Download as pdf or txt
Download as pdf or txt
You are on page 1of 399

Lecture Notes in Artificial Intelligence 1562

Subseries of Lecture Notes in Computer Science


Edited by J. G. Carbonell and J. Siekmann

Lecture Notes in Computer Science


Edited by G. Goos, J. Hartmanis and J. van Leeuwen
3
Berlin
Heidelberg
New York
Barcelona
Hong Kong
London
Milan
Paris
Singapore
Tokyo
Chrystopher L. Nehaniv (Ed.)

Computation for
Metaphors,
Analogy, and Agents

13
Series Editors

Jaime G. Carbonell, Carnegie Mellon University, Pittsburgh, PA, USA


Jörg Siekmann, University of Saarland, Saarbrücken, Germany

Volume Editor
Chrystopher L. Nehaniv
University of Hertfordshire
Faculty of Engineering and Information Sciences
College Lane, Hatfield Herts AL10 9AB, UK
E-mail: c.l.nehaniv@herts.ac.uk

Cataloging-in-Publication data applied for

Die Deutsche Bibliothek - CIP-Einheitsaufnahme


Computation for metaphors, analogy, and agents / Chrystopher L. Nehaniv (ed.). -
Berlin ; Heidelberg ; New York ; Barcelona ; Hong Kong ; London ; Milan ; Paris
; Singapore ; Tokyo : Springer, 1999
(Lecture notes in computer science ; 1562 : Lecture notes in artificial
intelligence)
ISBN 3-540-65959-5

CR Subject Classification (1998): I.2, J.4, J.5, K.4

ISBN 3-540-65959-5 Springer-Verlag Berlin Heidelberg New York

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,
in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are
liable for prosecution under the German Copyright Law.
c Springer-Verlag Berlin Heidelberg 1999

Printed in Germany
Typesetting: Camera-ready by author
SPIN 10702939 06/3142 – 5 4 3 2 1 0 Printed on acid-free paper
Preface

Metaphor and analogy have served as powerful methods in language, cognition,


and the history of science for human agents and cultures. Software, robotic, and
living agents also show or may take advantage of such methods in interacting
with their worlds.
This is a book about `crossing the lines' from one domain into another, and
about what can then emerge. The focus of this volume is the phenomena of
meaning transfer and meaning construction between di erent domains (minds,
systems, technologies, cultures, etc.) and their computational structure and de-
sign. The tools of transfer include imitation, analogy, metaphor, narrativity and
interaction which support mapping, thinking, processing, learning, reasoning,
manipulating, surviving or understanding for agents coping with their worlds.
In metaphor, meaning transferred (between di erent agents or from one realm
to another within a single system) may constitute, for example, symbolic or non-
representational knowledge, particular sets of behaviors, a structural description
or nite-state automaton model of a physical phenomenon, cognitive models
and hierarchical categories, coordinate systems a ording understanding, or a
paradigmatic viewpoint for construction of science or social reality. Meaning is
nevertheless only constructed with regard to some situated agent or observer un-
der constraints grounded in the interaction of its own structure and environment.
Good mappings and metaphors for situated agents are, moreover, not arbitrary,
but their usefulness and quality depend upon the degrees to which they respect
such grounding and structural constraints.
This volume brings together the work of researchers from various disciplines
where aspects of descriptive, mathematical, computational, or design knowledge
concerning metaphor and analogy have emerged. Such areas include, for ex-
ample, embodied intelligence, robotics, software and virtual agents, semiotics,
linguistics, cognitive science, psychology, philosophy, cultural anthropology, his-
tory of science, consciousness studies, mathematics, algebraic engineering, and
intelligent control.

April 1998 Chrystopher L. Nehaniv


Aizu-Wakamatsu City
Japan
Computation for Metaphors, Analogy & Agents
CMA2 is an international workshop organized and sponsored by the Cybernetics
and Software Systems Group and the Software Engineering Laboratory of the
University of Aizu and is supported by grants of the Fukushima Prefectural
Government, Japan.

Conference General Chair


Shoichi Noguchi University of Aizu, Japan

Scienti c Program Chair


Chrystopher Nehaniv University of Aizu, Japan

Advisory Committee
Rodney A. Brooks MIT Arti cial Intelligence Lab, U.S.A.
Joseph Goguen University of California, San Diego, U.S.A.
Douglas R. Hofstadter Indiana University, U.S.A.
Alex Meystel National Institute of Standards and
Technology, U.S.A.
Melanie Mitchell Santa Fe Institute, U.S.A.

International Program Committee


Meurig Beynon University of Warwick, U.K.
Lawrence Bull University of the West of England, U.K.
Zixue Cheng University of Aizu, Japan
Kerstin Dautenhahn University of Reading, U.K.
Gilles Fauconnier University of California, San Diego, U.S.A.
Robert M. French University of Liege, Belgium
Joseph Goguen University of California, San Diego, U.S.A.
Karsten Henckell New College, University of South Florida,
U.S.A.
Masami Ito Kyoto Sangyo University, Japan
Jacob L. Mey Odense University, Denmark
Alex Meystel National Institute of Standards and
Technology, U.S.A.
Chrystopher Nehaniv (Chair) University of Aizu, Japan
Minetada Osano University of Aizu, Japan
Thomas S. Ray ATR Human Information Processing Research
Labs, Japan & University of Delaware, U.S.A.
John L. Rhodes University of California at Berkeley, U.S.A.
Paul Thagard University of Waterloo, Canada
Local Organizing Committee
Qi-Ming Chen Takao Maeda
Zixue Cheng Chrystopher Nehaniv
Tsuyoshi Ishikawa Minetada Osano
Yuko Kesen Kazuaki Yamauchi (Secretariat)

Referees
Steve Battle Robert M. French Chrystopher Nehaniv
Meurig Beynon Joseph Goguen Minetada Osano
Aude Billard Karsten Henckell Thomas S. Ray
Larry Bull Masami Ito John L. Rhodes
Zixue Cheng William Martens Paul Thagard
Kerstin Dautenhahn Jacob L. Mey and other anonymous
Gilles Fauconnier Alex Meystel referees
Table of Contents

Introduction
Computation for Metaphors, Analogy and Agents : : : : : : : : : : : : : : : : : : : : : : 1
Chrystopher L. Nehaniv (University of Aizu, Japan & University of
Hertfordshire, U.K.)
Metaphors and Blending
Forging Connections : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 11
Mark Turner (University of Maryland, U.S.A.)
Rough Sea and the Milky Way: `Blending' in a Haiku Text :: :: :: :: :: :: :: 27
Masako K. Hiraga (University of the Air, Japan)
Pragmatic Forces in Metaphor Use: The Mechanics of Blend Recruitment
in Visual Metaphors : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 37
Tony Veale (Dublin City University, Ireland)
Embodiment: The First Person
The Cog Project: Building a Humanoid Robot : : : : : : : : : : : : : : : : : : : : : : : : : 52
Rodney A. Brooks, Cynthia Breazeal, Matthew Marjanovic,
Brian Scassellati, Matthew M. Williamson (MIT Arti cial Intelligence
Lab, U.S.A.)
Embodiment as Metaphor: Metaphorizing-In the Environment : : : : : : : : : : : 88
Georgi Stojanov (SS Cyril & Methodius University, Macedonia)
Interaction: The Second Person
Embodiment and Interaction in Socially Intelligent Life-Like Agents : : : : : : 102
Kerstin Dautenhahn (University of Reading, U.K.)
An Implemented System for Metaphor-Based Reasoning with Special
Application to Reasoning about Agents : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 143
John A. Barnden (University of Birmingham, U.K.)
GAIA: An Experimental Pedagogical Agent for Exploring Multimodal
Interaction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 154
Tom Fenton-Kerr (University of Sydney, Australia)
When Agents Meet Cross-Cultural Metaphor: Can They Be Equipped to
Parse and Generate It? : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 165
Patricia O'Neill-Brown (Japan Technology Program, U.S. Dept. of
Commerce)
Imitation: First and Second Person
Imitation and Mechanisms of Joint Attention: A Developmental Structure
for Building Social Skills on a Humanoid Robot : : : : : : : : : : : : : : : : : : : : : : : : 176
Brian Scassellati (MIT Arti cial Intelligence Lab, U.S.A.)
Figures of Speech, a Way to Acquire Language : : : : : : : : : : : : : : : : : : : : : : : : 196
Anneli Kauppinen (University of Helsinki & Helsinki Polytechnic,
Finland)
Situated Mapping: Space and Time
\Meaning" through Clustering by Self-Organization of Spatial and
Temporal Information : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 209
Ulrich Nehmzow (University of Manchester, U.K.)
Conceptual Mappings from Spatial Motion to Time: Analysis of English
and Japanese : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 230
Kazuko Shinohara (Otsuma Women's University, Japan)
Algebraic Engineering: Respecting Structure
An Introduction to Algebraic Semiotics, with Application to User Interface
Design : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 242
Joseph Goguen (University of California, San Diego, U.S.A.)
An Algebraic Approach to Modeling Creativity of Metaphor : : : : : : : : : : : : : 292
Bipin Indurkhya (Tokyo University of Agriculture and Technology, Japan)
Metaphor and Human-Computer Interaction: A Model Based Approach : : : 307
J. L. Alty and R. P. Knott (Loughborough University, U.K.)
A Sea-Change in Viewpoints
Empirical Modelling and the Foundations of Arti cial Intelligence : : : : : : : : 322
Meurig Beynon (University of Warwick, U.K.)
Communication as an Emergent Metaphor for Neuronal Operation : : : : : : : 365
Slawomir J. Nasuto, Kerstin Dautenhahn, and Mark Bishop
(University of Reading, U.K.)
The Second Person | Meaning and Metaphors : : : : : : : : : : : : : : : : : : : : : : : : 380
Chrystopher L. Nehaniv (University of Aizu, Japan & University of
Hertfordshire, U.K.)
Author Index :: :: :: :: :: :: :: :: :: :: :: :: :: : :: :: :: :: :: :: :: :: :: :: :: 389
Computation for Metaphors, Analogy and
Agents

Chrystopher L. Nehaniv

Cybernetics and Software Systems Group


University of Aizu, Aizu-Wakamatsu City, Fukushima 965-8580, Japan
nehaniv@u-aizu.ac.jp

Abstract. As an introduction to papers in this book we review the


notion of metaphor in language, and of metaphor as conceptual, and as
primary to understanding. Yet the view of metaphor here is more general.
We propose a constructive view of metaphor as mapping or synthesis of
meaning between domains, which need not be conceptual ones. These
considerations have implications for artificial intelligence (AI), human-
computer interaction (HCI), algebraic structure-preservation, construc-
tive biology, and agent design. In this larger setting for metaphor, con-
tributions of the selected papers are overviewed and key aspects of com-
putation for metaphors, analogy and agents highlighted.

1 Metaphor beyond Language and Concepts

Metaphor and analogy had traditionally been considered the strict domain of
rhetoric, poetics and linguistics. Their study goes back in long scholarly histories
at least to the ancient Greece of Aristotle and the India of Panini. More recently
it has been realized that human metaphor in language is primarily conceptual,
and moreover that metaphor transcends language, going much deeper into the
roots of human concepts, epistemologies, and cultures. Seen as a major com-
ponent in human thought, metaphor has come to be understood and studied
as belonging also to the realm of the cognitive sciences. Lakoff and Johnson’s
and Ortony’s landmark volumes [22,36] cast metaphor in cognitive terms (for
humans with their particular type of embodiment) and shed much light on the
constructive nature of metaphorical understanding and creation of conceptual
worlds.
Our thesis is that these ideas on metaphor have a power extending beyond
the human realm, not only beyond language and into human cognition, but to
the realm of animals, as well as robots and other constructed agents. In building
robots and agents, we are engaging in a kind of constructive biology, working
to realize the mechanism-as-creature metaphor, which has guided and inspired
much work on robots and agents. Such agents may have to deal with aspects of

Current address: Interactive Systems Engineering, Department of Computer Science,
University of Hertfordshire, Hatfield, Hertfordshire AL10 9AB, United Kingdom, E-
mail: c.l.nehaniv@herts.ac.uk

C. Nehaniv (Ed.): Computation for Metaphors, Analogy, and Agents, LNCS 1562, pp. 1–11, 1999.

c Springer-Verlag Berlin Heidelberg 1999
2 Chrystopher L. Nehaniv

time, space, mapping, history and adaptation to their respective Umwelt (“world
around”).
By looking at the linguistic and cognitive understanding of metaphor and
analogy, and at formal and computation instantiations of our understanding
of metaphor and analogy, the constructors of agents, robots and creatures may
have much to gain. Understanding through building is a powerful way to validate
theories and uncover explanatory mechanisms. Moreover, building can open one’s
eyes to the light of new understanding in both theory and practice.
An intriguing metaphorical blend is the notion of a Robot. The concept
of a robot is understood as a cognitive blend of the concepts “machine”1 and
“human” (or “animal”).2 Attempting to build such a mechanism, one is led to
the question of ‘transferring’ – realizing analogues of – human or animal-like
abilities in a new medium. Moreover, if this new mechanism should act like an
animal, How will need to interact with, adapt to, and perhaps interpret the
world around it? How is this agent to ‘compute’ in this way? and, How is it to
be engineered in order to meet either or both of these mutually reflective goals?
Scientific advances (and delays) have often rested on metaphors and analo-
gies, and paradigmatic shifts may be largely based on them [20]. But compu-
tation employing conceptual metaphors has mostly been carried out via human
thought. In the realms of human-computer interaction (HCI), artificial intelli-
gence (AI), artificial life, agent technology, constructive biology, cognitive sci-
ence, linguistics, robotics, and computer science, we may ask for means to em-
ploy the powerful tool of metaphor for the synthesis and analysis of systems for
which meaning makes sense3 , for which a correspondence exists between inside
and outside, among behaviors, embodiments and environments.
Richards [37] formulated a metaphor as a mapping from a topic source do-
main (‘tenor’) to a target domain (‘vehicle’), by means of which something is as-
1
The machine itself as tool and metaphor has had a long and creative history [11]. In-
deed the conceptualization of what we consider mechanistic explanations in physics,
biology and engineering has changed very much in course of the history of ideas. For
instance, Newton’s physics was criticized as being non-mechanistic, since it required
action at a distance without interconnecting parts (Toumlin [43]). Modern mech-
anistic scientific explanations were not necessary mechanistic in the older sense of
the term. ‘Mechanistic’ represents a refined, blended concept that has evolved over
many centuries.
2
A related blend is the notion of ‘cyborg’ — a ‘cybernetic organism’, which is more
proximal for us than ‘robot’ in that it entails a physical blending of biological life,
including ourselves, with the machine. Indeed, our use of tools such as eyeglasses,
hammers, numerical notations, contact lenses, and other prosthetics to augment our
bodies and minds has already made us cyborgs. This can be taken as an empower-
ing metaphor when one takes control of and responsibility for our own cybernetic
augmentation (Haraway [12], Nehaniv [28]).
3
See the discussion paper “The Second Person — Meaning and Metaphors” [30]
at the end of this book for outline of a theory of meaning in a setting extending
Shannon-Weaver information theory to situated agents and observers and addressing
the origin, evolution and maintenance of interaction channels for perception and
action.
Computation for Metaphors, Analogy and Agents 3

serted (or understood) about the topic. Cognitive theories realized that metaphor
is not an exceptional decorative occurrence in language, but is a main mechanism
by which humans understand abstract concepts and carry out abstract reasoning
(e.g. Lakoff and Johnson [22], Lakoff [21], Johnson [18]). On this view, metaphors
are structure-preserving mappings (partial homomorphisms) between conceptual
domains, rather than linguistic constructions. Common metaphorical schemas in
our cultures are grounded in embodied perception. Correspondences in experi-
ence (rather than just abstract similarity) structure our cognition. Common
conceptual root metaphors in English are studied by Lakoff and Johnson [22],
as also extended with detailed attention to root analogies in the English lexicon
by Goatly [9].
An important extension for conceptual metaphors is the framework of Mark
Turner and Gilles Fauconnier (see Turner’s paper in this volume), who argue that
metaphors and analogies are not sufficiently accounted for by mappings between
pre-existing static domains, but are actually better understood as constructs in
forged conceptual spaces, which are blends of conceptual domains, over some
common space, with projections from the blend space back to the constituent
factors (e.g. a ‘tenor’ and ‘vehicle’) affording recruitment of features from the
blend space in which much inference and new structure may be generated.4
We shall not restrict ourselves to concepts or language. A more general,
not necessarily symbolic view is also possible if one conceives of metaphor and
analogy in the study of ‘meaning transfer’ between domains, or in light of the
theory of cognitive blending, as the realm of ‘meaning synthesis’ by putting
things together that already share something to create a new domain guiding
thought, perception or action. Other types of meaning can be seen for instance
in Dawkins’ notion of memes as replicators in minds and cultures [7], transmit-
ted by imitation and learning, propagating, often in difficult circumstances, via
motion through behavioral or linguistic media. Still another type of meaning is
comprised by agent behavior in response to sensory stimuli to effect changes in
its environment.

1.1 Human-Computer Interaction

The idea of metaphor has been applied in Human-Computer Interaction (HCI),


Cognitive Ergonomics, and Software Engineering. For example, building user
interfaces based on metaphors is now standard engineering practice. Examples
are windows, widgets, menus, desktops, synthetic worlds (e.g. nanomolecular
manipulation via a virtual reality (VR) and force-feedback haptic interface),
and personal assistant agents). The search for improved interaction metaphors
is an active research and development area (e.g. [41]). Here we are in a realm of
metaphor in human-tool interaction, which is clearly primarily conceptual (and
at times merely sensorimotorial) rather than linguistic. Language games have
4
The understanding of readers with an knowledge of basic category theory may be
enhanced by the suggestion that Fauconnier-Turner blends may be considered as
category-theoretic pushouts or, more generally, as colimits of conceptual domains.
4 Chrystopher L. Nehaniv

become interaction games, with the meaning of artifacts defined by the actions
they afford.
A particular case is the area of ‘intelligent software agents’. This has grown
into a large arena of research and application, concerned with realizing the
software-as-agent metaphor in interfaces, entertainment and synthetic worlds,
as well as for workload and information overload reduction (cf. [38]). As with
other types of semantic change in human language and cultures, what may at
first have been marked as strange may become common: these metaphors become
definitional identities; rather than conceptual mappings, they become realities.
Some pieces of software are really agents.

1.2 Algebraic Engineering: Preserving Structure


Can the creativity of human metaphor be understood in formal terms? How do
humans understand each other’s metaphors and analogies? What if the humans
live in different cultures, speak different languages, or have radically differing
experiences? How can understanding of metaphor and analogy be explained?
Metaphors and mappings cannot be arbitrary or they will be useless and
without sense. Meaning for situated agents is constrained by the grounding of
the agent, the agent-environment coupling, and the dynamics and structure of
both the agent and environment. The study of mappings which respect this
structure (homomorphisms) is the applied algebraic subject Algebraic Engineer-
ing, which addresses various challenges of building agents, artificial intelligence
and other areas. Several of the papers here directly address algebraic structures
with applications to the study of creativity and user-interface design.
Many human-generated coordinate systems for understanding phenomena,
such as decimal notation for numbers, the structure of coordinate systems im-
plicit in the use of clocks, conservation laws in physics, can be built systemati-
cally using techniques of algebraic engineering (e.g. Nehaniv [29,25,28,27]). The
methods of decomposition used are those of completing a relation by making
it into a structure-preserving mapping, while refining the relation with lower-
level detail in a way that respects structure and affords understanding [28,29].
Relational morphisms can be considered as analogies between formal models
affording understanding [25,29]. This formal treatment is related to S. Ryan Jo-
hansson’s idea [17] that metaphors provide a kind of software for the human
mind by offering suggestions or commands to attempt to consistently relate two
systems and thus force the mind to construct understanding by trying to re-solve
ambiguities and contradictions of the resulting mapping. 5
The work of Joseph Goguen presented here [10] in algebraic semiotics repre-
sents one approach to provide a formal language in which to talk precisely about
5
A programme for automatic manipulation of formal models affording understanding
via algebra can itself be understood as an attempt to resolve the meta-metaphor: Re-
lational morphisms are metaphors and, failed metaphors can be completed to working
metaphors as constrained by kernel theorems that describe the creation of meaning
from resolving the failure of attempted metaphors to work perfectly [25,27,29]. This
viewpoint is applied to the study of imitation in [33].
Computation for Metaphors, Analogy and Agents 5

the quality of user-interfaces in terms of the degree to which they preserve the al-
gebraic structure of semiotic systems. Unlike most formal approaches, it remains
agent- and user-centered, considering situated interaction in its particular, allow-
ing it to avoid overconstraint and other pitfalls of objectivism, while focusing on
the central role of structure-respecting mappings. In understanding metaphors
and analogies concerning real-world things, one would do well do avoid forcing
fixed conceptual representations onto them since conceptualizations can be con-
structed dynamically in creative analogy, perception and problem-solving (Hoft-
stadter et al. [14], Mitchell [23], Holyoak and Thagard [15]) which allow for fluid
‘conceptual slippage’. Scientists know well not to neglect their intuitions of vague
analogies, since these may lead to deep insights that may later be substantiated
by hard empirical data.

2 Overview of Papers

The mechanics of metaphors and blending comprise the first section of the book.
Mark Turner [44] presents a sophisticated approach to metaphor and analogy
in terms of ‘blends’, a framework that can be expressed in category-theoretical
terms of pushouts (or more general colimits) of conceptual spaces over a com-
mon skeletal space. It is shown how this framework works better for the analysis
of analogy than traditional source-target approaches, especially since elements
of the constructed (blend) space are recruited to the analogy. Thus meaning is
often constructed (‘forged’) in the blend rather than merely transferred between
domains by mapping. Masako K. Hiraga [13] illustrates the Fauconnier-Turner
framework by her detailed study of metaphorical blends in a famous haiku of the
Japanese poet Basho. She carries out a beautiful tour de force analysis involving
levels of logographics, grammar, poetics, morphophonemics, and culture. Tony
Veale [45] gives applications to visual metaphors using a sophisticated implemen-
tation of a computational system for finding and understanding metaphor with
special attention to computational feasibility and pragmatics using the blend
framework and notions of recruitment (semantic crossover from domains of the
blend) with good use of some traditional AI methods.
The agent-centered or first-person viewpoint is the focus of the next section
which concentrates on the details of embodiment and agent-environment cou-
pling from an agent perspective: Rodney A. Brooks, Cynthia Breazeal, Matthew
Marjanović, Brian Scassellati and Matthew M. Williamson [4] discuss alternative
essences of intelligence and lessons from embodied AI, presenting the MIT Hu-
manoid Robot Cog and the embodied AI viewpoint. Emergent dynamics driven
by human interaction (turn-taking) and exploitation of natural dynamics in
the robot (arm swinging and force-feedback with a slinky toy) have also been
achieved by the MIT group. Key ideas are to reject monolithic control and full
internal models, not attempting general purposehood, and the recognition of
6 Chrystopher L. Nehaniv

the importance of development6 , the importance of social interaction, embodi-


ment, bootstrapping and sensory integration. Georgi Stojanov [42] proposes to
view embodiment as metaphor for dealing with the environment. The control
schema of an embodied agent is transferred to another as the basis of learning
to navigate an environment. The theme of ‘understanding something through
something else’ is illustrated in this work connecting embodied agent control
with metaphor, and will no doubt provide material for good discussions of the
relations among meaning, behavioral schemata, and sensory perception.
The relatedness of the first-person to others (second person) is taken up in the
next second on interaction and mapping between agents: Roboticist and biologist
Kerstin Dautenhahn [6] identifies key properties of embodiment and intelligence
with special attention to socially intelligent natural agents (e.g. humans, pri-
mates, cetaceans) living in individualized (as opposed to anonymous insect-like)
societies. Historical and physical grounding for agents situated in the environ-
ment with which they interact is the source of any notion of ‘meaning’, and prop-
erties of life and intelligence attributed by an observer may be approached via a
bottom-up study (starting from the properties of matter or basic components)
from which higher-level phenomena emerge. Questions of embodiment for vari-
ous types of agents are considered. The Embodied AL (Artificial Life) approach
is the former and is of interest to both designers of systems whose object is to
use ideas from life to build some useful artifact and those who through building
concrete systems seek to understand properties of life. Robot implementations
of emergent balancing in a hilly landscape, cooperative power regeneration, and
learning a simple vocabulary via imitation illustrate some minimal conditions
sufficient for the emergence of interesting social interaction in embodied agents
that relies on interactivity but in no way on symbolic representation or mod-
eling. Shaping behavior in human-robot interaction is illustrated by a ‘dancing
with strangers’ implementation. John A. Barnden [2] describes an implemented
system for metaphor-based reasoning with special application to reasoning about
agents. His AI reasoning system handles metaphorical description about other
agents’ or people’s mental states (possibly in a nested manner) and beliefs about
other’s beliefs in a form close to natural language. His notion of ‘pretense cocoon’
is used to isolate and nest possibly inconsistent beliefs. Tom Fenton-Kerr [8] ex-
plores interaction issues in pedagogical agent design with multimodal interaction,
and points out dangers in the poor design of various existing software agents and
assistants (e.g. inappropriate anthropomorphism, intrusiveness, or attitude for
the tasks at hand) and suggests some important factors for consideration via
a case study. Patti O’Neill-Brown [35] identifies problems and solutions for the
human understanding of metaphor in a cross-linguistic context (English native
speakers learning Japanese) in an implemented computer-assisted instruction
system. Problems of cross-culture understanding that depend on the ubiquity of

6
Here ‘development’ means an incremental or ‘subsumption’ approach building on
what has been achieved so far, suppressing, invoking, or otherwise modulating its
behaviors in wider contexts by means of new layers of structure.
Computation for Metaphors, Analogy and Agents 7

metaphor in human language will be faced by software agents in going beyond


lexical surface meaning.
The relationship between agent and other (first and second person ) is the
focuses of the fourth section concentrating on imitation: Brian Scassellati [39]
describes developmental scaffolding for imitation and mechanisms of shared at-
tention, and how this are being realized in the MIT Cog project. Scassellati’s
work addresses efficient eye-localization, leading to determination of where a
human agent is looking as a basis for shared attention, as grounding for deitic
and declarative gazing as motivated by studies of primates, child development,
and autism in humans. Anneli Kauppinen [19] studies imitation and analogy in
language acquisition, realized in the use of figures-of-speech, by analyzing evi-
dence from child language-acquisition (from Finnish and other languages) that
acquisition of such figures may be a crucial principle in learning morphology,
semantics, and syntax – leading to constructive grammar in which pragmatics
play an integral role.
Mapping and Algebraic Engineering are the themes of the next two sec-
tions of the book and share the fundamental concern of respecting structure:
Structure-preservation in spatiotemporal mapping is addressed in the work of
Ulrich Nehmzow [34] for navigation in a mobile robot and by Kazuko Shino-
hara [40] for human conceptual schemata illustrated by standard metaphors
from English and Japanese. Joseph Goguen [10] gives a tutorial on semiotics
that spans sign systems, blends and algebraic approaches to software specifica-
tion and interface design. Main insights are (1) signs mediate meaning (Peirce),
(2) signs come in structured systems (Saussure), (3) structure-preserving maps
(‘morphisms’) are often at least as important as structures (Noether, Eilen-
berg and Mac Lane), and (4) discrete structures can be described by algebraic
theories. Goguen’s notion of semiotic morphism formalizes ‘theory mapping’
which is useful in user-interface design, software specification, and analysis of
conceptual blends. Bipin Indurkhya [16] offers a mathematically informed ap-
proach to formally modeling creativity that arises from metaphor, including
the study of how different viewpoints on the same system organize the model
differently. J. L. Alty and R. P. Knott [1] give a theoretical framework for
metaphor and applications in HCI useful in identifying and predicting problems
that can arise in user-interface metaphors, and also useful in finding areas where
metaphor/functionality/implementation can be improved for HCI systems.
The final section reports the sea-change in viewpoints away from an external,
objectivist (third person) perspective toward the first- and second- person view-
points requisite to an adequate treatment of agents, mapping and metaphor:
Meurig Beynon [3] details the Empirical Modeling (EM) approach to the foun-
dations of AI going beyond the traditional logicist framework. The approach
handles well a view of intelligence that acknowledges the provisional and empir-
ical nature of all knowledge of state and behavior. Examples of the applications
include the agent-oriented, open modeling and analysis of railway disaster. The
approach has also been applied to computer graphics. The openness and arti-
ficiality of all intelligence are treated from a perspective that intends to tran-
8 Chrystopher L. Nehaniv

scend symbolic, logicist, third-person (see comments above) AI. S. J. Nasuto,


K. Dautenhahn, and M. Bishop [24] present communication as a possible alter-
native metaphor for natural neural computation as opposed to the information-
processing, computational perceptron metaphor. This is applied to an artificial
retina and attention modeling via use of stochastic diffusion search. The discus-
sion paper by C. Nehaniv [30] addresses the notion of meaning for observers and
agents in relation to information theory and channels of perception and action in
evolved and designed agents. The papers of Beynon and Nehaniv both describe
the recent sea-change in the study of meaning and intelligence characterized by
a shift away from an external observer perspective toward a first-person, agent-
centered viewpoint, and a second-person relatedness- and interaction-centered
viewpoint. This is sea-change is a natural development as one moves from areas
of science where an objectivist perspective has worked well (e.g., chemistry and
classical physics) to areas of science where agents matter (e.g., biology, cognitive
science, artificial intelligence).
From this mix of the extremely multidisciplinary work representing roboti-
cists, cognitive scientists, linguists, biologists, computer scientists, engineers and
mathematicians, certain areas of consensus emerged.
There is a certain tension between linguists and cognitive theoreticians on the
one hand and agent-oriented computer scientists and roboticists. The linguists
and cognitive scientists are often satisfied when they succeed in analyzing or
expressing problems related to metaphor and analogy (descriptive view) in a
formal setting whereas for the other side (constructive view), this is enough to
get scientists interested – a starting point – for questions of how one would use
such ideas in implementing a useful system (robotic or software agent).
Human language or knowledge already represented for a system often tends
to be assumed to be already appropriately structured and already available for
some of the computational approaches to metaphor in which software would
then perform reasoning about another agent’s beliefs or use of metaphor. For
scientists working at a lower level of building an agent embodied in a concrete
environment, the problem of how to get such knowledge, to learn what knowledge
is appropriate and how to use that knowledge is more fundamental. It is useful to
create a dialogue among these these different viewpoints since scientific workers
are often largely unaware of the existence of techniques from other fields. It is
also clear that there remain many exciting challenges to bridging the concerns
the various branches of science and humanities represented here. This book will
have succeeded if it stimulates further integrative thought and work in these
directions.

References
1. J. L. Alty and R. P. Knott, Metaphor and Human-Computer Interaction: A
Model Based Approach. In [31], 307–321, (this volume). 7
2. John A. Barnden, An Implemented System for Metaphor-Based Reasoning
with Special Application to Reasoning about Agents. In [31], 143–153, (this
volume). 6
Computation for Metaphors, Analogy and Agents 9

3. Meurig Beynon, Empirical Modelling and the Foundations of Artificial Intelli-


gence. In [31], 322–364, (this volume). 7
4. Rodney A. Brooks, Cynthia Breazeal, Matthew Marjanović, Brian Scassellati,
and Matthew M. Williamson, The Cog Project: Building a Humanoid Robot.
In [31], 52–87, (this volume). 5
5. Kerstin Dautenhahn, I could be you — the phenomenological dimension of
social understanding. Cybernetics and Systems 25(8):417–453, 1997.
6. Kerstin Dautenhahn, Embodiment and Interaction in Socially Intelligent Life-
Like Agents. In [31], 102–142, (this volume). 6
7. Richard Dawkins, The Selfish Gene, Oxford University Press, 1976. 3
8. Tom Fenton-Kerr, GAIA: An Experimental Pedagogical Agent for Exploring
Multimodal Interaction. In [31], 154–164, (this volume). 6
9. Andrew Goatly, The Language of Metaphors, Routledge, 1997. 3
10. Joseph Goguen, An Introduction to Algebraic Semiotics, with Application to
User Interface Design. In [31], 242–291, (this volume). 4, 7
11. Hermann Haken, Anders Karlqvist, and Uno Svedin, eds., The Machine as
Metaphor and Tool, Springer-Verlag, 1993. 2
12. Donna Haraway. A Cyborg Manifesto: Science, Technology, and Socialist-
Feminism in the Late Twentieth Century. In Simians, Cyborgs and Women:
The Reinvention of Nature New York: Routledge, 149–181, 1991. 2
13. Masako K. Hiraga, Rough Sea and the Milky Way: ‘Blending’ in a Haiku Text.
In [31], 27–36, (this volume). 5
14. Douglas R. Hofstadter and the Fluid Analogies Research Group, Fluid Con-
cepts and Creative Analogies, Basic Books, 1995. 5
15. Keith J. Holyoak and Paul Thagard, Mental Leaps: Analogy in Creative
Thought, MIT Press, 1996. 5
16. Bipin Indurkhya, An Algebraic Approach to Modeling Creativity of Metaphor.
In [31], 292–306, (this volume). 7
17. S. Ryan Johansson, The Brain’s Software: The Natural Languages and Po-
etic Information Processing. In Hermann Haken, Anders Karlqvist, and Uno
Svedin, eds., The Machine as Metaphor and Tool, Springer-Verlag, 9–43, 1993.
4
18. Mark Johnson, The Body in the Mind, University of Chicago Press, 1987. 3
19. Anneli Kauppinen, Figures of Speech, a Way to Acquire Language. In [31],
196–208, (this volume). 7
20. Thomas S. Kuhn, The Structure of Scientific Revolutions, University of
Chicago Press, 1962. 2
21. George Lakoff, The Contemporary Theory of Metaphor. In Andrew Ortony,
ed., Metaphor and Thought, 2nd edition, Cambridge University Press, 202–251,
1993. 3
22. George Lakoff and Mark Johnson, Metaphors We Live By, University of
Chicago Press, 1980. 1, 3, 3
23. Melanie Mitchell, Analogy-Making as Perception, MIT Press, 1993. 5
24. Slawomir J. Nasuto, Kerstin Dautenhahn, and Mark Bishop, Communication
as an Emergent Metaphor for Neuronal Operation. In [31], 365–379, (this vol-
ume). 8
25. C. L. Nehaniv. Text of a public lecture on the algebra of understanding.
Technical Report 94-01-043, University of Aizu, September 1994. 4, 4, 4
26. C. L. Nehaniv. From relation to emulation: The Covering Lemma for trans-
formation semigroups. Journal of Pure & Applied Algebra, 107:75–87, 1996.
10 Chrystopher L. Nehaniv

27. C. L. Nehaniv, Algebra and Formal Models of Understanding. In M. Ito, ed.,


Semigroups, Formal Languages and Computer Systems, Kyoto Research Insti-
tute for Mathematics Sciences, RIMS Kokyuroku, vol. 960, 145–154, August
1996. 4, 4
28. C. L. Nehaniv. Algebraic Models for Understanding: Coordinate Systems and
Cognitive Empowerment In J. P. Marsh, C. L. Nehaniv, B. Gorayska, eds., Pro-
ceedings of the Second International Conference on Cognitive Technology: Hu-
manizing the Information Age, IEEE Computer Society Press, 147-162, 1997.
2, 4, 4
29. C. L. Nehaniv, Algebra for Understanding. In C. L. Nehaniv and M. Ito, eds.,
Algebraic Engineering: Proceedings of the First International Conference on
Semigroups and Algebraic Engineering (Aizu, Japan) and the International
Workshop on Formal Languages and Computer Systems (Kyoto, Japan), World
Scientific Press, 1–16, 1999. 4, 4, 4, 4
30. C. L. Nehaniv, The Second Person — Meaning and Metaphors. In [31], 380–
388, (this volume). 2, 8
31. C. L. Nehaniv, ed., Computation for Metaphors, Analogy and Agents, (Lecture
Notes in Artificial Intelligence, Vol. 1562), Springer Verlag, (this volume). 8,
8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 11, 11
32. C. Nehaniv and K. Dautenhahn. Embodiment and Memories — Algebras of
Time and History for Autobiographic Agents. In Robert Trappl, ed., Cyber-
netics and Systems ’98, Proc. 14th European Meeting on Cybernetics and
Systems Research (Symposium on Embodied Cognition and Artificial Intelli-
gence; co-organized by Maja Mataric and Eric Prem), Vienna, Austria, 14-17
April 1998. Austrian Society for Cybernetic Studies, volume 2, 651–656, 1998.
33. C. Nehaniv and K. Dautenhahn, Mapping Between Dissimilar Bodies: Affor-
dances and the Algebraic Foundations of Imitation. In: J. Demiris and A. Birk,
eds., Proc. European Workshop on Learning Robots 1998 (EWLR-7), (Edin-
burgh, Scotland - 20 July 1998), 1998. 4
34. Ulrich Nehmzow, “Meaning” through Clustering by Self-Organization of Spa-
tial and Temporal Information. In [31], 209–229, (this volume). 7
35. Patricia O’Neill-Brown, When Agents Meet Cross-Cultural Metaphor: Can
They Be Equipped to Parse and Generate It? In [31], 165–175, (this volume).
6
36. Andrew Ortony, Metaphor and Thought, 2nd edition (1st edition: 1979), Cam-
bridge University Press, 1993. 1
37. I. A. Richards, The Philosophy of Rhetoric, Oxford University Press, 1936. 2
38. Doug Riecken, guest editor, special issue on ‘Intelligent Agents’, Communica-
tions of the Association for Computing Machinery, 37 (7), July 1994. 4
39. Brian Scassellati, Imitation and Mechanisms of Joint Attention: A Develop-
mental Structure for Building Social Skills on a Humanoid Robot, In [31],
176–195, (this volume). 7
40. Kazuko Shinohara, Conceptual Mappings from Spatial Motion to Time: Anal-
ysis of English and Japanese. In [31], 230–241, (this volume). 7
41. Ben Shneiderman, Designing the User Interface: Strategies for Effective
Human-Computer Interaction, 2nd ed., Addison-Wesley, 1992. 3
42. Georgi Stojanov, Embodiment as Metaphor: Metaphorizing-In the Environ-
ment. In [31], 88–101, (this volume). 6
43. Stephen Toumlin, From Clocks to Chaos: Humanizing the Mechanistic World-
View. In Hermann Haken, Anders Karlqvist, and Uno Svedin, eds., The Ma-
chine as Metaphor and Tool, Springer Verlag, 139–153, 1993. 2
Computation for Metaphors, Analogy and Agents 11

44. Mark Turner, Forging Connections, In [31], 11-26, (this volume). 5


45. Tony Veale, Pragmatic Forces in Metaphor Use: The Mechanics of Blend Re-
cruitment in Visual Metaphors. In [31], 37–51, (this volume). 5
Forging Connections
Mark Turner

Department of English Language and Literature,


Program in Neuroscience and Cognitive Science,
University of Maryland, College Park 20742
markt@umd5.umd.edu

Abstract. Conceptual connections that look inevitable in retrospect often come


from industrious and dynamic creative work below the horizon of observation. I
introduce the theory of conceptual integration and discuss constraints that shape
and guide the construction of meaningful connections.

On Monday, October 27, 1997, when the Dow Jones Industrial Average fell more than
five hundred points, precipitously and unnervingly, on huge volume, in a single day,
and the last two hours saw broad panic selling, investors wondered whether the next
day would be a bloodbath. Later that evening, the internet was flooded with thousands
of postings analyzing whether the crash was like the infamous crash on Black Monday
ten years earlier. I read them all evening.
These professional and amateur investors never questioned the fundamental
importance of knowing whether the analogy was true. Evidently, punishment awaited
anyone who made the wrong call. If the analogy held, then the investor in equities
should preserve positions and buy aggressively into the market, which would rise.
Yet there were reasons to doubt the analogy. Even after their five-hundred point
fall, stocks were still expensively valued by traditional measures. Most investors had
enjoyed unprecedented capital gains on paper in the previous few years, and many
could not resist the argument that it would be prudent to realize those gains before the
market plunged into the vortex of Asian currency troubles. Thailand's monetary
turmoil—in a domino cascade running through Indonesia, Korea, Hong Kong, Japan,
and the United States—could be lethal.
The analysts on the internet took it for granted that establishing analogy or
disanalogy depends upon rebuilding, reconstruing, reinterpreting the two inputs—in
this case, the two crashes. They began with provisional background structure and
connections—for example, the Dow on Black Monday corresponded to the Dow in
October, 1997 (even though the thirty companies comprising the Dow Industrials had
been changed), the drop on Black Monday in 1987 corresponded to the drop on
October 27, 1997, and so on. But this structure and these correspondences provided
only a launching pad, not the analogy itself. In particular, they provided none of the
inferences investors sought as the basis for their consequential decisions and actions.
The effective claims in the internet analyses were introduced with phrases like,
"What this crash is a case of . . .," "We must not forget that the 1987 crash . . .," and
"It would be a mistake to think of the 1987 crash as . . . ." There were injunctions like
C. Nehaniv (Ed.): Computation for Metaphors, Analogy, and Agents, LNCS 1562, pp.11 -26, 1999.
 Springer-Verlag Berlin Heidelberg 1999
12 Mark Turner

"Don't blur categories—the professionals preserve their careers as professionals but


the small investors don't have that motivation." In the picture painted by these
analyses, analogy and disanalogy are processes centrally concerned with construction
and reconstruction of the inputs. The analogs are forged as the analogy is forged.
Creative forging of analogs and connections is essential for at least an important
category of analogies. Consider the following French political cartoon, which I take
from the front page of Le Figaro for 13 January 1997.
This cartoon, typical of newspaper political cartoons in this respect, makes its point
persuasively and unmistakably at a glance. It concerns the politically sensitive debate
over a policy of setting retirement at age 55. The headline reads "Retirement at 55:
Chirac bridles." The subhead reads, "Even though 61% of French citizens support the
policy . . ." The cartoon shows an expectant father in the waiting room of a maternity
ward. He has been reading the newspaper report of French president Jacques Chirac's
resistance to the retirement policy. The obstetrician has just entered the waiting room,
followed by a frowning nurse. The obstetrician throws up his hands at a loss and says
to the father, "Your baby refuses to allow me to deliver him into the world until he can
be told at what age he can take his retirement, if he finds work."
The immediate and powerful inference for French readers is that people demanding
the retirement policy are being absurd. Extreme assurances are unavailable in life, and
it is nonsense to condition everyday life on obtaining them. For workers to go on
strike to secure such a retirement policy would be like a fetus's going on strike in the
delivery room. The doctor's last clause, "if he finds work," is biting. Unemployment
and underemployment are severe in France, especially among the young. "Chomage"
is a principal topic of daily news. The inference of this last phrase is that it is
spectacularly stupid to demand governmental spending on early retirement when the
country faces the far more threatening issue of unemployment. What the baby should
demand, if it demands anything, is opportunity for employment, not a promise of early
retirement if it happens to be lucky enough to get a job.
Some readers may make yet other inferences of absurdity. The baby can cause
difficulties during delivery but may itself suffer, even die, in the consequence, so it
would be irrational for the baby to intend these difficulties. The baby's refusal can
even be viewed as silly, vain, and arrogant, since, inevitably, natural and medical
processes must compel the baby to be born regardless of the difficulties.
The central inference of this analogy is that the French electorate should drop its
support for the retirement policy and focus instead on supporting the government in its
fight against a sick economy and high unemployment. This message fits the political
dispositions of Le Figaro.
Forging Connections 13

Suppose we began to analyze this political analogy by adopting the mistaken but
common folk assumption that analogy starts with two pre-existing analogs, aligns and
matches them, and projects inferences from one to the other. The analogs to be
matched for this cartoon would be a scene with a father in the waiting room of a
maternity ward and a scene with French workers demanding a policy of early
14 Mark Turner

retirement. I can see no significant matches between these two notions. I can match
the labor of the mother to the labor of the French workers, but that connection has
nothing to do with this analogy and leads nowhere. I have no pre-existing knowledge
of fetuses according to which I can match them with French workers who make
demands about their conditions of employment. There is the possible match between
the non-delivery of the baby and the non-delivery of passengers and goods—French
transportation workers were at the time striking in support of the policy—but that
match is optional, provides no inference of absurdity, and could be fatally misleading
since it matches the obstetrician responsible for the delivery with the transportation
workers responsible for delivery, and this match destroys the analogy. It seems clear
that any straightforward matching between these two pre-existing notions, if there is
any, misses the analogy.
Matching does not work, but neither does projection of inferences from source to
target. The familiar source space would be birth in a maternity ward, supplemented
with the frame of the waiting room, and the target space would be French labor
politics. But there are no fetuses in the source space who make ridiculous demands of
any kind and no doctors who toss up their hands in exasperation at the absurd ideas of
the fetus. In the source space, members of the delivery team do not come into the
waiting room to protest the unreasonable views of the fetus. None of this and none of
the associated inferences in fact exist in the source to be projected onto the target in
the first place.
The absurdity of the situation does not belong to the pre-existing source.
Interestingly, it does not belong to the pre-existing target, either. The inference of the
cartoon is that the demands of the French workers are so absolutely absurd and
unheard-of as to be completely astonishing. They are wild from any perspective. But
if such an absurdity were already part of the pre-existing target, there would be no
need to make the analogy. The motivation for making the analogy is that 61% of the
French do in fact support these demands, and those citizens need to be persuaded to
drop their support.
The cartoon is unmistakably organized by the abstract conceptual frame of the
source space—a waiting room in a maternity ward. It also contains a few specified
elements, and it is illuminating to consider what they are doing in the cartoon.
Consider the newspaper in the expectant father's right hand. Naturally, an expectant
father might read a newspaper while he waits, and the analogist exploits this
possibility. But the motivation for including the newspaper in the cartoon is not to
evoke the frame of a waiting room and not to lead us to match or project the
newspaper to some analogical counterpart in the target space. There is a counterpart
newspaper in the target space, in fact this identical newspaper, but the connection
between them is identity, not analogy. The newspaper has been incorporated deftly
into the frame of the waiting room because it is important in the target: it announces
president Chirac's resistance to the policy of retirement. The construal of the waiting
room, we see, is driven by the analogy. The source analog is being forged so the
analogy can work.
The newspaper headline is the least of the elements in the waiting room that appear
there under pressure from the target. The difficulty of the delivery and the doctor's
Forging Connections 15

frustration are motivated only by the target. In fact, there are elements in this cartoon
that are impossible for the source space of real waiting rooms. The perversity of the
fetus, the disapproval of the fetus by the obstetrician and the nurse and presumably the
father, the speech of the fetus and its logic, the biting irony of putting the problem of
retirement ahead of the problem of unemployment—an irony clearly conveyed by the
cartoonist but not recognized by the doctor whose words convey it—come only from
the target.
The mental operations that account for this analogy and its work are not retrieval of
pre-existing source and target notions, alignment and matching of their elements, and
projection of inferences from source to target. Instead, the relevant mental operation
is, as Gilles Fauconnier and I have called it, "conceptual integration." (Fauconnier &
Turner 1994, 1996, in pressa, in press b, and in preparation; Turner and Fauconnier
1995, in pressa, and in pressb, Fauconnier 1997, and Turner 1996a and 1996b). There
is a website presenting the research on conceptual integration at
http://www.wam.umd.edu/~mturn/WWW/blending.html.
Conceptual integration—sometimes called "blending" or "mental binding"—
develops a network of mental spaces, including contributing spaces and a blended
space. In the example of the cartoon, the contributing spaces are the French labor
situation, with workers, and the maternity ward, with a fetus. The blend has a single
element that is both a faction in the French labor debate and a baby. Fauconnier and I
call a network of such connections and emergent structures a "conceptual integration
network." A conceptual integration network has projection of elements from
contributing spaces to the blend; cross-space mappings between the contributing
spaces; compositions of elements in the blend; completion of structure in the blend by
recruitment of other frames; and elaboration of the structure in the blend. The
operations of composition, completion, and elaboration in the blend frequently
develop emergent structure there that is not available from the contributing spaces. In
conceptual integration networks, inferences can be projected from the blend to either
input. In the case of analogy, the contributing spaces are asymmetric: one is a source
and one is a target. But causal, ontological, intentional, modal, and frame structure
can come from the target to the blend, and inferences can be projected from the blend
to both source and target. Conceptual integration networks have structural and
dynamic properties and develop under a set of competing optimality constraints which
Fauconnier and I have discussed elsewhere.
Of particular importance for this cartoon, construction and interpretation can be
done on any space at any time as the network develops. In particular, the input spaces
can be re-represented, rebuilt, reconstrued, and reinterpreted. For example, although
notions of the waiting room in a maternity ward do not include conventionally that the
obstetrician comes out to report a problem, or centrally that the expectant father is
reading a newspaper, nonetheless these structures can be recruited to the source space,
and are in this case, since they are needed for blending, under pressure from the target,
with its labor problems and politicians whose views are reported by the media. When
an organizing frame of the blend has been borrowed from the source, it can be
elaborated for the blend with structure left out of the source or impossible for the
source. For example, the baby in the cartoon has highly developed intentional,
16 Mark Turner

expressive, and political capacities, projected to it from the workers in the target, but
we do not project those abilities to the source: we do not interpret this cartoon as
asking us to revise our notions of fetuses to include these advanced abilities.
We keep the source, the target, and the blend quite distinct in this network and do
not become confused. Given the genre of the cartoon, we know that the purpose of
this analogy is to project inferences from the blend to the target rather than to the
source. (Seana Coulson [1996] has shown that there are other genres with other
standard directions of projection.) In the blend, we develop the inference that
something has gone wrong with the natural course of things and that agents dealing
with it are exasperated, but we do not project back to the source the inference that
when delivery is actually failing, it's fine for the obstetrician to take a walk out to the
waiting room to whine for sympathy, instead of redoubling his medical efforts in the
delivery room. We do not project back to the source the inference that in a true
medical emergency the reaction of the expectant father and the obstetrician should be
dumb-founded astonishment at the uncooperative behavior of the fetus rather than
anxiety over the health of the mother and child.
We do project the absurdity of the baby's demand in the blend to the worker's
demand in the target—that is the point of the analogy—but this projection is
complicated. The baby in the blend is an individual who has not yet obtained
employment. Part of the reason we judge the baby to be irrational is that, for the
individual, it would be manifestly illogical to care more about retiring early than about
having a job, since retiring at all is conditional upon having a job. Yet this inference
cannot project identically to each individual working French citizen, who is in fact
already employed. Nor does it seem to project identically to each individual
unemployed French citizen, who may in fact be more concerned about having a job
than about retiring early. The inference projects not identically but to a related
inference for the target, an inference not for individuals but for French citizens as a
political body. The baby's individual retirement age projects to the retirement age to
be set by policy, and the baby's individual prospects for employment project to general
employment trends in France. In the target, these numbers are distributed in a way
that does not give wild absurdity—61% of French citizens are unruffled by their
conjunction—but in the blend, these numbers have become the prospects faced by a
single individual, whose passion to know his conditional retirement age but
nonchalance about his prospects for employment yield a manifest absurdity and irony,
judgments that the cartoonist hopes to induce the reader to project back to the target.
The intended implication of the analogical integration network is that since
unemployment is a general concern for the nation, French citizens should not ask for
expensive retirement policies. The two central inferences of the analogy—manifest
absurdity and biting irony—are constructed only in the blend; they are not available
from the inputs.
The analogy of this cartoon, which appears on the front page of the newspaper as
an illustration of the main story, and which presents no difficulty whatever to its
readers, gives us a picture of analogy as a simultaneous forging of contributing spaces,
a blend, and connections in a dynamic integration network.
Forging Connections 17

We see quite a different picture of the nature of analogy, this time an explicit
academic picture, if we look at work in artificial intelligence. Forbus, Gentner,
Markman, and Ferguson (in press) take the view that there is consensus in AI on the
main theoretical assumptions to be made about analogy, and in particular on the
usefulness of decomposing analogical processing into constituent subprocesses such as
retrieving representations of the analogs, mapping (aligning the representations and
projecting inferences from one to the other), abstracting the common system, and so
on . . .
But for at least an important range of analogies, including many standard analogies
in political science and economics, this decompositional view of analogy fails. There
are two reasons for its failure. First, the analogies I have in mind cannot be explained
as operating over pre-derived construals that are independent of the making of the
analogy. Rather, the construal of the inputs depends upon the attempt to make the
analogical mappings.
Second, models in this Artificial Intelligence tradition do not seem to allow a place
for analogical meaning to arise that is not a composition of the meanings and
inferences of the inputs, yet the analogies I have in mind include essential emergent
meaning (e. g. absurdity) that cannot be viewed as a conjunction of structures in the
inputs.
Forbus, Gentner, Markman, and Ferguson make their claims about the theoretical
consensus for decomposition of processes as part of an attack on Douglas Hofstadter,
or rather a counterattack, since Hofstadter had claimed that their work, and similar
work in the relevant AI tradition1, is hollow, vacuous, a "dead-end" because it takes as
given what Hofstadter calls "gist extraction." Gist extraction is "the ability to see to
the core of the matter." Hofstadter views this ability as "the key to analogy making—
indeed to all intelligence" (Hofstadter, 1995). In collaboration with David Chalmers
and Robert French (1992), Hofstadter argues that there is no illumination to be found
in this tradition because the programs compute over merely meaningless symbolic
structures, because these formal structures are cooked beforehand in ways that make
matching easy, and, most importantly, because the cooking is done by the
programmer, not the program. In Hofstadter's view, the programmer has already done
the all-important gist extractions, boiled the meanings out of them, and substituted in
their place formal sets of predicate calculus symbols that already contain, implicitly,
the highly abstract, nearly vacuous formal match. The programmer then provides
these formal nuggets to the program. A program that detects the formal match
between them is not making analogies.
It seems to me that the people who understand the nature of analogy in this acerbic
debate are the practicial-minded non-academics who were actually making analogies
and disanalogies and posting them on the internet on the night of October 27, 1997—
Grey Monday, as it came to be called, once its aftermath was known. For them,
finding analogy or disanalogy is a process of forging, not merely finding, connections,

1 See e.g., Falkenhainer, Forbus & Gentner, 1989; Gentner 1983; Gentner & Gentner,
1983; Gentner & Stevens, 1983; Gick & Holyoak 1980, 1983; Holland, Holyoak,
Nisbett & Thagard, 1986; Holyoak & Thagard, 1989, 1995.
18 Mark Turner

and to forge those connections requires forging the inputs as you forge the
connections, revising the entire system of inputs and connections dynamically and
repeatedly, until one arrives at a network of inputs and connections that is persuasive.
My claim that analogy works by forging such a network may seem at first
counterintuitive because it runs against the folk theory according to which "finding an
analogy" consists of comparing two things in the world and locating the "hidden"
matches. We speak of "seeing" the analogy, which presupposes that the analogy is
completely there to be seen. On this folk theory, things in the world are objectively as
they are, things match objectively or not, and analogies and disanalogies are scientific
discoveries of objective truth. This view is reassuring and attractive. By contrast,
when I speak of forging inputs and connections, with continual revision and
backtracking, to build a network of spreading coherence that is "persuasive," it may
sound as if I am offering a dismal and barbarous postmodern hash in which anything
can be anything, any construal of the inputs will do, any connections will serve, since
all meaning is invented, a mere "construct," anyway.
But not so. Human beings have, over time, invented many human-scale concepts to
suit their purposes—chair, sitting, rich, Tuesday, marriage—, but these inventions are
highly constrained by both our biological endowment and our experience. First, there
are mental operations we all must use. Human beings must use conceptual framing,
categorization, blending, grammar, and analogy, for example. There is such a thing as
human nature, and it includes certain fundamental kinds of mental operation, analogy
being one of them. That is one kind of constraint. Second, profound constraints come
from success and failure. Some concepts and connections lead to success while others
lead to failure. Some help you live, some make you ill. With the right analogies, you
make a killing in the market, with the wrong ones, you get slaughtered. I have no
hesitation in saying that inventive forging of analogies can result in scientific
discovery of true analogies. In fact, it has resulted in scientific discovery of true
analogies. When a network is constructed that works, we call it true.
There is another reason that the folk theory of analogy appears attractive: after the
fact, in the rearview mirror, an established analogy usually looks exactly like a match
between existing structures, and it is easy to forget the conceptual work of forging
construals and connections that went into building the network.
Reforging the inputs while constructing the analogy was common procedure on the
night of Grey Monday. The analysts on the internet expressed revisions of the inputs
elaborately and unmistakably, using phrases like "What if what really happened on
Black Monday was . . ." and "You need to think of today's events not as X but instead
as Y."
I take it that this kind of reforging is typical for analogies in business and finance.
Consider the cover of The Economist for August 9, 1997. It shows a kite high in the
air and a man in a business suit flying it. The kite is labeled "Dow," for the Dow
Jones Industrials Average, and the caption reads "Lovely while it lasts." The final
conceptual product that comes out of understanding this analogy looks as if it matches
source and target and as if it projects an inference from source to target. But that
description of the product is not a model of the process.
Forging Connections 19

When I think of someone flying a kite, at least a traditional kite like this one, rather
than a trick kite, I imagine that it is easy to do in good wind. If there is a difficult
stage in flying a kite, it is the beginning, when the kite is near the ground. Once the
kite is very high, it is much easier to keep aloft, given the relative constancy of the
wind and the absence of obstructions. The kite-flier wants to keep the kite at a single
high altitude, and when he has had his fun, he winds up his string.
The phrase "Lovely while it lasts" is conventionally used to suggest that "it" won't
last, and interpreting "it" as referring to the Dow suggests that the cartoon concerns an
20 Mark Turner

impending fall in the market. Under pressure from this target, we can reconstrue the
source by recruiting to it some possible but peripheral structure: namely, gravity pulls
objects down with constant force, while winds are irregular; therefore, in some
moment, the winds will die and the kite will fall.
The inevitability of this fall is the inference to be projected to the target. But it is
constructed for the source only under pressure from the target.
If we look at this blend, we see that even though the organizing conceptual frame of
the blend is indeed flying a kite, much of its central structure does not come from that
source and indeed some of it conflicts strongly with that source . In the blend of
flying-a-kite and investing-in-the-stock-market, the kite-flier faces extreme difficulty
in keeping the kite aloft. In fact, he is physically struggling. Yet the kite is very high
and the winds are so fine that they are blowing the kite-flier's tie and hair forward.
This is highly unconventional structure for the source because, given the wind, he
should not be struggling at all.
We also know that this kite-flier is not satisfied merely to keep the kite up; he is a
special, bizarre, unique kite-flier with a special kite, who will be content only if the
kite constantly gains altitude, or meets some more refined measure, such as never
dropping in any given period of time lower than eight percent above its low in the
previous period. This is highly unconventional for the source.
In this blend, it is upsetting if the kite loses two percent of its altitude, dangerous if
it loses five percent, a major correction if it loses ten percent, and a complete disaster
if it loses thirty percent. Of course, in the source, none of these events presents any
problem at all; indeed, the only great disaster would be the kite's hitting the ground.
And yet, in the target, there is no possibility that the market could fall to zero, or even
down by half. We see, then, that the projection of inferences from the source is very
complicated. We need from the source the structure according to which constant
gravity will ultimately find a moment to overcome completely the inconstant winds,
but we cannot take from the source the inference that gravity will ultimately make the
kite fall to zero altitude and be smashed.
Now consider the man flying the kite. He is wearing a business suit and tie. This is
not impossible for the source, but it is odd, and the only motivation for building it into
the source is pressure from the target world of business and investment.
What counterpart in the target could the kite-flier in the source have? He must
correspond analogically to something in the target, since the analogy is about harm
that will come to people and institutions, not to the kite. This is a more complex
question than it might seem. Consider that, in the domain of kite-flying, the actual
kite-flier could make the kite crash, raise it by letting out string, lower it by taking in
string, or reel in his kite and go home. But this structure is not recruited for the
source, projected to the blend, or given counterparts in the target. The kite-flier in the
blend cannot be any of these kite-fliers. The kite-flier-investor in the blend cannot sell
the market short and then make the kite lose altitude; he cannot make the Dow kite
crash to the ground; he cannot sell his stocks and get out of the market at its peak;
paradoxically, it is not even clear that he can have any effect on the kite at all, even
though he is holding the string. He can be affected by what happens to the kite but
probably cannot influence the kite significantly. Moreover, a real investor can make
Forging Connections 21

money even if the Dow Average stays fixed, by trading stocks as they rise and fall
individually. Indeed, this is the standard way to make money in the market, since
leaving out the effects of new investment in the market, there must be a loser for each
winner. This kite-flier in the blend is someone who is somehow invested in the
continuing ascent of the kite that is the Dow, perhaps someone whose money is largely
in Dow or S&P 500 index funds, or other Dow-oriented mutual funds. But notice that
in the source domain of kite-flying, there are no such kite-fliers. These kinds of kite-
fliers exist only in the blend, not in the source.
And finally, the string to the kite is not a possible kite string. It is a somewhat
smoothed graph of the Dow Average over something like the previous fifteen years.
Interestingly, Black Monday of 1987 is not visible, because including a sharp fall of
that sort, followed by the sharp rise, would deform the string unacceptably far from
the strictly increasing smooth curve of the source space In the source, the path of the
kite-string is a snapshot in time of a line in space, while in the target, the path of the
Dow Average is a graph of the value of a variable over time. (This is why the sky in
the blend is ruled like graph-paper.) In the source, the path of the kite string has to do
with the physics of kites, strings, wind power, and gravity, which should be crucial for
the analogy, since the central inference of the analogy has to do with this physics,
namely, gravity will at some moment be stronger than the winds. In the source
domain, the kite string is indispensable for raising the kite—without it the kite would
surely fall, quickly in light wind.
As we have seen, the blend that provides the inferences of the analogy has structure
for the kite string that either ignores the central structure of the kite string in the source
or powerfully contradicts it. The view of analogy as retrieving pre-existing
representations of analogs, matching and aligning them, and projecting inferences
from the source to the target fails for this analogy, which, like the Figaro cartoon, is
meant to be instantly intelligible and persuasive.
The Figaro and Economist examples work as serious analogical arguments, meant
to be persuasive on central issues of politics and economics, but because they are in
the form of cartoons, it might be tempting to dismiss them as exceptional. On the
contrary, when we turn to celebrated examples discussed in the literature on analogy
in fields like psychology and computer science, we find the same operations of
blending and forging, although they are more easily overlooked because they are
somewhat less visible. Consider the well-known analogy discussed by Keith Holyoak
and Paul Thagard in Mental Leaps: Analogy in Creative Thought (1995) and earlier in
Gick and Holyoak (1983), in which the target analog is a tumor to be destroyed and
the source analog is a fortress to be stormed. The problem in the target is that only a
laser beam of high intensity will kill the tumor, but it would also kill any other cells it
encountered on the way; a beam of low intensity would not harm the patient but would
be ineffective on the tumor. The source analog is a fortress whose roads are mined to
blow up under the weight of many soldiers; a few can get through without harm, but
they will be too few to take the fortress. The solution to taking the fortress is to send
many small groups of soldiers along many roads to converge simultaneously on the
fortress and take it. Analogically, the solution to killing the tumor is to send many
22 Mark Turner

laser beams of low intensity along many paths at the tumor, to arrive simultaneously
and combine to have the effect of a beam of high-intensity.
The analogy looks, after the fact, like a straightforward matching of source and
target and projection of useful inferences, but if we look more closely we see, I think,
that this source was put together in this fashion under pressure to make this analogy.
Of course, after the target and source are put together in the right ways so that the
analogy will work, they can be handed to someone as analogs to be connected in a
straightforward fashion, but connecting these pre-built representations is not
understanding analogy.
Consider the actual military situation in the source. When combat resources are
plentiful and easily replaced, commanders facing a crucially important military
objective have historically not hesitated to sacrifice soldiers and replace them. The
straightforward solution for the source is to run animals or soldiers up the road,
sacrificing as many as necessary to clear the mines. With a sufficient supply of
soldiers, the mines will present no problem and the fortress will be taken. After all,
there cannot be many mined places. The residents of the fortress must be able to
move vehicles over the roads, which they could do only by avoiding the few places
that are mined. Moreover, only some spots on a road are suitable for mining in any
event. Bridges, for example, are rarely mined because the mines are too easily
detected. There is no point in mining the road if the soldiers can simply walk through
the field alongside it, so one must either install entire fields of mines or pick very
narrow passes in the topography for placing mines.
But these straightforward and conventional military framings of the source do not
serve the analogy, so the representations of the source given in the scholarship
typically rebuild the source artificially so as to disallow them. For example, the
representation given in Gick and Holyoak and again in Holyoak and Thagard is this:
the attacking general has just enough men to storm the fortress—he needs his entire
army, so cannot sacrifice any of them. The purpose of this weird representation of the
source is clearly to disallow the standard representations so the analogy will work.
That particular forging of the source in the service of the analogy is explicit, but
some other crucial forgings are only implicit. For example, I have told the fortress
story to military officers of various ranks. One of them responded, "it says the fortress
is situated in the middle of the country, surrounded by farms and villages. Why
doesn't the general just send his troops through the fields?" This is an excellent
objection. However, that construal is implicitly disallowed. The Fortress Story tells
us that the attacking general is a "great general," and that he solves this problem by
dividing up his army and sending them charging down different roads. We know that
a "great general" could not have missed so obvious a solution as marching his troops
through the field, and also suspect that the defender of the fortress is unlikely to be so
inept as to mine roads running through open fields, so we conclude that in some
unspecified way the source does not allow this possibility, even though nothing
explicit forbids it. The officer asking the excellent question was answered by a
companion officer, "All of the roads must go through narrow passes or something."
The most profound conceptual reforging in the service of making analogical
connections between tumor and fortress is the most subtle. In the source, it is an
Forging Connections 23

unchangeable truth and a central point in military doctrine that the armed force one
can bring to bear is also a vulnerable asset one does not wish to lose. For example, the
British Home Fleet during World War I was exceptionally strong, but its sheer
existence as a "force-in-being" was so important that it was almost never risked in
actual battle, the single exception being the Battle of Jutland in 1916, the only major
naval battle of the war. In the source, the force and the vulnerability cannot be
separated, and their inseparability is crucial. But if the tumor-fortress analogy is to go
through, they must somehow be separated, because in the target, the force is not
vulnerable. As Holyoak and Thagard note, the laser beam and the laser are not at risk.
Nor can the vulnerability of the force in the source be ignored, because vulnerability is
indispensable structure for the target. The solution is to take what cannot be separated
in the source and to conceive of it as having two aspects—a force whose intensity
varies with the number of soldiers that constitute it, and the physical soldiers who are
vulnerable. These aspects are projected to the blend separately. The military force
with variable intensity is blended with the laser beam; the vulnerable soldiers are
blended with the patient. Again, we see that the important work of analogy is not to
match analogs but, more complexly, to create an integration network which requires
reinterpretation of the analogs.
It may still be tempting to dismiss these examples as inconsequential. Two are
cartoons and one is a hypothetical problem of the sort dreamed up by psychologists
and inflicted upon college students as subjects. However, my last example is a
historical analogy that established policy, changed law, altered the urban landscape,
and cost plenty of money. It is Justice William O. Douglas's invention of a policy as
expressed in his opinion in a case in 1954 on the constitutionality of the Federal Urban
Renewal Program in Washington, D. C. Douglas needed to justify a policy according
to which the Federal government would be authorized to condemn and destroy entire
urban areas, even though nearly all of the privately-owned properties and buildings to
be destroyed met the relevant legal codes, and most of those were in fact individually
unobjectionable. Douglas hit upon the analogical inference that, just as an entire crop,
nearly all of whose individual plants are healthy, must be destroyed and entirely
replanted when some small part of it is blighted, so an urban area, nearly all of whose
individual buildings, utilities, and roads are satisfactory, must be completely destroyed
and redesigned from scratch when it has become socially unsavory. The following
paragraph suggests his reasoning:

The experts concluded that if the community were to be healthy, if it


were not to revert again to a blighted or slum area, as though possessed
of a congenital disease, the area must be planned as a whole. It was not
enough, they believed, to remove existing buildings that were
unsanitary or unsightly. It was important to redesign the whole area so
as to eliminate the conditions that cause slums—the overcrowding of
dwellings, the lack of parks, the lack of adequate streets and alleys, the
absence of recreational areas, the lack of light and air, the presence of
outmoded street patterns. It was believed that the piecemeal approach,
the removal of individual structures that were offensive, would be only
24 Mark Turner

a palliative. The entire area needed redesigning so that a balanced,


integrated plan could be developed for the region including not only
new homes but also schools, churches, parks, streets, and shopping
centers. In this way it was hoped that the cycle of decay of the area
could be controlled and the birth of future slums prevented. (Quoted in
Schön and Rein 1994, page 24.)

It might seem as if this invention of a justification for policy is the product of


straight-forward analogy: agricultural blight, a biological scenario, is mapped
analogically onto urban distress, a social scenario. But that analysis of this analogy,
although appealing, is inadequate. That analysis is based on the assumption that the
thinker first locates all the central structure in the familiar source scenario (here,
blight) and then attempts to project it onto the other target scenario (here, slums), so as
to create the "strongest match," where "strongest" means least difference between the
relations in the two notions. On such an analysis, we look first for causal structure in
blighted crops: there are organisms that inhabit the crop and that directly cause the
problem. Are there organisms that inhabit the slum and that directly cause the
problem? Certainly: the slum-dwellers. For the blighted crops, there is a solution:
destroy the crop completely so as to destroy the organisms completely, and then
replant the crop identically, so that it becomes exactly what it was before it was
inhabited. Projecting this to slums, we have a straightforward solution: raze the slum
areas entirely so as to destroy the residents, and then rebuild the area identically so
that it becomes what it was before it was inhabited.
Of course, this analysis, when spelled out this way, is ludicrous. Douglas
began instead with distinct preferences in thinking about the slums: the residents must
not be harmed, and even inconvenience to them must be attenuated; they are not to be
stigmatized or viewed as the important cause of the problem, even though the causal
chain must inevitably run through their actions; the Federal government is to be
viewed as responsible for correcting such problems; the extension of power to the
Federal government in its dealing with social ills is desirable; and so on. In order to
invent his justification, Douglas was obliged to use conceptual blending.
His blend leads to emergent structure not contained in the inputs. For
example, before this blending, the concept of urban distress does not by itself yield the
policy of razing perfectly acceptable buildings and ripping up useful roads that are in
good repair. In Douglas's "urban blight" blend, the agents that cause blight are
blended not with the biological agents in the area of urban distress but rather with the
area itself. So in the blend, but in neither of the inputs, the problem is handled by
saving the resident organisms but razing the crop/area. A summary of Douglas's
argument as "areas with slums are like crops with blight, so we should do to them
what we do to the crops" misses the conceptual work in the invention of this policy.
Douglas and the experts used elaborate conceptual blending to create a warrant for a
major legal decision that set expensive and highly aggressive governmental policy.
Again, the purpose of the analogy is in fact to create inferences for the target, and after
the fact, in hindsight, the analogy can be viewed as consisting of retrieving pre-
existing analogs, matching and aligning them, and projecting inferences from the
Forging Connections 25

source to the target. But that hindsight analysis misses, I propose, the essential
cognitive operations and conceptual work.
If analogy in general involves dynamic forging of analogs, connections, and blends
as we create a network of spreading coherence, then we must find a new model of
analogy. I nominate the Fauconnier & Turner network model of conceptual
integration for the job.

References

Chalmers, D. J., R. M. French, & D. R. Hofstadter, D. R. 1992. "High-level perception,


representation and analogy: A critique of artificial intelligence methodology." Journal of
Experimental and Theoretical Artificial Intelligence, 4, 185-211.
Coulson, S. 1996. "The Menendez Brothers Virus: Analogical Mapping in Blended Spaces."
In Conceptual Structure, Discourse, and Language. Edited by Adele Goldberg. Stanford:
Center for the Study of Language and Information.
Falkenhainer, B., Forbus, K. D., & Gentner, D. 1989. "The structure-mapping engine:
Algorithm and examples." Artificial Intelligence 41 (1), pages 1-63.
Fauconnier, Gilles. 1997. Mappings in Thought and Language. Cambridge: Cambridge
University Press.
Fauconnier, Gilles and Mark Turner. [1994]. "Conceptual projection and middle spaces,"
UCSD Cognitive Science Technical Report 9401. San Diego. [Available from
http://cogsci.ucsd.edu and from http://www.wam.umd.edu/~mturn]
____________. [1996] "Blending as a Central Process of Grammar" in Conceptual Structure,
Discourse, and Language. Edited by Adele Goldberg. Stanford: Center for the Study of
Language and Information.
____________. [in pressa] "Principles of Conceptual Integration" in Conceptual Structure,
Discourse, and Language, II. Edited by Jean-Pierre Koenig. Stanford: Center for the Study
of Language and Information.
____________. [in pressb]. "Conceptual Integration Networks." Cognitive Science.
____________. [in preparation] Making Sense.
Forbus, K, D. Gentner, A. B. Markman, R. W. Ferguson. In press. "Analogy just looks like
high level perception: Why a domain-general approach to analogical mapping is right."
Journal of Experimental and Theoretical Artificial Intelligence, 1997.
Gentner, D. 1982. "Are scientific analogies metaphors?" In D. S. Miall, editor. Metaphor:
Problems and perspectives. Brighton, Sussex: Harvester Press.
Gentner, D. 1983 "Structure-mapping: A theoretical framework for analogy." Cognitive
Science 7, pages 155-170.
Gentner, D., and Donald Gentner. 1983. "Flowing waters or teeming crowds: Mental models of
electricity. In D. Gentner and A. L. Stevens, editors. Mental models, pages 99-130.
Hillsdale, N. J.: Lawrence Erlbaum.
Gentner, D., and A. L. Stevens, editors. 1983. Mental models. Hillsdale, N. J.: Lawrence
Erlbaum.
Gick, M. L. and K. J. Holyoak. 1980. "Analogical problem solving." Cognitive Psychology
12, pages 306-355.
Gick, M. L. & Holyoak, K. J. 1983. "Schema induction and analogical transfer." Cognitive
Psychology 15, pages 1-38.
Hofstadter, Douglas. 1995. "A Review of Mental Leaps: Analogy in Creative Thought. AI
Magazine, Fall 1995, 75-80.
26 Mark Turner

Holland, J. H., Holyoak, K. J., Nisbett, R. E., and Thagard, P. R. 1986. Induction: Processes of
inference learning and discovery. Cambridge: MIT Press.
Holyoak, K. J. and Thagard, P. 1989. "An alogical mapping by constraint satisfaction."
Cognitive Science 13(3), pages 295-355.
Holyoak, K. J., and Thagard, P. 1995. Mental leaps: Analogy in creative thought. Cambridge:
MIT Press.
Schön, Donald and Martin Rein. 1994. Frame Reflection: Toward the Resolution of
Intractable Policy Controversies. New York: Basic.
Turner, Mark. 1996a. "Conceptual Blending and Counterfactual Argument in the Social and
Behavioral Sciences," Philip Tetlock and Aaron Belkin, editors, Counterfactual Thought
Experiments in World Politics. Princeton, N.J.: Princeton University Press. pages 291-295.
Turner, Mark. 1996b. The Literary Mind. New York: Oxford University Press.
Turner, Mark. 1991. Reading Minds: The Study of English in the Age of Cognitive Science.
Princeton: Princeton University Press.
Turner, Mark. 1987. Death is the Mother of Beauty: Mind, Metaphor, Criticism. Chicago:
University of Chicago Press.
Turner, Mark. (1989) "Categories and Analogies" in Analogical Reasoning: Perspectives of
Artificial Intelligence, Cognitive Science, and Philosophy. Edited by David Helman.
Dordrecht: Kluwer, 3-24.
Turner, Mark and Gilles Fauconnier. 1995. "Conceptual Integration and Formal Expression."
Metaphor and Symbolic Activity. 10:3, 183-203.
____________. [in pressa] "Conceptual Integration in Counterfactuals" in Conceptual
Structure, Discourse, and Language, II. Edited by Jean-Pierre Koenig. Stanford: Center for
the Study of Language and Information.
____________. [in pressb] "A Mechanism of Creativity." Poetics Today.
Rough Sea and the Milky Way:
‘Blending’ in a Haiku Text*
Masako K. Hiraga

Faculty of Liberal Arts,


The University of the Air,
Chiba City, 261-8586, Japan
hiraga@u-air.ac.jp

Abstract. This paper claims that the model of 'blending' proposed by Turner
and Fauconnier [16, 17] offers a useful tool for understanding poetic creativity
in general and metaphors in haiku1 in particular. It is one of the
characteristics of haiku that two or more entities (objects, ideas, and feelings)
are juxtaposed by loose grammatical configurations such as kireji (‘cutting
letters’) and kake-kotoba (‘hanging words’ or multiple puns). The
juxtaposed entities are put in comparison or equation, and contribute to
enriching the multi-layered metaphorical meaning of haiku. The analysis of
a sample text, a haiku describing a rough sea by Basho Matsuo, demonstrates
the effectiveness of ‘blending’ as an instrument for understanding the
cognitive role played by (i) metaphorical juxtaposition by kireji and (ii)
iconicity of the foregrounded elements in the text.

1 Blending and Metaphor


1.1 Cognitive Theory of Metaphor

Cognitive linguistics [cf. 5, 6, 7, 8, 14] treats metaphor as a key to understanding the


conceptual processes of the human mind. Metaphors are defined as “mappings

*
I am indebted to Joseph Goguen and Mark Turner for their invaluable comments and
suggestions.
1
Haiku or hokku as it was called during the time of Basho (1644 -1694), is the shortest form of
Japanese traditional poetry, consisting of seventeen morae, divided into three sections of 5-7-5.
Originating in the first three lines of the 31-mora tanka, haiku began to rival the older form in
the Edo period (1603-1867). It was elevated to the level of a profoundly serious art form by the
great master Basho. It has since remained the most popular poetic form in Japan. Originally, the
subject matter of haiku was restricted to an objective description of nature suggestive of one of
the seasons, evoking a definite, though unstated, emotional response. Later, its subject range
was broadened but it remained an art of expression suggesting as much as possible in the
fewest possible words. With the 31-mora tanka, haiku is composed by people of every class,
men and women, young and old. As the Japanese language has only five vowel sounds, [a], [e],
[i], [o] and [u], with which to form its morae, either by themselves or in combination with a
consonant as in consonant-vowel sequences, it is not possible to achieve rhyming in the sense
of European poetry. Brevity, suggestiveness and ellipsis are the life and soul of haiku and tanka
The reader is invited to read the unwritten lines with the help of imagination and background
knowledge.

C. Nehaniv (Ed.): Computation for Metaphors, Analogy, and Agents, LNCS 1562, pp. 27-36, 1999.
c Springer-Verlag Heidelberg Berlin 1999
28 Masako K. Hiraga

across conceptual domains,” in which “the image-schemata structure of the source


domain is projected onto the target domain in a way that is consistent with inherent
target domain structure” [7, p. 245]. In other words, metaphor allows us to
understand a relatively abstract and unstructured subject matter in terms of a more
concrete and structured subject matter through image-schemata, which Johnson [5, p.
79] defines as “those recurring structures of, or in, our perceptual interactions, bodily
experiences and cognitive operations.”
Turner and Fauconnier [16, p. 184] propose a model of “conceptual projection
across four or more (many) mental spaces rather than two domains,” to explain a wide
range of phenomena including “conceptual metaphor, metonymy, counterfactuals,
conceptual change”[16, p. 183], “classification, the making of hypotheses, inference,
and the origin and combining of grammatical constructions,” [16, p. 186] “idioms, ...,
jokes, advertising, and other aspects of linguistic and nonlinguistic behavior” [16, p.
186]. Mental spaces are small conceptual arrays constructed for local purposes of
understanding. In this new model, when a conceptual projection occurs, two input
mental spaces (source and target in a metaphor or analogy) are created. These input
spaces have “relevant information from the respective domains, as well as additional
structure from culture, context, point of view, and other background information” [1,
p. 5]. Unlike a unidirectional conceptual projection of the standard model, which
specifies a direction from source to target, the new model shows that a conceptual
projection is indirect and may move either way between input spaces. This is
because the many-space model assigns roles to two middle spaces in addition to the
input spaces. These middle spaces are “a generic space, which contains skeletal
structure that applies to both input spaces, and a blended space, which is a rich space
integrating, in a partial fashion, specific structure from both of the input spaces. The
blend space often includes structure not projected to it from either input space” [16, p.
184], namely, “emergent structure of its own” [16, p. 183]. At the same time,
“inferences, arguments, ideas, and emotions developed in the blend can lead us to
modify the initial input spaces and change our views of the knowledge used to build
those input spaces” [15, p. 83]. Blending is a dynamic creative activity.

1.2 The Cognitive Account of Literary Metaphor

Literary texts can be metaphoric on two levels: local and global. On the one hand,
literary texts display local metaphors, which are based on either conceptual mappings,
image mappings, or a combination of both. Conceptual mappings are often based on
conventional cognitive metaphors, which literary metaphors either extend, elaborate
or combine in a novel way. On the other hand, some texts as a whole can be read
holistically as global metaphors. According to Lakoff and Turner [9, pp. 146-147],
such global metaphorical readings are constrained in three major ways: (1) by the use
of conventional conceptual mapping; (2) by the use of commonplace knowledge in
addition to conventional metaphors; and (3) by iconicity -- a mapping between the
structure of a poem and the meaning or image it conveys. This last constraint of
iconicity, a mapping from the structure of language to the structure of the image in the
text as well as to the overall meaning of the text, is of particular importance because it
contributes to our recognition of the degree of organic unity of a text.
Rough Sea and the Milky Way: `Blending' in a Haiku Text 29

Hiraga [3] demonstrates that the many-space model is useful in analysing short
poetic texts such as haiku, which have rather obscured grammatical constructions and
dense cultural implications, for the following two reasons: (1) the ‘blending’ model
stresses the importance of “the emergent structure” of the blended space activated by
inferences from the input spaces and the contextual background knowledge, and
therefore, provides an effective tool for understanding the creativity of literary
metaphors (not only of haiku but also of any poetic text); (2) the many-space
approach, which does not specify unidirectional mapping between input spaces,
provides a better explanation of the rhetorical effects produced by loose grammatical
configurations in the haiku texts such as the juxtaposition of phrases by kireji
(‘cutting letters’) and kake-kotoba (‘hanging words’ or multiple puns) or those
produced by personification and allegory. One additional implication of the analysis
presented in Hiraga [3] is that understanding haiku texts, which are extremely short in
form and rich in traditional implications, requires common prior knowledge which is
long-standing in Japanese culture, and which shapes the cultural cognitive model. A
non-exhaustive list of the features of such knowledge would include: (1) pragmatic
knowledge of the context such as time, place, customs, life, etc., which contextualise
the poetic text in general terms; (2) folk models, which originate from myth and folk
beliefs about the conceptualisation of existing things; (3) conventional metaphors, in
Lakoff and Johnson's sense, which have been conventionalised in a given speech
community over time, and which a poet exploits in non-conventional ways; and (4)
the iconicity of kanji, Chinese logographs,2 which link form and meaning, particularly
with regard to their etymological derivation, and thereby serve as a cognitive medium
for haiku texts. The blending model provides an account for the process of
integration of these features of background knowledge in the reading of texts.
The present paper looks at one of the most famous haiku compiled in the travel
sketch by Basho called Oku no hosomichi,3 an acknowledged masterpiece in Japanese
literature [11]. The poem was chosen because (1) it has a kireji which divides the
text into two parts and puts them in metaphorical juxtaposition, and (2) the revision
done by Basho results in foregrounding the elements written in kanji, which play a
cognitive role to strengthen the organic unity of the text through iconicity. In my
analysis, I hope to demonstrate that cognitive poetics offers explanations of the
dynamic creativity of poetic meanings emergent out of blends as well as the organic
unity of form and concept expressed in the text.

2 Analysis
Example 1

2
The term ‘logographic’ will be used instead of ‘ideographic,’ because most kanji characters
correspond to words rather than ideas.
3
Oku no kosomichi was written as a travel sketch which consisted of a main narrative body,
fifty haiku poems by Basho and a few other poems by other authors. The fifty haiku poems are
considered as an integrated text in its own right, conforming to the general principle of
composition and structural congruence.
30 Masako K. Hiraga

araumi ya Sado ni yokotau ama no gawa


rough sea: Sado in lie heaven of river4
‘Rough sea: lying toward Sado Island the River of Heaven’ [12, p. 109].

2.1 Metaphorical Juxtaposition

The poem at first glance describes natural scenes. On the one hand, the sea is rough;
and on the other hand, over one’s head, there is the Milky Way arching toward the
island of Sado. Even if one does not have much pragmatic knowledge about Sado
Island or the Milky Way in Japanese history and culture, one may sense a grandness
of scale depicted by this haiku. It is a starry night. The Milky Way is magnificent.
The grandeur of the Milky Way is put in contrast to a dark rough sea beneath the
starry skies. The waves are terrifying; the water churns and moans, as if it would not
allow the boats to cross. It is dangerous and fearful in the night. This dark sea
does indeed separate the people living on the island of Sado from the mainland. The
island is visible across the troubled waves, perhaps with its scattered house-lights.
Human beings (including the poet) are so small in the face of the spectacular pageant
of powerful nature. And yet there are thousands of human lives and stories
embedded in the scenes.
The first five-syllable segment, araumi ya, consists of a noun, araumi (‘rough
sea’), and a kireji (‘cutting letter’), ya. Kireji, a rhetorical device, used in tanka and
haiku, consist of about a dozen particles and mark a division point in the text.
Although the functions of the division vary according to the particles, a general effect
of kireji is to leave room for reflection on the feelings or images evoked by the
preceding segment. Ya in Example 1 is a kireji particularly favoured by Basho and
said to have “something of the effect of a preceding “Lo!” It divides a haiku into
two parts and is usually followed by a description or comparison, sometimes an
illustration of the feeling evoked. There is always at least the suggestion of a kind of
equation, so that the effect of ya is often best indicated by a colon” [2, p. 189]. That
is, araumi (‘rough sea’) and the rest of the text, Sado ni yokotau ama no gawa (‘the
Milky Way, which lies toward Sado’), are juxtaposed to constitute a kind of metaphor
in which the feelings or images evoked by a rough sea are illustrated by the feelings
or images evoked by the Milky Way arching over the Island of Sado.
The next seven-syllable segment, Sado ni yokotau (‘[which] lies toward Sado’), is
an adjectival clause which modifies the last five-syllable segment, ama no gawa (‘the
river of heaven’). Sado is a place name, an island located about 50 miles away from
the coast of mid-Honshu. Ni (‘toward’) is a postpositonal particle of location.
Yokotau (‘to lie’) is a verb which normally has an animate agent and describes an
action (when used as a transitive verb) or a state (when used as an intransitive verb) of
spreading one’s body on something flat. As the grammatical subject of yokotau in
this poem is ama no gawa (‘the river of heaven’), an inanimate noun, the verb is used
metaphorically. The last five-syllable segment, ama no gawa, is a proper noun
signifying the Milky Way. It also involves a metaphor in which the path-shaped set

4
Word-for-word translation is given by the author and not in [12]. The author consulted [10]
and [12] to provide word-for-word translation.
Rough Sea and the Milky Way: `Blending' in a Haiku Text 31

of stars is seen as a river. The second and the third segments of the poem thus
constitute a local metaphor, in which the river of stars in the heaven spreads its body
toward the Island of Sado. There are conventional conceptual metaphors behind this
local metaphor, namely, NATURE IS ANIMATE5 (in this case RIVER IS
ANIMATE6), and A PATH-SHAPED OBJECT IS A RIVER.
Now how does this local knowledge about the grammatical and rhetorical
structure of this poem relate to the understanding of the whole text? There are at
least two major input spaces created at the reading of this poem: a rough sea and the
Milky Way. These two input spaces are juxtaposed and mediated by the use of kireji.
The input space of araumi (‘rough sea’) connotes the Sea of Japan, which is famous
for its violent waves, and which geographically lies between the mainland and Sado
Island. Although syntactically Sado modifies ama no gawa (‘the river of heaven’),
the configurational proximity and the semantic continuity of araumi and Sado seem to
suggest a metonymic reading of araumi, particularly at the time of on-line processing
of meaning. That is, a local blend of rough sea and Sado Island. This does not
deny, however, an interpretation of Sado and ama no gawa as being another local
blend, based on the grammatical proximity. The important point here is rather that
the understanding of this poem requires an array of blending, not only sequentially
but also simultaneously. It could be that the input space of Sado simultaneously
relates to the input spaces of a rough sea and the Milky Way.
Let us first consider the background knowledge recruited at the time of the blend,
for the Island of Sado and the Milky Way have rich cultural implications. Sado
Island has a long history. The island is geographically separated from the mainland
by the Sea of Japan. Because the rough waves prevented people from crossing the
sea by boat, the island functioned as a place of exile for felons and traitors from the
10th century up to the end of the 19th century. At the same time, gold mines were
discovered there in the early 17th century, and attracted all kinds of people. At the
time of Basho (1644-1694), the Tokugawa Shogunate had control of the gold mines,
and the people imprisoned in the island were forced to serve as free labour there.
Thus, the metonymy of a rough sea with Sado Island activates the cultural and
historical meanings of the island. Also, the roughness of the waves is consonant
with the roughness of life on the island which involves violence, cruelty, despair, and
so on. Another important point is that the name of this island is written in two
Chinese logographs, which mean ‘to help’ and ‘to cross’ respectively. The cognitive
meanings of the logographs, particularly that of ‘crossing,’ seem to be mapped onto
the image of a rough sea at the time of the blend. One can probably detect, in the
generic space of these two inputs, workings of such salient conventional metaphors as
LIFE IS A BOAT JOURNEY and THE WAVES ARE AN OBSTACLE TO SUCH A
JOURNEY. The difficulty of crossing is highlighted and emergent in the blend,
which further reinforces the sad feelings relating to the difficulty of reunion by
separated people. The blend is built up by recruiting structures from the

5
Metaphorical concepts are indicated in uppercase letters.
6
Some rivers have human male names such as Bando-Taro (‘place-male name’) for Tone
River. Furthermore, rivers are prototypically metaphorised as snakes in Japanese idioms, e.g.,
kawa ga dakoo-suru (‘A river snakes,’) kawa ga hebi no yoo-ni magaru (‘A river curves like a
snake,’) etc.
32 Masako K. Hiraga

conceptualisation of natural force (rough sea) and natural geography (island).


In addition, there is a sad legend about the Milky Way, which originated in China
and was brought to Japan. The date on which this poem was composed, the night
before the seventh night of the seventh month of the lunar calendar, suggests that the
poet had this legend in his mind. For the seventh night of the seventh month (i.e.,
the 7th of July) is known and celebrated as the ‘star festival’ after the Chinese story.
The two bright stars on either side of the Milky Way, the star Vega and the star Altair,
are believed to be Princess Weaver and Oxherd. These two stars face each other
across the Milky Way; but, because the Milky Way is so wide and vast they cannot
meet easily. One day a god of heaven pitied Princess Weaver’s lonely life and
arranged for her to marry Oxherd. After they married, the Princess became too lazy
to weave. The angry god punished her and allowed her to visit her husband only
once a year, the night of July 7, but only if the night was fair.
In the blend of Sado Island and the Milky Way, the separation of this legendary
couple is mapped onto the people imprisoned in Sado Island. The generic space
reflects event frames for confinement to both input spaces -- agent, spatial
confinement, limited freedom, limited means of travel, and the mental state of being
separated. The blend has an emergent structure of its own -- the revelation of the
elegy brought up by the separations, real and mythological, on the one hand, and by
the stark contrast of peaceful starry skies with the magnificent Milky Way and
turmoil of human emotions displayed in the history of gold miners and prisoners --
sorrow and ambition, despair and power, on the other.
The global blend of a rough sea and the Milky Way exhibits a structure which
‘blends’ local blends in a reinforced way. These two major input spaces share a
generic space with a set of associations of water, because araumi is a sea with violent
waves and ama no gawa is the river of heaven. Both the sea and the river are paths
of water which block people’s access to the other side. Therefore, the overall effect
of the global blend of a rough sea and the Milky Way is the ‘reinforced’ conceptual
structure of water being an obstacle, a separating line, something which prevents the
loved ones from reuniting. The blend produces a feeling of elegy, or a realisation of
helplessness or nothingness of human beings in front of powerful nature such as
terrifying rough waves and vast starry skies. At the same time, there is another
structure emergent in the blend, i.e., contrasts of various nature -- a contrast of motion
between the violent waves and the peaceful skies; a contrast of colour and light
between the black and dark sea and the silvery and bright skies; and a contrast of the
real and the legendary between life stories of people and the love story of stars.

2.2 Iconicity of Haiku

Interpretation of this haiku according to the theory of blending is also supported by


some of the iconic effects produced by kanji, on the one hand, and by sound patterns,
on the other.

Foregrounding by Kanji. Let us first look at the visual elements. The Japanese
language has a unique writing system in which three different types of signs are used
to describe the same phonological text: kanji (Chinese logographs), hiragana
(syllabary for words of Japanese origin), and katakana (syllabary for words of foreign
Rough Sea and the Milky Way: `Blending' in a Haiku Text 33

origin other than Chinese). In the context of the present discussion, logographs are
of particular importance because they function as a cognitive medium for poetry.
Basho revised this poem orthographically from 2a to 2b [11].

Example 2

a.

b.

araumi ya Sado ni yokotau ama no gawa

The poem’s three noun phrases, araumi, Sado and ama no gawa, were spelled all in
kanji in both the first (Example 2a) and the revised (Example 2b) versions. The
boxed part, the verb of lying, was revised from kanji, a Chinese logograph, to
hiragana, two syllabic letters. The main effect of changing the character type in the
verb yokotau (‘to lie’) from kanji to hiragana is to make that part of the text
a ground for the conspicuous profile of (‘rough sea’) and (‘milky way’).
In general, because kanji, being logograhic characters, have a distinct angular form
and semantic integrity, they differentiate themselves visually and cognitively as the
figure while the remaining hiragana function as the ground.
The words contribute to creating input spaces at the time of the blends, i.e.,
(‘rough sea’) and (‘milky way’), were spelled in kanji. Sado , a place
name, is also written in kanji. Notice also that the three nouns, araumi (‘rough sea’),
Sado, and ama no gawa (‘milky way), are all in two kanji. Also, (‘sea’),
(‘to cross water’), and (‘river’) in these three nouns (underlined in Example 2a
and 2b) share the same radical signifying water. Both (‘rough sea’) and
(‘milky way’) relate to water, as described above. The semantic similarity between
(‘rough sea’) and (‘milky way’) in terms of ‘wateriness’ and the obstacle
(in the real life and in the legend explained above) and their dissimilarity (violence in
the ‘rough sea’ and peacefulness in ‘the river of heaven’) are also foregrounded.
This is a case of diagrammatic iconic effect, intensifying the meaning of the
foregrounded elements by the repetitive use of similar visual elements -- two-
character nouns and the same radical.
In addition, (‘to cross water’) in Sado , name of the Island, seems
important, because this logograph means ‘to cross.’ As the background history and
the legend show, both ‘rough sea’ and ‘the milky way’ are obstacles for the loved
ones crossing for their meeting. This character is placed in the middle of the poem
as if it signalled the crossing.

Sound Patterns. The sound structure also exhibits interesting iconic effects which
contribute to supporting the interpretations drawn by the theory of blending. The
following analysis illustrates three possible iconic effects produced by the distribution
of vowels, consonants, and the repetition of adjacent vowels. Example 3 is a
phonological notation of the poem’s syllabic structure.
34 Masako K. Hiraga

Example 3
Line 1 a-ra-u-mi ya ([-] = syllabification)
Line 2 sa-do ni yo-ko-ta-u
Line 3 a-ma no nga-wa ([ng] as in thing)

Firstly, the distribution of the vowels shows that the poem dominantly uses back
vowels such as [a] and [o]. As indicated in Table 1, there are 9 [a]’s (53%) and 4
[o]’s (24%) out of 17 vowels.

a o i u e Total
Line 1 3 0 1 1 0 5
Line 2 2 3 1 1 0 7
Line 3 4 1 0 0 0 5
Total 9 4 2 2 0 17

Table 1: Distribution of Vowels

[a] and [o] are pronounced with a wide passage between the tongue and the roof of
the mouth, and with the back of the tongue higher than the front. The backness and
the openness often create ‘sonorous’ effects which may draw associations of
something deep and large [cf. 4). In this poem, perhaps, these effects have
something to do with the largeness of waves in the rough sea and the depth and width
of the river of heaven.
The sonorous effects are also created by the frequent use of nasals ([m], [n], and
[ng]), and vowel-like consonants ([y] and [w]). Table 2 shows the distribution of
consonants:

Position of Syllable in Line # of # of


Sono- Obst-
1 2 3 4 5 6 7 rants ruents
Line 1 r m y 3 0
Line 2 s d n y k t 2 4
Line 3 m n ng w 4 0
Total 9 4

Table 2: Distribution of Consonants

Dominance of sonorants such as [m], [n], [ng], [r], [y], and [w] is characteristic of the
text. The sonorants often provide prolongation and fullness of the sounds, and hence
usually produce lingering effects [cf. 13, pp. 10-12]. It could be argued that the back
vowels and sonorant consonants jointly reinforce a sound-iconic effect of the ‘depth’
or the ‘largeness’ of the image of ‘water’ elements, i.e., a rough sea and the river of
heaven expressed by the poem. Also note that the only line that has obstruents (i.e.,
non-sonorants such as [s], [d], [k], and [t]), is Line 2, in which the island is
mentioned. If one can interpret ‘sonorants’ as iconically associated with ‘water’
Rough Sea and the Milky Way: `Blending' in a Haiku Text 35

elements, then one can also infer that ‘obstruents’ are associated with ‘non-water,’
namely, the island in this text.
The last point is that the text seems to conceal very cleverly and wittingly a key
word, which is congruous with the meaning of the poem. The prototypical sound
sequence in Japanese is an alternation of a single consonant and a single vowel such
as CV-CV-CV. This general feature applies to the haiku text, too. A closer look,
however, enables us to recognise that there are a few occurrences of two vowels, [a]
and [u], adjacent to each other such as [a-u]. They occur in a r a u m i in Line 1 and
y o k o t a u in Line 2. In Line 3, there is a similar sound sequence, g a w a, as [w] is
phonetically close to [u]. It could be said that each line of the poem has a vowel
sequence, [a-u], hidden in the sound sequence of a word or two adjacent words.
Very interestingly, this vowel sequence, [a-u], is a verb in Japanese, which means ‘to
meet.’ The hidden repetition of [a-u] (‘to meet’) in each line could be read as an
echo of a hidden longing of the separated people. Again, the iconic effect of this
hidden element supports the reading of the text as a global metaphorical juxtaposition,
i.e., separation of the two stars on either side of the Milky Way mapped onto the
separated people in the Island of Sado from their loved ones in the mainland.

3 Conclusion

The study of the haiku text, taken from Basho's Oku no hosomichi, has pointed out
that the blending model proposed by Turner and Fauconnier [16, 17] provides an
effective tool for understanding creative mechanism of haiku. It has been claimed
that the cognitive projection derived by the metaphorical juxtaposition by kireji
(‘cutting letters’) is to be explained as a global blend which integrates input mental
spaces, which are at the same time locally blended spaces. This integration occurs
as a dynamic process of ‘making sense’ over the entire array of many mental spaces
under our recruitment from cultural and historical knowledge and other background
contexts, and thus creates emergent structures.
Interpretations of the literary text are constrained in certain ways -- by the use of
conventional conceptual mapping, by commonplace knowledge and by iconicity
between the structure and the meaning. The analysis has demonstrated that the
reading of haiku is also dependent on these factors. Basho used conceptual
metaphors, and exploited almost every possible resource in lexicon, syntax, and
orthography to multiply the implications of the short poetic text, e.g., kireji (‘cutting
letters’), kanji (‘Chinese logographs’), allusions, and sound patterns. It is
indispensable to rely also on cultural and historical background knowledge to
understand the enriched meanings of his texts. Finally, iconicity is of particular
importance in a short poetic text such as haiku because brevity seems to require the
form itself to participate in giving images, concepts, and feelings. This has been
demonstrated by Basho’s clever use of kanji and sound structure in visual, auditory
and cognitive terms.
36 Masako K. Hiraga

References

1. Caudle David J. 1995. “Conceptual Metaphor, Cognitive Spaces, and the


Semiotics of Invective.” Unpublished ms.
2. Henderson, Harold G. 1958. An Introduction to Haiku. NY: Doubleday.
3. Hiraga, Masako K. In press. “‘Blending’ and an Interpretation of Haiku: A
Cognitive Approach.” Poetics Today.
4. Jespersen, Otto. 1964[1921].7 “Sound Symbolism.” Language: Its Nature,
Development and Origin, 396-411. New York: Norton.
5. Johnson, Mark. 1987. The Body in the Mind: The Bodily Basis of Meaning,,
Imagination, and Reason. Chicago: University of Chicago Press.
6. Lakoff, George. 1987. Women, Fire, and Dangerous Things: What
Categories Reveal about the Mind. Chicago: Chicago University Press.
7. Lakoff, George. 1993. “The Contemporary Theory of Metaphor.”
Metaphor and Thought, ed. Andrew Ortony, 202-251. Second Edition.
Cambridge: Cambridge University Press.
8. Lakoff, George, and Mark Johnson. 1980. Metaphors We Live By. Chicago:
Chicago University Press.
9. Lakoff, George and Mark Turner. 1989. More than Cool Reason: A Field
Guide to Poetic Metaphor. Chicago: Chicago University Press.
10. Matsuo, Basho. 1966[1694]. “The Narrow Road to the Deep North.” Trans.
and ed. Nobuyuki Yuasa. The Narrow Road to the Deep North and Other
Travel Sketches. London: Penguin Books.
11. Matsuo, Basho. 1957[1694]. Oku no Hosomichi (Sora zuikounikki tsuki)
[The narrow road to the deep north and Sora's travel diary]. Annotated.
Shoichiro Sugiura. Tokyo: Iwanami Shoten.
12. Matsuo, Basho. 1996[1694]. Basho's Narrow Road. Trans. Hiroaki Sato.
Berkeley, CA: Stonebridge Press.
13. Shapiro, Karl and Robert Beum. 1965. A Prosody Handbook. New York:
Harper and Row.
14. Sweetser, Eve. 1990. From Etymology to Pragmatics: Metaphorical and
Cultural Aspects of Semantic Structure. Cambridge: Cambridge University
Press.
15. Turner, Mark. 1996. The Literary Mind. Oxford: Oxford University Press.
16. Turner, Mark, and Gilles Fauconnier. 1995. “Conceptual Integration and
Formal Expression.” Metaphor and Symbolic Activity 10(3). 183-204.
17. Turner, Mark, and Gilles Fauconnier. In press. “A Mechanism of Creativity.”
Poetics Today.

7
The reference with different years of publication indicates that the year listed first is an access
volume according to which the citation is made and the year in brackets is a source or an
original work.
Pragmatic Forces in Metaphor Use:
The Mechanics of Blend Recruitment in Visual
Metaphors

Tony Veale

School of Computer Applications


Dublin City University, Dublin, Ireland

Abstract. Metaphor and analogy are cognitive tools which, in serving


specific communicative goals and descriptive needs, are subject to a host
of pragmatic pressures. Knowing that these pressures will shape the in-
terpretation of a given metaphor, an effective communicator will exploit
them to structure the conceptual content of the metaphor in such a way
as to maximise its perceived aptness and argumentative force to the re-
cipient. This paper considers the form that such pressures can take, and
the computational strategies that a communicator can employ to max-
imise the effectiveness of a given metaphor. We choose as our domain of
discourse a collection of visual metaphors which highlights the effect of
pragmatic strategies on metaphoric communication.

1 Introduction

If a software agent is to fluently interact with, and act on behalf of, a human user,
it will require competence with both words and pictures, and metaphors which
combine the two. However, a multitude of pragmatic pressures interact to shape
the generation and interpretation of such multimedia metaphors. These pressures
range from the need to relax strict isomorphism when identifying a mapping re-
lationship between the tenor and vehicle domains, to recruiting intermediate
blends, or self-contained metaphors, as mediators between certain cross-domain
elements that would otherwise be considered too distant in conceptual or imag-
inistic space to make for an apt and aesthetically coherent metaphor. To apply
Hofstadter’s terminology of [6], such pressures fall under the broad rubric of
‘conceptual slippage’. Slippage mechanisms allow a metaphor’s content or mes-
sage to fluidly shift from one underlying concept to another, maximising the
structural coherence of the network of ideas that comprise the message.
This paper examines the complex interactions between these various slippage
pressures, and how they can be accommodated with a computational framework
that can potentially be exploited by a software agent. Though such fluid aspects
of metaphor can be accounted for structurally, they nevertheless demonstrate
that metaphor entails more than a simple structure-matching solution to the
graph-isomorphism problem, harnassing a range of on-the-fly reasoning processes

C. Nehaniv (Ed.): Computation for Metaphors, Analogy, and Agents, LNCS 1562, pp. 37–51, 1999.

c Springer-Verlag Berlin Heidelberg 1999
38 Tony Veale

that can create complex transformational chains between entities. In computa-


tional models of analogy such as the SME model (Structure-Mapping Engine)
of [2] and the ACME model (Analogical Constraint Mapping Engine) of [7], two
cross-domain entities are said to be analogical counterparts if they occupy the
same relative position in their respective semantic structures. In contrast, the
metaphors studied in this research suggest that analogical equivalence is much
more than a matter of structural isomorphism: not only must two cross-domain
concepts occupy the same relative semantic position, there must be a compelling
semantic rationale for one to be mapped to the other.
The phenomena studied in this paper help to make clear exactly what struc-
tural forms this semantic rationale must take. We demonstrate that in many
cases the rationale is blend-centered, and that novel visual metaphors often re-
cruit conventional visual blends to pragmatically motivate the key mappings of
the metaphor. Because these blends represent established metaphors in their own
right, they lend an immediacy to the metaphor in which they are incorporated,
helping to make this encompassing metaphor eye-catchingly apt. However, as we
shall further show, we do not need to posit a new theory of metaphor to account
for these phenomena, as the mechanics of this recruitment process are readily ex-
plained within the computational framework of Veale and Keane’s Sapper model
(see [9,10,11]).

2 Computational Models of Metaphor

At the heart of analogy and metaphor lies a structure-mapping process that is


responsible for creating an isomorphic correspondence between semantic sub-
structures of the tenor and vehicle domains. Isomorphism is a mathematical
notion that guarantees the systematicity and coherence of any resulting inter-
pretation, by ensuring that each relation and object of the tenor domain receives
at most one correspondence in the vehicle domain. Isomorphism is central to
metaphor and analogy because, in logical/computational terms, all meaning is
expressed via structure; if a cognitive process does not respect structure, it can-
not respect meaning, and thus, cannot itself be a meaningful process. Though
a graph-theoretic mathematical notion, isomorphism is implicit in the writings
of many non-mathematical philosophers of metaphor. Max Black (in [1]), for
example, describes metaphor as a process in which a blackened sheet of glass
inscribed with translucent markings (the vehicle) is placed over a visual scene
like the night sky (the tenor). Since only those stars which show through the
markings are visible to the observer, a sub-graph isomorphism between glass and
scene is created (e.g., those stars comprising the Pegasus constellation might be
picked out by a darkened glass inscribed with a picture of a winged horse).
Like SME and ACME, Sapper is a computational model founded upon this
notion of structure-mapping between domains. However, unlike SME and ACME,
Sapper requires that two cross-domain concepts have more in common that an
isomorphic structural setting if they are to be paired in an interpretation of a
given metaphor. In addition to structural isomorphism, Sapper requires that an
Pragmatic Forces in Metaphor Use 39

analogical pairing of concepts either share a common set of features (abstract


or concrete) or be structurally related to another, more pivotal, pair that do.
Concepts that share a number of semantic features or attributes are said to be
linked by a ‘bridge relation’, and it is upon such ‘bridges’ that Sapper grounds
the interpretation of a given metaphor. For instance, the concepts Scalpel and
Cleaver share the associations Sharp, Blade and Blood, and thus a bridge rela-
tion is established between both. Higher-level analogical correspondences can be
grounded in this bridge if the corresponding concepts relate to the bridge in an
identical semantic fashion; thus, because Surgeons use Scalpels, and Butchers use
Cleavers, a mapping between Surgeon and Butcher can be grounded in the bridge
relation between Scalpels and Cleavers. Bridges based upon low-level literal and
perceptual similarities correspond to basic attributive metaphors, and are con-
sidered by Sapper as instantiations of the generic schema X–metaphor→Y. Sap-
per views metaphor interpretation as a process of bridge-building in which new
bridges are constructed using existing bridges as foundations; thus Sapper might
construct the bridge Surgeon–metaphor→Butcher by building upon the lower-
level bridges Scalpel-metaphor→Cleaver or Surgery–metaphor→Slaughter.
Sapper is a graph-matching system then (see [10] for a full algorithmic
treatment and complexity analysis), one which exploits the bridge schema X–
metaphor→Y to ensure that pivotal elements of a cross-domain mapping are
grounded in perceptual similarity. But as a model of the human metaphoric
faculty, we do not see it as a responsibility of Sapper to establish these low-
level bridges to begin with, rather to build upon concepts so linked to create
higher-level comparisons. Effectively then, Sapper employs a pro-active view of
long-term and short-term memory in which shared associations between concepts
are automatically recognised and noted, making low-level bridge construction a
memory-centred rather than mapping-centred task.

2.1 Argument by Analogy in Sapper: A Worked Example


The importance of structural isomorphism in everyday argument by analogy,
and how it is modelled by the Sapper approach, is best illustrated with a topical
example. Chafing under the U.S. government’s decision in a recent anti-trust
case against Microsoft (on behalf of the competion rights of a rival company,
Netscape Inc.), its CEO and chairman Bill Gates argued that to expect Microsoft
to distribute Netscape Navigator as part of the Windows’98 operating system
was as irrational as expecting CocaCola to bundle two cans of Pepsi with every
sixpack of Coke. The analogy is a good one, for it grounds the corporate rivalry
between Microsoft and Netscape in the well-appreciated, indeed almost visceral,
fear and loathing that has traditionally existed between CocaCola and PepsiCo.
Both of the latter sell near-identical products in an intensely competitive market,
where the most apparent sources of marketability are brand recognition and
customer loyalty. Like Netscape Navigator and Microsoft’s rival browser, Internet
Explorer, both products have little to distinguish them at a content-level, so for
a company to use its distribution mechanisms to deliver a rival’s product to the
market-place can be seen as financial suicide.
40 Tony Veale

Microsoft MS-Excel
Create
Control Part
Windows™ MS-Word
Part

IExplorer
Part

Create
Contain
"Soft"
Affect IExplorerUserBase Attr

Affect MicrosoftSoftware
Affect
Target
Affect NetscapeUserBase
MassMarket
Create
NetscapeNavigator
Create
Control
NetscapeInc Enable

WebAccess

Fig. 1. The Market Dynamics of Microsoft and NetscapeInc. Semantic Rela-


tions marked with a bullet indicate pejorative (as opposed to strictly logical)
negation; thus, Microsoft–•affect→NetscapeInc means that Microsoft negatively
affects NetscapeInc.

Highlighted in Fig. 1 and 2 are the relational chains common to both that
might conveniently be termed the backbones of each domain structure. In Fig. 1
we see that Microsoft creates (and controls) Windows’98, which in turn contains
the browser IExplorer, which creates a market for itself denoted IExplorerBase,
which in turn reinforces Microsoft as a company. Similarly, in Fig. 2 we note
that CocaCola creates (and controls the makeup of) Coke six-packs, which con-
tain cans of Coke-branded soda, which generate a market for themselves denoted
CokeMarket, which in turn reinforces CocaCola’s corporate status. In the vocab-
ulary of the Sapper approach, we denote these relational chains using the path
notion Microsoft–create→Windows–part→IExplorer–create→IExplorerUserBase
–affect→Microsoft and CocaCola–create→CokeSixPack–part→CokeCan#6–
create- →CokeMarket–affect→CocaCola respectively. Both of these pathways are
isomorphic, and ultimately grounded in a metaphoric bridge that reconciles Mi-
crosoftSoftware with ColaSoftDrink (both are, in a sense, “soft” products that
are aimed at the mass market). This allows Sapper to generate a partial inter-
pretation of the analogy that maps Microsoft to CocaCola, Windows’98 to a
sixpack of Coke, IExplorer to a can of Coke (labelled CokeCan#6 in Sapper’s
network representation of memory) and IExplorerUserBase to CokeMarket.
Microsoft and CocaCola are viewed by Sapper as the root concepts of the
analogy, and all isomorphic pathways within a certain horizon, or size limit, orig-
inating at these nodes are considered as the basis of a new partial interpretation.
Typically Sapper only considers pathways that comprise six relations or less, a
Pragmatic Forces in Metaphor Use 41

CocaCola CokeCan#1
Create
Control Part
:
CokeSixPack : CokeCan#5
Part

CokeCan#6
Part
Fizzy
Create Contain Brown
"Soft"
Affect CokeMarket Attr

Affect ColaSoftDrink
Affect
Target
Affect PepsiMarket
MassMarket
Create Contain
PepsiCan#6
Create
Part
Control :
PepsiCo. PepsiSixPack : PespiCan#2
Part

PepsiCan#1
Part

Fig. 2. The Mirror Domain to that of Fig. 1, Illustrating Similar Market Dy-
namics at Work in the Rivalry between CocaCola and PepsiCo.

modest computational bound which nonetheless allows it to model analogical


reasoning that involves six levels of recursion, a significant cognitive feat from a
human perspective. When all partial interpretations within this limit have been
constructed, Sapper will have mapped PepsiCo to NetscapeInc, NetscapeNaviga-
tor to a can of Pepsi (labelled PepsiCan#6 in memory), and NetscapeUserBase
to PepsiMarket. It simply remains for Sapper to choose a maximal set of partial
interpretations that can be merged together to form an overall interpretation of
the analogy that is rich yet internally consistent.
When the number of partial mappings is small, all possible combinations can
be examined in an attempt to find a non-conflicting set that produces the richest
overall mapping. When the number is too large to permit exhaustive search of
this type, a heuristic approach is instead pursued, whereby the richest partial
interpretation is chosen as a the backbone of the analogy, and other interpre-
tations are aggregated around this backbone if it does not violate structural
isomorphism to do so.

2.2 A Blend Perspective on Sapper

As Sapper is a mapping mechanism in which elements of two conceptual spaces


are coherently combined to form a third, metaphoric, space, it is particularly
resonant with the conceptual blending, or many-spaces, theory of Fauconnier
and Turner (see citefauc:turn:1). A blend is an integration of two or more con-
42 Tony Veale

ceptual structures to create another, a structure which owes its semantic foun-
dations to its inputs but which also possesses an independent conceptual reality
of its own. Blending theory thus posits a multi-space extension of the classic
two-space model of metaphor and analogy, in which the traditional inputs to the
mapping process, the tenor and vehicle, are each assumed to occupy a distinct
mental space, while the product of their conceptual integration is also assumed
to occupy a separate output space of its own. This allows the newly blended
concept to acquire associations and conventions that do not strictly follow from
the logical makeup of its inputs. For instance, the concept BlackHole is a con-
venient and highly visual blend of the concepts Blackness and Hole, one which
enjoys continued usage in popular and scientific parlance despite evidence that
blackholes are neither hole-like or black in any real sense (i.e., while blackholes
are conveniently conceptualized as holes in the fabric of space-time, they are now
understood to emit gamma radiation, and are thus not truly black in a scientific
sense; furthermore, these emissions cause the blackhole to shrink, whereas a real
hole should grow larger the more substance it emits).
In addition, blend theory allocates a distinct space, called generic space, to
those schemas which guide the construction of a blend. These schemas operate
at a low-level of description, typically the image-schematic level, and serve both
as selectional filters and basic structure combinators for the input spaces. View-
ing Sapper from the perspective of blending theory then, the tenor and vehicle
structures correspond to the input spaces of the blend, while the lattice of cross-
domain bridges newly established in memory corresponds to the output blend
space. It follows that Sapper’s generic space is the set of conceptual schemas
that enable the generation of this lattice of metaphoric and analogical map-
pings. Thusfar we have encountered just one of these schemas, X–metaphor→Y,
but it is reasonable to assume that for every distinct pragmatic force that can
affect the shape of a given metaphoric mapping there will be a corresponding
mapping schema in generic space. By identifying these forces then, one can more
clearly theorize about their underlying generic schemas, and so begin to model
these schemas within a computational framework.

3 Pragmatic Factors in Metaphor Use

The example of Fig. 3 represents a very real and complex illustration of the
pragmatic pressures that interact to create a visually apt metaphor. Here we see
the Economist newspaper use an easily identified piece of consumer gadgetry, a
‘Tamagotchi’ virtual pet, to make a searing indictment of the Japanese financial
system: ‘Firms such as Yamaichi [Japan’s 4th-largest brokerage, recently col-
lapsed] have been kept alive as artificially as the “virtual pets” in Tamagotchi
toys: thank goodness those infernal gadgets are finally being turned off’.
Taken from a serious political newspaper, such a visual metaphor must be
eye-catching yet appropriate, and complex (with a non-trivial political message)
yet instantly understandable.
Pragmatic Forces in Metaphor Use 43

Fig. 3. A striking visual blend of a ‘Tamagotchi’ game and the Japanese financial
situation after the Yamaichi Brokerage scandal. (Source: ‘The Economist’, Nov.
29, 1997)

In this section we discuss a variety of the cognitive phenomena responsible


for the attention-catching potency of such complex metaphors, while describing
those phenomena within the framework of the Sapper model.

3.1 Recruitment of Sub-Metaphors

Complex metaphors have an internal structure which is itself frequently con-


structed from other, related metaphors. These sub-metaphors are ‘recruited’
(in the sense of [3]) to mediate between cross-domain elements of the larger
metaphor. For instance, the cliche the pen is mightier than the sword yields
a Pen-as-Sword metaphor that can be recruited as part of a domain reconcil-
iation between the concepts Author and General. This notion of recruitment
corresponds to Sapper’s process of building new, higher-level bridges from old,
since each bridge is essentially a previously recognised metaphor. Thus Sapper
builds its interpretation of Author-as-General upon the perceptual bridge Pen–
metaphor→Sword. Frequently however, pivotal elements of a metaphor will not
directly share obvious perceptual qualities such as, in this case, Long, HandHeld,
Pointed and Narrow. Nevertheless, if the metaphor is an apt one, it will be pos-
sible to recruit a perceptually-grounded conceptual blend as a domain mediator.
This blend will have certain properties, both sensory and causal, in common
with the elements it mediates between, even though those elements themselves
may have little or nothing in common.
44 Tony Veale

The Tamagotchi ‘piggy bank’ of Fig. 3 is a clear example of such a mediating


blend, inasmuch it connects two very disparate concepts (Yamaichi and Puppy)
in a most apt and pleasingly visual manner.

3.2 Double-Think

When one describes a person as a Wolf, one rarely employs a realistic schema for
Wolf, but a stereotypical model which many people now know to be false. This
archetype is closer in nature to the cartoon caricatures of Chuck Jones and Tex
Avery (e.g., lascivious, treacherous, ruthless and greedy) than to accepted reality
(e.g., that a wolf is a family animal, with strong social ties). This caricature is an
anthropomorphic and highly visual blend of properties drawn from both Person
and Wolf, which allows a cognitive agent to easily ascribe human qualities to
a non-human entity (similar observations are reported in French, [5]). More
importantly perhaps, blend recruitment facilitates a fundamental cognitive role
of metaphor that, following Orwell’s ‘1984’, we term ‘Doublethink’, namely, the
ability to hold two complementary perspectives on the same concept in mind
at the same time, and to combine or blend these perspectives for reasons of
inference when necessary.
Consider again the Tamagotchi visual metaphor of Fig. 3, whose creators ex-
ploit the Japanese associations of the Tamagotchi game to describe the situation
now facing Japan’s banking regulators after the downfall of the Yamaichi stock
brokerage. The metaphor particularly stresses the options open to the regula-
tors - to prop up (i.e., ‘feed’) the ailing brokerage, or let it fail (i.e., ‘die’), while
viewing the whole financial fiasco as a ‘game’ gone wrong. Tamagotchi games
conventionally centre around electronic pets such as puppies or kittens, which
the player (the regulator?) is supposed to nourish and nurture via constant in-
teraction. This animal is thus a good metaphor for Yamaichi, but the visual
impact would clearly be diminished if the artist simply substituted a picture of
a bank, no matter how iconic, into the game. This is thus a situation in which
direct mapping between tenor and vehicle elements lacks a sufficient pragmatic
force of its own.
Fortunately, a blend is available, that of ‘piggy-bank’, that possesses the
necessary iconicity to substitute for both Yamaichi and the Tamagotchi puppy
in the metaphor. A Piggy-Bank’s strong associations with money and savings
make it an ideal metaphor for Yamaichi, while its visual appearance makes it an
obvious (after-the-fact) counterpart to the electronic animal of the game.
This is where the notion of ‘double-think’ applies. While being a metaphor for
both a brokerage and a puppy, the Piggy-Bank blend is allowed to exploit con-
tradictory properties of both. Most obvious is the orientation of the Piggy-Bank
- its ‘belly-up’ position is an iconic visual commonly associated with animals -
indicating that Yamaichi is either already bankrupt (dead) or seriously insolvent
(dying). This inverse orientation would make no sense if applied to a literal image
of a bank, yet it is perfectly apt when applied to another artefact, the piggy-
bank, due its blend of animal visual properties (the most important here being
Pragmatic Forces in Metaphor Use 45

‘legs’ and ‘belly’). The Piggy-Bank concept is not simply a structural substitute
then for Yamaichi and puppy, but a ‘living’ blend of both.

3.3 Recasting

In the case of the Tamagotchi metaphor of Fig. 3, the slippage situation is actu-
ally even more complex than this. Though the concept Piggy-Bank is identified
as an appropriately visual mid-point between a financial institution and a puppy,
recall that the source of this key sub-metaphor is not actually a puppy at all,
but an electronic simulation of one. We thus need to introduce the idea of a
resemblance schema, taking the form X–resemble→Y. A resemblance relation is
simply a bridge relation between concepts that share a number of perceptual (i.e.,
appearance-related) properties. The transformational chain linking Yamaichi to
the Tamagotchi puppy is thus: Yamaichi–metaphor→PiggyBank–resemble→Pig–
metaphor→Puppy–resemble→TamagotchiPuppy. In effect, Yamaichi and the
Tamagotchi puppy need to be recast for the mediating blend to apply.

Fig. 4. A bowling metaphor is used to convey the rough-and-tumble of modern


Russian politics. (Source: ‘The Economist’, November 22, 1997)

Indeed, recasting seems to be a structural phenomenon which is key to


stamping visual coherence on a metaphor. Consider for instance another graphic
46 Tony Veale

metaphor from the cover of the ‘Economist’ (November 22, 1997), which illus-
trates the rough-and-tumble dynamism of modern Russian politics. To convey
the main thrust of the magazine’s leader column, namely that certain once-
prestigious Russian politicians continue to suffer humiliating downfalls while
Boris Yeltsin remains upright and stable throughout, the ‘Economist’ chooses a
bowling metaphor in which different pins represent various politicians and bowl-
ing balls the fickleness of public opinion. The metaphor, illustrated in Fig. 4,
is well-chosen not only because bowling is a populous sport associated with the
general public as a whole, but because the up / down / stable / rocking status
of the pins conforms to a conventional mode of discourse in politics. However,
visual coherence cannot be bought simply by painting the faces of the politicians
involved onto the appropriate pins, as the conceptual and imaginistic distance
between bowling pins and people is such that the result would simply look con-
trived. Instead, the cover’s creator uses not bowling pins but nested Russian
dolls, of the political variety one frequently sees at tourist stalls. While pos-
sessing an iconic visual quality, such dolls also resemble both bowling pins and
politicians, and so act as a perfect mediating blend between the end-points of
the metaphor.

3.4 Internal Recruitment of Blends

A blend which is recruited to act as a mapping intermediary in this way also acts
a visual precedent, in effect grounding the mapping in shared background knowl-
edge between creator and reader as well as securing the aptness of the mapping.
However, not all elements of the metaphor may be externally grounded in this
fashion. For instance, in the case of the Yeltsin bowling cartoon, the Russian fi-
nance minister Anatoly Chubais is also illustrated using a Russian doll/bowling
pin blend, yet there is no background precedent for this. Nevertheless, there
exists an internal precedent - Boris Yeltsin. Because Yeltsin is also depicted in
this fashion, and because Chubais is a strong analogical counterpart of Yeltsin
(both are powerful male Russian politicians), it makes sense that any ground-
ing applied to Yeltsin can also be analogically transferred to Chubais. So while
Yeltsin visually maps to the first bowling pin via the transformational chain
Yeltsin–resemble→ YeltsinRussianDoll–resemble→BowlingPin1, Chubais maps
to the second via Chubais–metaphor→Yeltsin–resemble→YeltsinRussianDoll –
resemble→BowlingPin1–resemble→BowlingPin2. It seems from such examples
that metaphor can possess an incestuous quality, feeding not only off other
metaphors and blends recruited from outside, but upon its own internal struc-
ture.

3.5 Analogical Inferencing

Analogy can be seen as a didactic form of metaphor in which the purpose of


communication is to educate by comparison. However, while many metaphors
are simply descriptive, with aesthetic rather than educational goals, metaphors
Pragmatic Forces in Metaphor Use 47

can also possess a take-home message which the reader transfers from the vehi-
cle domain to the tenor. For instance, in comparing Japan to a Tamagotchi, the
Economist’s take-home message is the opinion that perhaps the Japanese govern-
ment has viewed the problems of financial regulation as a game, while treating
favoured institutions like Yamaichi as ‘virtual pets’. This form of transfer-based
inferencing is readily provided by models of analogy and metaphor such as SME,
ACME and Sapper, given that the cross-domain mapping established by these
models acts as a substitution-key which dictates how elements of the vehicle
domain can be rewritten into the tenor domain.
However, not all metaphors provide a sufficient key for transferring elements
of the vehicle into the tenor. For instance, in the Russian bowling metaphor,
what is to be made of the fact that certain political kingpins are shown falling
on their sides? This idea of a ‘fall from grace’ has a strong metaphoric his-
tory in politics, conventionally denoting failure due to scandal, but this is a
metaphor that must be recruited from outside the current context rather than
identified and exploited internally. So, when presented with an image of a falling
Chubais doll/pin, one must draw upon political knowledge associated with a
‘fallen’ analogical counterpart of Chubais from outside the current context, if it
is not already appreciated that this particular politician is in a perilous position.
For instance, one can defer to another politican such as Nixon and his political
fall, via the analogical chain Chubais–metaphor→Nixon– perform→Resignation–
metaphor→Fall. In essence, we simply need to find a path that metaphorically
links the concept Chubais to the concept Fall, and this path should contain the
semantic sub-structure to be analogically carried into the tenor domain; in this
case the connecting sub-structure suggests that Chubais might perform an act
of resignation. It is necessary that the agent (software or human) reason via
an analogical counterpart like Nixon since the concept Fall may have different
metaphoric meanings in different contexts (e.g., one would not infer that a falling
share-price should also resign).

3.6 Determining a Relevant Scope


Iconicity clearly plays a key role in determining relevance for visual metaphors.
For example, because Japan is currently linked with a singular salience to the
Tamagotchi game, while Tamagotchi itself is clearly evocative of Japan (as many
miniaturised gadgets are), Tamagotchi thus acts as a good metaphor for modern
Japan.
Nevertheless, the metaphor of Fig. 3 is driven by the need to communicate the
current economic situation in Japan as it applies to the Yamaichi scandal. How-
ever, knowing little about Yamaichi itself, many readers would be hard-pressed
to recognise any iconic associations with what was until recently a rather anony-
mous Japanese brokerage. There exists a strong pragmatic pressure then to widen
the scope of the metaphor, in this case to Japan as a whole, while insisting that
any metaphor chosen to reflect Japan will encompass Yamaichi in a recruited
sub-metaphor. This enlargement of context serves two pragmatic goals: firstly,
the larger ‘Japan’ metaphor serves to place Yamaichi’s woes in a given cultural
48 Tony Veale

?- sapper(japan, tamagotchi, [map(yamaichi,_)]).

If we view japan as a kind of tamagotchi Then

[0.5] tamagotchi → japan

[0.5] tamagotchi_player → japanese_government

[0.1] tamagotchi_rules → economic_policy

[0.6] tamagotchi_puppy → yamaichi

[0.2] death → bankruptcy

[0.2] life → solvency

[0.3] food → money

[0.3] feed_button → japanese_treasury

[0.1] recast_as(yamaichi, piggy_bank) →

recast_as(tamagotchi_puppy, puppy)

Fig. 5. Output from a Prolog implementation of Sapper when given conceptual


descriptions of Japan and Tamagotchi.

context, while secondly, it structurally enriches the metaphor by allowing more


cross-domain elements to participate (e.g., Tamagotchi, the Japanese govern-
ment and its economic policies). The complete mapping generated by Sapper for
this enlarged context is shown in Fig. 5.
The encompassing concept is chosen for the reciprocated salience of its re-
lationship with the tenor. For instance, Yamaichi is strongly associated with
Japan, while Japan is itself causally related to Yamaichi via its government. In
contrast, though Yamaichi is also associated with mountains (its name means
‘Mountain Number 1’), the concept Mountain is not saliently associated with
Yamaichi. Thus the concept Japan, rather than a concept like Mountain, can
be recognised as providing a larger metaphoric context in which to work. Once
Japan is chosen to act as the new tenor of the metaphor, the concept Tamagotchi
can then be chosen as a suitable vehicle due to its iconic value, as again, there is a
reciprocated salience in their relationship (i.e., each is evocative of the other). It
Pragmatic Forces in Metaphor Use 49

remains for the cognitive agent to ‘run’ the metaphor of ‘Japan is a Tamagotchi
game’ with the caveat that Yamaichi receives a cross-domain mapping in the
interpretation. Many computational models of analogy and metaphor, such as
SME, ACME and Sapper, already provide for this pragmatic directive. Fig. 4
illustrates the output generated by Sapper when given structured descriptions
of these concepts to metaphorically analyse.

4 Pragmatic Mapping Schemas

We have seen how, starting with the Sapper bridging schema X–metaphor→Y,
this schema can be specialised to deal with appearance-based perceptual similar-
ity in the form X–resemble→Y. Taken together, these two schemas provide the
basic building blocks for reasoning about the slippage phenomena of blend re-
cruitment (both internal and external), recasting and doublethink. For instance,
the basis of the Yamaichi:Tamagotchi metaphor can be explained using the com-
posite chain of metaphor and resemblance schemas:

Yamaichi–metaphor→PiggyBank–resemble→Pig–metaphor→Puppy–
resemble →TamagotchiPuppy

while the mapping of Mr. Chubais to a bowling pin in Fig. 4 can also be ex-
plained using the chain:

Chubais–metaphor→Yeltsin–resemble→YeltsinRussianDoll–resemble→
BowlingPin1–resemble→BowlingPin2

Our initial exploration in the domain of political and economic cartoons show
these chains–each of which is a four-fold composite of the basic metaphor and
resemblance schemas–to be as complex as one is likely to find in this domain.
We can view therefore the generic space guiding the pragmatics of Sapper’s
mapping process as being populated with all permutations of these basic schemas
within a given computational limit. That is, just as there are effective cognitive
limitations on the number of elements one can store in working memory, or
nest in a centre-embedded clause, it is reasonable to assume that the amount of
structural slippage tolerated by the metaphor faculty is similarly bounded for
reasons of computational tractability. Sapper currently operates with a maxi-
mal chain size of four bridge schemas, but again, this proves effective for even
the most complex metaphors we have encountered so far. It remains to be seen
whether the computational limit is pragmatically determined - that is, whether
the context dictates how much computational effort should be applied. For in-
stance, one expects that political cartoons demand more cognitive expenditure
than, say, advertising imagery. This conjecture, among others, is the subject of
current on-going research.
50 Tony Veale

5 Conclusions
We conclude on this theme of computational felicity, by noting that the model
of blend recruitment presented in this paper may also shed a useful compu-
tational perspective on another intriguing aspect of Fauconnier and Turner’s
theory of blending, namely the metonymy projection principle. Since metaphors
and blends typically serve the communicative purpose of throwing certain ele-
ments of a domain into bas-relief, while de-emphasising others (e.g., see [8]), this
strengthening of associations frequently causes the relational distance between
the tenor and its highlighted association to be fore-shortened in any resulting
conceptual product.
Fauconnier and Turner cite as an example of this principle the concept Grim-
Reaper, a blend which metaphorically combines the concepts Farmer and Death.
In the latter domain, the concepts Skeleton and RottingClothes are causally as-
sociated with Death, via the intermediate concepts Decompose, Rot, Coffin,
Funeral, Graveyard, and so on. But in the resultant blend space, Skeleton and
RottingClothes become directly associated with Death, and are used together as
an explicit visual metonym; the Grim Reaper is thus conventionally portrayed as
a scythe-carrying skeleton, wrapped in a decrepit cloak and cowl. We see a sim-
ilar instance of this phenomenon in the Tamagotchi example of Fig. 3, in which
the associations between Yamaichi, a rather lofty brokerage, and the concepts of
PersonalSavings and SmallInvestor are strengthened by the use of a PiggyBank
as a visual metonym. This has the effect of personalising the metaphor and mak-
ing its consequences more relevant to the intended audience, the bulk of which
will themselves be small, rather than corporate, investors. In both these cases,
metonymic short-cuts emerge because an intermediate blend is recruited that
provides a shorter path to the relevant associations. Skeleton serves as a rich
visual analog of Farmer (both have arms, legs, torso, head, etc.) while evoking
certain abstract properties of Death, whereas PiggyBank is a rich visual analog
of a TamagotchiPuppy, while sharing key abstract properties with Yamaichi.
The computational account we provide of blend recruitment may thus also
provide an algorithmic basis for much of what passes for metonymic projection.
It remains as a goal of future research to establish other aspects of conceptual
integration that can be neatly accommodated within this computational frame-
work.

References
1. Black, M. Models and Metaphor: studies in language and philosophy. Ithaca, NY:
Cornell University Press. (1962) 38
2. Falkenhainer, B., Forbus, K. D., and D. Gentner.: The Structure-Mapping Engine.
Artificial Intelligence , 41, (1989) pp 1-63. 38
3. Fauconnier, G. and M. Turner: Conceptual projection and middle spaces. UCSD:
Department of Cognitive Science Technical Report 9401, (1994). 43
4. Fauconnier, G. and M. Turner.: Conceptual Integration Networks. Cognitive Sci-
ence (in press).
Pragmatic Forces in Metaphor Use 51

5. French, R.: The Subtlety of Sameness. Cambridge: MIT Press (1995). 44


6. Hofstadter, D. R. and the Fluid Analogy Research Group.: Fluid Concepts and
Creative Analogies: Computer Models of the Fundamental Mechanisms of Thought.
Basic Books, NY (1995). 37
7. Holyoak, K. J. and P. Thagard.: Analogical Mapping by Constraint Satisfaction.
Cognitive Science 13, (1989), pp 295-355. 38
8. Ortony, A.: The role of similarity in similes and metaphors. Metaphor and Thought,
edited by A. Ortony. (1979). Cambridge, MA: Cambridge University Press. 50
9. Veale, T. and M. T. Keane.: Belief Modelling, Intentionality and Perlocution in
Metaphor Comprehension, in The Proceedings of the Sixteenth Annual Meeting of
the Cognitive Science Society, Atlanta, Georgia. Hillsdale, NJ: Lawrence Erlbaum,
(1994). 38
10. Veale, T. and M. T. Keane.: The Competence of Sub-Optimal Structure Map-
ping on ‘Hard’ Analogies, in The Proceedings of IJCAI’97, the International Joint
Conference on Artificial Intelligence , Nagoya, Japan. (1997). 38, 39
11. Veale, T.: ‘Just in Time’ Analogical Mapping: An Iterative-Deepening Approach to
Structure-Mapping, in The Proceedings of ECAI’98, the 13th European Conference
on Artificial Intelligence , Brighton, UK. (1998). 38
The Cog Project: Building a Humanoid Robot

Rodney A. Brooks, Cynthia Breazeal, Matthew Marjanović,


Brian Scassellati, Matthew M. Williamson

MIT Artificial Intelligence Lab


545 Technology Square
Cambridge MA 02139, USA
{brooks,cynthia,maddog,scaz,matt}@ai.mit.edu
http://www.ai.mit.edu/projects/cog/

Abstract. To explore issues of developmental structure, physical em-


bodiment, integration of multiple sensory and motor systems, and social
interaction, we have constructed an upper-torso humanoid robot called
Cog. The robot has twenty-one degrees of freedom and a variety of sen-
sory systems, including visual, auditory, vestibular, kinesthetic, and tac-
tile senses. This chapter gives a background on the methodology that
we have used in our investigations, highlights the research issues that
have been raised during this project, and provides a summary of both
the current state of the project and our long-term goals. We report on
a variety of implemented visual-motor routines (smooth-pursuit track-
ing, saccades, binocular vergence, and vestibular-ocular and opto-kinetic
reflexes), orientation behaviors, motor control techniques, and social be-
haviors (pointing to a visual target, recognizing joint attention through
face and eye finding, imitation of head nods, and regulating interaction
through expressive feedback). We further outline a number of areas for
future research that will be necessary to build a complete embodied sys-
tem.

1 Introduction
Building an android, an autonomous robot with humanoid form and human-
like abilities, has been both a recurring theme in science fiction and a “Holy
Grail” for the Artificial Intelligence community. In the summer of 1993, our
group began the construction of a humanoid robot. This research project has
two goals: an engineering goal of building a prototype general purpose flexible
and dextrous autonomous robot and a scientific goal of understanding human
cognition (Brooks & Stein 1994).
Recently, many other research groups have begun to construct integrated hu-
manoid robots (Hirai, Hirose, Haikawa & Takenaka 1998, Kanehiro, Mizuuchi,
Koyasako, Kakiuchi, Inaba & Inoue 1998, Takanishi, Hirano & Sato 1998, Morita,
Shibuya & Sugano 1998). There are now conferences devoted solely to humanoid
systems, such as the International Symposium on Humanoid Robots (HURO)
which was first hosted by Waseda University in October of 1996, as well as sec-
tions of more broadly-based conferences, including a recent session at the 1998

C. Nehaniv (Ed.): Computation for Metaphors, Analogy, and Agents, LNCS 1562, pp. 52-87, 1999.
c Springer-Verlag Heidelberg Berlin 1999
The Cog Project: Building a Humanoid Robot 53

IEEE International Conference on Robotics and Automation (ICRA-98) in Leu-


ven, Belgium. There has also been a special issue of the Journal of the Robotics
Society of Japan in October of 1997 devoted solely to humanoid robotics.
Research in humanoid robotics has uncovered a variety of new problems
and a few solutions to classical problems in robotics, artificial intelligence, and
control theory. This research draws upon work in developmental psychology,
ethology, systems theory, philosophy, and linguistics, and through the process
of implementing models and theories from these fields has raised interesting
research issues. In this chapter, we review some of the methodology and results
from the first five years of our humanoid robotics project.
Since the inception of our research program, we have developed a methodol-
ogy that departs from the mainstream of AI research (Brooks, Breazeal (Ferrell),
Irie, Kemp, Marjanović, Scassellati & Williamson 1998). Section 2 reviews some
of the assumptions of classical AI that we have found lacking and concentrates
on four aspects of a new methodology that have greatly influenced our research
program: developmental structure, physical embodiment, integration of multi-
ple sensory and motor systems, and social interaction. In section 3, we describe
the current hardware and software environments of our upper-torso humanoid
robot, including twenty-one mechanical degrees of freedom, a variety of sensory
systems, and a heterogeneous distributed computation system. Section 4 focuses
on some of the long-term research issues that members of our group are currently
investigating, and Section 5 describes some of the current tasks and behaviors
that our robot is capable of performing. We conclude in Section 6 with a few of
the open problems that have yet to be addressed.

2 Methodology

In recent years, AI research has begun to move away from the assumptions of
classical AI: monolithic internal models, monolithic control, and general purpose
processing. However, these concepts are still prevalent in much current work and
are deeply ingrained in many architectures for intelligent systems. For example,
in the recent AAAI-97 proceedings, one sees a continuing interest in planning
(Littman 1997, Hauskrecht 1997, Boutilier & Brafman 1997, Blythe & Veloso
1997, Brafman 1997) and representation (McCain & Turner 1997, Costello 1997,
Lobo, Mendez & Taylor 1997), which build on these assumptions.
Previously, we have presented a methodology that differs significantly from
the standard assumptions of both classical and neo-classical artificial intelli-
gence (Brooks et al. 1998). Our alternative methodology is based on evidence
from cognitive science and neuroscience which focus on four alternative at-
tributes which we believe are critical attributes of human intelligence: devel-
opmental organization, social interaction, embodiment and physical coupling,
and multimodal integration.
In this section, we summarize some of the evidence that has led us to abandon
those assumptions about intelligence that classical AI continues to uphold. We
54 Rodney A. Brooks et al.

then briefly review the alternative methodology that we have been using in
constructing humanoid robotic systems.

2.1 False Assumptions about Human Intelligence


In studying human intelligence, three common conceptual errors often occur: re-
liance on monolithic internal models, on monolithic control, and on general pur-
pose processing. These and other errors primarily derive from naive models based
on subjective observation and introspection, and biases from common computa-
tional metaphors (mathematical logic, Von Neumann architectures, etc.)(Brooks
1991a, Brooks 1991b). A modern understanding of cognitive science and neuro-
science refutes these assumptions.

Humans have no full monolithic internal models. There is evidence that in


normal tasks humans tend to minimize their internal representation of the world.
Ballard, Hayhoe & Pelz (1995) have shown that in performing a complex task,
like building a copy of a display of blocks, humans do not build an internal model
of the entire visible scene. By changing the display while subjects were looking
away, Ballard found that subjects noticed only the most drastic of changes; rather
than keeping a complete model of the scene, they instead left that information in
the world and continued to refer back to the scene while performing the copying
task.
There is also evidence that there are multiple internal representations, which
are not mutually consistent. For example, in the phenomena of blindsight, cor-
tically blind patients can discriminate different visual stimuli, but report seeing
nothing (Weiskrantz 1986). This inconsistency would not be a feature of a single
central model of visual space.
These experiments and many others like it, e.g. Rensink, O’Regan & Clark
(1997) and Gazzaniga & LeDoux (1978), convincingly demonstrate that humans
do not construct a full, monolithic model of the environment. Instead humans
tend to only represent what is immediately relevant from the environment, and
those representations do not have full access to one another.

Humans have no monolithic control. Naive introspection and observation


can lead one to believe in a neurological equivalent of the central processing
unit – something that makes the decisions and controls the other functions of
the organism. While there are undoubtedly control structures, this model of
a single, unitary control system is not supported by evidence from cognitive
science.
One example comes from studies of split brain patients by Gazzaniga &
LeDoux (1978). As an experimental treatment for severe epilepsy in these pa-
tients, the corpus callosum (the main structure connecting the two hemispheres
of the brain) was surgically cut. The patients are surprisingly normal after the
operation, but with deficits that are revealed by presenting different information
to either side of the (now unconnected) brain. Since each hemisphere controls
The Cog Project: Building a Humanoid Robot 55

one side of the body, the experimenters can probe the behavior of each hemi-
sphere independently (for example, by observing the subject picking up an object
appropriate to the scene that they had viewed). In one example, a snow scene
was presented to the right hemisphere and the leg of a chicken to the left. The
subject selected a chicken head to match the chicken leg, explaining with the
verbally dominant left hemisphere that “I saw the claw and picked the chicken”.
When the right hemisphere then picked a shovel to correctly match the snow,
the left hemisphere explained that you need a shovel to “clean out the chicken
shed” (Gazzaniga & LeDoux 1978, p.148). The separate halves of the subject
independently acted appropriately, but one side falsely explained the choice of
the other. This suggests that there are multiple independent control systems,
rather than a single monolithic one.

Humans are not general purpose. The brain is conventionally thought to


be a general purpose machine, acting with equal skill on any type of operation
that it performs by invoking a set of powerful rules. However, humans seem to
be proficient only in particular sets of skills, at the expense of other skills, often
in non-obvious ways. A good example of this is the Stroop effect (Stroop 1935).
When presented with a list of words written in a variety of colors, performance in
a color recognition and articulation task is dependent on the semantic content
of the words; the task is very difficult if names of colors are printed in non-
corresponding colors. This experiment demonstrates the specialized nature of
human computational processes and interactions.
Even in the areas of deductive logic, humans often perform extremely poorly
in different contexts. Wason (1966) found that subjects were unable to apply the
negative rule of if-then inference when four cards were labeled with single letters
and digits. However, with additional context—labeling the cards such that they
were understandable as names and ages—subjects could easily solve exactly the
same problem.
Further, humans often do not use subroutine-like rules for making decisions.
They are often more emotional than rational, and there is evidence that this
emotional content is an important aspect of decision making (Damasio 1994).

2.2 Essences of Human Intelligence

In an attempt to simplify the problem of building complex intelligent systems,


classical AI approaches tended to ignore or avoid many aspects of human in-
telligence (Minsky & Papert 1970). We believe that many of these discarded
elements are essential to human intelligence. Our methodology exploits four
central aspects of human intelligence: development, social interaction, physical
interaction and integration. Development forms the framework by which humans
successfully acquire increasingly more complex skills and competencies. Social
interaction allows humans to exploit other humans for assistance, teaching, and
knowledge. Embodiment and physical coupling allow humans to use the world
itself as a tool for organizing and manipulating knowledge. Integration allows
56 Rodney A. Brooks et al.

humans to maximize the efficacy and accuracy of complementary sensory and


motor systems. We believe that not only are these four themes critical to the
understanding of human intelligence but also they actually simplify the problem
of creating human-like intelligence.

Development: Humans are not born with complete reasoning systems, com-
plete motor systems, or even complete sensory systems. Instead, they undergo
a process of development where they perform incrementally more difficult tasks
in more complex environments en route to the adult state. Building systems de-
velopmentally facilitates learning both by providing a structured decomposition
of skills and by gradually increasing the complexity of the task to match the
competency of the system.
Development is an incremental process. Behaviors and learned skills that
have already been mastered prepare and enable the acquisition of more advanced
behaviors by providing subskills and knowledge that can be re-used, by placing
simplifying constraints on the acquisition, and by minimizing new information
that must be acquired. For example, Diamond (1990) shows that infants between
five and twelve months of age progress through a number of distinct phases
in the development of visually guided reaching. In this progression, infants in
later phases consistently demonstrate more sophisticated reaching strategies to
retrieve a toy in more challenging scenarios. As the infant’s reaching competency
develops, later stages incrementally improve upon the competency afforded by
the previous stages. Within our group, Marjanović, Scassellati & Williamson
(1996) applied a similar bootstrapping technique to enable the robot to learn
to point to a visual target. Scassellati (1996) has discussed how a humanoid
robot might acquire basic social competencies through this sort of developmental
methodology. Other examples of developmental learning that we have explored
can be found in (Ferrell 1996, Scassellati 1998b).
By gradually increasing the complexity of the required task, a developmen-
tal process optimizes learning. For example, infants are born with low acuity
vision which simplifies the visual input they must process. The infant’s visual
performance develops in step with their ability to process the influx of stimula-
tion (Johnson 1993). The same is true for the motor system. Newborn infants
do not have independent control over each degree of freedom of their limbs, but
through a gradual increase in the granularity of their motor control they learn
to coordinate the full complexity of their bodies. A process in which the acuity
of both sensory and motor systems are gradually increased significantly reduces
the difficulty of the learning problem (Thelen & Smith 1994). The caregiver also
acts to gradually increase the task complexity by structuring and controlling
the complexity of the environment. By exploiting a gradual increase in complex-
ity both internal and external, while reusing structures and information gained
from previously learned behaviors, we hope to be able to learn increasingly so-
phisticated behaviors. We believe that these methods will allow us to construct
systems which scale autonomously (Ferrell & Kemp 1996, Scassellati 1998b).
The Cog Project: Building a Humanoid Robot 57

Social Interaction: Human infants are extremely dependent on their care-


givers, relying upon them not only for basic necessities but also as a guide to
their development. This reliance on social contact is so integrated into our species
that it is hard to imagine a completely asocial human; developmental disorders
that effect social development, such as autism and Asperger’s syndrome, are
extremely debilitating and can have far-reaching consequences (Cohen & Volk-
mar 1997). Building social skills into an artificial intelligence provides not only
a natural means of human-machine interaction but also a mechanism for boot-
strapping more complex behavior. Our research program has investigated social
interaction both as a means for bootstrapping and as an instance of develop-
mental progression.
Social interaction can be a means to facilitate learning. New skills may be so-
cially transfered from caregiver to infant through mimicry or imitation, through
direct tutelage, or by means of scaffolding, in which a more able adult manip-
ulates the infant’s interactions with the environment to foster novel abilities.
Commonly scaffolding involves reducing distractions, marking the task’s critical
attributes, reducing the number of degrees of freedom in the target task, and
enabling the infant to experience the end or outcome before she is cognitively
or physically able of seeking and attaining it for herself (Wood, Bruner & Ross
1976). We are currently engaged in work studying bootstrapping new behav-
iors from social interactions (Breazeal & Scassellati 1998, Breazeal & Velasquez
1998).
The social skills required to make use of scaffolding are complex. Infants
acquire these social skills through a developmental progression (Hobson 1993).
One of the earliest precursors is the ability to share attention with the caregiver.
This ability can take many forms, from the recognition of a pointing gesture to
maintaining eye contact (see chapter in this volume by Scassellati). In our work,
we have also examined social interaction from this developmental perspective,
building systems that can recognize and respond to joint attention by finding
faces and eyes (Scassellati 1998c) and imitating head nods of the caregiver (Scas-
sellati 1998d).

Embodiment and Physical Coupling: Perhaps the most obvious, and most
overlooked, aspect of human intelligence is that it is embodied. A principle
tenet of our methodology is to build and test real robotic systems. We believe
that building human-like intelligence requires human-like interaction with the
world (Brooks & Stein 1994). Humanoid form is important both to allow hu-
mans to interact socially with the robot in a natural way and to provide similar
task constraints.
The direct physical coupling between action and perception reduces the need
for an intermediary representation. For an embodied system, internal repre-
sentations can be ultimately grounded in sensory-motor interactions with the
world (Lakoff 1987). Our systems are physically coupled with the world and op-
erate directly in that world without any explicit representations of it (Brooks
1986, Brooks 1991b). There are representations, or accumulations of state, but
58 Rodney A. Brooks et al.

these only refer to the internal workings of the system; they are meaningless
without interaction with the outside world. The embedding of the system within
the world enables the internal accumulations of state to provide useful behavior.1
In addition we believe that building a real system is computationally less
complex than simulating such a system. The effects of gravity, friction, and
natural human interaction are obtained for free, without any computation. Em-
bodied systems can also perform some complex tasks in relatively simple ways
by exploiting the properties of the complete system. For example, when putting
a jug of milk in the refrigerator, you can exploit the pendulum action of your
arm to move the milk (Greene 1982). The swing of the jug does not need to be
explicitly planned or controlled, since it is the natural behavior of the system.
Instead of having to plan the whole motion, the system only has to modulate,
guide and correct the natural dynamics. We have implemented one such scheme
using self-adaptive oscillators to drive the joints of the robot’s arm (Williamson
1998a, Williamson 1998b).

Integration: Humans have the capability to receive an enormous amount of in-


formation from the world. Visual, auditory, somatosensory, and olfactory cues are
all processed simultaneously to provide us with our view of the world. However,
there is evidence that the sensory modalities are not independent; stimuli from
one modality can and do influence the perception of stimuli in another modality.
For example, Churchland, Ramachandran & Sejnowski (1994) demonstrated an
example of how audition can cause illusory visual motion. Vision can cause au-
ditory illusions too, such as the McGurk effect (Cohen & Massaro 1990). These
studies demonstrate that sensory modalities cannot be treated independently.
Sensory integration can simplify the computation necessary for a given task.
Attempting to perform the task using only one modality is sometimes awkward
and computationally intensive. Utilizing the complementary nature of separate
modalities can result in a reduction of overall computation. We have imple-
mented several mechanisms on Cog that use multimodal integration to aid in
increasing performance or developing competencies. For example, Peskin & Scas-
sellati (1997) implemented a system that stabilized images from a moving camera
using vestibular feedback.
By integrating different sensory modalities we can exploit the multimodal
nature of stimuli to facilitate learning. For example, objects that make noise often
move. This correlation can be exploited to facilitate perception. Wertheimer
(1961) has shown that vision and audition interact from birth; even ten-minute-
old children will turn their eyes toward an auditory cue. This interaction between
the senses continues to develop; visual stimuli greatly affect the development of
sound localization (Knudsen & Knudsen 1985). In our work, Irie (1997) built an
auditory system that utilizes visual information to train auditory localization.
This work highlights not only the development of sensory integration, but also
1
This was the fundamental approach taken by Ashby (1960) contemporaneously with
the development of early AI.
The Cog Project: Building a Humanoid Robot 59

Fig. 1. Cog, an upper-torso humanoid robot. Cog has twenty-one degrees of freedom
to approximate human movement, and a variety of sensory systems that approximate
human senses, including visual, vestibular, auditory, and tactile senses.

the simplification of computational requirements that can be obtained through


integration.

3 Hardware
In pursuing the methodology outlined in the previous section, we have con-
structed an upper-torso humanoid robot called Cog (see Figure 1). This section
describes the computational, perceptual, and motor systems that have been im-
plemented on Cog as well as the development platforms that have been con-
structed to test additional hardware and software components.

3.1 Computational System


The computational control for Cog is a heterogeneous network of many differ-
ent processors types operating at different levels in the control hierarchy, rang-
ing from small microcontrollers for joint-level control to digital signal processor
(DSP) networks for audio and visual preprocessing.
Cog’s “brain” has undergone a series of revisions. The original was a network
of 16 MHz Motorola 68332 microcontrollers on custom-built boards, connected
through dual-port RAM. Each of these nodes ran L, a multithreading subset of
Common Lisp. The current core is a network of 200 MHz industrial PC com-
puters running the QNX real-time operating system and connected by 100VG
ethernet. The network currently contains 4 nodes, but can be expanded at will
by plugging new nodes into the network hub. QNX provides transparent and
60 Rodney A. Brooks et al.

fault-tolerant interprocess communication over the network. The PC backplanes


provide ample room for installing commercial or custom I/O boards and con-
troller cards. The “old” and “new” brains can inter-operate, communicating via
a custom-built shared memory ISA interface card.
Video and audio preprocessing is performed by a separate network of Texas
Instruments C40 digital signal processors which communicate via the proprietary
C40 communications port interface. The network includes C40-based framegrab-
bers, display boards, and audio I/O ports. The processors relay data to the core
processor network via ISA and PCI interface cards.
Each joint on the robot has a dedicated local motor controller, a custom-
built board with a Motorola HC11 microcontroller, which processes encoder and
analog inputs, performs servo calculations, and drives the motor via pulse-width
modulation. For the arms, the microcontroller generates a virtual spring behavior
at 1kHz, based on torque feedback from strain gauges in the joints.

3.2 Perceptual Systems


To obtain information about the environment, Cog has a variety of sensory
systems including visual, vestibular, auditory, tactile, and kinesthetic senses.

Visual System: Cog’s visual system is designed to mimic some of the capa-
bilities of the human visual system, including binocularity and space-variant
sensing (Scassellati 1998a). Each eye can rotate about an independent vertical
axis (pan) and a coupled horizontal axis (tilt). To allow for both a wide field
of view and high resolution vision, there are two grayscale cameras per eye, one
which captures a wide-angle view of the periphery (88.6◦ (V ) × 115.8◦ (H) field
of view) and one which captures a narrow-angle view of the central (foveal) area
(18.4◦ (V ) × 24.4◦(H) field of view with the same resolution). Each camera pro-
duces an NTSC signal that is digitized by a frame grabber connected to the
digital signal processor network.

Vestibular System: The human vestibular system plays a critical role in the
coordination of motor responses, eye movement, posture, and balance. The hu-
man vestibular sensory organ consists of the three semi-circular canals, which
measure the acceleration of head rotation, and the two otolith organs, which
measure linear movements of the head and the orientation of the head relative
to gravity. To mimic the human vestibular system, Cog has three rate gyroscopes
mounted on orthogonal axes (corresponding to the semi-circular canals) and two
linear accelerometers (corresponding to the otolith organs). Each of these devices
is mounted in the head of the robot, slightly below eye level. Analog signals from
each of these sensors is amplified on-board the robot, and processed off-board
by a commercial A/D converter attached to one of the PC brain nodes.

Auditory System: To provide auditory information, two omni-directional mi-


crophones were mounted on the head of the robot. To facilitate localization,
The Cog Project: Building a Humanoid Robot 61

crude pinnae were constructed around the microphones. Analog auditory signals
are processed by a commercial A/D board that interfaces to the digital signal
processor network.

Tactile System: We have begun experimenting with providing tactile feedback


from the robot using resistive force sensors. Each sensor provides a measurement
of the force applied to its sensing surface. As an initial experiment, we have
mounted an 6 × 4 array of these sensors to the front of the robot’s torso. The
signals from these sensors are multiplexed through a single 6811 microcontroller,
thus giving measurements of both force and position. A similar system has been
used to mount tactile sensors on some of the hands that we have used with the
robot.

Kinesthetic System: Feedback concerning the state of Cog’s motor system is


provided by a variety of sensors located at each joint. The eye axes utilize only
the simplest form of feedback; each actuator has a single digital encoder which
gives position information. The neck and torso joints have encoders, as well as
motor current sensing (for crude torque feedback), temperature sensors on the
motors and driver chips, and limit switches at the extremes of joint movement.
The arms joints have the most involved kinesthetic sensing. In addition to all the
previous sensors, each of the 12 arm joints also has strain gauges for accurate
torque sensing, and potentiometers for absolute position feedback.

3.3 Motor Systems

Cog has a total of twenty-one mechanical degrees-of-freedom (DOF); two six


DOF arms, a torso with a two degree-of-freedom (DOF) waist, a one DOF torso
twist, a three DOF neck, and three DOF in the eyes.

Arms: Each arm is loosely based on the dimensions of a human arm with 6
degrees-of-freedom, each powered by a DC electric motor through a series spring
(a series elastic actuator, see (Pratt & Williamson 1995)). The spring provides
accurate torque feedback at each joint, and protects the motor gearbox from
shock loads. A low gain position control loop is implemented so that each joint
acts as if it were a virtual spring with variable stiffness, damping and equilibrium
position. These spring parameters can be changed, both to move the arm and
to alter its dynamic behavior. Motion of the arm is achieved by changing the
equilibrium positions of the joints, not by commanding the joint angles directly.
There is considerable biological evidence for this spring-like property of arms
(Zajac 1989, Cannon & Zahalak 1982, MacKay, Crammond, Kwan & Murphy
1986).
The spring-like property gives the arm a sensible “natural” behavior: if it is
disturbed, or hits an obstacle, the arm simply deflects out of the way. The dis-
turbance is absorbed by the compliant characteristics of the system, and needs
62 Rodney A. Brooks et al.

31"

Fig. 2. Range of motion for the neck and torso. Not shown are the neck twist (180
degrees) and body twist (120 degrees)

no explicit sensing or computation. The system also has a low frequency char-
acteristic (large masses and soft springs) which allows for smooth arm motion
at a slower command rate. This allows more time for computation, and makes
possible the use of control systems with substantial delay (a condition akin to
biological systems). The spring-like behavior also guarantees a stable system if
the joint set-points are fed-forward to the arm.

Neck and Torso: Cog’s body has six degrees of freedom: the waist bends side-
to-side and front-to-back, the “spine” can twist, and the neck tilts side-to-side,
nods front-to-back, and twists left-to-right. Mechanical stops on the body and
neck give a human-like range of motion, as shown in Figure 2 (Not shown are
the neck twist (180 degrees) and body twist (120 degrees)).

3.4 Development Platforms

In addition to the humanoid robot, we have also built three development plat-
forms, similar in mechanical design to Cog’s head, with identical computational
systems; the same code can be run on all platforms. These development platforms
allow us to test and debug new behaviors before integrating them on Cog.

Vision Platform: The vision development platform (shown at the left of Figure
3) is a copy of Cog’s active vision system. The development platform has identical
degrees of freedom, similar design characteristics, and identical computational
environment. The development platform differs from Cog’s vision system in only
three ways. First, to explore issues of color vision and saliency, the development
platform has color cameras. Second, the mechanical design of the camera mounts
The Cog Project: Building a Humanoid Robot 63

Fig. 3. Two of the vision development platforms used in this work. These desktop
systems match the design of the Cog head and are used as development platforms for
visual-motor routines. The system on the right has been modified to investigate how
expressive facial gestures can regulate social learning.

has been modified for the specifications of the color cameras. Third, because the
color cameras are significantly lighter than the grayscale cameras used on Cog,
we were able to use smaller motors for the development platform while obtaining
similar eye movement speeds. Additional details on the development platform
design can be found in Scassellati (1998a).

Vision and Emotive Response Platform: To explore ideas in social inter-


action between robots and humans, we have constructed a platform with capa-
bilities for emotive facial expressions (shown at the right of Figure 3). This sys-
tem, called Kismet, consists of the active stereo vision system (described above)
embellished with facial features for emotive expression. Currently, these facial
features include eyebrows (each with two degrees-of-freedom: lift and arch), ears
(each with two degrees-of-freedom: lift and rotate), eyelids (each with one degree
of freedom: open/close), and a mouth (with one degree of freedom: open/close).
The robot is able to show expressions analogous to anger, fatigue, fear, disgust,
excitement, happiness, interest, sadness, and surprise (shown in Figure 4) which
are easily interpreted by an untrained human observer.
A pair of Motorola 68332-based microcontrollers are also connected to the
robot. One controller implements the motor system for driving the robot’s facial
motors. The second controller implements the motivational system (emotions
and drives) and the behavior system. This node receives pre-processed perceptual
information from the DSP network through a dual-ported RAM, and converts
this information into a behavior-specific percept which is then fed into the rest
of the behavior engine.
64 Rodney A. Brooks et al.

Fig. 4. Static extremes of Kismet’s facial expressions. During operation, the 11 degrees-
of-freedom for the ears, eyebrows, mouth, and eyelids vary continuously with the cur-
rent emotional state of the robot.

Visual-Auditory Platform: A third development platform was constructed


to investigate the relationships between vision and audition. The development
platform has an auditory system similar to that used on Cog, with two micro-
phones and a set of simplified pinnae. As a simplified visual system, a single
color camera was mounted at the midline of the head.

4 Current Long-Term Projects


This section describes a few of the long-term research issues that our group is
currently addressing. Although each project is still in progress, initial results
from each of these areas will be presented in Section 5.

4.1 Joint Attention and Theory of Mind


One critical milestone in a child’s development is the recognition of others as
agents that have beliefs, desires, and perceptions that are independent of the
The Cog Project: Building a Humanoid Robot 65

child’s own beliefs, desires, and perceptions. The ability to recognize what an-
other person can see, the ability to know that another person maintains a false
belief, and the ability to recognize that another person likes games that differ
from those that the child enjoys are all part of this developmental chain. Further,
the ability to recognize oneself in the mirror, the ability to ground words in per-
ceptual experiences, and the skills involved in creative and imaginative play may
also be related to this developmental advance. These abilities are also central to
what defines human interactions. Normal social interactions depend upon the
recognition of other points of view, the understanding of other mental states,
and the recognition of complex non-verbal signals of attention and emotional
state.
If we are to build a system that can recognize and produce these complex
social behaviors, we must find a skill decomposition that maintains the com-
plexity and richness of the behaviors represented while still remaining simple
to implement and construct. Evidence from the development of these “theory
of mind” skills in normal children, as well as the abnormal development seen
in pervasive developmental disorders such as Asperger’s syndrome and autism,
demonstrate that a critical precursor is the ability to engage in joint attention
(Baron-Cohen 1995, Frith 1990). Joint attention refers to those preverbal social
behaviors that allow the infant to share with another person the experience of a
third object (Wood et al. 1976). For example, the child might laugh and point
to a toy, alternating between looking at the caregiver and the toy.
From a robotics standpoint, even the simplest of joint attention behaviors
require the coordination of a large number of perceptual, sensory-motor, atten-
tional, and cognitive processes. Our current research is the implementation of
one possible skill decomposition that has received support from developmen-
tal psychology, neuroscience, and abnormal psychology, and is consistent with
evidence from evolutionary studies of the development of joint attention behav-
iors. This decomposition is described in detail in the chapter by Scassellati, and
requires many capabilities from our robotic system including basic eye motor
skills, face and eye detection, determination of eye direction, gesture recogni-
tion, attentional systems that allow for social behavior selection at appropriate
moments, emotive responses, arm motor control, image stabilization, and many
others.
A robotic system that can recognize and engage in joint attention behav-
iors will allow for social interactions between the robot and humans that have
previously not been possible. The robot would be capable of learning from an
observer using normal social signals in the same way that human infants learn;
no specialized training of the observer would be necessary. The robot would also
be capable of expressing its internal state (emotions, desires, goals, etc.) through
social interactions without relying upon an artificial vocabulary. Further, a robot
that can recognize the goals and desires of others will allow for systems that can
more accurately react to the emotional, attentional, and cognitive states of the
observer, can learn to anticipate the reactions of the observer, and can modify its
own behavior accordingly. The construction of these systems may also provide a
66 Rodney A. Brooks et al.

new tool for investigating the predictive power and validity of the models from
natural systems that serve as the basis. An implemented model can be tested
in ways that are not possible to test on humans, using alternate developmen-
tal conditions, alternate experiences, and alternate educational and intervention
approaches.

4.2 Social Interaction between an Infant and a Caretaker


Other ongoing work focuses on altricial learning in a social context (Breazeal (Fer-
rell) 1998, Breazeal & Scassellati 1998, Breazeal & Velasquez 1998). By treating
the robot as an altricial system whose learning is assisted and guided by the
human caretaker, this approach exploits the environment and social interactions
that are critical to infant development.
An infant’s motivations (emotions, drives, and pain) play an important role
in generating meaningful interactions with the caretaker (Bullowa 1979). The
infant’s emotional responses provide important cues which the caretaker uses
to assess how to satiate the infant’s drives, and how to carefully regulate the
complexity of the interaction. The former is critical for the infant to learn how
its actions influence the caretaker, and the latter is critical for establishing and
maintaining a suitable learning environment for the infant. Similarly, the care-
taker’s emotive responses to the infant shape the continuing interaction and can
guide the learning process.
An infant’s motivations are vital to regulating social interactions with his
mother (Kaye 1979). Soon after birth, an infant is able to display a wide variety
of facial expressions (Trevarthen 1979). As such, he responds to events in the
world with expressive cues that his mother can read, interpret, and act upon.
She interprets them as indicators of his internal state (how he feels and why),
and modifies her actions to promote his well being (Tronick, Als & Adamson
1979, Chappell & Sander 1979). For example, when he appears content she tends
to maintain the current level of interaction, but when he appears disinterested
she intensifies or changes the interaction to try to re-engage him. In this manner,
the infant can regulate the intensity of interaction with his mother by displaying
appropriate emotive and expressive cues.
An important function for a robot’s motivational system is not only to es-
tablish appropriate interactions with the caretaker, but also to regulate their in-
tensity so that the robot is neither overwhelmed nor under stimulated by them.
When designed properly, the intensity of the robot’s expressions provide appro-
priate cues for the caretaker to increase the intensity of the interaction, tone it
down, or maintain it at the current level. By doing so, both parties can modify
their own behavior and the behavior of the other to maintain the intensity of
interaction that the robot requires.
The use of emotional expressions and gestures facilitates and biases learning
during social exchanges. Parents take an active role in shaping and guiding how
and what infants learn by means of scaffolding. As the word implies, the parent
provides a supportive framework for the infant by manipulating the infant’s
interactions with the environment to foster novel abilities. The emotive cues the
The Cog Project: Building a Humanoid Robot 67

parent receives during social exchanges serve as feedback so the parent can adjust
the nature and intensity of the structured learning episode to maintain a suitable
learning environment where the infant is neither bored nor overwhelmed.
In addition, an infant’s motivations and emotional displays are critical in
establishing the context for learning shared meanings of communicative acts
(Halliday 1975). An infant displays a wide assortment of emotive cues such as
coos, smiles, waves, and kicks. At such an early age, the mother imparts a con-
sistent meaning to her infant’s expressive gestures and expressions, interpreting
them as meaningful responses to her mothering and as indications of his inter-
nal state. Curiously, experiments by Kaye (1979) argue that the mother actually
supplies most if not all the meaning to the exchange when the infant is so young.
The infant does not know the significance his expressive acts have for his mother,
nor how to use them to evoke specific responses from her. However, because the
mother assumes her infant shares the same meanings for emotive acts, her con-
sistency allows the infant to discover what sorts of activities on his part will get
specific responses from her. Routine sequences of a predictable nature can be
built up which serve as the basis of learning episodes (Newson 1979).
Combining these ideas one can design a robot that is biased to learn how
its emotive acts influence the caretaker in order to satisfy its own drives. To-
ward this end, we endow the robot with a motivational system that works to
maintain its drives within homeostatic bounds and motivates the robot to learn
behaviors that satiate them. For our purposes, we further provide the robot with
a set of emotive expressions that are easily interpreted by a naive observer as
analogues of the types of emotive expressions that human infants display. This
allows the caretaker to observe the robot’s emotive expressions and interpret
them as communicative acts. This establishes the requisite routine interactions
for the robot to learn how its emotive acts influence the behavior of the care-
taker, which ultimately serves to satiate the robot’s own drives. By doing so,
both parties can modify both their own behavior and the behavior of the other
in order to maintain an interaction that the robot can learn from and use to
satisfy its drives.

4.3 Dynamic Human-like Arm Motion


Another research goal is to build a system that can move with the speed, pre-
cision, dexterity, and grace of a human to physically interact with the world in
human-like ways. Our current research focuses on control methods that exploit
the natural dynamics of the robot to obtain flexible and robust motion without
complex computation.
Control methods that exploit physical dynamics are not common in robotics.
Traditional methods are often kinematically based, requiring accurate calibra-
tion of the robot’s dimensions and mechanical properties. However, even for
systems that utilize only a few degrees of freedom, kinematic solutions can be
computationally expensive. For this reason, researchers have adopted a number
of strategies to simplify the control problems by reducing the effects of sys-
tem dynamics including careful calibration and intensive modeling (An, Atke-
68 Rodney A. Brooks et al.

son & Hollerbach 1988), using lightweight robots with little dynamics (Salisbury,
Townsend, Eberman & DiPietro 1988), or simply by moving slowly. Research em-
phasizing dynamic manipulation either exploits clever mechanical mechanisms
which simplify control schemes (Schaal & Atkeson 1993, McGeer 1990) or results
in computationally complex methods (Mason & Salisbury 1985).
Humans, however, exploit the mechanical characteristics of their bodies. For
example, when humans swing their arms they choose comfortable frequencies
which are close to the natural resonant frequencies of their limbs (Herr 1993,
Hatsopoulos & Warren 1996). Similarly, when placed in a jumper, infants bounce
at the natural frequency (Warren & Karrer 1984). Humans also exploit the active
dynamics of their arm when throwing a ball (Rosenbaum et al. 1993) and the
passive dynamics of their arm to allow stable interaction with objects (Mussa-
Ivaldi, Hogan & Bizzi 1985). When learning new motions, both infants and
adults quickly utilize the physical dynamics of their limbs (Thelen & Smith
1994, Schneider, Zernicke, Schmidt & Hart 1989).
On our robot, we have exploited the dynamics of the arms to perform a
variety of tasks. The compliance of the arm allows both stable motion and safe
interaction with objects. Local controllers at each joint are physically coupled
through the mechanics of the arm, allowing these controllers to interact and
produce coordinated motion such as swinging a pendulum, turning a crank, and
playing with a slinky. Our initial experiments suggest that these solutions are
very robust to perturbations, do not require accurate calibration or parameter
tuning, and are computationally simple (Williamson 1998a, Williamson 1998b).

4.4 Multi-modal Coordination


Our group has developed many behaviors and skills for Cog, each involving
one or two sensory and/or motor systems – i.e. face finding, crank turning,
auditory localization. However, to be truly effective as an embodied robot, Cog
requires a general mechanism for overall sensory-motor coordination, a facility
for effectively combining skills or at least preventing them from interfering with
each other.
A multi-modal coordination system will manifest itself in three different ways.
First, for interactions between sensory systems, such a facility would provide
a basis for the combination of several sensory inputs into a more robust and
reliable view of the world. Second, interactions between motor systems produce
synergisms — coactivation of motor systems not directly involved with a task
but which prepare the robot for more effective execution overall. Third, for
interactions between sensory and motor systems, this system would provide a
method for “sensory tuning,” in which adjusting physical properties of the robot
can optimize the performance of a sensory system (foveation is a very basic
example).
The foundation for such a general coordination mechanism rests on two mod-
ules: a system that incorporates intrinsic performance measures into sensorimo-
tor processes, and a system for extracting correlations between sensorimotor
events. Combined, these provide sufficient information for Cog to learn how its
The Cog Project: Building a Humanoid Robot 69

internal systems interact with each other. Unfortunately, finding this information
is by no means trivial.
Performance measures are the most straightforward. For sensory processes,
the performance is estimated by a confidence measure, probably based on a com-
bination of repeatibility, error estimates, etc. Motor performance measurements
would be based upon criteria such as power expenditure, fatigue measures, safety
limits, and actuator accuracy.
Extracting correlations between sensorimotor events is more complex. The
first step is segmentation, that is, determining what constitutes an “event” within
a stream of proprioceptive data and/or motor commands. Segmentation algo-
rithms and filters can be hard-coded (but only for the most rudimentary enu-
meration of sensing and actuating processes) or created adaptively. Adaptive
segmentation creates and tunes filters based on how well they contribute to
the correlation models. Segmentation is crucial because it reduces the amount
of redundant information produced by confluent data streams. Any correlation
routine must deal with both the combinatorial problem of looking for patterns
between many different data sources and the problem of finding correlations
between events with time delays.
A general system for multimodal coordination is too complex to implement
all at once. We plan to start on a small scale, coordinating between two and
five systems. The first goal is a mechanism for posture — to coordinate, fixate,
and properly stiffen or relax torso, neck, and limbs for a variety of reaching and
looking tasks. Posture is not merely a reflexive control; it has feed-forward com-
ponents which require knowledge of impending tasks so that the robot can ready
itself. A postural system being so reactive and pervasive, requires a significant
amount of multi-modal integration.

5 Current Tasks

In pursuing the long-term projects outlined in the previous section, we have im-
plemented many simple behaviors on our humanoid robot. This section briefly
describes the tasks and behaviors that the robot is currently capable of perform-
ing. For brevity, many of the technical details and references to similar work
have been excluded here, but are available from the original citations. In ad-
dition, video clips of Cog performing many of these tasks are available from
http://www.ai.mit.edu/projects/cog/.

5.1 Visual-motor Routines

Human eye movements can be classified into five categories: three voluntary
movements (saccades, smooth pursuit, and vergence) and two involuntary move-
ments (the vestibulo-ocular reflex and the opto-kinetic response)(Goldberg, Eg-
gers & Gouras 1992). We have implemented mechanical analogues of each of
these eye motions.
70 Rodney A. Brooks et al.

Saccades: Saccades are high-speed ballistic motions that focus a salient object
on the high-resolution central area of the visual field (the fovea). In humans,
saccades are extremely rapid, often up to 900◦ per second. To enable our machine
vision systems to saccade to a target, we require a saccade function S : (x, e) →
∆e which produces a change in eye motor position (∆e) given the current eye
motor position (e) and the stimulus location in the image plane (x). To obtain
accurate saccades without requiring an accurate model of the kinematics and
optics, an unsupervised learning algorithm estimates the saccade function. This
implementation can adapt to the non-linear optical and mechanical properties
of the vision system. Marjanović et al. (1996) learned a saccade function for
this hardware platform using a 17 × 17 interpolated lookup table. The map was
initialized with a linear set of values obtained from self-calibration. For each
learning trial, a visual target was randomly selected. The robot attempted to
saccade to that location using the current map estimates. The target was located
in the post-saccade image using correlation, and the L2 offset of the target was
used as an error signal to train the map. The system learned to center pixel
patches in the peripheral field of view. The system converged to an average of
< 1 pixel of error in a 128 × 128 image per saccade after 2000 trials (1.5 hours).
With a trained saccade function S, the system can saccade to any salient stimulus
in the image plane. We have used this mapping for saccading to moving targets,
bright colors, and salient matches to static image templates.

Smooth-Pursuit Tracking: Smooth pursuit movements maintain the image


of a moving object on the fovea at speeds below 100◦ per second. Our current
implementation of smooth pursuit tracking acquires a visual target and attempts
to maintain the foveation of that target. The central 7 × 7 patch of the initial
64 × 64 image is installed as the target image. In this instance, we use a very
small image to reduce the computational load necessary to track non-artifact
features of an object. For each successive image, the central 44 × 44 patch is
correlated with the 7 × 7 target image. The best correlation value gives the
location of the target within the new image, and the distance from the center
of the visual field to that location gives the motion vector. The length of the
motion vector is the pixel error. The motion vector is scaled by a constant (based
on the time between iterations) and used as a velocity command to the motors.
This system operates at 20 Hz. and can successfully track moving objects whose
image projection changes slowly.

Binocular Vergence: Vergence movements adjust the eyes for viewing ob-
jects at varying depth. While the recovery of absolute depth may not be strictly
necessary, relative disparity between objects are critical for tasks such as accu-
rate hand-eye coordination, figure-ground discrimination, and collision detection.
Yamato (1998) built a system that performs binocular vergence and integrates
the saccadic and smooth-pursuit systems described previously. Building on mod-
els of the development of binocularity in infants, Yamato used local correlations
to identify matching targets in a foveal region in both eyes, moving the eyes to
The Cog Project: Building a Humanoid Robot 71

match the pixel locations of the targets in each eye. The system was also capable
of smoothly responding to changes of targets after saccadic motions, and during
smooth pursuit.

Vestibular-ocular and Opto-kinetic Reflexes: The vestibulo-ocular re-


flex and the opto-kinetic nystigmus cooperate to stabilize the eyes when the
head moves. The vestibulo-ocular reflex (VOR) stabilizes the eyes during rapid
head motions. Acceleration measurements from the semi-circular canals and the
otolith organs in the inner ear are integrated to provide a measurement of head
velocity, which is used to counter-rotate the eyes and maintain the direction of
gaze. The opto-kinetic nystigmus (OKN) compensates for slow, smooth motions
by measuring the optic flow of the background on the retina (also known as the
visual slip). OKN operates at much lower velocities than VOR (Goldberg et al.
1992). Many researchers have built accurate computational models and simula-
tions of the interplay between these two stabilization mechanisms (Lisberger &
Sejnowski 1992, Panerai & Sandini 1998). To mimic the human vestibular sys-
tem, Cog has three rate gyroscopes mounted on orthogonal axis (corresponding
to the semi-circular canals) and two linear accelerometers (corresponding to the
otolith organs).
A simple OKN can be constructed using a rough approximation of the optic
flow on the background image. Because OKN needs only to function at relatively
slow speeds (5 Hz is sufficient), and because OKN only requires a measurement
of optic flow of the entire field, our computational load is manageable. The
optic flow routine calculates the full-field background motion between successive
frames, giving a single estimate of camera motion. The optic flow estimate is a
displacement vector for the entire scene. Using the saccade map that we have
learned previously, we can obtain an estimate of the amount of eye motion we
require to compensate for the visual displacement.
A simple VOR can be constructed by integrating the velocity signal from
the rate gyroscopes, scaling that signal, and using it to drive the eye motors.
This technique works well for transient and rapid head motions, but fails for two
reasons. First, because the gyroscope signal must be integrated, the system tends
to accumulate drift. Second, the scaling constant must be selected empirically.
Both of these deficits can be eliminated by combining VOR with OKN.
Combining VOR with OKN provides a more stable, robust system (Peskin
& Scassellati 1997). The OKN system can be used to train the VOR scale con-
stant. The training routine moves the neck at a constant velocity with the VOR
enabled. While the neck is in motion, the OKN monitors the optical slip. If the
VOR constant is accurate for short neck motions, then the optical slip should
be zero. If the optical slip is non-zero, the VOR constant can be modified in the
appropriate direction. This on-line technique can adapt the VOR constant to an
appropriate value whenever the robot moves the neck at constant velocity over
short distances. The combination of VOR and OKN can also eliminate gradual
drift. The OKN will correct not only for slow head motions but also for slow
72 Rodney A. Brooks et al.

Fig. 5. Orientation to a salient stimulus. Once a salient stimulus (a moving hand) has
been detected, the robot first saccades to that target and then orients the head and
neck to that target.

drift from the VOR. We are currently working on implementing models of VOR
and OKN coordination to allow both systems to operate simultaneously.

5.2 Eye/Neck Orientation

Orienting the head and neck along the angle of gaze can maximize the range of
the next eye motion while giving the robot a more life-like appearance. Once the
eyes have foveated a salient stimulus, the neck should move to point the head in
the direction of the stimulus while the eyes counter-rotate to maintain fixation
on the target (see Figure 5). To move the neck the appropriate distance, we must
construct a mapping N : (n, e) →∆n which produces a change in neck motor
positions (∆n) given the current neck position (n) and the initial eye position
(e). Because we are mapping motor positions to motor positions with axes that
are roughly parallel, a simple linear mapping has sufficed: ∆n = (k ė − n) for
some constant k.2
There are two possible mechanisms for counter-rotating the eyes while the
neck is in motion: the vestibulo-ocular reflex or an efference copy signal of the
neck motion. VOR can be used to compensate for neck motion without any
additions necessary. Because the reflex uses gyroscope feedback to maintain the
eye position, no communication between the neck motor controller and the eye
motor controller is necessary. This can be desirable if there is limited bandwith
between the processors responsible for neck and eye control. However, using VOR
to compensate for neck motion can become unstable. Because the gyroscopes
are mounted very close to the neck motors, motion of the neck can result in
additional vibrational noise on the gyroscopes. However, since the neck motion
is a voluntary movement, our system can utilize additional information in order
to counter-rotate the eyes, much as humans do (Ghez 1992). An efference copy
signal can be used to move the eye motors while the neck motors are moving. The
neck motion signal can be scaled and sent to the eye motors to compensate for
the neck motion. The scaling constant is simply k1 , where k is the same constant
2
This linear mapping has only been possible with motor-motor mappings and not
sensory-motor mappings because of non-linearities in the sensors.
The Cog Project: Building a Humanoid Robot 73

TONIC
INPUT c β v1

11
00
00
11
hj [gj]+
y1
1
PROPRIOCEPTIVE

1
0
0
1
INPUT gj OUTPUT
+ y out
ω y1 ω y2
-

1
0
0
1
hj [gj]-
y2

11
00
11
00
2

11
00
00
11
TONIC
INPUT c
β v2

Fig. 6. Schematic of the oscillator. Black circles correspond to inhibitory connections,


open circles to excitatory. The βvi connections correspond to self-inhibition, and the
ωyi connections give the mutual inhibition. The positive and negative parts of the input
gj are weighted by the gain hj before being applied to the neurons. The two outputs
yi are combined to give the oscillator output yout .

that was used to determine ∆n. Just as with the vestibulo-ocular reflex, the
scaling constants can be obtained using controlled motion and feedback from
the opto-kinetic nystigmus. Using efference copy with constants obtained from
OKN training results in a stable system for neck orientation.

5.3 Dynamic Oscillator Motor Control

Neural oscillators have been used to generate repetitive arm motions. The cou-
pling between a set of oscillators and the physical arm of the robot achieves
many different tasks using the same software architecture and without explicit
models of the arm or environment. The tasks include swinging pendulums at
their resonant frequencies, turning cranks, and playing with a slinky.
Using a proportional-derivative control law, the torque at the ith joint can
be described by:
ui = ki (θvi − θi ) − bi θ˙i (1)
where ki is the stiffness of the joint, bi the damping, θi the joint angle, and
θvi the equilibrium point. By altering the stiffness and damping of the arm, the
dynamical characteristics of the arm can be changed. The posture of the arm
can be changed by altering the equilibrium points (Williamson 1996). This type
of control preserves stability of motion. The elastic elements of the arm produce
a system that is both compliant and shock resistant, allowing the arm to operate
in unstructured environments.
Two simulated neurons with mutually inhibitory connections drive each arm
joint, as shown in Figure 6. The neuron model describes the firing rate of a
biological neuron with self-inhibition (Matsuoka 1985). The firing rate of each
74 Rodney A. Brooks et al.

neuron is governed by the following equations:

+ j=n +
τ1 x˙1 = −x1 − βv1 − ω [x2 ] − Σj=1 hj [gj ] + c (2)
+
τ2 v˙1 = −v1 + [x1 ] (3)
+ j=n −
τ1 x˙2 = −x2 − βv2 − ω [x1 ] − Σj=1 hj [gj ] + c (4)
τ2 v˙2 = −v2 + [x2 ]+ (5)
+
yi = [xi ] = max(xi , 0) (6)
yout = y1 − y2 (7)

where xi is the firing rate, vi is the self-inhibition of the neuron (modulated by


the adaption constant β), and ω controls the mutual inhibition. The output of
each neuron yi is the positive portion of the firing rate, and the output of the
whole oscillator is yout . Any number of inputs gj can be applied to the oscilla-
tor, including proprioceptive signals and signals from other neurons. Each input
is scaled by a gain hj and arranged to excite one neuron while inhibiting the
+
other by applying the positive portion of the input ([gj ] ) to one neuron and the
negative portion to the other. The amplitude of the oscillation is proportional
to the tonic excitation c. The speed and shape of the oscillator output are de-
termined by the time constants τ1 and τ2 . For stable oscillations, τ1 /τ2 should
be between 0.1 and 0.5. The stability and properties of this oscillator system
and more complex networks of neurons are analyzed in Matsuoka (1985) and
Matsuoka (1987).
The output of the oscillator yout is connected to the equilibrium point θv .
One neuron flexes the joint and the other extends it about a fixed posture θp ,
making the equilibrium point θv = yout + θp . The inputs to the oscillators are
either the force (τ ) or the position (θ) of the joint.3 The interaction of the
oscillator dynamics and the physical dynamics of the arm form a tightly coupled
dynamical system. Unlike a conventional control system, there is no “set-point”
for the motion. The interaction of the two coupled dynamical systems determines
the overall arm motion.
The oscillators have two properties which make them suitable for certain
types of repetitive motions. First, they can entrain an input signal over a wide
range of frequencies. In the entrained state, the oscillator provides an output at
exactly the same frequency as the input, with a phase difference between input
and output which depends on frequency. Second, the oscillator also becomes
entrained very rapidly, typically within one cycle. Figure 7 shows the entrainment
of an oscillator at the elbow joint as the shoulder of the robot is moved. The
movement of the shoulder induces forces at the elbow which drive the elbow in
synchrony with the shoulder.

3
These signals in general have an offset (due to gravity loading, or other factors).
When the positive and negative parts are extracted and applied to the oscillators, a
low-pass filter is used to find and remove the DC component.
The Cog Project: Building a Humanoid Robot 75

Without force feedback


30
sh angle
20 el angle
speed

joint angles
10

−10

−20
0 1 2 3 4 5 6 7
Time seconds

With force feedback


30

20
joint angles

10

−10

−20
0 1 2 3 4 5 6 7
Time seconds

Fig. 7. Entrainment of an oscillator at the elbow as the shoulder is moved. The joints
are connected only through the physical structure of the arm. Both plots show the
angle of the shoulder (solid) and the elbow (dashed) as the speed of the shoulder is
changed (speed parameter dash-dot). The top graph shows the response of the arm
without proprioception, and the bottom with proprioception. Synchronization occurs
only with the proprioceptive feedback.

Slinky: The entrainment property can be exploited to manipulate objects, such


as a slinky. As the slinky is passed from hand to hand, the weight of the slinky
is used to entrain oscillators at both elbow joints. The oscillators are completely
independent, and unsynchronized, in software. With the slinky forming a physi-
cal connection between the two systems, the oscillators work in phase to produce
the correct motion. The adaptive nature of the oscillators allows them to quickly
recover from interruptions of motion and changes in speed. An example of the
coordination is shown in Figure 8.

Cranks: The position constraint of a crank can also be used to coordinate the
joints of the arm. If the arm is attached to the crank and some of the joints are
moved, then the other joints are constrained by the crank. The oscillators can
sense the motion, adapt, and settle into a stable crank turning motion.
In the future, we will explore issues of complex redundant actuation (such as
multi-joint muscles), utilize optimization techniques to tune the parameters of
the oscillator, produce whole-arm oscillations by connecting various joints into a
single oscillator, and explore the use of postural primitives to move the set point
of the oscillations.

5.4 Pointing to a Visual Target


We have implemented a pointing behavior which enables Cog to reach out its
arm to point to a visual target (Marjanović et al. 1996). The behavior is learned
76 Rodney A. Brooks et al.

60

left arm
40 right arm

equilibrium angle
feedback gain
20

−20

−40
0 2 4 6 8 10 12
time − seconds

60

40
equilbrium angle

20

−20

−40
0 2 4 6 8 10 12
time − seconds

Fig. 8. The robot operating the slinky. Both plots show the outputs from the oscil-
lators as the proprioception is turned on and off. With proprioception, the outputs
are synchronized. Without proprioception, the oscillators move out of phase. The only
connection between the oscillators is through the physical structure of the slinky.

over many repeated trials without human supervision, using gradient descent
methods to train forward and inverse mappings between a visual parameter space
and an arm position parameter space. This behavior uses a novel approach to
arm control, and the learning bootstraps from prior knowledge contained within
the saccade behavior (discussed in Section 5.1). As implemented, the behavior
assumes that the robot’s neck remains in a fixed position.
From an external perspective, the behavior is quite rudimentary. Given a
visual stimulus, typically by a researcher waving an object in front of its cam-
eras, the robot saccades to foveate on the target, and then reaches out its arm
toward the target. Early reaches are inaccurate, and often in the wrong direction
altogether, but after a few hours of practice the accuracy improves drastically.
The reaching algorithm involves an amalgam of several subsystems. A motion
detection routine identifies a salient stimulus, which serves as a target for the
saccade module. This foveation guarantees that the target is always at the center
of the visual field; the coordinates of the target on the retina are always the
center of the visual field, and the position of the target relative to the robot is
wholly characterized by the gaze angle of the eyes (only two degrees of freedom).
Once the target is foveated, the joint configuration necessary to point to that
target is generated from the gaze angle of the eyes using a “ballistic map.” This
configuration is used by the arm controller to generate the reach.
Training the ballistic map is complicated by the inappropriate coordinate
space of the error signal. When the arm is extended, the robot waves its hand.
This motion is used to locate the end of the arm in the visual field. The distance
of the hand from the center of the visual field is the measure of the reach error.
However, this error signal is measured in units of pixels, yet the map being
The Cog Project: Building a Humanoid Robot 77

trained relates gaze angles to joint positions. The reach error measured by the
visual system cannot be directly used to train the ballistic map. However, the
saccade map has been trained to relate pixel positions to gaze angles. The saccade
map converts the reach error, measured as a pixel offset on the retina, into an
offset in the gaze angles of the eyes (as if Cog were looking at a different target).
This is still not enough to train the ballistic map. Our error is now in terms
of gaze angles, not joint positions — i.e. we know where Cog could have looked,
but not how it should have moved the arm. To train the ballistic map, we also
need a “forward map” — i.e. a forward kinematics function which gives the gaze
angle of the hand in response to a commanded set of joint positions. The error
in gaze coordinates can be back-propagated through this map, yielding a signal
appropriate for training the ballistic map.
The forward map is learned incrementally during every reach: after each
reach we know the commanded arm position, as well as the position measured
in eye gaze coordinates (even though that was not the target position). For the
ballistic map to train properly, the forward map must have the correct signs in
its derivative. Hence, training of the forward map begins first, during a “flail-
ing” period in which Cog performs reaches to random arm positions distributed
through its workspace.
Although the arm has four joints active in moving the hand to a particular
position in space (the other two control the orientation of the hand), we re-
parameterize in such a way that we only control two degrees of freedom for a
reach. The position of the outstretched arm is governed by a normalized vector
of “postural primitives.” A primitive is a fixed set joint angles, corresponding
to a static position of the arm, placed at a corner of the workspace. Three such
primitives form a basis for the workspace. The joint space command for the arm is
calculated by interpolating the joint space components between each primitive,
weighted by the coefficients of the primitive-space vector. Since the vector in
primitive space is normalized, three coefficients give rise to only two degrees of
freedom. Hence, a mapping between eye gaze position and arm position, and
vice versa, is a simple, non-degenerate R2 → R2 function. This considerably
simplifies learning.
Unfortunately, the notion of postural primitives as formulated is very brit-
tle: the primitives are chosen ad-hoc to yield a reasonable workspace. Finding
methods to adaptively generate primitives and divide the workspace is a subject
of active research.

5.5 Recognizing Joint Attention Through Face and Eye Finding


The first joint attention behaviors that infants engage in involve maintaining
eye contact. To enable our robot to recognize and maintain eye contact, we have
implemented a perceptual system capable of finding faces and eyes (Scassellati
1998c). The system first locates potential face locations in the peripheral image
using a template-based matching algorithm developed by Sinha (1996). Once a
potential face location has been identified, the robot saccades to that target using
the saccade mapping S described earlier. The location of the face in peripheral
78 Rodney A. Brooks et al.

image coordinates (p(x,y) ) is then mapped into foveal image coordinates (f(x,y) )
using a second learned mapping, the foveal map F : p(x,y) → f(x,y) . The location
of the face within the peripheral image can then be used to extract the sub-image
containing the eye for further processing.
This technique has been successful at locating and extracting sub-images that
contain eyes under a variety of conditions and from many different individuals.
Additional information on this task and its relevance to building systems that
recognize joint attention can be found in the chapter by Scassellati.

5.6 Imitating head nods


By adding a tracking mechanism to the output of the face detector and then
classifying these outputs, we have been able to have the system mimic yes/no
head nods of the caregiver (that is, when the caretaker nods yes, the robot
responds by nodding yes). The face detection module produces a stream of face
locations at 20Hz. An attentional marker is attached to the most salient face
stimulus, and the location of that marker is tracked from frame to frame. If
the position of the marker changes drastically, or if no face is determined to be
salient, then the tracking routine resets and waits for a new face to be acquired.
Otherwise, the motion of the attentional marker for a fixed-duration window is
classified into one of three static classes: the yes class, the no class, or the no-
motion class. Two metrics are used to classify the motion, the cumulative sum
of the displacements between frames (the relative displacement over the time
window) and the cumulative sum of the absolute values of the displacements
(the total distance traveled by the marker). If the horizontal total trip distance
exceeds a threshold (indicating some motion), and if the horizontal cumulative
displacement is below a threshold (indicating that the motion was back and
forth around a mean), and if the horizontal total distance exceeds the vertical
total distance, then we classify the motion as part of the no class. Otherwise,
if the vertical cumulative total trip distance exceeds a threshold (indicating
some motion), and if the vertical cumulative displacement is below a threshold
(indicating that the motion was up and down around a mean), then we classify
the motion as part of the yes class. All other motion types default to the no-
motion class. These simple classes then drive fixed-action patterns for moving
the head and eyes in a yes or no nodding motion. While this is a very simple
form of imitation, it is highly selective. Merely producing horizontal or vertical
movement is not sufficient for the head to mimic the action – the movement
must come from a face-like object.

5.7 Regulating Interactions through Expressive Feedback


In Section 4.2, we described ongoing research toward building a robotic “in-
fant” capable of learning communicative behaviors with the assistance of a hu-
man caretaker. For our purposes, the context for learning involves social ex-
changes where the robot learns how to manipulate the caretaker into satisfying
the robot’s internal drives. Ultimately, the communication skills targeted for
The Cog Project: Building a Humanoid Robot 79

learning are those exhibited by infants such as turn taking, shared attention,
and pre-linguistic vocalizations exhibiting shared meaning with the caretaker.
Towards this end, we have implemented a behavior engine for the develop-
ment platform Kismet that integrates perceptions, drives, emotions, behaviors,
and facial expressions. These systems influence each other to establish and main-
tain social interactions that can provide suitable learning episodes, i.e., where the
robot is proficient yet slightly challenged, and where the robot is neither under-
stimulated nor over-stimulated by its interaction with the human. Although we
do not claim that this system parallels infants exactly, its design is heavily in-
spired by the role motivations and facial expressions play in maintaining an
appropriate level of stimulation during social interaction with adults.
With a specific implementation, we demonstrated how the system engages
in a mutually regulatory interaction with a human while distinguishing between
stimuli that can be influenced socially (face stimuli) and those that cannot (mo-
tion stimuli) (Breazeal & Scassellati 1998). The total system consists of three
drives (fatigue, social, and stimulation), three consummatory behaviors
(sleep, socialize, and play), five emotions (anger, disgust, fear, happiness,
sadness), two expressive states (tiredness and interest), and their corre-
sponding facial expressions. A human interacts with the robot through direct
face-to-face interaction, by waving a hand at the robot, or using a toy to play
with the robot. The toys included a small plush black and white cow and an or-
ange plastic slinky. The perceptual system classifies these interactions into two
classes: face stimuli and non-face stimuli. The face detection routine classifies
both the human face and the face of the plush cow as face stimuli, while the
waving hand and the slinky are classified as non-face stimuli. Additionally, the
motion generated by the object gives a rating of the stimulus intensity. The
robot’s facial expressions reflect its ongoing motivational state and provides the
human with visual cues as to how to modify the interaction to keep the robot’s
drives within homeostatic ranges.
In general, as long as all the robot’s drives remain within their homeostatic
ranges, the robot displays interest. This cues the human that the interac-
tion is of appropriate intensity. If the human engages the robot in face-to-face
contact while its drives are within their homeostatic regime, the robot displays
happiness. However, once any drive leaves its homeostatic range, the robot’s
interest and/or happiness wane(s) as it grows increasingly distressed. As this
occurs, the robot’s expression reflects its distressed state. In general, the facial
expressions of the robot provide visual cues which tell whether the human should
switch the type of stimulus and whether the intensity of interaction should be
intensified, diminished or maintained at its current level.
For instance, if the robot is under-stimulated for an extended period of time,
it shows an expression of sadness. This may occur either because its social
drive has migrated into the “lonely” regime due to a lack of social stimulation
(perceiving faces near by), or because its stimulation drive has migrated into
the “bored” regime due to a lack of non-face stimulation (which could be pro-
vided by slinky motion, for instance). The expression of sadness upon the robot’s
80 Rodney A. Brooks et al.

Interaction with face


2500

Anger
2000
Activation Level Disgust
Interest
1500 Sadness
Happiness
1000

500

0
0 20 40 60 80 100 120 140 160 180 200
Time (seconds)

2000

1000
Activation Level

Social drive
−1000
Socialize behavior
Face stimulus
−2000
0 20 40 60 80 100 120 140 160 180 200
Time (seconds)

Fig. 9. Experimental results for Kismet interacting with a person’s face. When the
face is present and moving slowly, the robot looks interested and happy. When the face
begins to move too quickly, the robot begins to show disgust, which eventually leads
to anger.

face tells the caretaker that the robot needs to be played with. In contrast, if
the robot receives an overly-intense face stimulus for an extended period of time,
the social drive moves into the “asocial” regime and the robot displays an ex-
pression of disgust. This expression tells the caretaker that she is interacting
inappropriately with the robot – moving her face too rapidly and thereby over-
whelming the robot. Similarly, if the robot receives an overly-intense non-face
stimulus (e.g. perceiving large slinky motions) for an extended period of time,
the robot displays a look of fear. This expression also tells the caretaker that
she is interacting inappropriately with the robot, probably moving the slinky
too much and over stimulating the robot.

These interactions characterize the robot’s behavior when interacting with


a human. Figure 9 demonstrates how the robot’s emotive cues are used to reg-
ulate the nature and intensity of social interaction, and how the nature of the
interaction influences the robot’s social drives and behavior. The result is an
ongoing “dance” between robot and human aimed at maintaining the robot’s
drives within homeostatic bounds. If the robot and human are good partners,
the robot remains interested and/or happy most of the time. These expressions
indicate that the interaction is of appropriate intensity for learning.
The Cog Project: Building a Humanoid Robot 81

6 Future Research Directions


Human beings are the most complex machines that our species has yet examined.
Clearly a small effort such as that described in this paper can only scratch the
surface of an understanding of how they work. We have concentrated on a number
of issues that are well beyond the purely mechatronic ambitions of many robotic
projects (humanoid and other). Our research has focused on exploring research
issues aimed at building a fully integrated humanoid, rather than concentrating
on building an integrated humanoid for its own sake.
Our ultimate goal is to understand human cognitive abilities well enough to
build a humanoid robot that develops and acts similar to a person. To date, the
major missing piece of our endeavor is demonstrating coherent global behavior
from the existing subsystems and sub-behaviors. If all of these systems were
active at once, competition for actuators and unintended couplings through the
world would result in incoherence and interference among the subsystems. The
problem is deeper than simply that of multi-modal systems discussed in section
4.4.

6.1 Coherence
We have used simple cues, such as visual motion and sounds, to focus the visual
attention of Cog. However, each of these systems has been designed indepen-
dently and assumes complete control over system resources such as actuator
positions, computational resources, and sensory processing. We need to extend
our current emotional and motivational models (Breazeal & Scassellati 1998) so
that Cog might exhibit both a wide range of qualitatively different behaviors,
and be coherent in the selection and execution of those behaviors.
It is not acceptable for Cog to be repeatedly distracted by the presence of
a single person’s face when trying to attend to other tasks such as grasping
or manipulating an object. Looking up at a face that has just appeared in the
visual field is important. Looking at what the object being manipulated is also
important. Neither stimulus should completely dominate the other, but perhaps
preference should be given based upon the current goals and motivations of
the system. This simple example is multiplied with the square of the number of
basic behaviors available to Cog, and so the problem grows rapidly. At this point
neither we, nor any other robotics researchers, have focused on this problem in
a way which has produced any valid solutions.

6.2 Other Perceptual Systems


We have a small number of tactile sensors mounted on Cog, but nothing near
the number that occur in biological systems. Furthermore, their capabilities are
quite limited when compared to the mammalian somatosensory system.
Cog does have kinesthetic sensors on some joints to provide a sense of how
hard it was working, but we have not yet found a useful way to use that informa-
tion. Nor have we made use of the force sensing that is available at every joint of
82 Rodney A. Brooks et al.

the arms beyond direct use in feedback control — there has been no connection
of that information to other cognitive mechanisms.
Finally, we have completely ignored some of the primary senses that are used
by humans, especially infants; we have ignored the chemical senses of smell and
taste.
Physical sensors are available for all these modalities but they are very crude
compared to those that are present in humans. It may not be instructive to try
to integrate these sensory modalities into Cog when the fidelity will be so much
lower than that of the, admittedly crude, current modalities.

6.3 Deeper Visual Perception

So far we have managed to operate with visual capabilities that are much sim-
pler than those of humans, although the performance of those that we do use are
comparable to the best available in artificial systems. We have concentrated on
motion perception, face detection and eye localization, and content-free sensory
motor routines, such as smooth pursuit, the vestibular-ocular reflex, and ver-
gence control. In addition to integrating all these pieces into a coherent whole,
we must also give the system some sort of understanding of regularities in its
environment.
A conventional approach to this would be to build object recognition systems
and face recognition systems (as opposed to our current face detection systems).
We believe that these two demands need to be addressed separately and that
neither is necessarily the correct approach.
Face recognition is an obvious step beyond simple face detection. Cog should
be able to invoke previous interaction patterns with particular people or toys
with faces whenever that person or toy is again present in its environment. Face
recognition systems typically record detailed shape or luminance information
about particular faces and compare observed shape parameters against a stored
database of previously seen data. We question whether moving straight to such
a system is necessary and whether it might not be possible to build up a more
operational sense of face recognition that may be closer to the developmental
path taken by children.
In particular we suspect that rather simple measures of color and contrast
patterns coupled with voice cues are sufficient to identify the handful of people
and toys with which a typical infant will interact. Characteristic motion cues
might also help in the recognition, leading to a stored model that is much richer
than a face template for a particular person, and leading to more widespread
and robust recognition of the person (or toy) from a wider range of viewpoints.
We also believe that classical object recognition techniques from machine
vision are not the appropriate approach for our robot. Rather than forcing all
recognition to be based on detailed shape extraction we think it is important
that a developmental path for object recognition be followed. This will include
development of vergence and binocularity, development of concepts of object
The Cog Project: Building a Humanoid Robot 83

permanence, and the early development of color perception that is robust to


varied lighting.4

6.4 A Sense of Time


Currently, Cog has no sense of time. Everything is in the present, with the
exception of some short term state implemented via the emotional levels present
in the Kismet platform. These emotional states can act as the keys to K-line
like indexing into associative memory, but this is not sufficient to produce the
richness of experience and subsequent intelligence that humans exhibit.
A key technical problem is how to relate the essentially static and timeless
aspects of memory that are present in neural networks, registration maps, self-
organizing maps, nearest neighbor approximations, and associative memory, to
the flow of time we as human beings experience.
This is a real technical challenge. A conventional AI system has separate
program and data, and the program has a natural flow of time that it can then
record in a data structure. Our models do not make this sort of distinction; there
is neither a sequential place in memory nor a process to capitalize on it. Given
that we have rejected the conventional approaches, we must find a solution to
the problem of how episodic memory might arise.
This chapter has focused on the current capabilities of our humanoid robotic
systems and the future directions that our research will address. These problems
are simply the beginning of what we hope will be a rich source of both new
research questions and innovative solutions to existing problems.

7 Acknowledgments
Support for this project is provided in part by an ONR/ARPA Vision MURI
Grant (No. N00014-95-1-0600).

References
An, C. H., Atkeson, C. G. & Hollerbach, J. M. (1988), Model-based control of a robot
manipulator, MIT Press, Cambridge, MA.
Ashby, W. R. (1960), Design for a Brain, second edn, Chapman and Hall.
Ballard, D., Hayhoe, M. & Pelz, J. (1995), ‘Memory representations in natural tasks’,
Journal of Cognitive Neuroscience pp. 66–80.
Baron-Cohen, S. (1995), Mindblindness, MIT Press.
Blythe, J. & Veloso, M. (1997), Analogical Replay for Efficient Conditional Planning,
in ‘Proceedings of the American Association of Artificial Intelligence (AAAI-97)’,
pp. 668–673.
4
It is well known that the human visual system, at least in adults, is sensitive to
the actual pigmentation of surfaces rather than the frequency spectrum of the light
that arrives on the retina. This is a remarkable and counter-intuitive fact, and is
rarely used in modern computer vision, where cheap successes with simple direct
color segmentation have gotten impressive but non-extensible results.
84 Rodney A. Brooks et al.

Boutilier, C. & Brafman, R. I. (1997), Planning with Concurrent Interacting Actions,


in ‘Proceedings of the American Association of Artificial Intelligence (AAAI-97)’,
pp. 720–726.
Brafman, R. I. (1997), A Heuristic Variable Grid Solution Method for POMDPs, in
‘Proceedings of the American Association of Artificial Intelligence (AAAI-97)’,
pp. 727–733.
Breazeal, C. & Scassellati, B. (1998), ‘Infant-like Social Interactions between a Robot
and a Human Caretaker’, Adaptive Behavior. In submission.
Breazeal, C. & Velasquez, J. (1998), Toward teaching a robot “infant” using emo-
tive communication acts, in ‘Socially Situated Intelligence: Papers from the 1998
Simulated Adaptive Behavior Workshop’.
Breazeal (Ferrell), C. (1998), A Motivational System for Regulating Human-Robot
Interaction, in ‘Proceedings of the American Association of Artificial Intelligence
(AAAI-98)’.
Brooks, R. A. (1986), ‘A Robust Layered Control System for a Mobile Robot’, IEEE
Journal of Robotics and Automation RA-2, 14–23.
Brooks, R. A. (1991a), Intelligence Without Reason, in ‘Proceedings of the 1991 Inter-
national Joint Conference on Artificial Intelligence’, pp. 569–595.
Brooks, R. A. (1991b), ‘Intelligence Without Representation’, Artificial Intelligence
Journal 47, 139–160. originally appeared as MIT AI Memo 899 in May 1986.
Brooks, R. A. & Stein, L. A. (1994), ‘Building brains for bodies’, Autonomous Robots
1(1), 7–25.
Brooks, R. A., Breazeal (Ferrell), C., Irie, R., Kemp, C. C., Marjanović, M., Scas-
sellati, B. & Williamson, M. M. (1998), Alternative Essences of Intelligence, in
‘Proceedings of the American Association of Artificial Intelligence (AAAI-98)’.
Bullowa, M. (1979), Before Speech: The Beginning of Interpersonal Communicaion,
Cambridge University Press, Cambridge, London.
Cannon, S. & Zahalak, G. I. (1982), ‘The mechanical behavior of active human skeletal
muscle in small oscillations’, Journal of Biomechanics 15, 111–121.
Chappell, P. & Sander, L. (1979), Mutual regulation of the neonatal-materal interactive
process: context for the origins of communication, in M. Bullowa, ed., ‘Before
Speech’, Cambridge University Press, pp. 191–206.
Churchland, P., Ramachandran, V. & Sejnowski, T. (1994), A Critique of Pure Vision,
in C. Koch & J. Davis, eds, ‘Large-Scale Neuronal Theories of the Brain’, MIT
Press.
Cohen, D. J. & Volkmar, F. R., eds (1997), Handbook of Autism and Pervasive Devel-
opmental Disorders, second edn, John Wiley & Sons, Inc.
Cohen, M. & Massaro, D. (1990), ‘Synthesis of visible speech’, Behaviour Research
Methods, Intruments and Computers 22(2), pp. 260–263.
Costello, T. (1997), Beyond Minimizing Change, in ‘Proceedings of the American As-
sociation of Artificial Intelligence (AAAI-97)’, pp. 448–453.
Damasio, A. R. (1994), Descartes’ Error, G.P. Putnam’s Sons.
Diamond, A. (1990), Developmental Time Course in Human Infants and Infant Mon-
keys, and the Neural Bases of Inhibitory Control in Reaching, in ‘The Devel-
opment and Neural Bases of Higher Cognitive Functions’, Vol. 608, New York
Academy of Sciences, pp. 637–676.
Ferrell, C. (1996), Orientation Behavior using Registered Topographic Maps, in ‘From
Animals to Animats: Proceedings of 1996 Society of Adaptive Behavior’, Cape
Cod, Massachusetts, pp. 94–103.
The Cog Project: Building a Humanoid Robot 85

Ferrell, C. & Kemp, C. (1996), An Ontogenetic Perspective to Scaling Sensorimotor


Intelligence, in ‘Embodied Cognition and Action: Papers from the 1996 AAAI
Fall Symposium’, AAAI Press.
Frith, U. (1990), Autism : Explaining the Enigma, Basil Blackwell.
Gazzaniga, M. S. & LeDoux, J. E. (1978), The Integrated Mind, Plenum Press, New
York.
Ghez, C. (1992), Posture, in E. R. Kandel, J. H. Schwartz & T. M. Jessell, eds, ‘Prin-
ciples of Neural Science’, 3rd edn, Appleton and Lange.
Goldberg, M. E., Eggers, H. M. & Gouras, P. (1992), The Ocular Motor System, in
E. R. Kandel, J. H. Schwartz & T. M. Jessell, eds, ‘Principles of Neural Science’,
3rd edn, Appleton and Lange.
Greene, P. H. (1982), ‘Why is it easy to control your arms?’, Journal of Motor Behavior
14(4), 260–286.
Halliday, M. (1975), Learning How to Mean: Explorations in the Development of Lan-
guage, Elsevier, New York, NY.
Hatsopoulos, N. G. & Warren, W. H. (1996), ‘Resonance Tuning in Rhythmic Arm
Movements’, Journal of Motor Behavior 28(1), 3–14.
Hauskrecht, M. (1997), Incremental Methods for computing bounds in partially ob-
servable Markov decision processes, in ‘Proceedings of the American Association
of Artificial Intelligence (AAAI-97)’, pp. 734–739.
Herr, H. (1993), Human Powered Elastic Mechanisms, Master’s thesis, Massachusetts
Institute of Technology, Cambridge, Massachusetts.
Hirai, K., Hirose, M., Haikawa, Y. & Takenaka, T. (1998), The Development of the
Honda Humanoid Robot, in ‘Proceedings of the 1998 IEEE International Confer-
ence on Robotics and Automation (ICRA-98)’, IEEE Press.
Hobson, R. P. (1993), Autism and the Development of Mind, Erlbaum.
Irie, R. E. (1997), Multimodal Sensory Integration for Localization in a Humanoid
Robot, in ‘Proceedings of Second IJCAI Workshop on Computational Auditory
Scene Analysis (CASA’97)’, IJCAI-97.
Johnson, M. H. (1993), Constraints on Cortical Plasticity, in M. H. Johnson, ed., ‘Brain
Development and Cognition: A Reader’, Blackwell, Oxford, pp. 703–721.
Kanehiro, F., Mizuuchi, I., Koyasako, K., Kakiuchi, Y., Inaba, M. & Inoue, H. (1998),
Development of a Remote-Brained Humanoid for Research on Whole Body Ac-
tion, in ‘Proceedings of the 1998 IEEE International Conference on Robotics and
Automation (ICRA-98)’, IEEE Press.
Kaye, K. (1979), Thickening Thin Data: The Maternal Role in Developing Communi-
cation and Language, in M. Bullowa, ed., ‘Before Speech’, Cambridge University
Press, pp. 191–206.
Knudsen, E. I. & Knudsen, P. F. (1985), ‘Vision Guides the Adjustment of Auditory
Localization in Young Barn Owls’, Science 230, 545–548.
Lakoff, G. (1987), Women, Fire, and Dangerous Things: What Categories Reveal about
the Mind, University of Chicago Press, Chicago, Illinois.
Lisberger, S. G. & Sejnowski, T. J. (1992), ‘Motor learning in a recurrent network
model based on the vestibulo-ocular reflex’, Nature 260, 159–161.
Littman, M. L. (1997), Probabilistic Propositional Planning: Representations and Com-
plexity, in ‘Proceedings of the American Association of Artificial Intelligence
(AAAI-97)’, pp. 748–754.
Lobo, J., Mendez, G. & Taylor, S. R. (1997), Adding Knowledge to the Action De-
scription Language A, in ‘Proceedings of the American Association of Artificial
Intelligence (AAAI-97)’, pp. 454–459.
86 Rodney A. Brooks et al.

MacKay, W. A., Crammond, D. J., Kwan, H. C. & Murphy, J. T. (1986), ‘Measurements


of human forearm posture viscoelasticity’, Journal of Biomechanics 19, 231–238.
Marjanović, M. J., Scassellati, B. & Williamson, M. M. (1996), Self-Taught Visually-
Guided Pointing for a Humanoid Robot, in ‘From Animals to Animats: Proceed-
ings of 1996 Society of Adaptive Behavior’, Cape Cod, Massachusetts, pp. 35–44.
Mason, M. T. & Salisbury, Jr., J. K. (1985), Robot Hands and the Mechanics of Ma-
nipulation, MIT Press, Cambridge, Massachusetts.
Matsuoka, K. (1985), ‘Sustained oscillations generated by mutually inhibiting neurons
with adaption’, Biological Cybernetics 52, 367–376.
Matsuoka, K. (1987), ‘Mechanisms of frequency and pattern control in neural rhythm
generators’, Biological Cybernetics 56, 345–353.
McCain, N. & Turner, H. (1997), Causal Theories of Action and Change, in ‘Proceed-
ings of the American Association of Artificial Intelligence (AAAI-97)’, pp. 460–
465.
McGeer, T. (1990), Passive Walking with Knees, in ‘Proc 1990 IEEE Intl Conf on
Robotics and Automation’.
Minsky, M. & Papert, S. (1970), ‘Draft of a proposal to ARPA for research on artificial
intelligence at MIT, 1970-71’.
Morita, T., Shibuya, K. & Sugano, S. (1998), Design and Control of Mobile Manipula-
tion System for Human Symbiotic Humanoid, in ‘Proceedings of the 1998 IEEE
International Conference on Robotics and Automation (ICRA-98)’, IEEE Press.
Mussa-Ivaldi, F. A., Hogan, N. & Bizzi, E. (1985), ‘Neural, Mechanical, and Geometric
Factors Subserving Arm Posture in humans’, Journal of Neuroscience 5(10), 2732–
2743.
Newson, J. (1979), The growth of shared understandings between infant and caregiver,
in M. Bullowa, ed., ‘Before Speech’, Cambridge University Press, pp. 207–222.
Panerai, F. & Sandini, G. (1998), ‘Oculo-Motor Stabilization Reflexes: Integration of
Inertial and Visual Information’, Neural Networks. In press.
Peskin, J. & Scassellati, B. (1997), Image Stabilization through Vestibular and Retinal
Feedback, in R. Brooks, ed., ‘Research Abstracts’, MIT Artificial Intelligence
Laboratory.
Pratt, G. A. & Williamson, M. M. (1995), Series Elastic Actuators, in ‘Proceedings
of the IEEE/RSJ International Conference on Intelligent Robots and Systems
(IROS-95)’, Vol. 1, Pittsburg, PA, pp. 399–406.
Rensink, R., O’Regan, J. & Clark, J. (1997), ‘To See or Not to See: The Need for
Attention to Perceive Changes in Scenes’, Psychological Science 8, 368–373.
Rosenbaum, D. A. et al. (1993), ‘Knowledge Model for Selecting and Producing Reach-
ing Movements’, Journal of Motor Behavior 25(3), 217–227.
Salisbury, J., Townsend, W. T., Eberman, B. S. & DiPietro, D. M. (1988), Preliminary
Design of a Whole arm Manipulation System (WAMS), in ‘Proc 1988 IEEE Intl
Conf on Robotics and Automation’.
Scassellati, B. (1996), Mechanisms of Shared Attention for a Humanoid Robot, in
‘Embodied Cognition and Action: Papers from the 1996 AAAI Fall Symposium’,
AAAI Press.
Scassellati, B. (1998a), A Binocular, Foveated Active Vision System, Technical Report
1628, MIT Artificial Intelligence Lab Memo.
Scassellati, B. (1998b), Building Behaviors Developmentally: A New Formalism, in
‘Integrating Robotics Research: Papers from the 1998 AAAI Spring Symposium’,
AAAI Press.
Scassellati, B. (1998c), Finding Eyes and Faces with a Foveated Vision System, in
‘Proceedings of the American Association of Artificial Intelligence (AAAI-98)’.
The Cog Project: Building a Humanoid Robot 87

Scassellati, B. (1998d), Imitation and Mechanisms of Shared Attention: A Develop-


mental Structure for Building Social Skills, in ‘Agents in Interaction - Acquiring
Competence through Imitation: Papers from a Workshop at the Second Interna-
tional Conference on Autonomous Agents’.
Schaal, S. & Atkeson, C. G. (1993), Open loop Stable Control Strategies for Robot
Juggling, in ‘Proceedings 1993 IEEE International Conference on Robotics and
Automation’, Vol. 3, pp. 913–918.
Schneider, K., Zernicke, R. F., Schmidt, R. A. & Hart, T. J. (1989), ‘Changes in limb
dynamics during the practice of rapid arm movements’, Journal of Biomechanics
22(8–9), 805–817.
Sinha, P. (1996), Perceiving and recognizing three-dimensional forms, PhD thesis, Mas-
sachusetts Institute of Technology.
Stroop, J. (1935), ‘Studies of interference in serial verbal reactions’, Journal of Exper-
imental Psychology 18, 643–62.
Takanishi, A., Hirano, S. & Sato, K. (1998), Development of an anthropomorhpic Head-
Eye System for a Humanoid Robot, in ‘Proceedings of the 1998 IEEE Interna-
tional Conference on Robotics and Automation (ICRA-98)’, IEEE Press.
Thelen, E. & Smith, L. (1994), A Dynamic Systems Approach to the Development of
Cognition and Action, MIT Press, Cambridge, MA.
Trevarthen, C. (1979), Communication and cooperation in early infancy: a descrip-
tion of primary intersubjectivity, in M. Bullowa, ed., ‘Before Speech’, Cambridge
University Press, pp. 321–348.
Tronick, E., Als, H. & Adamson, L. (1979), Structure of early Face-to-Face Commu-
nicative Interactions, in M. Bullowa, ed., ‘Before Speech’, Cambridge University
Press, pp. 349–370.
Warren, C. A. & Karrer, R. (1984), ‘Movement-related potentials during development:
A replication and extension of relationships to age, motor control, mental status
and IQ’, International Journal of Neuroscience 1984, 81–96.
Wason, P. C. (1966), Reasoning, in B. M. Foss, ed., ‘New Horizons in Psychology’,
Vol. 1, Penguin Books, Harmondsworth, England, pp. 135–51.
Weiskrantz, L. (1986), Blindsight: A Case Study and Implications, Clarendon Press,
Oxford.
Wertheimer, M. (1961), ‘Psychomotor coordination of auditory and visual space at
birth’, Science 134, 1692.
Williamson, M. M. (1996), Postural Primitives: Interactive Behavior for a Humanoid
Robot Arm, in ‘Fourth International Conference on Simulation of Adaptive Be-
havior’, Cape Cod, Massachusetts, pp. 124–131.
Williamson, M. M. (1998a), Exploiting natural dynamics in robot control, in ‘Four-
teenth European Meeting on Cybernetics and Systems Research (EMCSR ’98)’,
Vienna, Austria.
Williamson, M. M. (1998b), Rhythmic robot control using oscillators, in ‘IROS ’98’.
Submitted.
Wood, D., Bruner, J. S. & Ross, G. (1976), ‘The role of tutoring in problem-solving’,
Journal of Child Psychology and Psychiatry 17, 89–100.
Yamato, J. (1998), Tracking moving object by stereo vision head with vergence for
humanoid robot, Master’s thesis, MIT.
Zajac, F. E. (1989), ‘Muscle and tendon:Properties, models, scaling, and application
to biomechanics and motor control’, CRC Critical Reviews of Biomedical Engi-
neering 17(4), 359–411.
Embodiment As Metaphor:
Metaphorizing-in the Environment1

Georgi Stojanov

Computer Science Institute, Faculty of Electrical Engineering


SS Cyril and Methodius University in Skopje, Republic of Macedonia
geos@cerera.etf.ukim.edu.mk

Abstract. The paper describes a general mechanism for internalization of


environment in autonomous agents. After reviewing the role of representation in
behavior-based autonomous agents, we propose metaphor framework that unifies
various research threads in the domain. We start from a variant of the so-called
similarity creating metaphors for the case of implicit target domain (the
environment of the agent). The mechanism is based on a fairly simple idea of
assimilation via inborn schemas as understood in Piaget’s developmental
psychology. These schemas represent the source domain for the metaphor. They are
ordered sequences of elementary actions that the agent is capable of performing.
Because of the environmental constraints, when the agent tries to execute some
schema, only certain subsequences from the original schema will actually be
performed. These subsequences are called enabled schema instances. Thus,
environment unfolds its structure to the agent via the subset of the enabled schema
instances. Another way to look at this is to say that what the agent gets is a
metaphorical description of its environmental niche (the implicit target domain) in
terms of instances of its inborn schemas (the source domain). After describing the
basic idea by means of an example, we present some simulation results that show
the plausibility of the model. The simulated agent solves the navigational problem in
an initially unknown environment. The paper closes with a discussion section where
we compare our model with some related works and make the case for the metaphor
framework as a proper unifier of diverse research work in embodied and situated
cognition.

1 Introduction

After the “behaviourist turn” [40] in the field of AI (e.g. [9] ) which may be regarded as a
reaction to the classical, so called explicit symbolic representations approaches, things
changed to another extreme. The need for representation was denied and the accent was
put on building reactive type systems that act in the real world. The usual argument was
that in a noisy and fast changing environment there was no time left for the agent to

1 The author wishes to express his gratitude to the Ministry of Culture and of Science of the
Republic of Macedonia for the awarded grants which helped the work described in this paper.

C. Nehaniv (Ed.): Computation for Metaphors, Analogy, and Agents, LNCS 1562, pp.88 -101, 1999.
 Springer-Verlag Berlin Heidelberg 1999
Embodiment As Metaphor: Metaphorizing-in the Environment 89

update constantly its internal model of the world and act accordingly, but it was better for
it simply to (re)act. However, it was soon realized within this behavior based (BB)
approach [21], that apart from simple behaviors, like obstacle avoidance, wall (light,
odour, or any gradient) following, wandering, or exploration (“insect type intelligence”), it
was impossible to introduce systematically and naturally any type of learning or
adaptation. This was a trivial consequence from the fact that there were no variables in
these “fixed-topology network of simple finite states machines” [10] to be changed (tuned,
learned, adapted). An obvious remedy was to introduce, well – some representations.
Indeed, in the decade that followed, various architectures appeared within the frame of BB
robotic systems, which introduced different types of representations. For a taxonomy of
these systems with respect to their treatment of representations see [37]. To our opinion,
one of the most important lessons learned from this episode in AI was that the
representations were to be contingent on the particular embodiment of the artifact. We can
mention here the works of Mataric [24, 25, 26, 27], Drescher [12], Indurkhya [18], Dean
et al. [11], Bickhard [2, 3, 4], and others. These types of representations were supposed to
avoid the problems of traditional representationalism (for an excellent critique of the
traditional approach see (Chalmers, French, and Hofstadter in [17]). However, what was
lacking, was some kind of general framework which would act as a common denominator
for the above mentioned research dispersed over many diverse domains.
In this paper we put forward the idea of looking at the act of learning useful
environment models as a process of getting its metaphorical description in terms of
agents’ internal structures (i.e. its particular embodiment) during the agent-environment
interaction. This process is considered to be a basic cognitive operation. It is similar to
what has been called similarity creating metaphors (SCM) in the metaphor research
literature (Black [5, 6], Hausman [14, 15, 16], Ricoeur [30, 31, 32, 33], Indurkhya [18]).
In our case we have a variant of SCM where the target domain is the environment itself,
and the source domain is the agent’s internal structures. An agent acts in the environment
exercising its set of basic behaviors, and trying to satisfy its needs (goals, or drives, which
can be treated as a possibility to exercise some kind of consumatory behavior). For
example, an agent may be hungry and the goal would be to find a place in the
environment where the food is located. In order to perform better than random search, it
should somehow use the history of its interactions. The agent cannot know anything about
the external environment beyond the effects it has produced on its internal structures, i.e.
the target domain is implicit. Here, we do not treat the cases where the environment is
explicitly given to the agent, for example, by means of some connectivity matrix among
perceptually different “places”. Rather, it should build its idiosyncratic cognitive map of
the environment. This map is metaphorical in the sense that it stands for the agent-
environment past interactions. How “successful” a metaphor is depends on what it is used
for, i.e. what the agent’s goal is (e.g. navigation). As noted in [18] most of the research on
metaphor in cognitive science has concentrated on similarity-based metaphors. This
thread was further pursued in computational models of metaphor (understanding or
generating): both the target and the source domain are given and the program (agent,
artifact) tries to compute the similarities and the most plausible mappings. Notable
90 Georgi Stojanov

exceptions are the works of Hofstadter and his group [17]. Their view is that the essence
of metaphor and analogy making is the very process of constructing the representation of
the situations in both domains, not finding the mapping among pregiven representations.
We describe agent’s internal structure using the notion of inborn schemas. In Section 2
we elaborate more this notion, and for the time being we can only say that it represents an
ordered sequence of elementary actions that the agent is capable of performing.
In the remainder of this section we will briefly expose the basic idea by means of an
example. Suppose we have an agent inhabiting some maze-like world as given in Figure
1. The agent’s basic action set consists of 3 actions: F(orward), L(eft), and R(ight). They
move the agent forward, left, or right relative to its current position and orientation. Apart
from the sensations from the proprioceptors, informing the agent about the moving
constraints at the place it occupies, various different sensory inputs (like visual, sonar,
chemical, etc.) may be included in S.

F
S FLR L
R

a) b)

Fig. 1. A maze-like world and the agent that inhabits it. See text for explanations.

In this example, the agent possesses only one inborn schema: FLR. Being in the
environment the agent spontaneously tries to exercise it. To simplify the matter, we
assume that the agent can occupy only certain places in the maze (marked with circles),
can have one out of four possible orientations (E, W, N, S), and moves in discrete time
instances. A successfully performed elementary action can displace the agent only to a
place neighboring its current position. For example, if the agent is at the lower left most
corner and facing north (see Figure 2) when trying to exercise the FLR schema, it will
only succeed to move forward, that is F__, and its next position will be as shown in Figure
2b.
Embodiment As Metaphor: Metaphorizing-in the Environment 91

a) b)

Fig. 2. Agent trying to exercise the FLR schema from its current position and orientation.

The environment, being as it is, will systematically impose constraints on the agent’s
behavior, favoring thus only particular instances of the initial schema. For example, being
in a corridor, the agent can only move forward, that is, use the F__ instance of the schema
(that is, the F__ behavior). Cruising through the maze for a while and depending on the
initial position and orientation, the following sets of instances of the initial schema will be
favored for obvious reasons: F__, F_R, and __R or F__, F_L, and _L_. So, as a result
from this interaction one of the following two basic environment conceptualizations (or
metaphorization) will emerge (Figure 3):

SF__ SF__

SF_R SFL_
F_R FL_
F__ F__
SF__ SF__

S__R SF__ S_L_ SF__

__R _L_

a) b)

Fig. 3. Two different conceptual structures that may result from agent-environment interaction.
“SXXX”s represent percepts enabling XXX behavior.

The F__ node represents “following the corridor” concept/behavior, while “turning left”
and “turning right” behaviors are represented with FL_ or _L_ and F_R or __R,
respectively. Note that in these metaphorizations all the corridors collapse in a single F__
node. This is true for the turns also. This is so because our agent does not have any
preferred “S”s that it should strive for. However, what this conceptual structure tells the
agent is that after following the corridor it must turn to the left or to the right and then
again to switch to the corridor following concept/behavior. Another important point is that
two identical percepts are interpreted in different ways, depending on what
concept/behavior (node) is currently active.
92 Georgi Stojanov

2 Schemas as Source Domains for Environment Metaphors

In the previous section we presented an example where the inner structure of the agent
was defined via its inborn schemas. Indeed, this approach seems to be very appealing, so
that one can say that the notion of schema is a leitmotif in psychology, AI, and cognitive
science. Speaking about the origins of the concept of schema as classically used in AI
(e.g. [23]), Arbib [1] points to the neurologist Henry Head and his body schema. Head
used the concept of body schema to explain the cases of patients with parietal lobe lesions
who were neglecting, for example half of their bodies. According to him, the lesion
destroys a part of the body schema and that part of the body is being neglected by the
patients, i.e. no context was provided to interpret the incoming sensory inputs from those
body parts. Perhaps, a more clear example of the schema notion is given by Sir Frederic
Bartlett, a student of Head. In his 1932 book “Remembering” he observes that people do
not remember things (events, situations) in a photographic manner. Rather, having heard
something, for example, and being asked to repeat it, they rarely use the exact words.
Some parts are emphasized and given more place and others just sketched or even
omitted. The point is that hearing and understanding something means projecting it on the
internal individual space of schemas; remembering then, is not a passive process but an
active reconstruction in terms of those schemas that were activated during the exposure to
the story (picture, movie...). Humans, as linguistically competent agents, are constantly
producing novel schemas in terms of stories, or narratives, thus enriching the source
domain for constant metaphorization of their new experiences.
So far in our theory, we are concerned only with agents without linguistic competence.
Most closely related to our understanding of the notion of schema is Piaget’s schema as
used in his theory of mental development [28]. Initially, according to the theory, the infant
has no concept of object permanence and this concept is constructed by internalizing the
various appearances of an object through interactions. Interactions are performed by
exercising the set of schemas (or schemata) the child is born with. A schema is an
organized sequence of behavior (e.g. sucking, grasping). According to Piaget, the very
existence of a schema in a child’s repertoire of action itself creates a motivation for its
use. That is, the motivation is intrinsic in the schema. The child tries to “understand” the
object by incorporating it in some existing schema: the act (the schema) of sucking may
be provoked with whatever object is placed in the mouth. This process is called
assimilation. Depending on the result of such an exercise (that is, the consequence) and
mental growth, initial schemas may change, and this process is called accommodation. An
example is the reaching-and-grasping-objects schema [34]: initially it consists of a fairly
crude “swipe and grab” in the general direction of an attractive object. As the baby grows
the schema becomes more refined and is adapted to the object’s position and size. It
begins to accommodate to the object [29].
What is important for us is that the internal representations of the environment are
inherently contingent on the agent’s structure, that is its specific embodiment. The
“reality” is re-presented via the modifications of its schemas. These modifications
metaphorically stand for its past experiences.
Embodiment As Metaphor: Metaphorizing-in the Environment 93

We ended the previous section with an example illustrating the use of the schema
notion. Our agent metaphorized-in its environment after the history of interactions with it.
In the next section we show how it can use the concept/behavior structure that emerged, in
order to achieve some goals – i.e. how the agent can make these metaphorical descriptions
of its environment useful with respect to a given goal.

3 Using Metaphors in Achieving Goals

These metaphorical concept/behavior structures that emerge during agent-environment


interaction are now the basis for building some useful metaphorizations or models of the
environment. Models can be useful only with respect to some goal.

C
F F
S
F_R L
R
R F LR

a)
b)

Fig. 4. a) Agent in a maze with an object provoking desirable sd in it. b) internal structure of the
agent with a possible conceptualization of the environment (see Figure 3a) for details). The “C”
node represents the consumatory behavior which may be provoked by sd.

If we now put something in the maze that provokes some desirable sd in our agent, we will
create a goal in it. If we put the agent somewhere in the maze it will try to find the desired
thing, that is, to achieve the goal. Let us call that something food and place it in the upper
right part of the maze (see Figure 4a). In order to appreciate food the agent has to be able
to exhibit appropriate behavior. Let us call it consumatory behavior and represent it with a
schema within the agent as in Figure 4b.
If we assume that the Figure 3a conceptualization took place, the agent will bump onto the
food while performing the F__ behavior. It will “think” then, that in order to get to the sd it
will suffice to do F__. This means that in the conceptual network a link will be built from
F__ to the C node (Figure 5). However for the agent in the lowermost corridor or in one of
the three small corridors this will not do. If, for instance it is in the position shown in
Figure 3.4a it may reach the goal by performing F__-(F)_R-F__-C.
94 Georgi Stojanov

SF__

sd SF_R
C F_R
F__
SF__

S __R SF__

__R

Fig. 5. After bumping onto the goal this conceptual structure is built...
S F__\S F__’

SF_R
sd F_R
C F__
SF__

S__R S F__ SF__’ S F_R

SF__’
__R F__’
S__R

SF__’

Fig. 6. ... But there are “F__”s not leading to the goal while performing F__.

That is, there is an instance of F__ behavior where the sd percept cannot be observed.
These actually are the percepts from the SF set that do not occur during the execution of
F__ that lead to sd. This distinction leads to a creation of another instance of F__, named
F__’, containing those percepts, linked with the right F__ node via the (F)_R nodes
(Figure 6).

a)

Fig. 7. A situation where the conceptual structure form Figs. 3-6 does not help.
Embodiment As Metaphor: Metaphorizing-in the Environment 95

SF__\SF__’

SF_R
sd F_R
C F__
SF__

S__R SF__ SF__’ SF_R

SF__’ SF_R’
F_R’
__R F__’
S__R SF__’
SF_’\SF_’’

S__R’ SF__’ SF__’’ SF_R’

SF__’’
__R’ F__’’
S__R’

SF__’’

Fig. 8. The “correct” conceptual structure that always leads to the goal. A\B denotes percept set
difference.

According to its observations the agent assumes it is in F__’. But performing the F__’-
(F)_R-F__ sequence does not bring it to the food. Again, this expectation failure will lead
to further differentiation among the F. Introduction of the new nodes leads to the
conceptual structure shown in Figure 8.
How does the agent use this map to get to the food? Whenever performing F it
observes the percepts and locates itself in the F__ or in F__’ node. If in F__ it will
eventually perceive sd . Being in the F__’, however, it should make a turn and then
continue with F__. In doing so it marks positively the percepts from SF_’, S__R, and SF_R sets
that occurred in a successful trial that began from F__’. This is because this procedure
will not work if the agent starts from a position like the one shown in Figure 7.
Above, we used an example which showed an autonomous agent solving the
navigation problem. However, there are no assumptions regarding the interpretation of the
concept/behavior structures. In this context their natural interpretation is that of “places”
or “landmarks” in the world. Most generally they are “objects” in the agent Umwelt.
These objects afford certain manipulations with them. Agents learn these affordances via
the contingencies represented in the conceptual graph. Actually the name
concept/behavior is chosen to suggest this generality. We see that the introduction of goals
imposes additional ordering and refinement of the concept/behaviors that represent the
metaphorical description of the environment. This is a natural incorporation of the
pragmatic constraints in metaphor generation.
In [35, 36] we proposed an algebraic formulation of the above informally presented
procedure which was partially inspired by [18]. We also proposed learning algorithms and
in the next subsection we present simulation results in the case of more realistic
environments.
96 Georgi Stojanov

3.1 Simulation Results

In these experiments, the simulated agent had a body and retina (Figure 9a), and was
capable of performing four elementary motor actions: go forward, go backward, go left,
and go right. These actions displace the agent for a fixed step, relative to its current
position in a two-dimensional environment (Figure 10a). The environment is populated by
obstacles and there is only one place where the food (goal) is to be found. Percepts
represented semicircle scans in front of the agent in 10 different directions returning the
distance to obstacles in the respective directions (Figure 9b).

distance to obstacle

body

retina

direction

a) b)

Fig. 9. a) Agent’s body and retina; b) one percept.

Thus, given the sensory readings in a particular direction it is possible to decide whether
the next action from the schema which is to be performed, is enabled or not. These
percepts are complemented with the outputs of two binary valued sensors for food (goal)
and bump detection. Food is detected if it falls within the semicircle in front of the agent.
In these particular experimental runs, we used agents having only one inborn schema with
length of 20 to 30 elementary actions (e.g. fffrrllffrrbllrffffllffff). So, the source domain
n
contained 2 (where n is the length of the inborn schema) potential enabled schemas.
Learning algorithm used was very simple:

• try to execute the inborn schema;


• store the actually executed subschema (the enabled schema) complemented with the
full percepts at every step (i.e. concept/behaviors);
• establish and store the link to the previous enabled schema;
• if food is detected, than propagate backwards this information as a number which
increases at every next enabled schema in the chain, reflecting the distance to the food;
• go to the first step;

Essentially, the agent stores triplets of the form:


Embodiment As Metaphor: Metaphorizing-in the Environment 97

link(enabled_schema+perceps_1, enabled_schema+percepts_2, distance_to_food)


which implicitly define an oriented graph whose nodes are these enabled schemas. So,
whenever hunger drive is activated, the agent tries to execute its inborn schema, sees what
is the resulting enabled schema, locate thus itself in the environment, and follows the next
node which has minimum distance_to_food to traverse. For environments (i.e. implicit
target domains) “similar” to the one given in Figure 10a metaphorization contained 150-
200 concept/behavior nodes. During its sojourn in this environment the agent builds the
concept/behavioral network. The hunger drive (that is the urge to locate the food) is
activated periodically. As we can see in Figure 10b) the average number of steps
(elementary actions), expressed in terms of number of executed subschemas, decreases as
the agent builds a more detailed metaphor network with respect to the goal position.
Although the main goal of this paper is to show the principled plausibility of this
metaphorizing-in the environment approach, we would like to give some brief comments
on the practical problems. The relation among the structure of the initial schema (its
length and the relative ordering of the elementary actions), the number and nature of the
elementary actions, the structure of the environment, and the number of the
concept/behaviors is a very complex one. If we would like to make some analytical
predictions about the number of concept behaviors (and their usefulness in the
metaphorical description) that emerge in the interaction between given agent and
environment, we should somehow provide some formal description of the environment
structure. This may be some connectivity graph supplemented with metric information,

45
schema instances executed

40
35
to find the food

30
25
obstacles 20
15
Goal 10
place 5
0
15

29

43

57
71

85

99
1

n-th activation of the hunger drive


a) b)

Fig. 10. a) The environment of the simulated agent; b) learning curve: the average number of
steps to the goal decreases each time the hunger drive is activated.
98 Georgi Stojanov

as well as with some features discernable by the agent’s perceptual apparatus. We have
done this [39] for the case of simple simulated environments but the procedure is not
applicable for complicated, real-world ones. Another issue we did not explicitly address in
the paper is the choice of inborn schemas. For the time being we are working on applying
genetic algorithms to solve this problem, i.e. to evolve “optimal” inborn schemas for
given agent, environmental niche, and goals. Although we have been doing simulations
only so far, we are quite optimistic regarding the scalability of the methods here proposed,
given the positive examples of relatively simpler real-world learning agents (e.g. [24],
[42]).
We conclude this section by explicating and justifying the use of the class of
similarity-creating metaphors to describe our agent’s architecture and operation. In the
process of internalizing the environment, the agent tries to describe metaphorically its
environment in terms of its internal structure by creating similarities between the
description and the environment. These similarities are, of course, similarities perceived
from the agent’s point of view. For example, having inhabited some environment for a
while and then being put in a different one, the only measure of similarity from the
agent’s perspective would be how good the old metaphor is in locating the food in the new
environment. We presented only one simple learning algorithm. There are many other
ways of introducing some other ordering among the enabled schemas, which would reflect
yet other more subtle “similarities” between the source and the implicit target domain.
(For example, while performing the elementary actions the agent can be treated as
traversing some finite state automaton. Repeating a sequence of elementary actions would
lead the agent to enter a cycle; thus, we could group the enabled schemas according to the
cycles they participate in, and use this grouping as a basis for building useful environment
models).

4 Discussion and Concluding Remarks

The work described here originated in our research of the problem of environment
representations in artificial and biological agents [37, 38, 39, 7, 41]. Among the main
results was the concept of environment representations via the process of metaphorizing-
it-in in terms of agent’s inner structure (i.e. agent’s particular embodiment). Various
research threads scattered across diverse areas such as embodied and situated cognition,
agency in AI, metaphor in language, and the like can easily fit this metaphorizing-in the
environment framework. The work of Tani (e.g. [42]) comes closest to the spirit of our
approach. The internal structure of its agent is represented via a Recursive Neural Net
(RNN). The structure of the RNN represents, of course, the source domain. Mataric (e.g.
[27]) proposes biologically (rat hippocampus) inspired internal structure. Drescher’s agent
[12] uses rather symbolic schema structures inspired by Piaget’s theory.
From the purely theoretical research we can mention the work of Indurkhya [18]
where he gives a rather detailed algebraic model of metaphorical reasoning, and the work
Embodiment As Metaphor: Metaphorizing-in the Environment 99

of Bickhard [4] regarding interactivism as a better philosophical stand for AI than


representationalism.
Our work can be seen as an implementation of some of the results of the research in
metaphor in natural languages: the mechanism we propose puts some flesh on the
theoretical notion of image schemata [19, 20]; for example, the sets of enabled schema
instances can be seen as basic level categories upon which more elaborated environment
models are founded (like natural language in linguistically competent agents (see [35] for
a thorough treatment of this subject)); concept/behavior networks can be regarded as
blends [13, 43] incorporating structural constraints originating in the target and the source,
and the pragmatic constraints originating in the type of goal the agent is to achieve.
Actually we think that this is just a beginning of the application of insights from the vast
body of metaphor research in the domain of agency research.

References

1. Arbib, M. A.: In Search of the Person, The University of Massachusetts Press (1985).
2. Bickhard, M. H.: Cognition, Convention, and Communication, Praeger Publishers
(1980).
3. Bickhard, M. H.: “Representational Content in Humans and Machines”, Journal of
Theoretical and Experimental Artificial Intelligence, 5 (1993a).
4. Bickhard, M. H.: “On Why Constructivism Does Not Yield Relativism”, Journal of
Theoretical and Experimental Artificial Intelligence, 5 (1993b).
5. Black, M.: “Metaphor” in M. Black Models and Metaphors, Cornell University Press,
Ithaca, NY; originally published in Proceedings of the Aristotelian Society, N.S. 55,
1954-55; Reprinted in M. Johnson (ed.) Philosophical Perspectives on Metaphor,
University of Minnesota Press, Minneapolis, Minn. (1981).
6. Black, M.: “More about Metaphor”, in A. Ortony (ed.) Metaphor and Thought,
Cambridge University Press, UK (1979).
7. Bozinovski, S., Stojanov, G., Bozinovska, L.: “Emotion, Embodiment, and
Consequence Driven Systems”, AAAI Fall Symposium, TR FS-96-02, Boston (1996).
8. Bozinovski, S., Consequence Driven Systems, GOCMAR Publishers, Athol (1995).
9. Brooks, R. A., “A Robust Layered Control System for a Mobile Robot”, IEEE Journal
of Robotics and Automation, RA-2, April (1986).
10. Brooks, R. A.: “Intelligence Without Representation”, Artificial Intelligence, No. 47
(1991).
11. Dean, T., Angluin, D., Basye, K., Kaelbling, L. P.: “Uncertainty in Graph-Based Map
Learning”, in J. Connell and S. Mahadevan (eds.) Robot Learning (1992).
12. Drescher, G.: Made-Up Minds, MIT Press (1991).
13. Fauconnier, G., Turner, M.: “Conceptual Projection and Middle Spaces”, UCSD
Cognitive Sciences Technikal Report 9401, San Diego (1994).
14. Hausman, C. R.: “Metaphors, Referents, and Individuality”, Journal of Aesthetics and
Art Criticism, Vol. 42 (1983).
100 Georgi Stojanov

15. Hausman, C. R.: A Discourse on Novelty and Creation, SUNY Press, Albany, NY
(1984).
16. Hausman, C. R.: Metaphor and Art: Interactionism and Reference in Verbal and
Nonverbal Art, Cambridge University Press, Cambridge, UK (1989).
17. Hofstadter, D. R., and the Fluid Analogies Research Group: Fluid Concepts and
Creative Analogies, BasicBooks, (1995).
18. Indurkhya, B.: Metaphor and Cognition, An Interactionist Approach, Kluwer
Academic Publishers, Boston (1992).
19. Johnson, M.: The Body in the Mind, Chicago University Press, Chicago (1987).
20. Lakoff, G.: Women, Fire, and Dangerous Things, The University of Chicago Press
(1987).
21. Maes, P.: Designing Autonomous Agents: Theory and Practice from Biology to
Engineering and Back, MIT Press, Cambridge (1991).
22. Mayer, R. E.: Thinking, Problem Solving, Cognition, W.H. Freeman and Company,
New York (1992).
23. Minsky, M.: "A Framework for Representing Knowledge", in A. Collins and E. Smith
(eds.) Readings in Cognitive Science, Morgan Kaufmann Publishers (1988).
24. Mataric, M.: “Navigating With a Rat Brain: A Neurobiologically-Inspired Model for
Robot Spatial Representation”, in J. A. Meyer & S. Wilson, eds. From Animals to
Animats, International Conference on Simulation of Adaptive Behavior, The MIT Press
(1990).
25. Mataric, M.: "Integration of Representation Into Goal-Driven Behavior-Based
Robots", in IEEE Transactions on Robotics and Automation, Vol. 8, No. 3 (1992).
26. Mataric, M.: “Integration of Representation Into Goal-Driven Behaviour-Based
Robots”, IEEE Transactions on Robotics and Automation, Vol. 8, No.3, (1992).
27. Mataric, M.: “Navigating With a Rat Brain: A Neurobiologically-Inspired Model for
Robot Spatial Representation”, in J. A. Meyer & S. Wilson, eds. From Animals to
Animats, International Conference on Simulation of Adaptive Behaviour, The MIT
Press, (1990).
28. Piaget, J.: Genetic Epistemology, Columbia, New York (1970).
29. Piaget, J.: Inhelder, B.: Intellectual Development of Children, (in Serbo-Croatian)
Zavod za udjbenike i nastavna sredstva, Beograd (1978).
30. Ricoeur, P.: Interpretation Theory: Discourse and the Surplus of Meaning, The Texas
Christian University Press, Fort Worth, Tex., (1976).
31. Ricoeur, P.: The Rule of Metaphor, University of Toronto Press, Toronto, Canada,
(1977).
32. Ricoeur, P.: “The Metaphorical Process as Cognition, Imagination, and Feeling”,
Critical Inquiry 5, No. 1, 1978; Reprinted in M. Johnson (ed.) Philosophical
Perspectives on Metaphor, University of Minnesota Pres, Minneapolis, Minn., (1981).
33. Ricoeur, P. “Imagination et Metaphore”, Psychologie Medicale, Vol. 14, No. 12,
(1982).
34. Roth, I. (ed.): Introduction to Psychology, Vol. 1., LPA and The Open University,
London (1991).
Embodiment As Metaphor: Metaphorizing-in the Environment 101

35. Stojanov, G.: Expectancy Theory and Interpretation of EXG curves in the Context of
Biological and Machine Intelligence, PhD Thesis, ETF, Skopje (1997a).
36. Stojanov, G., Bozinovski, S., Trajkovski, G.:" Interactionist Expectative View on
Agency and Learning", IMACS Journal of Mathematics and Computers in Simulation
, North-Holland, N. 44 (1997b) 295-310.
37. Stojanov, G., Trajkovski, G., Bozinovski, S.: “The Status of Representation in
Behaviour Based Robotic Systems: The Problem and A Solution”, IEEE Conference
Systems, Man, and Cybernetics, Orlando (1997c).
38. Stojanov, G., Trajkovski, G., Bozinovski, S.: "Representation versus context: A false
dichotomy", 2nd ECCS Workshop on Context, Manchester (1997d).
39. Stojanov, G., Trajkovski, G.: “Spatial Representations for Mobile Robots: Detection
of Learnable and Unlearnable Environments”, Proceedings of the First Congress of
Mathematicians and Computer Scientists in Macedonia, Ohrid, Macedonia (1996).
40. Stojanov, G., Bozinovski, S., Simovska, V.: "AI (Re)discovers behaviorism and other
analogies", presented at the 3. Int. Congress on Behaviorism and Sciences of Behavior,
Yokohama (1996).
41. Stojanov, G., Stefanovski, S., Bozinovski, S.: “Expectancy Based Emergent
Environment Models for Autonomous Agents”, 5th International Symposium on
Automatic Control and Computer Science, Iasi, Romania (1995).
42. Tani, J.: “Model-Based Learning for Mobile Robot Navigation from Dynamical
System Perspective”, IEEE Transactions on Systems, Man, and Cybernetics 26(3)
(1996).
43. Turner, M.: “Conceptual Blending and Counterfactual Argument in the Siocial and
Behavioral Sciences”, in P. Tetlock and A. Belkin (eds.), Counterfactual Thought
Experiments in World Politics, Princeton University Press, Princeton (1996).
Embodiment and Interaction in Socially
Intelligent Life-Like Agents

Kerstin Dautenhahn

Department of Cybernetics
University of Reading, United Kingdom

Abstract. This chapter addresses embodied social interaction in life-


like agents. Embodiment is discussed from both artificial intelligence and
psychology viewpoints. Different degrees of embodiment in biological,
virtual and robotic agents are discussed, given the example of a bottom-
up, behavior-oriented, dynamic control of virtual robots. A ‘dancing with
strangers’ experiment shows how the same principles can be applied to
physical robot-human interaction. We then discuss the issue of sociality
which differs in different academic communities with respect to which
roles are attributed to genes, memes, and the individual embodied agent.
We attempt to define social intelligence and integrate different viewpoints
in a hierarchy of social organization and control which could be applied
to both artificial and natural social systems. The project AURORA for
children with autism which addresses issues of both human and robotic
social agents is introduced. The conclusion points out challenges in re-
search on embodied socially intelligent life-like agents.

1 Introduction and Definitions

The discussions in this chapter on embodiment and sociality originate in the


author’s work on social agents, in particular autonomous mobile robots. This
work is based on the following working hypotheses:

1. Life and intelligence only develops inside a body,


2. which is adapted to the environment which the agent is living in.
3. Intelligence can only be studied with a complete system, embedded and cou-
pled to its environment.
4. Intelligence is linked to a social context. All intelligent agents are social
beings.

These hypothesis have been investigated by studying interactions between


mobile robots and between humans and mobile robots ([9,24,74,28,10,11]). The
issue of robot-environmentco-adaptation is addressed e.g. in [24], describing ex-
periments of a robot balancing on a seesaw. A specific environment, an artificial
ecosystem, namely a hilly landscape (first proposed by the author in [20]) has
been developed and studied in a number of experiments. A specific helping sce-
nario is described in [24]. Imitation as a cooperative behavior which enhances the

C. Nehaniv (Ed.): Computation for Metaphors, Analogy, and Agents, LNCS 1562, pp. 102–141, 1999.

c Springer-Verlag Berlin Heidelberg 1999
Embodiment and Interaction in Socially Intelligent Life-Like Agents 103

survival of a group of mobile robots is documented in [74]. An imitative ‘social


bonding’ mechanism has been used for the study of grounding of communication
(robot-robot and robot-human) and is investigated in a number of publications
of Aude Billard and the author.
We hereby characterize social robotics as follows:

1. Agents are embodied.


2. Agents are individuals, part of a heterogeneous group (the members are not
identical but have individual features, like different sensors, different shapes
and mechanics, etc).
3. Agents can recognize and interact with each other and engage in social in-
teractions as a prerequisite to developing social relationships.
4. Agents have ‘histories’; they perceive and interpret the world in terms of
their own experiences.
5. Agents can explicitly communicate with each other. Communication is
grounded in imitation and interactions between agents, meaning is trans-
ferred between two agents by sharing the same context.
6. The individual agent contributes to the dynamics of the whole group (soci-
ety) as well as the society contributing to the individual.

Above we use the term ‘agent’ in order to account for different embodiments
of agents, and also allow the discussion of biological agents and software agents.
The issue of autonomy plays an important part in agent discussions. In [27]
the author defines autonomous agents as entities inhabiting a world, being able
to react and interact with the environment they are located in and with other
agents of the same and different kind (a variation of Franklin and Graesser’s
definition ([36]).
This chapter is divided as follows: section 2 discusses the general issue of
knowledge and memory in human society (section 2.1), and the specific issue of
autobiographic agents (section 2.2). Section 3 discusses embodiment in physical
(robotic) agents (section 3.1) and virtual agents (section 3.2). The latter section
shows a concrete example of behavior-oriented control which the author has
used in her work. The same programming approach, applied to an experiment
on robot-human interaction is presented in section 3.3. Section 4 discusses the
issue of social agents in more detail, relating it to sociobiology and evolution-
ary considerations on the origin of social behavior (section 4.1). Social software
agents are discussed in section 4.2. Such issues lead to an attempt to define
(artificial) social intelligence from the perspective of an individual (section 4.3),
as well as from the perspective of social organization and control (section 4.4).
Section 5 discusses a research project which studies how an interactive robot can
be used as a remedial tool for children with autism. In section 6 we come back to
the starting point of our investigations, namely how embodiment and meaning
apply to agent research.
104 Kerstin Dautenhahn

2 Histories and Autobiographic Agents

2.1 Knowledge and Memory

Primate societies can be said to exhibit the most complex social relationships
which can be found in the animal world. The social position of an individual
within a primate society is neither innate nor strictly limited to a critical im-
printing period. Especially in human 20th-century societies social structures are
in an ongoing process of re-structuring. In a way one could say that the tendency
of making our non-social environment more predictable and reliable by means
of technological and cultural re-structuring and control has been accompanied
by the tendency that our social life is becoming more and more complex and
unpredictable, often due to the same technologies (e.g. electronic power helps to
keep us warm and save during winter while at the same time means of social
inter-networking could give rise to sociological and psychological changes of our
conception of personality and social relationships [88]).
Such degrees of complexity of social behavior of single humans as well as the
complexity of societies which emerge from interactions of groups of individuals
depend on having a good memory. Both a memory as part of the individual,
as well as a shared or ‘cultural memory’ for societies. Traditionally such is-
sues have not been considered in Artificial Intelligence (AI) or Artificial Life
(Alife) research. In the former the issue of discussion was less about memory
and more about knowledge. Memory (‘the hardware part’) was mostly regarded
less a problem than knowledge (the ‘software part’, representations, algorithms).
The idea to extract knowledge from human experts and make it operational in
computer programs led to the development of professions like knowledge engi-
neer and products like (expert- or) knowledge-based systems. The knowledge
debate can best be exemplified by the Cyc-endeavour ([52]) which for more than
one decade has been trying to ‘computationalize’ common-sense knowledge. The
idea here is not to extract knowledge from single human beings but to trans-
fer encyclopedic (cultural) knowledge to a computer. In the recently emerging
internet-age the knowledge-debate has regained attention through technological
developments trying to cope with ‘community knowledge’.
In Alife research the distinction between hardware and software level is less
clearly drawn. Evolutionary mechanisms are investigated both on the hardware,
as well as on the software side (see evolutionary robotics [41] and evolvable
hardware [55]). These conceptions are closer to biology, where the ‘computa-
tional units’, e.g neurons, are living, dynamic systems themselves, so that the
distinction between hardware and software is not useful. In the case of evolv-
ing software-agents the distinction becomes less clear. Nevertheless the question
when and whether to call software agents ‘life-like’ (if not to say ‘living’) is still
open.
A main research issue in Alife concerns the question how ‘intelligence’ and
‘cognition’ in artifacts can be defined and achieved. The question of how best
to approach cognitive or ‘intelligent’ behavior is still open. Here we find a broad
area of intersection between AI and Alife. The main difference in the ‘artificial life
Embodiment and Interaction in Socially Intelligent Life-Like Agents 105

roots of artificial intelligence’ ([80]) is the bottom-up approach, namely to ground


cognition in evolutionarily ‘older’ levels.1 A second main difference which is em-
phasized by that part of the Alife community which is working with hardware
systems (robots) is the concept of ‘embodiment’ (see section 3). In [13] Rodney
Brooks strongly argues against traditional AI techniques towards intelligence
and especially against the philosophy of ‘representation’. The behavior-oriented
robotics research area which has been mainly founded upon the conceptions de-
veloped in Rodney Brooks’ paper has therefore focused on reactive-behavior,
without an explicit memory functionality. As an alternative to the knowledge-
oriented AI systems, (reactive-) behavior-oriented Alife systems have been devel-
oped on the path towards the construction of intelligent systems. But in the same
way as AI knowledge-based systems could only perform well in a limited domain
or context (without ever becoming flexible, robust, general-purpose, i.e. human-
like, intelligent systems), current Alife systems have not yet crossed the border
towards autonomously surviving (life-like) creatures. From the current point of
view, Alife robots can do things AI robots could not, and vice versa.
No matter if the relationship between AI and Alife might result in competi-
tion or synergy, from all we discussed so far we think that the aspect of memory
which is intensively discussed in hundreds of publications in cognitive science
and psychology, should merit to be revisited in order to overcome the current
behaviorist level (see [87]) in Alife robotic research.
Traditional computationalist approaches in computer science to memory are
strongly influenced by the data-base metaphor (using the storage-and-retrieval
concept). Even in cognitive science and those parts of artificial intelligence which
are aiming at modelling human cognition, the idea of a memory ‘module’ which
contains representations of concepts, words, etc. has been most influential and
has led to intensive work on the best way of encoding and manipulating these
(propositional or procedural) representations. The idea for memory that there
is some ‘entity’ (concept or pattern of neural activity) which has (within a cer-
tain range of precision) to be reproduced in the same ‘state’ as it was when it
has been stored is characteristic for these computational approaches to mem-
ory. Recent discussions in cognitive and neuropsychology outline potential alter-
natives, proposing dynamic, constructive and self-referential remembering pro-
cesses. Rosenfield ([72]) presented an approach to memory on the basis of clinical
case studies. Rosenfield’s main statements which are relevant for this paper are:
(1) There is no memory but the process of remembering. (2) Memories do not
consist of static items which are stored and retrieved but they result out of a
construction process. (3) The body is the point of reference for all remembering
events. (4) Body, time and the concept of ‘self’ are strongly interrelated. A similar
interpretation of human memory had already been published six decades earlier
by Bartlett ([4]) who favored using the term remembering instead of memory
(see [22] for further discussions on a dynamic memory approach.)
1
We use the term ‘older’ instead of lower since the latter would imply ‘easier’, what
they are definitely not. Especially these system levels, like robust navigation, ‘sur-
viving’, etc. are often the harder engineering problems.
106 Kerstin Dautenhahn

2.2 Autobiographic Agents

A dynamic account of human memory suggests that humans seem to integrate


and interpret new experiences on the basis of previous ones, e.g. see [4]. Previous
experiences are reconstructed with the actual body and concrete context as the
point of reference. In this way past and presence are closely coupled. Humans
give explanations for their behavior on the basis of a story, a dynamically up-
dated and rewritten script, their autobiography. Believability of this story (to
both oneself and others) seems to be more crucial than ‘consistency’ or ‘cor-
rectness’. In order to account for this autobiographic aspect of the individual
I defined the concept of an autobiographic agent as an embodied agent which
dynamically reconstructs its individual ‘history’ (autobiography) during its life-
time [22]. Humans interpret interactions with reference to their ‘history’ and
bodily grounding in the world. A framework of a ‘historical’ account of Alife
systems has been developed together with Chrystopher Nehaniv, see e.g. [29,64].
The behavior and appearance of any biological agent can only be understood
with reference to its history. The skeletal elements of a bat’s wing, a dolphin’s
flipper, a cat’s leg and a human’s arm are homologous according to the basic
body plan of all mammals. Thus, discovering the evolutionary history furthers
understanding of the morphology and behavior of extant species. Part of the
history becomes sometimes visible in the ontogeny of an individual, e.g. the gill
pouches and the postanal tail of a 4-week-old human embryo are characteris-
tics of all vertebrate embryos. Thus, history comprises the evolutionary aspect
(phylogeny) as well as the developmental aspect (ontogeny) and the individual’s
experiences during its lifetime (see [43]). Applying the historical view to social
behavior means that an agent can only be understood when interpreted in its
context, considering past, present and future situations. This is particularly im-
portant for life-long learning human agents who are continuously learning about
themselves and their environment and are able to modify their goals and mo-
tivations. Using the notion of ‘story’ we might say that humans are constantly
telling and re-telling stories about themselves and others (see [95]). Humans are
autobiographic agents.
I suggested in [25] that social understanding depends on processes inside
an embodied system, namely based on empathy as an experiential, bodily phe-
nomenon of internal dynamics, and on a second process, the biographic re-
construction which enables the empathizing agent to relate a concrete commu-
nication situation to a complex biographical ‘story’ which helps it to interpret
and understand social interactions. Agents can be made more believable when
put into an ‘historical’ (story) context. But historical grounding of agents can
make them not only appear life-like, it can be a step towards embodied, social
understanding in artifacts themselves. Imagine:

Once upon a time, in the not so far future, robots and humans enjoy
spending their tea breaks together, sitting on the grass outside the office,
gossiping about the latest generation of intelligent coffee machines which
nobody cares for, debating on whether ‘loosing one’s head’ is a suitable
Embodiment and Interaction in Socially Intelligent Life-Like Agents 107

judgement on a robot which fell in love with another robot not of his
own kind, and telling each other stories about their lives and living in a
multi-species society.

Bodily interaction with the real world is the easiest way to learn about the
world, because it directly provides meaning, context, the ‘right’ perspective, and
sensory feedback. Moreover, it gives information about the believability of the
world and the position of the agent within the world. The next section discusses
issues of embodiment and meaning in different environments.

3 Studying Embodiment and Meaning


3.1 Embodiment in Physical Robots: Social Robotics
Since the advantage of cooperative behavior in animals is quite obvious much re-
search has already been invested within the Alife and behavior-oriented robotics
community in the study of robot group behavior. In some cases there has been a
fruitful symbiosis between biologists and engineers. We would like to give a few
examples.
For a few years activities have been under way to model multi-robot behav-
ior in terms of social-insect sociology. Some results in this area are presented
in [32,86,50,61]. Social-insect societies have long been studied by biologists so
that much data is available on their social organization. Moreover, they serve
well as good models for robot group behavior since, e.g. they show efficient
strategies of division of labour and collective behavior on the basis of local
communication and interaction between relatively simple and interchangeable
(‘robot-like’) units. Recent results on the organization of work in social insect
colonies are described in [40]. Especially in cases where large groups of robots
should be designed and controlled efficiently in order to build up and maintain
complex global structures, the biological metaphor of social-insect anonymous
societies (see section 4.4) seems to be promising.
Many studies into robot group behavior are done within the field of behavior-
oriented robotics and artificial life, focusing on how complex patterns of ‘social
behavior’ can emerge from local interaction rules in a group of homogeneous
robots. Such work is interesting in applications where robust collaborative be-
havior is required and where specific skills or ‘intelligence’ of single robust is not
required (e.g. floor-cleaning robots). The term ‘collective behavior’ is used for
such a distributed form of intelligence, social insect societies (e.g. ants, bees, ter-
mites) have been used as biological models. Deneubourg and his colleagues ([32]
give an impressive example where a group of robots ant-like robots collectively
‘solves’ a sorting task. Their work is based on a model of how ants behave, using
the principle of ‘stigmergy’ which is defined as “The production of a certain
behavior in agents as a consequence of the effects produced in the local environ-
ment by previous behavior”([6]). Mataric ([57]) gives an overview on designing
collective, autonomous (robotic) agents. Principles of collective behavior are usu-
ally applied to a group of homogeneous robots which do not recognize or treat
108 Kerstin Dautenhahn

each other individually, i.e. they do not use any representations of other agents
or explicit communication. In contrast, the term ‘cooperation’ describes a form
of interaction which usually uses some form of more advanced communication.
“Specifically, any cooperative behaviors that require negotiation between agents
depend on directed communication in order to assign particular tasks” [57]. Dif-
ferent ‘roles’ between agents are for instance studied in [48], a flocking behavior
where one robot is the leader, but the role of the ‘leader’ is only temporally as-
signed and depends on local information only. Moreover there is only one fairly
simple ‘task’ (staying together) which does not change.
Behavior based research on the principle of stigmergy is not using explicit
representations of goals, the dynamics of group behavior are emergent and self-
organizing. The results of such behavior can be astonishing (e.g. see building
activities or feeding behavior of social insects), but is different from highly com-
plex forms of social organization and cooperation which we find e.g. in mammal
societies (see hunting behavior of wolves or organization of human society), em-
ploying division of labour, individual ‘roles’ and tasks allocated to specific indi-
viduals, and as such based on hierarchical organization. Hierarchies in mammal
societies can be either fairly rigid or flexible, adapted to specific needs and chang-
ing environmental conditions. The basis of an individualized society is particular
relationships and explicit communication between individuals.
Another example of fruitful scientific collaboration between biological and
engineering disciplines is the ecological approach towards the study of self-
sufficiency and cooperation between a few robotic agents which has been inten-
sively studied by David McFarland and Luc Steels. The theoretical background
and experimental results are described in [60,81,83]. The biological framework
is based on concepts and mechanisms within a sociobiological background and
rooted in economics and game theoretical evolutionary dynamics. Thus, central
concepts in the design of the ecosystem, the robots, and the control programs
which implement the behavior of the robotic agents are self-sufficiency and util-
ity (see [59] for a comprehensive treatment of this framework). A self-sufficient
robot must maintain itself in a viable state for longer periods of time, so that it
must be able to keep track of its energy consumption and recharge itself. This
can be seen as the basic ‘selfish’ need of a robot agent in order to guarantee
its ‘survival’. In the scenario developed by McFarland and Steels this level is
connected to cooperative behavior in the sense that viability can only be en-
sured by cooperation (note that here the term cooperation is used by Steels and
McFarland although the robots do not explicitly communicate with each other).
A second robot in the ecosystem is necessary since parasites (lights) are taking
energy from the ecosystem (including the charging station), but the parasites
can temporarily be switched off by a robot bumping into them. The ecosystem
itself was set-up so that a single robot alone (turn-taking between switching off
the parasites and recharging) could not survive.
It is interesting to note that McFarland very easily transferred and applied
sociobiological concepts to robot behavior. The development of robot designs
(the artificial evolution) is in these terms also interpreted in terms of marketing
Embodiment and Interaction in Socially Intelligent Life-Like Agents 109

strategies. This is also interesting insofar as a conceptual framework which has


been developed in order to describe the behavior of natural agents at a systems
level has, by using the robotic approach, been fed back to the component level as
guidelines for the synthesis of such systems, namely as specifications for computer
programs which control the robots.
An overview on approaches towards synthesizing and analyzing collective au-
tonomous agents is systematically given by Maja J. Mataric ([57]). She discusses
biologically inspired Alife approaches as well as engineering approaches from the
Distributed Artificial Intelligence domain. The distributed problem solving sub-
area deals mainly with centrally designed systems, global problems and built-in
cooperation strategies. The other subarea, multi-agent systems comprises het-
erogeneous systems, is oriented towards locally designed agents, and deals with
utility-maximizing strategies of co-existence. [77] gives an example for off-line
design of social laws for homogeneous multi-agent societies. Mataric’s own work
is more biologically motivated. She uses e.g. a basic behavior approach and
reinforcement learning in order to study robot group behavior ([56]).

Teacher-Learner Social Robotics Experiments. Grounding of communi-


cation and meaning in ‘social robots’ has recently attracted much attention. This
subsection discusses research which studies the grounding of communication in
robotic agents in a particular teacher-learner set-up developed by Aude Bil-
lard, [8], in joint work with the author. The learner uses the teacher as a model,
i.e. learning to communicate means in this case that the learner tries to achieve a
similar ‘interpretation’ of the environment as the teacher has, on the basis of the
learner’s own sensory-motor interactions. A simple imitative strategy (following
and keeping-contact, as the author proposed in [21]) is used as the social bond-
ing mechanism, and a vocabulary is learnt by associative learning. Along these
lines a number of experiments have been performed both in simulation and with
real physical agents, with different learning tasks and different agents, including
teaching between a human and a robot. The experiments are described in detail
in [9,12], and [10]. Learning to communicate occurs as part of a general neural
network architecture, DRAMA, developed by Aude Billard, [8,11].
A particular experiment ([9]) studied the usefulness of communication using
a teacher-learner situation in a ‘meaningful’ (hilly) environment, an environment
proposed ([20], [21]) as a scenario for social learning. In this experiment ([9]) a
specific scenario (‘mother-child’) is studied as an example for a situation in which
the ability to communicate is advantageous for an individual robot. The labels
‘mother’ and ‘child’ assigned by the experimenters were used in a metaphorical
sense since the learner and teacher robot had (from an observer point of view)
particular ‘social roles’: first the learner learns to associate certain ‘words’ that
the teacher ‘utters’ with the environmental context (e.g. the values of its incli-
nation sensors). In the next step the learner can use this information in order to
find the teacher when the teacher emits the appropriate ‘names’ of its current
location. The experiment uses a hilly landscape scenario (see section 1), and the
110 Kerstin Dautenhahn

Fig. 1. The learner robot. It has to learn the teacher’s interpretations of ‘words’
on the basis of its own sensory inputs. Learning means here creating associations.

learner robot learns to associate names for ‘hill’ and ‘plane’ (see figures 1, 2, 3)
which are distinct features in its environment.
The behavioral architecture implements concepts of equilibrium and energy
potential in order to balance the internal dynamics of processes linked to in-
stinctive tendencies and individual learning. Results obtained were successful in
terms of the learning capacities, but they point out the limitation of using the
imitative following strategy as a means of learning. Unsuccessful or misleading
learning occurs due to the embodied nature of the agents (spatial displacement)
and the temporal delay in imitative behavior. These findings gave rise to a series
of further experiments which analyzed these limitations quantitatively and de-
termined bounds on environmental and learning parameters for successful learn-
ing [10], e.g. the impact of the parameter specifying the duration of short-term
memory which is correlated to the particular spatial distance (constraints due
to the embodiment) of the two agents.
One of the basic conclusions from these experiments was that general bounds
on parameters controlling social learning in the teacher-learner set-up can be
specified, but that the exact quantitative values of these parameters have to be
adjusted in the concrete experiments, e.g. adapted to the kind of robots, en-
vironment, and interactions which the experiments consist of. What does this
imply for the general context of (social) learning experiments of mobile robots? A
careful suggestion, based on the results so far, is that the fine-tuning of parame-
ters in experiments with embodied physical agents is not an undesired effect, and
that it is not only a matter of time until it can be overcome by a next and better
Embodiment and Interaction in Socially Intelligent Life-Like Agents 111

Fig. 2. The teacher (left) and the learner (right) robot in the initial position.
The robots are not identical, they have different shapes, plus sensori-motor char-
acteristics. We assume that the teacher robot ‘knows’ how to interpret the world,
i.e. it is emitting 2 different signals (bitstrings) by radio link communication for
moving on a plane and moving on a hill.

generation of a generic learning architecture. Rather, this could be an expression


of the intrinsic individual nature of embodied agents. Embodied agents are never
exactly the same, with respect to both morphology and behavior. This applies to
biological agents as well as robots, and ultimately goes back to the organization
of physical matter. Thus, the quest for a universal learning mechanism might
be misguided, embodied agents have to be designed carefully, following specific
guidelines and using qualitative knowledge on control and adaptation (compare
the ‘logic of life’ discussion in [26]). As long as robots cannot truly be evolved
(compare the evolution of virtual creatures by Karl Sims, [78]), robot evolution
has to be done by hand, in a process of synthesis. However, scientific investiga-
tions can yield guidelines to be discovered during the process of creation. Future
evolutionary steps, i.e. in a succession of robot-environment prototypes, can then
build on these results.
What about the degree of embodiment of the robots used in the experiments
described above? The robots were situated, since they completely depend on
on-line, real world sensor data which were used directly in a behavior-oriented
control architecture. The robots did not utilize any world model. The robots
were embedded, since robot and environment (social and non-social) were con-
sidered as one system, e.g. design and dynamic behavior had to be carefully
co-adapted. However, in comparison to natural living systems the robots have
112 Kerstin Dautenhahn

Fig. 3. ‘Mother’ and ‘child’ on top of the hill.

a ‘weak’ status of embodiment. E.g. the body of the robot is static, the posi-
tion and characteristics of the sensors and actuators are modified and adapted
to the environment by hand, not by genuine development (compare with re-
cent studies on the evolution of robot morphology, e.g. [54]). The body (the
robot’s mechanical and electronical parts) is not ‘living’, and its state does not
depend on the internal dynamics of the control program. If the robot’s energy
supply is interrupted (the robot ‘dies’), the robot’s body still remains in the
same state. This is a fundamental difference to living systems. If the dynamics
(chemical-physiological processes) inside a cell stop, then the system dies, it loses
its structure, dissipates, in addition to being used by saprobes, and cannot be
reconstructed (revived), see [26].

3.2 Embodiment in Virtual Agents

This section illustrates the design of virtual robots in virtual worlds and discusses
the role of embodiment in virtual agents. To be concrete, the discussion is based
on the virtual laboratory INSIGHT developed by Simone Strippgen ([84,85]).
This environment uses a hilly landscape scenario with virtual robots which has
also been studied in robotic experiments ([74,21]). The environment may consist
of charging stations, areas with sand, water and trees, and other agents. IN-
SIGHT is a laboratory for experiments in an artificial ecosystem where different
environments, robots and behaviors can be designed. Visualization tools, and a
Embodiment and Interaction in Socially Intelligent Life-Like Agents 113

methodology for designing control programs facilitate experimentation and anal-


ysis. In order to survive the agents have to cope with the ecological constraints
(e.g. hills, energy-consuming surfaces like sand). The agents may have various
distance and proximity sensors (e.g. bumpers). Labels like ‘sand’ and ‘energy’
(attributed by the experimenter) are used analogously to their function in exper-
iments with real robots. For example, energy for INSIGHT agents is simulated:
when they run out of energy then they stop because such a behavior is specified
in the virtual environment.

Inclination FB
Water

Bumper 8
Tree
“Head” Bumper 1
Sand d
Charging
Station InclinationLR

Robot

c)
ChargingStation1 ChargingStation2

a) b)
Fig. 4. Experiments in INSIGHT. a) Environment with sand, water, trees, charg-
ing station and one agent. The two sensor cones for finding the charging station
are indicated (dashed lines). It shows that these sensors cover a relatively large
area of the environment. The light sensors (necessary to detect other agents) have
the same size. b) design of an agent: The head indicates the back-front axis. It
has a ring of 8 bumpers (quantity Bumper1,2,3,4,5,6,7,8) which are surrounding
the surface of the agent’s body, 2 sensors measuring distance to the charging
station (ChargingStation, CS1 and CS2), 3 sensors each for detecting sand and
water (Water1,2,3; Sand1,2,3), 2 inclination sensors for the forward-backward
and left-right orientation of the body axis (InclinationFB, InclinationLR), and 2
sensors sensitive to green light (SignalGreenLight1,2). Each agent has a green
‘light’ on top. c) an agent approaching a charging station.

Control programs in INSIGHT follow the so-called ‘dynamical systems ap-


proach’, which was developed by Luc Steels at the VUB AI Lab in Bussels [82].
Programs consist of a set of simple processes and a set of quantities: sensor
quantities, actuator quantities and internal quantities. Processes specify the ad-
ditive changes of quantities. In each iteration cycle the processes are executed
in parallel and the quantities updated synchronously.
A PDL program for an example agent exploring the environment and recharg-
ing can be described by two addvalue statements, one for specifying dynamic
changes to the Translation quantity, the other for modifying the Rotation quan-
tity. Tabulars 1 and 2 show these two processes which make up the control pro-
114 Kerstin Dautenhahn

gram. This gives an example of a bottom-up, behavior-oriented control program


for an autonomous agent which is exploring and surviving in its environment.
The overall behavior of the agent is the result of its shape, properties of its actu-
ators, internal state and sensor readings at a particular moment in time without
any hierarchical control architecture or internal model of the world. The be-
havior of the robot, given its control program cannot be predicted reliably; the
only way to find out is to place the robot with its individual embodiment in
its environment and let it run. Thus, the behavior results from non-linear local
interactions between components of the robot-environment system (including
parts of the robot’s body, control program and environment).

An auxiliary quantity ‘Contact’ is used for process ‘StopCS’ which slows


down the translation of the agent when it is close to the charging station.
This should only happen when the agent is not engaged in obstacle avoidance
(value(Contact) == 0) behavior. The quantity ‘Contact’ represents the num-
ber of bumpers which are pushed in each iteration cycle. If the agent is located
right in the middle of the charging station (so that both charging station sen-
sor variable values equal zero) then the translation quantity is reduced to zero.
According to the PDL philosophy we only used addition, subtraction, multipli-
cation and division operations in the processes. In this way the arguments of
the addvalue statements had to be computationally simple, e.g. and or or rela-
tions had to be reduced to multiplications, etc. The programs were designed so
that the agents could survive in their habitat for a period of time, i.e. that the
agents could move around the landscape, find and enter charging stations, avoid
obstacles, avoid water and sand, react to other agents and hills.

Tabular 1: Quantity Translation


Process Argument
a ReduceTranslation (−value(T ranslate) + 500.0)/5.0)
h1 LeftCollision (−value(T ranslate) ∗ value(Bumper1))
h2 LeftFrontCollision (−value(T ranslate) ∗ value(Bumper2))
h3 FrontCollision (−value(T ranslate) ∗ value(Bumper3))
h4 RightFrontCollision (−value(T ranslate) ∗ value(Bumper4))
h5 RightCollision (−value(T ranslate) ∗ value(Bumper5))
h6 RightBackCollision (−value(T ranslate) ∗ value(Bumper6))
h7 BackCollision (−value(T ranslate) ∗ value(Bumper7))
h8 LeftBackCollision (value(T ranslate)) ∗ (−value(Bumper8)))
i AvoidWater ((value(W ater1) + value(W ater2) + value(W ater3))∗
(−value(T ranslate))/5.0)
j AvoidSand ((value(Sand1) + value(Sand2) + value(Sand3))∗
(−value(T ranslate))/10.0)
k StopCS ((1.0 − value(Contact))∗
((1.0 − value(CS1)) ∗ (1.0 − value(CS1))∗
(1.0 − value(CS1)) ∗ (1.0 − value(CS1))∗
(1.0 − value(CS1)) ∗ (1.0 − value(CS))∗
(−value(T ranslate)/2.0)))+
(1.0 − (value(Contact))∗
((1.0 − value(CS2)) ∗ (1.0 − value(CS2))∗
(1.0 − value(CS2)) ∗ (1.0 − value(CS2))∗
(1.0 − value(CS2)) ∗ (1.0 − value(CS2))∗
(−value(T ranslate)/2.0)))
m NormalSpeedup 50.000
Embodiment and Interaction in Socially Intelligent Life-Like Agents 115

Tabular 2: Quantity Rotation


Process Argument
a ReduceRotation (−value(Rotate)/5.000)
b FindC 5.000 ∗ (value(SignalGreenLight1) − value(SignalGreenLight2)))
c FindG 5.000 ∗ (value(SignalBlueLight1) − value(SignalBlueLight2)))
d AvoidC (5.000 ∗ (value(SignalGreenLight2) − value(SignalGreenLight1)))
e AlignValleyLR ((−0.07 ∗ value(InclinationLR)))
f AlignValleyFB ((0.16 ∗ value(InclinationF B)))
g FindLS (8.000 ∗ ((value(CS1) − value(CS2))))
h1 LeftCollision (−12.0 ∗ value(Bumper1))
h2 LeftFrontCollision (−12.0 ∗ value(Bumper2))
h3 FrontCollision (−12.0 ∗ value(Bumper3))
h4 RightFrontCollision (−12.0 ∗ value(Bumper4))
h5 RightCollision (−12.0 ∗ value(Bumper5))
h6 RightBackCollision (−12.0 ∗ value(Bumper6))
h7 BackCollision (−12.0 ∗ value(Bumper7))
h8 LeftBackCollision (−12.0 ∗ value(Bumper8))
i AvoidWater (15.000 ∗ value(W ater1)∗
(value(W ater1) − value(W ater2)))−
(15.000 ∗ value(W ater2) ∗ (value(W ater2) − value(W ater1)))
+(25.000 ∗ value(W ater3) ∗ (value(W ater3) − value(W ater1))
∗(value(W ater3) − value(W ater2)))
j AvoidSand (5.000 ∗ value(Sand1)∗
(value(Sand1) − value(Sand2)))−
(5.000 ∗ value(Sand2) ∗ (value(Sand2) − value(Sand1)))
+(10.000 ∗ value(Sand3)∗
(value(Sand3) − value(Sand1))
∗(value(Sand3) − value(Sand2)))

The environment INSIGHT has been described in order to give an example


of approaches to model the ‘embodiment’ of virtual agents in a virtual world.
To give another example, a commercially available robot simulator is Webots by
Cyberbotics (see http://www.cyberbotics.com/).
But can virtual, software or simulated agents be embodied? In section 1 we
consider embodiment a property of agents in social robotics research. Does this
mean that artificial agents which do not have a physical body cannot be embod-
ied? On a conceptual level there is no reason to restrict embodiment to the real
world, even if this is our ‘natural’ way of thinking. Recently, discussions have
started on what embodiment can mean to a software agent ([34], [51]), discussing
embodiment in terms of interactions at the agent-environment interface. Such
agent-environment couplings make sense for both software and robotic agents,
however it is not quite clear what embodiment can mean for simulated and soft-
ware agents and whether it is useful to apply the same criteria of embodiment to
physical and virtual/software agents. If virtual agents are simulations of physical
agents, e.g. the INSIGHT agents which can serve as simulations of real robots,
then realistic behavior has to be explicitly modelled. E.g. physical contact is
not provided by the simulation environment INSIGHT, it has to be modelled
explicitly. The INSIGHT agents do not ‘naturally’ possess a body boundary, so
without the specification of contact sensors around their body they could ‘cross’
through each other like ‘ghosts’. Thus, physical boundaries are realized in IN-
SIGHT by robot design and behavioral control instead of simulating physical
laws. This might appear ‘unnatural’ when the main purpose of a virtual world is
understood to simulate the physical world as close as possible, e.g. in order to use
the virtual world as a model for the real world. However, it allows alternative re-
alizations of embodiment (where embodiment is not ‘naturally given’ but has to
116 Kerstin Dautenhahn

be defined and designed explicitly). Thus, virtual environments might provide


an interesting testbed for concepts and theories on embodiment and meaning
since they force us to be precise and explicit about concepts like ‘embodiment’
which are in virtual environment no longer ‘naturally’ given by the physics of
the world.

3.3 Dancing with Strangers - A Dynamical Systems Approach


Towards Robot-Human Interaction
This section outlines experiments which the author first implemented at the
VUB-AI Lab in Brussels and later re-implemented at the Humanoid Interaction
Laboratory, ETL, Japan.2 This work presents an dynamics approach towards
robot-human interaction, based on ideas previously developed and published
by the author in [25]. This section will outline the basic concepts behind this
approach, introducing the concept of temporal coordination as a ‘social feedback’
signal for reinforcement learning in robot-human interaction.

Experimental Set-Up. The experiments consists of one mobile robot (e.g. a


VUB Lego robot, or a fischertechnik robot built by the author) and a human
with a stationary video camera pointing at her. The robot is controlled in a PDL-
like fashion as described in section 3.2. The camera image is processed on a PC,
movements are detected using a technique developed by Tony Belpame ([7]).
The basic idea is to calculate difference images between each pair of successive
image frames and then to calculate the centre of gravity for the difference image.
The difference image represents areas where changes of movement occurred. If
the environment in which the human moves is static then the difference image
is equivalent to areas where the human body moved. The centre of gravity then
shows the centre of the movement. This method for movement detection is com-
putationally simple, but only applies to a static camera and only if a distinct
area of main movements exists. If the human moves both arms simultaneously
then it is likely that the centre of gravity would be within the centre of the body.
Thus, the experiments required ‘controlled’ movements of parts of the body such
as hand movements or full body movements. For enhanced precision the experi-
ments report only on hand movements when the human is sitting in front of the
camera and moving her hand so that it covers a large area of the image.
Changes in the centre of gravity between two successive difference images are
then used to classify the hand movements of the human into six categories: a)
moving horizontally from right to left or left to right, b) moving vertically up or
down the screen, c) moving the hand in circles either clockwise or anti-clockwise.
Information about the classification of the movements is sent to the robot via
radio-link.
The control program which runs on the mobile robot can run in two modes:
in the autonomous mode it repeatedly performs a sequence of movements (a
2
Thanks to Luc Steels, Tony Belpaeme, Luc Berthouze and Yasuo Kuniyoshi for
supporting the experiments.
Embodiment and Interaction in Socially Intelligent Life-Like Agents 117

movement repertoire) autonomously and depending on the feedback from the


human certain movements can be selected (see figure 7). The six possible inputs
(movements by the human) are mapped to four possible outputs (movements of
the robot): turning left, turning right, moving forward, moving backwards. In
the slave mode these mappings are directly determining the robot’s movements.
Figure 5 shows the basic set-up of the experiments and the association matrix.
Figure 8 gives an example of the performance of the robot in slave-mode.
Due to programming according to the PDL philosophy (see section 3.2) move-
ment transitions do not occur abruptly but in each PDL iteration cycle the ac-
tivation of the motors is updated by addition or subtraction of small values. In
this way, if in slave mode the robot is turning left and the human intends to have
it turning right, the ‘correct’ input has to be given for a significant amount of
time, since the robot will first slow down, then stop and then reverse its direction
of movement until it finally is moving right.
As the author discusses in [25] the synchronization and coordination of move-
ments between humans and their environment seems to play a crucial role in the
development of children’s social skills. Hendriks-Jansen points out ([43], [44])
too, that getting the interaction dynamics right between infant and caretaker
seems to be a central step in the development of social skills. In [25] we dis-
cuss that in social understanding empathic resonance plays an important role,
a kind of synchronization in a psychological rather than movement-based sense.
The synchronization of bodies and minds, dancing, turn-taking (e.g. in a di-
alogue) and empathy, have in common that they required one to coordinate
one’s external and/or internal states with another agent, to become engaged in
the situation. The states need not be exactly the same, dancers in a group can
dance different movement patterns, but their states are temporally coordinated.
Moreover, dancing in a group is more than the sum of its parts, a dance is an
emergent pattern in which different individual dancers take part and synchronize
their movement with respect to each other and within the group as a whole.

Temporal Coordination of Movements. How can we study mobile robots


which become ‘engaged’ in a dialogue with a human? The set-up which this sec-
tion describes puts temporal coordination in the centre of the study, i.e. neither
attempted selection and matching of movements (like in attempted imitation,
see [65]), nor (socially) learning the correct action (see works on programming by
demonstration, [19], and imitation for software and robotic agents) is the focus
of attention, but studying the temporal relationship between the movements of
two agents. Temporal Coordination is represented as a weight associated to each
possible input/output pair in the association matrix (see figure 5). The weight
is activated if the two agents perform movements as indicated in the matrix
entries. The weight is increased if the weight was activated in two consecutive
timesteps.3 The weights in the association matrix are used in the control program
of the robot as numerical factors which serve as ‘motivation factors’ for either the
3
The association matrix and the updating of the weights is a simple version of Hebbian
Learning in a neural network.
118 Kerstin Dautenhahn

movement repertoire (‘global’ option, only one motivation factor) or single move-
ments (‘select’ option, several motivation factors). The maximum value is 100
which means that the motor control commands are directly sent to the robot,
e.g. the commands to perform a sequence of movements. A motivation which
equals zero or is below zero means that the robot will not move at all (global
option), or will not perform that particular movement (select option). Figure 6
shows the combinations of modes and options for running the experiments. In
the autonomous mode movements with associated values which equal or are less
than zero are skipped in the sequence of movements. In that situation this partic-
ular movement would therefore (from an observer point of view) disappear from
the robot’s movement repertoire. To give a simple example, let us assume two
agents A and B which can show four or respectively six different movements A1,
A2, ...,A4 and B1, B2,....,B6. If during nine consecutive timesteps agent B shows
the sequence B1-B2-B3-B1-B2-B3-B1-B2-B3 while agent A shows A4-A4-A4-
A4–A4-A4-A4-A4-A4 then the temporal coordination between the movements
equals zero. B showing B4-B4-B4-B1-B1-B1-B1-B2-B2 results in a update of the
weights between A4/B4 (update twice) and A4/B1 (three times) and A4/B2
(once). Thus, it does not matter if the movements of agent A and agent B are
the same, it only matters if the current pairing (e.g. A1 and B4) is maintained
over consecutive timesteps. Note that the sequences A1-A2-A3 and B2-B3-B4
are temporally not coordinated, although they might be considered as mirror or
imitated movements. This might appear counter-intuitive, but results from the
segmentation of movements which is needed for the input of the association ma-
trix. Inputs to the matrix represent movements during fractions of a second, so
not ‘behaviors’ (extended over time, e.g. seconds) in the strict sense. Parameters
which are controlling the generation of the input data for the association matrix
are therefore important features of the set-up. They were manually adapted to
the movements of the human.

Results. Figure 7 gives an example of an experiment in the autonomous mode


of the system. The robot autonomously performs and repeats a sequence of
movements, e.g. rotation left (series 1 in diagram), rotation right (series 2),
translation forwards (series 3), translation backwards (series 4). Each movement
has a weight in the association matrix (select option). Here we show an example
where the duration of the movements, which is initially equal for all four move-
ments, changes over time. The weights are initialized with 100 (maximum) and
decrease by 0.5 in each iteration cycle if no temporal coordination between the
human’s and the robot’s movements is detected by the robot. If a temporal co-
ordination is detected then the weight is increased by 1.5 in that iteration cycle.
a) Global option. This shows a reference experiment where a global weight con-
trols the activation of the robot’s movement repertoire. In this case the human
responds to the robot’s movements in a non-synchronized way, namely by doing
movements without paying attention to the robot’s movement. Thus, only acci-
dentally short periods of temporal coordination interrupt the constant decrease
of the global motivation. The robot is in this situation showing the sequence of
Embodiment and Interaction in Socially Intelligent Life-Like Agents 119

Antenna

Lego- or FT- Bot


Mappings used for “Slave Mode”

robot movements wij


left

motor output
rotation
right

forward
translation
backward

j sensory input

i
left right up down circle+ circle-

human movements

camera

classification of hand
movements

Fig. 5. The basic experimental set-up and the association matrix.

autonomous movements with constantly decreasing ‘motivation’, i.e. it slows


down and finally stops. b) Select option. In this experiment the human pays
attention to the robot and reacts with a temporally synchronized movement to
a particular movement, e.g. here the human reacts with circular movements in
clockwise direction everytime the robot rotates in anti-clockwise direction. In
this way anti-clockwise movement of the robot is reinforced, and the weights
for the execution of the other movements decrease. After 92 iteration cycles the
robot performs anti-clockwise rotations more frequently than any other move-
ments. The arrow indicates the continuation of the experiment, showing the time
window with iteration cycles 425-435, when the weight for anti-clockwise rota-
tion is still at its maximum value while all other weights have dropped below
120 Kerstin Dautenhahn

Options
Global Select

Autonomous
Modes
Slave

Fig. 6. Modes and options used in the ‘dancing with strangers’ experiments.

zero. As a result, the robot will, as long as the human reacts with temporally
coordinated movements, continuously rotate in an anti-clockwise direction. The
human’s appropriate reaction need not necessarily be clockwise rotation, hori-
zontal movements to the left or any other movements which are linked to the
robot’s anti-clockwise movement (as specified in the association matrix), have
the same effect.
Figure 8 gives an example of an experiment in the slave mode of the system.
Series 1-4 represent motivation factors associated to particular movements of
the robot: 1-2 stand for rotation (1: anti-clockwise, 2: for clockwise), 3-4 stand
for translational movements (3: moving forwards, 4: backwards). All weights in
the association matrix are initialized with 100 (maximum) and decrease by 0.5
in each iteration cycle if no temporal coordination between the human’s and
the robot’s movements is detected by the robot. If a temporal coordination is
detected then the weight is increased by 1.5 in each iteration cycle. Since vertical
hand movements are not used in this sequence the weights for translational
movements drop monotonically, and series 3 and 4 cannot be distinguished. Due
to reactions of the human a particular movement of the robot is selected, in this
case turning to the left. The human starts with hand movements to the right
and left, points a, b, c and d in figure 8 indicate her changes of direction. At
point e she switches to circular movements in anti-clockwise direction. During
the ‘training’ period the weights for other movement tendencies drop to zero
while the robot’s tendency for anti-clockwise rotation increases to the maximum
value. At point f the human stops circular movements and starts to move her
hand from left to right. The weight for anti-clockwise rotation drops slights while
the weight for clockwise rotation slowly increases. However, since the weights
for movements other than anti-clockwise rotation are close to zero, the robot
does not exhibit any visible movement. Thus, the movement repertoire of the
robot has been trained towards anti-clockwise rotation. Strictly speaking this
only applies to movements (different from anti-clockwise rotation) with a short
duration. If the human changes her preferred movements from anti-clockwise
rotation to clockwise rotation then this leads to a retraining of the robot. Of
Embodiment and Interaction in Socially Intelligent Life-Like Agents 121

a) Autonomous Mode: Global

100
90
80
70
60
Weight

50
40
30
20
10
0
40

80

120

160

200

240

280
0

Tim e Steps

b) Autonomous Mode: Select

120
100
80
60
40 Series1
Weight

20 Series2
0 Series3
10
20
30
40
50
60
70
80
90
432
0

-20 Series4
-40
-60
-80
-100

Tim e Steps

Fig. 7. Autonomous Mode. See text for explanation.

course the learning mechanism could be changed so that once a pattern has
been trained the robot tends to memorize this movement. In the experiments
reported here we did not implement any such memory functionality.

Discussion. What have these experiments shown? We studied the temporal


coordination between a human and a mobile robot which changed, depending
122 Kerstin Dautenhahn

Slave Mode: Select

150
a bc d e f g

100

50 Series1
Weight

Series2
0
Series3
50
100
150
200
250
300
350
400
450
500
0

-50 Series4

-100

-150

Tim e Steps

Fig. 8. Slave Mode. See text for explanation.

on the reactions or the feedback by the human, its movement repertoire. A very
simple association matrix was used for training purposes, however, it turned
out in demonstrations of this system4 that it was the human rather than the
robot which was the learner is these experiments. In the slave mode humans very
quickly realized that the robot’s movement were correlated to their own move-
ments and that the robot could be operated like a passive puppet-on-a-string
toy. However, the ‘puppet’ was sensitive to how long humans interact with it
and how ‘attentive’ they were (e.g. adapting the speed of their own movements
to the robot’s speed, this was necessary e.g. when trying to change the robot’s
movement from turning left to turning right, see above). A cooperative human
paid attention to the robot’s movement and kept it moving, ‘neglect’ made the
robot slow down and finally stop. The robot could also be operated (in select
option) so that it finally only performed those movement(s) where the human
gave longest response and attention to. The robot therefore adapted to the hu-
man and ‘personalized’, i.e. after a while only reacting to the human’s ‘favorite’
movement. This also occurred in the autonomous mode, however then the human
could only select from a given repertoire of movements, i.e. the human could
shape the robot’s autonomous behavior. A cooperative human learnt quickly to
give the appropriate feedback in order to keep the robot moving. Depending on
the human’s preference the robot then (in the autonomous mode) ended up per-

4
For instance at a workshop co-organized with Luc Steels: 7-14 September 1996
in Cortona, Italy (Cortona Konferenz - Naturwissenschaft und die Ganzheit des
Lebens,“Innen und Aussen” - “Inside/Outside”).
Embodiment and Interaction in Socially Intelligent Life-Like Agents 123

forming only one or a few different movements. Thus, the behavior of the robot
finally was typical of the human who interacted with it.
Potentially this method can be used to adapt the behavior of a robot to a
human’s individual needs and preferences, in particular if the ‘movements’ which
we used become complex behaviors and can be shaped individually. This process
is done is a purely non-symbolic way, without any reasoning involved except
for defining an association matrix and detecting temporal coordination. More
sophisticated learning architectures could be based on such a system, e.g. for
the study of imitation ([38,10]). This becomes particularly attractive if the robot
has more degrees of freedom than the simple system we used in this robot-human
interaction experiments. This becomes important in areas where humans have
long periods of interaction with a robot, e.g. in service robotics (e.g. [91]).
Another aspect in robot-human interaction aims at believability, e.g. as [35]
shows, a robot with life-like appearance and responses furthers the motivation of
a human to interact with the robot. The dynamics of the robot-human interac-
tions change both the states of the robot and the human, and that influences the
overall interaction and the way the human interprets the robot. The following
section analyses in more detail levels of interaction and how robot behavior is
interpreted by a human observer.

Temporal Coordination and Believable Interaction. Let us consider the


situation when human a enters a room where a robot is located. Hypothetical
behaviors of the robot (R), and plausible interpretations by the human observer
and interaction partner (H) can occur, depending on the following levels of in-
teraction:
1. R: the robot is not moving at all. H: the robot can be any object, it is not
interesting.
2. R: the robot moves randomly or in a manner not correlated with the reactions
of the human. H: The robot is likely to be attributed autonomy, but the
human might feel indifferent or afraid of the robot. The human might do
some ‘tests’ in order to see if the robot reacts to her, e.g. repeating certain
movements, approaching the robot, etc. After a while the human might lose
interest since she can neither influence nor control the robot.
3. The human is able to influence the behavior of the robot without paying
attention to the robot. For example, the robot increases and decreases the
speed of its movements depending on the human’s activities. The robot’s
movement repertoire itself remains unchanged.
4. R: the robot’s movements are temporally coordinated to the human’s move-
ments. H: the human realizes that she can influence the robot when perform-
ing appropriate movements, she can modify, or ‘train’ its behavior individ-
ually. The relationship builds up and needs ‘attention’, but is not a priori
given. The robot is more likely to be accepted as an interaction partner.
5. See previous item with the following increase in interaction complexity: The
human is now able to shape the robot’s behavior, e.g. by means of machine
learning techniques.
124 Kerstin Dautenhahn

In the author’s view synchronization of movements can contribute to life-like


behavior just as appearance can. However, in robot-human interaction so far the
analysis of the human’s behavior resulting in a symbolic description which can
then be used to control a robot’s behavior has been the predominant approach.
Generally, body movements are used by computationally expensive vision rou-
tines which extract information on position or gestures, rather than using the
dynamic nature of the movements itself. However, temporal coordination might
be a means to link the human’s and the robot’s dynamics in a way which appears
‘natural’ to humans.
The ‘dancing’ experiments described in this section were strongly inspired
by Simon Penny’s PETIT MAL, an interesting example of a non-humanoid but
socially successful mobile robot [68]). In the terminology introduced above PE-
TIT MAL facilitates human-robot interactions of level 3. A double pendulum
structure gives the robot an ‘interesting’ (very smooth behavior transitions) and
at the same time unpredictable movement repertoire, pyro-electric and ultra-
sonic sensors enable the robot to react to humans by approaching or avoiding
them. The system has been running at numerous exhibitions and attracted much
attention despite of its technological simplicity. The robot is a purely reactive
system without any learning or memory functionality, the complexity lies in
the balanced design of this system, and not in its hardware and software com-
ponents. Robot-human interactions with PETIT MAL generate interesting dy-
namics which cannot be explained or predicted from the behavior of the human
or the robot alone. This implementation at the intersection of interactive art
and robotics demonstrates the power of dynamics in human-robot social inter-
actions. Combining learning and movement training techniques which the author
investigates with interesting designs like PETIT MAL suggests the direction for
building socially competent robots. This could complement research directions
which emphasize the complexity of the robot control architecture (e.g. [49]).

4 Social Matters

The term ‘social’ seems to have become a fashionable word during the last years.
It is often used in different communities when describing work on models, the-
ories or implementations which comprise interactions between at least two au-
tonomous systems. The word ‘social’ is intensively used in research on multi-
agent systems (MAS), distributed artificial intelligence (DAI), Alife, robotics. It
has been used for a quite longer time in research areas primarily dealing with
natural systems like psychology, sociology, biology. It would go beyond the scope
of this paper to discuss in length the historical and current use of the term social
in all these different research areas. Instead, we exemplify its use by discussing
distinct approaches to sociality. Particular emphasis is given to the role of the
individual in social modelling. We discuss issues which seem to be important
characteristics of this individual dimension. In order to account for the individ-
ual in social modelling we relate this to the concept of autobiographic agents
Embodiment and Interaction in Socially Intelligent Life-Like Agents 125

which keep up their individual ‘history’ (autobiography) during their life-time


(see section 2.2).
We propose as a first level beyond the individual’s self interest the social
control dynamics within a small group of individualized agents with emotional
bonding between its members. In socially integrated agents on this level com-
plex processes take place when genetic and memetic selfish interests emerging at
different levels of control structure mutually interact within the autobiographic
agent who does, by definition, try to construct and integrate all experiences on
the basis of his own embodied ‘history’. In our view these complex, dynamic in-
teractions within an embodied, autobiographic, socially integrated agent can ac-
count for the individuality, complexity and variability of human behavior which
cannot sufficiently be described by the selfishness of genes and memes only.

4.1 Natural Social Agents: Genes, Memes and the Role of the
Individual
Sociobiology can be defined as the science of investigating the factors of biological
adaptation of animal and human social behavior (according to [89], p. 1). In
his most influential book Sociobiology Edward O. Wilson argues for using the
term ‘social’ in an explicitly broad sense, “in order to prevent the exclusion
of many interesting phenomena” ([93]). One concept is basic to sociobiology:
gene selection, namely viewing genes and not the individual as a whole or the
species as the basic selectionist units. An important term in the sociobiological
vocabulary is selfishness which means that genes or individuals behave only in
a way which tends to increase their own fitness. The principle of gene selection
is opposed to how ‘classical’ ethology views the evolution of species with the
individual as the basic unit of selection. According to [94] the new paradigm of
sociobiology is that it uses Darwin’s theory of evolution by natural selection and
has transferred it to the level of genes.
Richard Dawkins’s selfish-gene approach has across disciplines influenced the
way people think about evolution and the role of the human species as part of
this system ([30,31]).
“There is a river out of Eden, and it flows through time, not space. It is
a river of DNA - a river of information, not a river of bones and tissues:
a river of abstract instructions for building bodies, not a river of solid
bodies themselves. The information passes through bodies and affects
them, but it is not affected by them on its way through.” ([31])
Dawkins’s definitions of an evolution based on information transfer and of
replicators (self-reproducing systems) as the unit of evolution has become very
attractive for computer scientists and the Artificial Life research direction, since
it seems to open up a path towards synthesizing life (or life-like qualities) without
the need and burden to rebuild a body in all its phenomenological complexity as
natural ones have. In Dawkins’s philosophy the body is merely an expression of
selfish genes in order to produce more selfish genes. In order to explain the evo-
lution of human culture Dawkins introduced the concept of memes, representing
126 Kerstin Dautenhahn

ideas, cognitive or behavioral patterns which are transmitted between individuals


by learning and imitation. These memes should follow the same selfish Darwinian
principles as genes. Human behavior and thinking, in this philosophy, are driven
and explainable by the selfishness of genes and memes.
Based on the sociobiological concept of selfishness many attempts have been
made to explain ‘altruism’ and cooperative behavior which obviously do exist in
human and other animal societies and seem to contradict the selfishness of genes.
Francis Heylighen reviews in [45] the most prominent models for the explanation
of altruism and cooperation, namely kin selection, group selection and reciprocal
altruism.
Kin selection, as the least controversial model, is based on inclusive fitness
and strictly follows the selfish gene dogma. Since an individual shares its genes
with its kin or offspring this principle would lead to cooperation and altruism
which at best further the transportation of copies of ones genes to the next gener-
ation. The social organization of so-called social insects can be well explained by
this. In these cases of ‘ultrasociality’, e.g. when sisters are more closely related
to each other than they would be to possible offspring of their own, altruism
increases the inclusive fitness. Genetic and physiological mechanisms serve as
control structures, e.g. inhibiting the fertility of workers. Such social organiza-
tions and control structures can be found in insect and mammal species, namely
bees, ants, wasps, termites and African naked mole-rats ([76]).
In group selection, evolution should select at the level of the group and select
for group structures where cooperation and altruism lead to an increase of the
fitness of the whole group. This principle has been shown to be sensitive to infec-
tion by non-altruistic individuals (‘free-riders’) and therefore to be evolutionary
unstable. This is the least accepted explanation for the evolution of cooperation.
Reciprocal altruism has been treated using the game theoretical approach
of Axelrod’s work on the evolution of cooperation in the Prisoner’s Dilemma
game ([1]) which shows how a symbiotic relationship between two organisms
can develop. The repeated Prisoner’s Dilemma models the fact that the same
two individuals often interact more than once. The TIT-FOR-TAT strategy has
become famous in this context. A lot of work in evolutionary biology has dis-
cussed this game-theoretical approach to account for strategies of cooperation
(see [39,67]).
Sociobiological models of social behavior are strongly influenced by game
theory and its use in evolutionary research (see [58]). Game theory has been
originally developed in order to describe human economic behavior ([90]). The
main idea is to use a utility function which evaluates every strategy by a nu-
merical value. Participants in game theoretical interactions are supposed to act
‘rationally’ in the sense to choose the strategy which provides the highest util-
ity. As Maynard Smith points out “Paradoxically, it has turned out that game
theory is more readily applied to biology than to the field of economic behavior
for which it was originally designed” ([58]). The game theoretical concepts in
economics of utility and human rationality are replaced in evolutionary biology
by Darwinian fitness and evolutionary stability. The latter seems to be more
Embodiment and Interaction in Socially Intelligent Life-Like Agents 127

tractable by game theory than the former. We would like to note here that it is
an interesting point that a mathematical framework has turned out to be more
appropriate for describing the complex process of evolution than for the behavior
of those creatures who invented the framework.

In articles like [39] and [67] which model the social behavior of humans on
the basis of game theoretical approaches it is mentioned that ‘real persons’ in
real life do not only act on the bases of rationality and that the game-theoretical
assumptions do only apply in simple situations with few alternatives of choice.
[67] mentions “feelings of solidarity or selflessness” or “pressure of society” which
can underly human behavior. But nevertheless the game-theoretical models are
used to explain cooperation and developments in human societies on the abstract
level of rational choice. Axelrod himself seemed to be aware of the limitations of
the explanatory power of game-theory in modelling human behavior. In [1] he
dedicated a whole chapter to the ‘social structure of cooperation’. He identified
four factors in social structure: labels, reputation, regulation and territoriality.
Thus, while still of the basis of rational choices, Axelrod nevertheless includes
the ‘human factor’ in the game, taking into account human individual and social
characteristics. He goes a step further in his subsequent book The Complexity
of Cooperation ([2]).

Francis Heylighen [45] doubts that reciprocal altruism can sufficiently ac-
count for cooperative behavior in large groups of individuals. In [46] he introduces
another model for the evolution of cooperation especially in human society. On
the basis of memes, which we described earlier, he discusses how selfishness at the
cultural level can lead to cooperation at the lower level of the individuals. In [47]
the idea of memetic evolution is discussed in the framework of metasystem tran-
sitions, namely the evolutionary integration and control of individual systems by
shared controls. The following social metasystem transitions are identified: uni-
cellular to multicellular organisms, solitary to social insects, and human sociality.
Social insects are a good example for well-integrated societies with genetically
determined shared controls. In the case of human societies, Heylighen discusses
mutual monitoring (in small, primary groups with close face-to-face contacts),
internalized restraint, legal control and market mechanisms as memetic control
structures which lead to cooperative behavior beyond the competitive level of
the individual. This has led to ambivalent sociality and weakly integrated social
metasystems.

This section was meant to give an overview on theories about the genetic and
memetic evolution of social systems. We wanted to discuss the terms selfishness,
memes, and control structures. We come back to these terms in section 4.4 where
we discuss them in the broader context of social organization and control.
128 Kerstin Dautenhahn

4.2 Social Software Agents?


The research area intelligent software agents5 addresses the design of software
agents which are generally characterized by more or less repeated and ‘close’
contacts to human users. They should make the life of the human user easier
(increasing work efficiency), more comfortable or more pleasurable, e.g. helping
him to search and navigate in large databases, adjust a programming or physical
environment to the actual or expected requirements of the human or simply
entertain the human (computer games, virtual reality interactions, computer
generated movies). Thus, these agents have to represent, handle, adapt to and
learn the needs, desires and other human traits of ‘personality’. Even in the case
of ‘synthetic actors’, which do not have direct contact to any specific human,
the behavior of the agents has to satisfy the expectations of the audience. In
this way the agents themselves, in ‘coevolution’ with the human user, exhibit a
kind of ‘personality’. Keywords like ‘collaborating interface agents’, ‘believable
agents’, ‘synthetic characters’, and ‘interactive characters’ indicate the growing
interest in this research domain in modelling and synthesizing ‘individualized
agents’.
Of course, it should be noted that synthetic ‘individualized’ software agents
are not necessarily designed according to biological or psychological findings
about animal or human personality and ‘agency’. But even on a shallow level
and taking into account that humans can adapt to ‘unnatural’ ways of interac-
tion, human social competence and cognition plays an important role. Especially
in entertainment applications there is moreover a need for ‘complete’ agents
showing a broad and ‘life-like’ repertoire of acting and interacting. The issue of
human-agent interaction has in the domain of software agents much more inten-
sively been studied than in the domain of hardware agents (robots). To some
extent this might be due to the technologies available. On the other hand, robot
group behavior is mostly thought of in the sense that robots should do something
for a human being and not in collaboration with a human (except for research on
robots for handicapped people, e.g. [92]). Therefore, it is not surprising that the
general philosophy of thinking about ‘social robots’ (e.g. in the field of service
robotics) is still dominated by ‘rational’ concepts, while software agents research
(which is technologically as ‘computationalistic’ as robot research, sometimes
using the same control architectures) is also concerned with ‘phenomenological’
concepts like emotions, character or personality ([69,42,79,5]).

4.3 Defining Social Intelligence


In [21] we argued for the need to study the development of social intelligence
for autonomous agents, focusing on robots. Our argumentation was twofold: (1)
social intelligence is a necessary prerequisite for scenarios in which groups of
autonomous agents should cooperatively (i.e. by using communication) solve a
5
For an overview see Special Issue of Communications of the ACM on Intelligent
Agents, July 1994, Vol 37(7), and Special Issue AI Magazine on Agents, Summer
1998, Vol 19(2).
Embodiment and Interaction in Socially Intelligent Life-Like Agents 129

given task or survive as a group, (2) social intelligence is supposed to be the


basis for intelligence as such in the evolution of primate species. According to
the social intelligence hypothesis primate intelligence “originally evolved to solve
social problems and was only later extended to problems outside the social do-
main” ([18], see also [14], [15] for an overview about discussions along this line
of argumentation). For readers from the social science community the assump-
tion that social dynamics were an important (or primary) driving force for the
evolution of human intelligence might not at all seem new or provocative. More-
over, the Alife endeavour to construct artificially (social) intelligent agents along
this path seems to be straightforward. Nevertheless, in the Artificial Intelligence
community the concept of intelligence is still fundamentally shaped by ‘rational’
concerns like knowledge representation, planning and problem-solving. As an
example we like to cite a recent statement in [53] defining machine intelligence
as “intelligence is optimal problem solving in pursuit of specific goals under re-
source constraints” (explicitly avoiding any reference to human intelligence or
cognition).
In the author’s notion of social intelligence the directed interaction between
individuals is the focus of attention. In our view such communication situations
are based on synchronization processes which lead to both external coordination
of behaviors (including speech acts) and, on the internal, subjective, phenomeno-
logical side, to empathic understanding which can give rise to certain qualities
of social understanding and social learning (see [22], [25]).
We propose a definition of the term social intelligence as the individual’s
capability to develop and manage relationships between individualized, auto-
biographic agents which, by means of communication, build up shared social
interaction structures which help to integrate and manage the individual’s basic
(‘selfish’) interests in relationship to the interests of the social system at the next
higher level. The term artificial social intelligence is then an instantiation of so-
cial intelligence in artifacts. This definition of social intelligence refers to forms
of sociality which are typical for highly individualized societies (e.g. parrots,
whales, dolphins, primates), where individuals interact with each other, rather
than members of an anonymous society. The definition therefore contrasts to
notions of swarm intelligence and stigmergy (see section 3.1).
In the next section we propose a layered system of control structures which
we find useful for describing social systems. As we will show, we consider most
relevant the first level beyond the individual’s self interest, characterized by so-
cial control dynamics within a small group of individualized agents with social
bonding between its members. On this level we assume the most complex inter-
actions between the genetic, memetic and the individual, experiential level.

4.4 Social Organization and Control


The natural evolution of social living animals gives us two possible models,
namely anonymous and individualized societies. Social insects are the most
prominent example of anonymous societies. The group members do not recog-
nize each other as individuals but rather as group members ([26]). If we remove
130 Kerstin Dautenhahn

a single bee from a hive no search behavior is induced. The situation is quite
different in individualized societies which primate societies belong among. Here
individual recognition gives rise to complex kinds of social interaction and the
development of various forms of social relationships. On the behavioral level
social bonding, attachment, alliances, dynamic (not genetically determined) hi-
erarchies, social learning, etc. are visible signs of individualized societies. The
evolution of language, spreading of traditions and the evolution of culture are
further developments of individualized societies.
Fig. 9 points out our conception of social systems based on concepts which
we described in the previous sections. As a starting point we consider the indi-
vidual, ‘selfish’ agent. The individual itself is integrated insofar as if it consists of
numerous components, subsystems (cells, organs) whose survival is dependent
on the survival of the system at the higher level. If the individual dies all its
subsystems will die, too. In the case of eusocial agents (e.g. social insects and
naked mole-rats) a genetically determined control structure of a ‘superorganism’
has emerged, a socially well-integrated system. The individual itself plays no
crucial role, social interactions are anonymous.
Many mammal species with long-lasting social relationships show an alterna-
tive path towards socially integrated systems. Primary groups, which typically
consist of family members and close friends, emerged with close and often long-
lasting individual relationships. We define primary groups as a network of ‘con-
specifics’ who the individual agent uses as a testbed and as a point of reference
for his social behavior. Members of this group need not necessarily be genetically
related to the agent. Social bonding is guaranteed by complex mechanisms of
individual recognition, emotional and sexual bonding. This level is the substrate
for the development of social intelligence (cf. section 4.3) where individuals build
up shared social interaction structures, which serve as control structures of the
system at this level. Even if these bonding mechanisms are based on genetical
predispositions, social relationships develop over time and are not static. The
role of the individual agent as a life-long learning individual and social learning
system becomes most obvious in human societies. In life-long learning systems
the individual viewpoint and the complexity of coping with the non-social and
social environments furthermore reinforces the development of ‘individuality’.
We proposed in a previous section (2.2) to use the term ‘autobiographic agent’
to account for the aspect of re-interpreting remembered and experienced situa-
tions in reference to the agent’s embodied ‘history’.
Secondary and tertiary level groups emerge by additional, memetic control
structures. In contrast to Heylighen [47], we distinguish between simple market
mechanisms in secondary groups (trade and direct exchange of goods between
individuals) and complex market mechanisms in tertiary groups. The level of mu-
tual monitoring and (simple) market mechanisms is necessary in larger groups
of agents with division of labour and cooperation for the sake of survival of
the economic agents. This happens still by means of face-to-face interaction and
communication (the upper limit of the group size could probably be estimated
for humans as 150, which is according to [33] the cognitive limit on the num-
Embodiment and Interaction in Socially Intelligent Life-Like Agents 131

tertiary group

Human (Individualized) Societies


non-integrated system
cultural agents

secondary group, face-to-face


locally integrated system
mutual monitoring
market mechanisms
-> economic agents

primary group
socially integrated system
social bonding ->
“social”autobiographic agents

individual agent
“selfish” survival interests
Eusocial (Anonymous) Societies

individually integrated system:


cell->tissues-> organs->
organism

eusocial agents
socially integrated system

Fig. 9. Social organization and control.


132 Kerstin Dautenhahn

ber of individuals with whom one person can maintain stable relationships, as a
function of brain size). Control structures in secondary groups are still based on
the needs of the individual agent. We distinguish this level from tertiary groups
where external references (legal control, religion, etc.) provide the control mech-
anisms. Complex market mechanisms which can be found in human societies,
also play a role on this level. Here, the group size is potentially unlimited, espe-
cially if effective means of communication and rules for social interaction exist
(by means of language humans can handle large group sizes by categorization of
individuals into types and instructing others to obey certain rules of behavior
towards these types, see [33]).
An important point here to mention is that secondary and tertiary control
structures do not simply enslave or subsume the lower levels in the way the
organism as a system ‘enslaves’ its components (organs, body parts). The indi-
vidual which is as a social being embedded in primary groups, does not depend
absolutely for its survival on the survival of a specific system at a higher level.
Of course, changes in political, religious or economic conditions can dramati-
cally change the lives of the primary groups. But the dependency is weaker and
more indirect than in the case of social insects or the organ-body relationships.
This independence of the individual and the primary group from higher levels
can be an advantage in cases of dramatic changes. (Disadvantages of such less
integrated systems, e.g. part-whole competitions, are discussed in [47].)
A central point is that secondary and tertiary levels have mutual exchanges
with the level of the social, autobiographic agent. In socially integrated agents
on the primary group level, complex processes can take place when genetic and
memetic factors which are emerging at different levels of control structure mutu-
ally interact within the autobiographic agent who tries to construct and integrate
all experiences on the basis of his own embodied ‘history’. Within the mind of the
agent all the influences from the primary, secondary and tertiary groups are taken
into account for the individual decision processes, referring them to the past ex-
periences and the current state of the body. The memes which are exchanged
(either directly via personal one-to-one contact or indirectly one-to-many by
means of cultural knowledge bases like books, television, World-Wide-Web) are
integrated within the individual’s processes of constructing reality, maintain-
ing a concept of self and re-telling the autobiography. Educational systems can
assist the access to these sources of information (memes) but the knowledge
is constructed within the individual (see trends in learner-centered education
and design, [66], which stress life-long-learning and the need for engagement of
the user of educational tools). Since, as we described in the previous sections, no
two agents can have the same viewpoint and the same ‘history’ of individual and
‘memetic’ development, initial genetic variability is in this way fundamentally
enhanced on a cognitive and behavioral level.
These complex, dynamic interactions within an embodied, autobiographic,
socially integrated agent yield a unique, individual, dynamical pattern of ‘per-
sonality’ at the component level of social systems. This can account for the
Embodiment and Interaction in Socially Intelligent Life-Like Agents 133

individuality, complexity and variability of human behavior which in our view


are not sufficiently described by the selfishness of genes and memes only.
In [37] Liane Gabora discusses the origin and evolution of culture. She sees
culture as an evolutionary process and points out the analogies between the bi-
ological evolution of genes and the cultural evolution of memes which both “ex-
hibit the key features of evolution – adaptive exploration and transformation on
an information space through variation, selection, replication and transmission”.
In her view the creative process of generating new memes reflects the dynam-
ics of the entire society of interacting individuals hosting them. She presents a
scenario of how an individual infant becomes a meme-evolving machine via the
emergence of an autocatalytic network of sparse, distributed memories. The her
view, culture emerged with the first self-perpetuated, potentially-creative stream
of thought in an individual’s brain.
In this way Liane Gabora explicitly addressed the interdependencies of pro-
cesses taking place within the individual and memetic, cultural evolution in
societies. In our view this is an important step towards a framework of mod-
elling cultural phenomena by accounting for both component and systems level.
However, can we interpret humans as ‘hosts’ of memes (e.g. social knowledge)
in the way as Gabora sees humans as hosts of ideas, memes? As we discuss
in [25] social skills and knowledge are inseparable from the subjective, expe-
riential, phenomenological basis of social understanding, e.g. when memes are
interpreted and modified within an embodied system. Thus, only an integration
of the individual, social and cultural dimensions could sufficiently account for
the complexity of human social animals. Similar thoughts using the notion of
individual lifelines are elaborated by Steven Rose in [71].
An economic interpretation of figure 9 in terms of investment and pay-off
might speculate that evolution tried out two different strategies of investment:
investments into the control structure level (leading to integrated systems with
high complexity at the systems level but uniformity at the component level in
eusocial systems) versus investments into the complexity of the individual (lead-
ing to less-integrated systems on the systems level with strongly individualized
components in human society). Only the latter strategy which, as we mentioned
above, increased the number of variations well beyond the genetic level, has
shown to be an impressive source of creativity and flexibility.

5 The Project Aurora: Robots and Autism

In this section the project AURORA for children with autism which addresses
issues of both human and robotic social agents is introduced.
The main characteristics of autism are: 1) qualitatively impaired social re-
lationships, 2) impairment of communication skills and fantasy, 3) significantly
reduced repertoire of activities and interests (stereotypical behavior, fixation to
stable environments).
A variety of explanations of autism have been discussed, among them the
widely discussed ‘theory of mind’ model which is conceiving autism as a cognitive
134 Kerstin Dautenhahn

disorder ([3]), and an explanation which focuses on the interaction dynamics


between child and caretaker ([44]). Similarly, a lack of empathic processes is
suggested which prevent the child from developing ‘normal’ kinds of social action
and interaction ([25]). Supporting evidence suggests that not impairments of
mental concepts, but rather disorders of executive functions, namely functions
which are responsible for the control of thought and action, are primary to
autistic disorder ([73]).
The project studies how a mobile robot can become a ‘toy’, and a remedial
tool for getting children with autism interested in coordinated and synchronized
interactions with the environment. The project aims to demonstrate how social
robotics technology can increase the quality of life of disadvantaged children
who have problems in relating to the social world. Humans are best models for
human social behavior, but their social behavior is very subtle, elaborate, and
widely unpredictable. Many children with autism are however interested to play
with mechanical toys or computers.
The research goal is to develop a control architecture for a robotic platform,
so that the robot functions as an interactive ‘actor’ which based on a basic
behavior repertoire can express more complex ‘stories’ (e.g. sequences of move-
ments) depending on the interaction with a child, or a small group of children.
The careful use of recognition and communication techniques in human-robot
interaction and the development of an adequate story-telling ([75,29], [64]) con-
trol architecture using a behavior-oriented approach is the scientific challenge of
this project, and it can only be realized through a series of prototypes and their
evaluation in interaction with children with autism. The project is therefore an
ongoing long-term project.
It is however expected that the systems developed in the early phases will
already be useful as an interactive toy which can be used by the teaching staff
of schools of the British National Autistic Society (NAS) during their work with
children with autism.
The Aurora project
(http://www.cyber.rdg.ac.uk/people/kd/WWW/aurora.html) is done in col-
laboration with the National Autistic Society. We use the mobile robot platform
Labo-1, an Intelligent Indoor Mobile Robot Platform, and a product of Applied
AI Systems who support the project. Additional funding is provided by the UK
Engineering and Physical Sciences Research Council (EPSRC), GR/M62648.
The long-term goals of the project AURORA are twofold: 1) helping children
with autism in making the initial steps to bond with the (social) world, 2)
studying general issues of human-robot interface design with the human-in-the-
loop, in particular a) the dynamics of the perception-action loop in embodied
systems, with respect to both the robot and the human, b) the role of verbal
and non-verbal communication in making interactions ‘social’, c) the process of
adaptation, i.e. humans adapting to robots as social actors, and robots adapting
to individual cognitive needs and requirements of human social actors. Results
of this project are expected to advance research on embodiment and interaction
in socially intelligent life-like agents.
Embodiment and Interaction in Socially Intelligent Life-Like Agents 135

6 Conclusion

What is embodiment? In [23] embodiment is defined as follows: Embodiment


means the structural and dynamic coupling of an agent with its environment,
comprising external dynamics (the physical body embedded in the world) as well
as the phenomenological dimension, internal dynamics of experiencing and re-
experiencing of self and, via empathy, of others. Both kinds of dynamics are two
aspects emerging from the same state of being-in-the-world.
Recent discussions in the area of Embodied Artificial Intelligence (EAI, [70])
can be better applied to physical (biological and artificial) agents. The issue of
embodiment for digital agents is still controversial, and subject to the danger
of using metaphorical comparisons on a high level of abstraction which is not
relevant for concrete experiments.
What is meaning? The WWWebster Dictionary
(http://www.m-w.com/netdict.htm) defines ‘meaning’ as follows:

1. a : the thing one intends to convey especially by language, b : the thing that
is conveyed especially by language
2. something meant or intended
3. significant quality; especially : implication of a hidden or special significance
4. a : the logical connotation of a word or phrase, b : the logical denotation or
extension of a word or phrase

Which of these definitions can be applied to life-like agents? Definitions 1


and 2 seem to have most in common with the issues which we addressed in this
paper. However, 1 would exclude most existing robotic and software agents, since
they generally do not have human language. 2 seems to be mostly applicable in
our context, the definition points towards the role of the human as designer of,
user of, and observer of agents. Thus, in these interpretations the agent can have
a meaning to the human, no matter how meaningless its behavior or appearance
is from the point of view of the agent. Thus, talking about meaning then means
talking about humans, and their relationships to agents, instead of trying to
discover the introspective meaning of the world from an agent’s point of view:
What is it like to be an agent?6 For an elaborated discussion on the role of the
human observer in designing social agents see [27].
What are challenges for future research on life-like social agents based on the
work discussed in this chapter?

– Historically grounded robots. How can robots become autobiographic agents?


The framework proposed by C. Nehaniv and the author ([29], [64]) might be
a promising approach.
– The role of embodiment in social interactions and cooperative behavior:
What is the role of the particular embodiment of an agent? How can we
conceptualize embodiment for different ‘species’ of agents? This work will
study virtual and robotic agents in social learning experiments.
6
Compare Thomas Nagel [63].
136 Kerstin Dautenhahn

– Imitation: Scaling up from simple imitative behaviors like pre-programmed


following (learning by imitation) towards 1) more complex forms of imita-
tion and imitating robots, 2) learning to imitate. The framework described
in [65] can help evaluating attempts to imitation and in designing experi-
ments which study learning to imitate.
– Robot-Human communication: Instead of replacing humans, robots can have
the role of a ‘social mediator’, e.g. helping people to become engaged in
real world interactions. Here, robots would be socially intelligent therapeu-
tic tools. The issue of robot design plays hereby an important role (see
section 3.3).
– Based on considerations in section 4.1 mobile robots might be a powerful
tool to test models in human organization theory, a first approach taken by
the author in joint work with Scott Moss is described in [62]. Comparisons
between artificial and natural social structures and organizations ([16]) can
identify mechanisms and test assumptions on the nature of the agent ([17]).
Including robots in comparative studies could reveal the role of embodiment
and individual situated experience in such kind of models.

Acknowledgements
My special thanks to Aude Billard, Chrystopher Nehaniv and Simone Strippgen
for discussions and collaborative work on issues which are discussed in this paper.
The thoughts presented in this paper are nevertheless the author’s own.

References
1. Robert Axelrod. The Evolution of Cooperation. Basic Books, Inc., Publishers,
1984. 126, 127
2. Robert Axelrod. The Complexity of Cooperation: Agent-based Model of Competi-
tion and Cooperation. Princeton University Press, 1997. 127
3. S. Baron-Cohen, A. M. Leslie, and U. Frith. Does the autistic child have a “theory
of mind”. Cognition, 21:37–46, 1985. 134
4. F. C. Bartlett. Remembering – A Study in Experimental and Social Psychology.
Cambridge University Press, 1932. 105, 106
5. Joseph Bates. The nature of characters in interactive worlds and the oz project.
In: Virtual Realities: Anthology of Industry and Culture, Carl Eugene Loeffler, ed.,
1993, 1993. 128
6. R. Beckers, O. E. Holland, and J. L. Deneubourg. From local actions to global
tasks: stigmergy and collective robotics. In R. A. Brooks and P. Maes, editors,
Artificial Life IV, Proc. of the Fourth International Workshop on the Synthesis
and Simulation of Living Systems, pages 181–189, 1994. 107
7. Tony Belpame. Tracking objects using active vision. Thesis, tweede licentie
toegepaste informatica verkort programma academiejaar 1995-1996, Vrije Univer-
siteit Brussel, Belgium, 1996. 116
8. Aude Billard. Allo kazam, do you follow me? or learning to speak through imitation
for social robots. MSc thesis, DAI Technical Paper no. 43, Dept. of AI, University
of Edinburgh, 1996. 109
Embodiment and Interaction in Socially Intelligent Life-Like Agents 137

9. Aude Billard and Kerstin Dautenhahn. Grounding communication in situated,


social robots. In Proc. TIMR, Manchester, Towards Intelligent Mobile Robots
TIMR UK 97, Technical Report Series of the Department of Computer Science,
Manchester University, 1997. 102, 109
10. Aude Billard and Kerstin Dautenhahn. Grounding communication in autonomous
robots: an experimental study. Robotics and Autonomous Systems, special issue
on “Scientific Methods in Mobile Robotics”, 24(1-2):71–81, 1998. 102, 109, 110,
123
11. A. Billard, K. Dautenhahn, and G. Hayes. Experiments on human-robot commu-
nication with Robota, an imitative learning and communication doll robot. Tech-
nical Report CPM-98-38, Centre for Policy Modelling, Manchester Metropolitan
University, UK, 1998. 102, 109
12. Aude Billard and Gillian Hayes. Learning to communicate through imitation in
autonomous robots. In Proceedings of ICANN97, 7th International Conference on
Artificial Neural Networks, pages 763–768. Springer-Verlag, 1997. 109
13. Rodney A. Brooks. Intelligence without reason. In Proc. of the 1991 International
Joint Conference on Artificial Intelligence, pages 569–595, 1991. 105
14. R. Byrne. The Thinking Ape, Evolutionary Origins of Intelligence. Oxford Uni-
versity Press, 1995. 129
15. R. W. Byrne and A. Whiten. Machiavellian Intelligence. Clarendon Press, 1988.
129
16. Kathleen M. Carley. A comparison of artificial and human organizations. Journal
of Economic Behavior and Organization, 896:1–17, 1996. 136
17. Kathleen M. Carley and Allen Newell. The nature of the social agent. Journal of
Mathematical Sociology, 19(4):221–262, 1994. 136
18. D. L. Cheney and R. M. Seyfarth. Précis of how monkeys see the world. Behavioral
and Brain Sciences, 15:135–182, 1992. 129
19. A. Cypher, editor. Watch What I Do: Programming by Demonstration. MIT Press,
1993. 117
20. Kerstin Dautenhahn. Trying to imitate – a step towards releasing robots from social
isolation. In P. Gaussier and J.-D. Nicoud, editors, Proc. From Perception to Action
Conference, Lausanne, Switzerland, pages 290–301. IEEE Computer Society Press,
1994. 102, 109
21. Kerstin Dautenhahn. Getting to know each other – artificial social intelligence for
autonomous robots. Robotics and Autonomous Systems, 16:333–356, 1995. 109,
112, 128
22. Kerstin Dautenhahn. Embodiment in animals and artifacts. In Embodied Cognition
and Action, pages 27–32. AAAI Press, Technical report FS-96-02, 1996. 105, 106,
129
23. Kerstin Dautenhahn. Ants don’t have friends – thoughts on socially intelligent
agents. In Socially Intelligent Agents, pages 22–27. AAAI Press, Technical report
FS-97-02, 1997. 135
24. Kerstin Dautenhahn. Biologically inspired robotic experiments on interaction and
dynamic agent-environment couplings. In Proc. Workshop SOAVE’97, Selbstorga-
nization von Adaptivem Verhalten, Ilmenau, 23-24 September 1997, pages 14–24,
1997. 102
25. Kerstin Dautenhahn. I could be you – the phenomenological dimension of social
understanding. Cybernetics and Systems, 25(8):417–453, 1997. 106, 116, 117, 129,
133, 134
138 Kerstin Dautenhahn

26. Kerstin Dautenhahn. The role of interactive conceptions of intelligence and life in
cognitive technology. In Jonathon P. Marsh, Chrystopher L. Nehaniv, and Barbara
Gorayska, editors, Proceedings of the Second International Conference on Cognitive
Technology, pages 33–43. IEEE Computer Society Press, 1997. 111, 112, 129
27. Kerstin Dautenhahn. The art of designing socially intelligent agents: science, fiction
and the human in the loop. Applied Artificial Intelligence Journal, Special Issue
on Socially Intelligent Agents, 12(7-8):573–617, 1998. 103, 135
28. Kerstin Dautenhahn, Peter McOwan, and Kevin Warwick. Robot neuroscience —
a cybernetics approach. In Leslie S. Smith and Alister Hamilton, editors, Neu-
romorphic Systems: Engineering Silicon from Neurobiology, pages 113–125. World
Scientific, 1998. 102
29. Kerstin Dautenhahn and Chrystopher Nehaniv. Artificial life and natural stories.
In Proc. Third International Symposium on Artificial Life and Robotics (AROB
III’98 - January 19-21, 1998, Beppu, Japan), volume 2, pages 435–439, 1998. 106,
134, 135
30. Richard Dawkins. The Selfish Gene. Oxford University Press, 1976. 125
31. Richard Dawkins. River Out of Eden. Basic Books, 1995. 125
32. J. L. Deneubourg, S. Goss, N. Franks, A. Sendova-Franks, C. Detrain, and
L. Chrétien. The dynamics of collective sorting: robot-like ants and ant-like robots.
In J. A. Meyer and S. W. Wilson, editors, From Animals to Animats, Proc. of the
First International Conference on simulation of adaptive behavior, pages 356–363,
1991. 107
33. R. I. M. Dunbar. Coevolution of neocortical size, group size and language in
humans. Behavioral and Brain Sciences, 16:681–735, 1993. 130, 132
34. O. Etzioni. Intelligence without robots: a reply to Brooks. AI Magazine, pages
7–13, 1993. 115
35. C. Breazeal (Ferrell). A motivational system for regulating human-robot interac-
tion. in Proceedings of AAAI98, Madison, WI, 1998. 123
36. Stan Franklin and Art Graesser. Is it an agent, or just a program?: A taxonomy for
autonomous agent. In Proceedings of the Third International Workshop on Agent
Theories, Architectures, and Languages, published as Intelligent Agents III, pages
21–35. Springer-Verlag, 1997. 103
37. Liane Gabora. The origin and evolution of culture and creativity. Journal of
Memetics, 1(1):29–57, 1997. 133
38. P. Gaussier, S. Moga, J. P. Banquet, and M. Quoy. From perception-action loops
to imitation processes: A bottom-up approach of learning by imitation. Applied
Artificial Intelligence Journal, Special Issue on Socially Intelligent Agents, 12(7-
8):701–729, 1998. 123
39. Natalie S. Glance and Bernardo A. Huberman. Das Schmarotzer-Dilemma. Spek-
trum der Wissenschaft, 5:36–41, 1994. 126, 127
40. Deborah M. Gordon. The organization of work in social insect colonies. Nature,
380:121–124, 1996. 107
41. I. Harvey, P. Husbands, and D. Cliff. Issues in evolutionary robotics. In J. A.
Meyer, H. Roitblat, and S. Wilson, editors, From Animals to Animats, Proc. of
the Second International Conference on Simulation of Adaptive Behavior, 1992.
104
42. Barbara Hayes-Roth, Robert van Gent, and Daniel Huber. Acting in character.
In Proc. AAAI Workshop on AI and Entertainment, Portland, OR, August 1996,
1996. 128
Embodiment and Interaction in Socially Intelligent Life-Like Agents 139

43. Horst Hendriks-Jansen. Catching Ourselves in the Act: Situated Activity, Interac-
tive Emergence, Evolution, and Human Thought. MIT Press, Cambridge, Mass.,
1996. 106, 117
44. Horst Hendriks-Jansen. The epistomology of autism: making a case for an embod-
ied, dynamic, and historical explanation. Cybernetics and Systems, 25(8):359–415,
1997. 117, 134
45. Francis Heylighen. Evolution, selfishness and cooperation. Journal of Ideas,
2(4):70–76, 1992. 126, 127
46. Francis Heylighen. ‘selfish’ memes and the evolution of cooperation. Journal of
Ideas, 2(4):77–84, 1992. 127
47. Francis Heylighen and Donald T. Campbell. Selection of organization at the social
level: obstacles and facilitators of metasystem transitions. World Futures, 45:181–
212, 1995. 127, 130, 132
48. Ian Kelly and David Keating. Flocking by the fusion of sonar and active infrared
sensors on physical autonomous mobile robots. In The Third Int. Conf. on Mecha-
tronics and Machine Vision in Practice. 1996, Guimaraes, Portugal, Volume 1,
pages 1–4, 1996. 108
49. Volker Klingspor, John Demiris, and Michael Kaiser. Human-robot-communication
and machine learning. Applied Artificial Intelligence Journal, 11:719–746, 1997.
124
50. C. R. Kube and H. Z. Zhang. Collective robotics: from social insects to robots.
Adaptive Behavior, 2(2):189–218, 1994. 107
51. Nicholas Kushmerick. Software agents and their bodies. Minds and Machines,
7(2):227–247, 1997. 115
52. Douglas B. Lenat and R. V. Guha. Building Large Knowledge-Based Systems.
Representation and Inference in the Cyc Project. Addison-Wesley Publishing Com-
pany, 1990. 104
53. Robert Levinson. General game-playing and reinforcement learning. Computa-
tional Intelligence, 12(1):155–176, 96. 129
54. Henrik Hautop Lund, John Hallam, and Wei-Po Lee. Evolving robot morphology.
In Proceedings of IEEE 4th International Conference on Evolutionary Computa-
tion. IEEE Press, 1997. 112
55. P. Marchal, C. Piguet, D. Mange, A. Stauffer, and S. Durand. Embryological
development on silicon. In R. A. Brooks and P. Maes, editors, Artificial Life IV,
Proc. of the Fourth International Workshop on the Synthesis and Simulation of
Living Systems, pages 365–370, 1994. 104
56. M. J. Mataric. Learning to behave socially. In J-A. Meyer D. Cliff, P. Husbands and
S. Wilson, editors, From Animals to Animats 3, Proc. of the Third International
Conference on Simulation of Adaptive Behavior, SAB-94, pages 453–462, 1994.
109
57. Maja J. Mataric. Issues and approaches in design of collective autonomous agents.
Robotics and Autonomous Systems, 16:321–331, 1995. 107, 108, 109
58. John Maynard Smith. Evolution and the Theory of Games. Cambridge University
Press, 1982. 126, 127
59. D. McFarland and T. Bosser. Intelligent Behavior in Animals and Robots. MIT
Press, 1993. 108
60. David McFarland. Towards robot cooperation. In D. Cliff, P. Husbands, J.-A.
Meyer, and S. W. Wilson, editors, From Animals to Animats 3, Proc. of the
Third International Conference on Simulation of Adaptive Behavior, pages 440–
444. IEEE Computer Society Press, 1994. 108
140 Kerstin Dautenhahn

61. R. Moller, D. Labrinos, R. Pfeifer, T. Labhart, and R. Wehner. Modeling ant


navigation with an autonomous agent. In R. Pfeifer, B. Blumberg, J.-A. Meyer,
and S. W. Wilson, editors, From Animals to Animats 5, Proc. of the Fourth In-
ternational Conference on Simulation of Adaptive Behavior, pages 185–194, 1998.
107
62. Scott Moss and Kerstin Dautenhahn. Hierarchical organisation of robots: a social
simulation study. In R. Zobel and D. Moeller, editors, Proceedings 12th Euro-
pean Simulation Multiconference ESM98, Manchester, United Kingdom June 16-
19, 1998, pages 400–404. SCS Society for Computer Simulation International, 1998.
136
63. Thomas Nagel. What it is like to be a bat? Philosophical Review, 83:435–450,
1974. 135
64. Chrystopher Nehaniv and Kerstin Dautenhahn. Embodiment and memories —
algebras of time and history for autobiographic agents. In Proceedings of 14th
European Meeting on Cybernetics and Systems Research EMCSR’98, pages 651–
656, 1998. 106, 134, 135
65. Chrystopher Nehaniv and Kerstin Dautenhahn. Mapping between dissimilar bod-
ies: Affordances and the algebraic foundations of imitation. In John Demiris and
Andreas Birk, editors, Proceedings European Workshop on Learning Robots 1998
(EWLR-7), Edinburgh, 20 July 1998, pages 64–72, 1998. 117, 136
66. Donald A. Norman and James C. Spohrer. Learner-centered education. Commu-
nications of the ACM, 39(4):24–27, 1996. 132
67. Martin A. Nowak, Robert M. May, and Karl Sigmund. The arithmetics of mutual
help. Scientific American, 6:50–55, 1995. 126, 127
68. Simon Penny. Embodied cultural agents: at the intersection of robotics, cognitive
science and interactive art. In Socially Intelligent Agents, pages 103–105. AAAI
Press, Technical report FS-97-02, 1997. 124
69. Paolo Petta and Robert Trappl. On the cognition of synthetic characters. In Robert
Trappl, editor, Proc. Cybernetics and Systems ’96, Vol. 2, pages 1165–1170, 1996.
128
70. Erich Prem. Epistemological aspects of embodied artificial intelligence. Cybernetics
and Systems, 28(5):iii–ix, 1997. 135
71. Steven Rose. Lifelines. Biology, Freedom, Determinism. Penguin Books, 1997. 133
72. I. Rosenfield. The Strange, Familiar, and Forgotten. An Anatomy of Conscious-
ness. Vintage Books, 1993. 105
73. James Russell. Autism as an Executive Disorder. Oxford University Press, 1997.
134
74. E. Schlottmann, D. Spenneberg, M. Pauer, T. Christaller, and K. Dautenhahn.
A modular design approach towards behavior oriented robotics. Technical report,
GMD Technical Report Nr. 1088, June 1997, GMD, Sankt Augustin, 1997. 102,
103, 112
75. Phoebe Sengers. Narrative intelligence. To appear in: Human Cognition and Social
Agent Technology, Ed. Kerstin Dautenhahn, John Benjamins Publishing Company,
1999. 134
76. Paul W. Sherman, Jennifer U.M. Jarvis, and Richard D. Alexander, editors. The
Biology of the Naked Mole-Rat. Princeton University Press, Princeton, N.J, 1991.
126
77. Yoav Shoham and Moshe Tennenholtz. On social laws for artificial agent societies:
off-line design. Artificial Intelligence, 73:231–252, 1995. 109
78. Karl Sims. Evolving 3d morphology and behavior by competition. Artificial Life,
1(1):353–372, 1995. 111
Embodiment and Interaction in Socially Intelligent Life-Like Agents 141

79. Aaron Sloman. What sort of control system is able to have a personality. In Robert
Trappl, editor, Proc. Workshop on Designing Personalities for Synthetic Actors,
Vienna, June 1995, 1995. 128
80. L. Steels. The artificial life roots of artificial intelligence. Artificial Life, 1(1):89–
125, 1994. 105
81. L. Steels. A case study in the behavior-oriented design of autonomous agents. In
D. Cliff, P. Husbands, J.-A. Meyer, and S.W. Wilson, editors, From Animals to
Animats 3, Proceedings of the Third International Conference on Simulation of
Adaptive Behavior, pages 445–452, Cambridge, MA, 1994. MIT Press/Bradford
Books. 108
82. Luc Steels. Building agents out of autonomous behavior systems. In L. Steels
and R. A. Brooks, editors, The “Artificial Life” Route to “Artificial Intelligence”:
Building Situated Embodied Agents. Lawrence Erlbaum, 1994. 113
83. Luc Steels, Peter Stuer, and Dany Vereertbrugghen. Issues in the physical reali-
sation of autonomous robotic agents. Manuscript, AI Memo, VUB Brussels, 1996.
108
84. Simone Strippgen. Insight: ein virtuelles Labor fuer Entwurf, Test und Analyse von
behaviour-basierten Agenten. Doctoral Dissertation, Department of Linguistics
and Literature, University of Bielefeld, 1996. 112
85. Simone Strippgen. Insight: A virtual laboratory for looking into behavior-based
autonomous agents. In W. L. Johnson, editor, Proceedings of the First International
Conference on Autonomous Agents. Marina del Rey, CA USA, February 5-8, 1997,
pages 474–475. ACM Press, 1997. 112
86. G. Theraulaz, S. Goss, J. Gervet, and L. J. Deneubourg. Task differentiation
in polistes wasp colonies: a model for self-organizing groups of robots. In J. A.
Meyer and S. W. Wilson, editors, From Animals to Animats, Proc. of the First
International Conference on simulation of adaptive behavior, pages 346–355, 1991.
107
87. John K. Tsotsos. Behaviorist intelligence and the scaling problem. Artificial In-
telligence, 75:135–160, 95. 105
88. Sherry Turkle. Life on the Screen, Identity in the Age of the Internet. Simon and
Schuster, 1995. 104
89. Eckart Voland. Grundriss der Soziobiologie. Gustav Fischer Verlag, Stuttgart,
Jena, 1993. 125
90. J. von Neumann and O. Morgenstern. Theory of Games and Economic Behaviour.
Princeton University Press, 1953. 126
91. D. M. Wilkes, A. Alford, R. T. Pack, T. Rogers, R. A. Peters II, and K. Kawa-
mura. Toward socially intelligent service robots. To appear in Applied Artificial
Intelligence Journal, vol. 1, no. 7, 1998. 123
92. D. M. Wilkes, R. T. Pack, A. Alford, and K. Kawamura. Hudl, a design philosophy
for socially intelligent service robots. In Socially Intelligent Agents, pages 140–145.
AAAI Press, Technical report FS-97-02, 1997. 128
93. Edward O. Wilson. Sociobiology. The Belknap Press of Harvard University Press,
Cambridge, Massachusetts and London, England, 1980. 125
94. Franz M. Wuketits. Die Entdeckung des Verhaltens. Wissenschaftliche Buchge-
sellschaft, Darmstadt, 1995. 125
95. Robert S. Wyer. Knowledge and Memory: The Real Story. Lawrence Erlbaum
Associates, Hillsdale, New Jersey, 1995. 106
An Implemented System for Metaphor-Based
Reasoning,
With Special Application to Reasoning about
Agents

John A. Barnden

School of Computer Science


University of Birmingham, Birmingham B15 2TT, U.K.
J.A.Barnden@cs.bham.ac.uk
WWW home page: http://www.cs.bham.ac.uk/~jab

Abstract. An implemented system called ATT-Meta (named for propo-


sitional ATTitudes and Metaphor) is sketched. It performs a type of
metaphor-based reasoning. Although it relies on built-in knowledge of
specific metaphors, where a metaphor is a conceptual view of one topic
as another, it is flexible in allowing novel discourse manifestations of
those metaphors. The flexibility comes partly from semantic agnosticism
with regard to metaphor, in other words not insisting that metaphorical
utterances should always have metaphorical meanings. The metaphorical
reasoning is integrated into a general uncertain reasoning framework, en-
abling the system to cope with uncertainty in metaphor-based reasoning.
The research has focused on metaphors for mental states (though the al-
gorithms are not restricted in scope), and consequently throws light on
agent descriptions in natural language discourse, multi-agent scenarios,
personification of non-agents, and reasoning about agents’ metaphorical
thoughts. The system also naturally leads to an approach to chained
metaphor.

1 Introduction and Overview of ATT-Meta

First, some terminology. A metaphorical utterance is one that manifests (in-


stantiates) a metaphor, where a metaphor is a conceptual view of one topic
as another. Here I broadly follow Lakoff (e.g., Lakoff 1993). An example of a
metaphor is the view of the mind as a three-dimensional physical region. (We
call this metaphor MIND AS PHYSICAL SPACE.) Notice that, under this ter-
minology, a metaphor is the view itself, as opposed to some piece of natural
language that manifests the view. Such a piece of language might be “John be-
lieved in the recesses of his mind that ...,” in the case of MIND AS PHYSICAL

This work was supported in part by grant number IRI-9101354 from the National
Science Foundation, U.S.A.

C. Nehaniv (Ed.): Computation for Metaphors, Analogy, and Agents, LNCS 1562, pp. 143–153, 1999.

c Springer-Verlag Berlin Heidelberg 1999
144 John A. Barnden

SPACE. When a metaphor is manifested in an utterance, the topic actually be-


ing discussed (John’s mind, in the example) is the tenor, and the topic it is
metaphorically cast as (physical space, in the example) is the vehicle.

The ATT-Meta reasoning system is aimed at the reasoning needed to extract


useful information from metaphorical utterances in mundane natural language
discourse. It is not currently capable of dealing with novel metaphors — rather,
it has pre-given knowledge of a specific set of metaphors — but it is specifically
designed to handle novel manifestations of the metaphors it does know about.
Its knowledge of any given metaphor consists mostly of a relatively small set of
very general “conversion rules” that can convert information about the vehicle
into information about the tenor, or vice versa. The degree of novelty the sys-
tem can handle in a manifestation of a metaphor is limited only by the amount
of knowledge it has about the vehicle and by the generality of the conversion
rules. Note also Lakoff & Turner’s (1989) persuasive claims that even in po-
etry metaphorical utterances are mostly manifestations of familiar, well-known
metaphors, albeit the manifestations are highly novel and that metaphors can
be mixed in novel ways.

ATT-Meta is merely a reasoning system, and does not itself deal with natural
language input directly. Rather, a user supplies hand-coded logic formulae that
are intended to couch the literal meaning of small discourse chunks (two or three
sentences). This will become clearer later in the paper.

The ATT-Meta research has concentrated on a specific type of metaphor,


namely metaphors for mental states (and processes), such as MIND AS PHYS-
ICAL SPACE. However, care has been taken to ensure that the principles and
algorithms implemented are not restricted to this special case. The present pa-
per will mainly use mental-state metaphors in examples, but the examples can
readily be adapted to other types of metaphor.

There are many mental-state metaphors apart from MIND AS PHYSICAL


SPACE. Some are as follows: IDEAS AS PHYSICAL OBJECTS, under which
ideas are cast as physical objects that have locations and can move about (either
outside a person, or inside a person’s mind conceived of as a space); COGNI-
TION AS VISION, as when understanding, realization, knowledge, etc. is cast
as vision; IDEAS AS INTERNAL UTTERANCES, which is manifested when
a person’s thoughts are described as internal speech or writing (internal speech
is not literally speech); and MIND PARTS AS PERSONS, under which a per-
son’s mind is cast as containing several sub-agents with their own thoughts,
emotions, etc. Many real-discourse examples of manifestations of metaphors for
mental states and processes can be found in the author’s databank on the web
(http://www.cs.bham.ac.uk/~jab/ATT-Meta/Databank).

The special case of mental states has particular relevance to the current
workshop, because of the workshop’s interest in the subject of intelligent agents
and societies of agents. There are many points of contact with this subject:-
An Implemented System for Metaphor-Based Reasoning 145

(a) Mundane discourses, such as ordinary conversations and newspaper arti-


cles, often use metaphor in talking about mental states/processes of agents
(mainly people). Indeed, as with many abstract topics, as soon as anything
subtle or complex needs to be said, metaphor is practically essential.
(b) One commonly used metaphor for mental states, MIND PARTS AS PER-
SONS, casts the mind as containing a small society of sub-agents. Thus,
research into multi-agent situations can contribute to the study of metaphor
for mental states, as well as vice versa (cf. point (a)).
(c) Thirdly, one important research topic in cognitive science is self-deception
(see, e.g., Mele 1997), and, as I have argued elsewhere (Barnden 1997a),
metaphor for mental states (including MIND PARTS AS PERSONS) can
make a strong contribution to this area.
(d) Metaphors for mental states and processes are strongly connected to metaphors
for communication between agents, such as the CONDUIT metaphor, under
which communicational objects such as words are viewed as physical objects
travelling through a physical conduit (Reddy 1979).
(e) Even when an agent X’s mental states and processes are not themselves
metaphorically described, X itself may be thinking and reasoning metaphori-
cally about something. Note that this claim respects the idea that a metaphor
is a conceptual view that can be manifested in many different ways other
than natural language, such as in visual art, action, and thought. Thus, there
is a need for reasoning about agents’ metaphorical thoughts.
(f) Non-agents are often metaphorically cast as agents, i.e. personified, in mun-
dane discourse. Either implicitly or explicitly this raises the prospect of the
non-agent having mental states. An example of this is the sentence “My car
doesn’t want to start this morning.” To contrast this with (e), we can call
this reasoning about metaphorical agents’ thoughts.

Unusually for detailed technical treatments of metaphor, the ATT-Meta project


has given much attention to the question of uncertainty in reasoning. (The work
of Hobbs 1990 is the only other approach that gives comparable attention to
uncertainty.) Metaphor-based reasoning introduces special types of uncertainty.
Given an utterance, it is often not certain what particular metaphors are mani-
fested. But even given that a particular metaphor is manifested, the implications
of it for the tenor (which is, e.g., John’s mind) are themselves uncertain, and
may conflict with other lines of reasoning about the tenor. Those other lines
of reasoning, metaphor-based or not, are likely to be uncertain, in practice. A
further source of uncertainty is that the understander’s knowledge about the
vehicle of the metaphor (e.g., physical space) is itself uncertain. For instance,
mundane physical objects that are not close together generally do not physically
interact in any direct sense, but they may do so. An additional complication is
that two or more metaphors may be used simultaneously in relation to a given
tenor, further amplifying the uncertainties. ATT-Meta embodies partial treat-
ments of these various types of uncertainty. This metaphor-related uncertainty
146 John A. Barnden

handling is completely integrated into ATT-Meta’s treatment of uncertainty in


general.
ATT-Meta deals only in qualitative measures of uncertainty, as opposed to,
say, probabilistic measures. This is in part a simplification imposed to make the
project more manageable, and in part reflects a claim that qualitative uncer-
tainty is more appropriate for some purposes, notably some aspects of natural
language understanding. Arguing this matter is beyond the scope of the current
paper (but see Barnden 1998).
The plan of the rest of the paper is as follows. Section 2 presents the funda-
mental principles on which ATT-Meta’s metaphor-based reasoning works. Sec-
tion 3 very briefly sketches ATT-Meta’s basic reasoning facilities, irrespective of
metaphor. Section 4 explains how the metaphor principles are realized within
the basic reasoning framework. Section 5 comments briefly on ATT-Meta’s fa-
cilities for reasoning about agents’ beliefs and reasoning, again irrespective of
metaphor. Section 6 then combines the information from the previous two sec-
tions to indicate briefly how ATT-Meta could deal with reasoning about agents’
metaphorical thoughts (see (e) above) and reasoning about metaphorical agents’
thoughts (see (f) above). The Section also addresses chained metaphor. Section 7
concludes.
Further detail of the system and the attendant research can be found in
Barnden (1997, 1998), Barnden (in press) and Barnden et al. (1994a,b, 1996).

2 ATT-Meta’s Metaphor-Based Reasoning: Principles

Notoriously, metaphorical utterances can be difficult if not impossible to para-


phrase in non-metaphorical terms. Equally, it can be difficult if not impossible to
give them internal meaning representations that are not themselves metaphori-
cal. Consider, for instance. “One part of John was insisting that Sally was right.”
This manifests the metaphor of MIND PARTS AS PERSONS, where further-
more the mentioned part engages in natural language utterance (the insistence),
so that we also have IDEAS AS INTERNAL UTTERANCES being applied to
John. I claim that we simply do not know enough about how the mind works to
give a full, definite, detailed account of what was going on in John’s mind ac-
cording to the sentence. After all, what non-metaphorical account can be given
of some “part” of John “insisting” something? Rather, the utterance connotes
things such as that John had reasons both to believe that Sally was right and
to believe the opposite.
This particular connotation arises from the observation that someone gener-
ally insists something only when someone else has stated the opposite (although
there are other possible scenarios). So, the sentence suggests that some other
“part” of John stated, and therefore probably believed, that Sally was not right.
Then, because of the thoughts of the two sub-agents with John (the two parts),
we can infer that John had reasons to believe the mentioned things about Sally.
Some investigators may wish to call such an inference the underlying mean-
ing of the utterance, or at least to claim that it is part of the meaning. The
An Implemented System for Metaphor-Based Reasoning 147

ATT-Meta research project has refrained from this step, which is after all only
terminological, and only explicitly countenances literal meanings for metaphor-
ical utterances. (The literal meaning of the above utterance is the ridiculous
claim that John literally had a part that literally insisted that Sally was right.)
However, the project presents no objection to the step. Thus, we can say that
ATT-Meta is “semantically agnostic” as regards metaphor. (The approach is
akin to but less extreme than that of Davidson 1979, which can be regarded as
semantically “atheist.”)
ATT-Meta’s approach is one of literal pretence. A literal-meaning represen-
tation for the metaphorical input utterance is constructed. The system then pre-
tends that this representation, however ridiculous, is true. Within the context of
this pretence, the system can do any reasoning that arises from its knowledge
of the vehicles of the metaphors involved. In our example, it can use knowledge
about interaction within groups of people, and knowledge about communicative
acts such as insistence. As a result of this knowledge, the system can infer that
the explicitly mentioned part of John believed (as well as insisted) that Sally was
right, and some other, unmentioned, part of John believed (as well as stated)
that Sally was not right. Suppose now that, as part of the system’s knowledge
of the MIND PARTS AS PERSONS metaphor, there is the knowledge that if
a “part” of someone believes something P, then the person has reasons to be-
lieve P. The system can now infer both that John had reasons to believe that
Sally was right and that John had reasons to believe that Sally was not right.
Note here that the key point is that the reasoning from the literal meaning of
the utterance, conducted within the pretence, link up with the just-mentioned
knowledge. That knowledge is itself of a very fundamental, general nature, and
does not, for instance, rely on the notion of insistence or any other sort of
communicative act. Any line of within-pretence inference that linked up with
that knowledge could lead to conclusions that John had reasons to believe certain
things. This is the way in which ATT-Meta can deal with novel manifestations
of metaphors. There are no need for it at all to have any knowledge of how
insistence by a “part” of a person maps to some non-metaphorically describable
feature of the person. Equally, an utterance that described a part as doing things
from which it can be inferred that the part insisted that Sally was right would
also to lead to the same inferences as our example utterance (unless it also led
to contrary inferences by some route).
In sum, the ATT-Meta research has taken the line that it is a mistake to focus
on the notion of the underlying meaning of a metaphorical utterance, and has
concentrated instead on the literal meaning and the inferences that can be drawn
from it. This approach is the key to being able to deal flexibly with metaphorical
utterances.

3 ATT-Meta’s Basic Reasoning

ATT-Meta is a rule-based reasoning system that manipulates hypotheses (facts


or goals). In ATT-Meta, at any time any particular hypothesis H is tagged
148 John A. Barnden

with an certainty level, one of certain, presumed, suggested, possible or


certainly-not. The last just means that the negation of H is certain. Possible
just means that the negation of H is not certain but no evidence has yet been
found for H itself. Presumed means that H is a default: i.e., it is taken as a
working assumption, pending further evidence. Suggested means that there is
evidence for the hypothesis, but it is not strong enough to enable H to be a
working assumption.
ATT-Meta applies its rules in a backchaining style. It is given a reasoning
goal, and uses rules to generate subgoals. Goals can of course also be satisfied
by provided facts. When a rule application supports a hypothesis, it supplies a
level of certainty to it, calculated as the minimum of the rule’s own certainty
level and the levels picked up from the hypotheses satisfying the rule’s condition
part. When several rules support a hypothesis, the maximum of their certainty
contributions is taken.
When both a hypothesis H and its negation –H are supported to level at
least presumed, conflict-resolution takes place. The most interesting case is when
both hypotheses are supported to level presumed. The system attempts to see
whether one hypothesis has more specific evidence than the other, so that it can
downgrade the certainty level of the other hypothesis. Specificity comparison is a
commonly used heuristic for conflict-resolution in AI (e.g., Delgrande & Schaub
1994, Hunter 1994, Loui 1987, Loui et al. 1993, Poole 1991, Yen et al. 1991),
although serious problems remain in coming up with adequate and practical
heuristics. ATT-Meta’s specificity comparison depends on what facts H and –H
rely on, and on derivability relationships between the hypotheses supporting H
and –H. If one hypothesis wins, it stays presumed and the other hypothesis is
downgraded to suggested. If neither wins, both are downgraded to suggested.
The scheme can deal with any amount of iterative defeat: for example, if “flen-
guins” are special penguins that can indeed fly, but ill flenguins once again cannot
fly, then the system will resolve the conflicts correctly for flenguins in general
and for ill flenguins.
The system contains a truth-maintenance-like mechanism for propagating
levels of certainty around. This can be complex because of the frequent appear-
ance of cycles in the rule-application graph. As a result, the system gradually
settles to a consistent set of certainty levels for its hypotheses.

4 ATT-Meta’s Metaphor-Based Reasoning:


Implementation

Section 2 referred to reasoning taking place “within a pretence” that a metaphor-


ical utterance was literally true. To implement this, ATT-Meta constructs a
computational environment called a metaphorical pretence cocoon. The repre-
sentation of the literal meaning of the utterance, namely that a part PJ of John
insisted that Sally was right, is placed as a fact L inside this cocoon. Corre-
sponding to this, outside the cocoon, the system has a hypothesis (a fact) SL
An Implemented System for Metaphor-Based Reasoning 149

that it itself (the system) is pretending that L holds. Also, the system has the
fact, outside the cocoon, that it is pretending that PJ is a person.
As usual, the system has a goal, such as the hypothesis that John believes that
Sally is right (recall the example in the second section of this paper). Assume the
system has a rule that if someone X has reasons to believe P then, presumably,
X believes P. (This is a default rule, so its conclusion can be defeated.) Thus,
one subgoal that arises is that John had reasons to believe that Sally was right.
Now, in the earlier Section we referred to the system’s knowledge about the
MIND PARTS AS PERSONS metaphor. The mentioned knowledge is couched
in the following rule:
IF I (the system) am pretending that part Y of agent X is a person AND I
am pretending that Y believes Q THEN (presumably) X has reasons to believe
Q.
Of course, this is a paraphrase of a imagined, formally expressed rule. We call
this a conversion rule, as it maps between pretence and reality. Because of the
subgoal that John had reasons to believe that Sally was right, the conversion
leads to the setting up of the subgoal that the system is pretending that PJ (the
mentioned part of John) believes that Sally is right, This subgoal is itself outside
the cocoon, but it automatically leads to the the subgoal that PJ believes that
Sally is right, within the cocoon. This subgoal can then be inferred (as a default)
from the hypothesis that PJ stated that Sally was right, which itself can be
inferred (as a default) from the existing within-cocoon fact that PJ insisted that
Sally was right. Notice carefully that these last two steps are entirely within the
cocoon and merely use commonsense knowledge about real-life communication.
As well as the original goal (John believed that Sally was right) the system
also looks at the negation of this, and hence indirectly at the hypothesis that
John has reasons to believe that Sally was not right. This subgoal gets support
in a rather similar way to the above process, but it involves richer reasoning
within the cocoon.

4.1 Uncertainty in ATT-Meta’s Metaphorical Reasoning


ATT-Meta incorporates a handling, at least partial, of all the types of uncertainty
in metaphor-based reasoning that were mentioned in Section 1.
First, the system can be unsure whether a metaphor holds, by having merely
presumed as the level of certainty for a fact like the above to the effect that the
system pretends that part PJ of John is a person. This fact is then potentially
subject to defeat in the ordinary way.
Secondly, notice the “presumably” in the above conversion rule, indicating
that its certainty level is presumed. Thus, the rule is only a default rule. It is
possible for there to be evidence that is strong enough (e.g., specific enough) to
defeat a conclusion made by the rule. Conversely, although there may be evidence
against the conclusion of the rule, it may be weak enough to get defeated by the
evidence for that conclusion. Thus, whether a piece of metaphorical reasoning
overrides or fails to override other lines of reasoning about the tenor is matter of
150 John A. Barnden

the peculiarities of the case at hand. Some authors (e.g., Lakoff 1994) assume that
in cases of conflict tenor information should override metaphor-based inferences,
but it appears that such assumptions are based on inadequate realization of the
fact that tenor information can itself be uncertain.
Finally, the reasoning within the cocoon is itself usually uncertain, since
commonsense knowledge rules are usually uncertain.

5 ATT-Meta’s Reasoning about Agents’ Beliefs and


Reasoning

The ATT-Meta system has facilities for reasoning non-metaphorically about the
beliefs and reasoning acts of agents, including cases where those beliefs and
acts are themselves about the beliefs and reasoning of further agents, and so
forth. Although ATT-Meta can reason about beliefs in an ordinary rule-based
way, its main tool is simulative reasoning (e.g., Creary 1979, Konolige 1986 [but
called “attachment” there], Haas 1986, Ballim & Wilks 1991, Dinsmore 1991,
Hwang & Schubert 1993, Chalupsky 1993 and 1996, Attardi & Simi 1994; see
also related work in philosophy and psychology in Carruthers & Smith 1996,
Davies & Stone 1995). In attempting to show that agent X believes P from the
fact that X believes Q, the system puts P as a goal and Q as a fact in a simulation
cocoon for X, which is a special environment which is meant to reflect X’s own
reasoning processes. Reasoning from Q to P in the cocoon is alleged (by default)
to be reasoning by X. The reasoning within the cocoon can involve ordinary rule-
based reasoning and/or simulation of other agents. In particular, the reasoning
can be uncertain. Also, the result of the simulation of X is itself uncertain: even
if the simulation supports the hypothesis that X believes P, ordinary rule-based
reasoning may support the negation of this hypothesis more strongly.

6 Interesting Nestings

In fact, simulation cocoons operate very similarly to metaphorical pretence co-


coons. Just as simulation cocoons can be nested within each other, to get the ef-
fect of reasoning about X’s reasoning about Y’s reasoning about ..., so metaphor-
ical pretence cocoons can be nested within each other, and either way round with
respect to simulation cocoons. We now look briefly at the uses for these three
further types of nesting. The four types of nesting arise from a general scheme
in ATT-Meta for nesting cocoons within each other to any depth and with any
mixture of types.
Nesting of metaphorical pretence cocoons within each other provides a treat-
ment of chained metaphor. Consider the sentence “The thought hung over him
like an angry cloud” (adapted from a real-text example). The thought is metaphor-
ically cast as a cloud, and the cloud is in turn metaphorically cast as an animate
being (because only animate beings can literally be angry). In ATT-Meta, this
would be handled by having a metaphorical cocoon for the second of those two
An Implemented System for Metaphor-Based Reasoning 151

metaphorical steps nested within a cocoon for the first. That is, within the pre-
tence that the thought is a cloud there is a further pretence that the cloud is a
person.
Embedding of a metaphorical pretence cocoon within a simulation cocoon
handles a major aspect of point (e) in Section 1, namely reasoning about agents’
metaphorical reasoning. This would be needed for dealing with one interpretation
of the sentence “Mary believed that the thought hung over John like a cloud,”
viz the interpretation under which the metaphorical view of the thought as a
cloud is part of Mary’s own belief state. (But another interpretation is that the
metaphor is used only by the speaker, and not by Mary.)
Conversely, embedding of a simulation cocoon within a metaphorical pretence
cocoon handles a major aspect of point (f) in Section 1, namely reasoning about
metaphorical agents’ reasoning, as required for sentences like “My car doesn’t
want to wake up because it thinks it’s Sunday.” From the fact that the car
thinks it’s Sunday, we might want to infer that the car thinks people needn’t
wake up until some relatively late time. (That thought would then be a reason
for not wanting to wake up.) The car’s alleged reasoning would occur within a
simulation cocoon for the car, embedded within a metaphorical pretence cocoon
for the pretence that the car is a person.

7 Conclusion

ATT-Meta is a preliminary, implemented demonstration that open-ended de-


scriptions of agents’ mental states (using familiar metaphors) can be handled
computationally. Such descriptions are widespread in mundane discourse but
have not been studied to any great extent in artificial intelligence and computa-
tional linguistics.
The research also shows that personification of non-agents and reasoning
about agents’ metaphorical thoughts can be handled by embedding metaphorical
and belief spaces inside each other, in a way closely related to the embedding
already needed to handle nested belief situations.
More broadly, the research supports the claim that the way to handle novel
manifestations of familiar metaphors is to abandon any insistence on translat-
ing the literal meaning of the utterance into tenor terms; instead, the aim of
processing is to extract useful inferences by whatever route possible, using an
arbitrary mix of within-vehicle and within-tenor reasoning.
ATT-Meta is a preliminary but implemented demonstration of how various
types of uncertainty in metaphor-based reasoning can be handled computation-
ally, and handled in a way that is fully integrated into a general framework for
uncertain reasoning.
152 John A. Barnden

References

Attardi, G. & Simi, M. (1994). Proofs in context. In J. Doyle, E. Sandewall & P. Torasso
(Eds), Principles of Knowledge Representation and Reasoning: Proceedings of the
Fourth International Conference, pp. 15–26. (Bonn, Germany, 24–27 May 1994.)
San Mateo, CA: Morgan Kaufmann.
Ballim, A. & Wilks, Y. (1991). Artificial believers: The ascription of belief. Hillsdale,
N.J.: Lawrence Erlbaum.
Barnden, J.A. (1997a).Deceived by metaphor. Behavioral and Brain Sciences, 20 (1),
pp. 105–106. Invited Commentary on A.R. Mele’s “Real Self-Deception.”
Barnden, J.A. (1997b). Consciousness and common-sense metaphors of mind. In S.
O’Nuallain, P. McKevitt & E. Mac Aogain (Eds), Two Sciences of Mind: Readings
in Cognitive Science and Consciousness, pp. 311–340. Amsterdam/Philadelphia:
John Benjamins.
Barnden, J.A. (1998). Uncertain reasoning about agents’ beliefs and reasoning. Tech-
nical Report CSRP-98-11, School of Computer Science, The University of Birming-
ham, U.K. Invited submission to a special issue of Artificial Intelligence and Law ,
ed. E. Nissan.
Barnden, J.A. (in press). An AI system for metaphorical reasoning about mental states
in discourse. In Koenig, J-P. (Ed.), Conceptual Structure, Discourse, and Language
II. Stanford, CA: CSLI/Cambridge University Press.
Barnden, J.A., Helmreich, S., Iverson, E. & Stein, G.C. (1994a). An integrated imple-
mentation of simulative, uncertain and metaphorical reasoning about mental states.
In J. Doyle, E. Sandewall & P. Torasso (Eds), Principles of Knowledge Representa-
tion and Reasoning: Proceedings of the Fourth International Conference, pp. 27–38.
(Bonn, Germany, 24–27 May 1994.) San Mateo, CA: Morgan Kaufmann.
Barnden, J.A., Helmreich, S., Iverson, E. & Stein, G.C. (1994b). Combining simulative
and metaphor-based reasoning about beliefs. In Procs. 16th Annual Conference of
the Cognitive Science Society (Atlanta, Georgia, August 1994), pp. 21–26. Hillsdale,
N.J.: Lawrence Erlbaum.
Barnden, J.A., Helmreich, S., Iverson, E. & Stein, G.C. (1996). Artificial intelligence
and metaphors of mind: within-vehicle reasoning and its benefits. Metaphor and
Symbolic Activity, 11(2), pp. 101–123.
Carruthers, P. & Smith, P.K. (Eds). (1996). Theories of Theories of Mind. Cambridge,
UK: Cambridge University Press.
Chalupsky, H. (1993). Using hypothetical reasoning as a method for belief ascription.
J. Experimental and Theoretical Artificial Intelligence, 5 (2&3), pp. 119–133.
Chalupsky, H. (1996). Belief ascription by way of simulative reasoning. Ph.D. Disser-
tation, Department of Computer Science, State University of New York at Buffalo.
Creary, L. G. (1979). Propositional attitudes: Fregean representation and simulative
reasoning. Procs. 6th. Int. Joint Conf. on Artificial Intelligence (Tokyo), pp. 176–
181. Los Altos, CA: Morgan Kaufmann.
Davidson, D. (1979). What metaphors mean. In S. Sacks (Ed.), On Metaphor, pp. 29–
45. U. Chicago Press.
Davies, M & Stone, T. (Eds) (1995). Mental Simulation: Evaluations and Applications.
Oxford, U.K.: Blackwell.
Delgrande, J.P. & Schaub, T.H. (1994). A general approach to specificity in default
reasoning. In J. Doyle, E. Sandewall & P. Torasso (Eds), Principles of Knowledge
Representation and Reasoning: Proceedings of the Fourth International Conference,
An Implemented System for Metaphor-Based Reasoning 153

pp. 146–157. (Bonn, Germany, 24–27 May 1994.) San Mateo, CA: Morgan Kauf-
mann.
Dinsmore, J. (1991). Partitioned Representations: A Study in mental Representation,
Language Processing and Linguistic Structure. Dordrecht: Kluwer Academic Pub-
lishers.
Haas, A.R. (1986). A syntactic theory of belief and action. Artificial Intelligence , 28,
¯
245–292.
Hobbs, J.R. (1990). Literature and Cognition. CSLI Lecture Notes, No. 21, Center for
the Study of Language and Information, Stanford University.
Hunter, A. (1994). Defeasible reasoning with structured information. In J. Doyle, E.
Sandewall & P. Torasso (Eds), Principles of Knowledge Representation and Rea-
soning: Proceedings of the Fourth International Conference, pp. 281–292. (Bonn,
Germany, 24–27 May 1994.) San Mateo, CA: Morgan Kaufmann.
Hwang, C.H. & Schubert, L.K. (1993). Episodic logic: a comprehensive, natural repre-
sentation for language understanding. Minds & Machines, 3 (4), pp. 381–419.
Konolige, K. (1986). A deduction model of belief. London: Pitman. Los Altos: Morgan
Kaufmann.
Lakoff, G. (1993). The contemporary theory of metaphor. In A. Ortony (Ed.), Metaphor
and Thought, 2nd edition, pp. 202–251. New York and Cambridge, U.K.: Cambridge
University Press.
Lakoff, G. (1994). What is metaphor? In J.A. Barnden & K.J. Holyoak (Eds.), Advances
in Connectionist and Neural Computation Theory, Vol. 3: Analogy, Metaphor and
Reminding. Norwood, N.J.: Ablex Publishing Corp.
Lakoff, G. & Turner, M. (1989). More than Cool Reason: A Field Guide to Poetic
Metaphor. Chicago: University of Chicago Press.
Loui, R.P. (1987). Defeat among arguments: a system of defeasible inference. Compu-
tational Intelligence, 3, pp. 100–106.
Loui, R.P., Norman, J., Olson, J. & Merrill, A. (1993). A design for reasoning with
policies, precedents, and rationales. In Fourth International Conference on Artifi-
cial Intelligence and Law: Proceedings of the Conference, pp. 202–211. New York:
Association for Computing Machinery.
Mele, A.R. (1997). Real self-deception. Behavioral and Brain Sciences, 20 (1).
Poole, D. (1991). The effect of knowledge on belief: conditioning, specificity and the
lottery paradox in default reasoning. Artificial Intelligence, 49 , pp. 281–307.
Reddy, M.J. (1979). The conduit metaphor—a case of frame conflict in our language
about language. In A. Ortony (Ed.), Metaphor and Thought, Cambridge, UK: Cam-
bridge University Press.
Yen, J., Neches, R. & MacGregor, R. (1991). CLASP: Integrating term subsumption
systems and production systems. IEEE Trans. on Knowledge and Data Engineer-
ing, 3 (1), pp. 25–32.
GAIA: An Experimental Pedagogical Agent for
Exploring Multimodal Interaction

Tom Fenton-Kerr

New Technologies in Teaching and Learning Group (NeTTL)


The University of Sydney, NSW 2006, AUSTRALIA,
tfk@nettl.usyd.edu.au
http://nettl.usyd.edu.au

Abstract. This paper discusses GAIA (Graphic-Audio Interface Agent),


an experimental interface agent used in a pedagogical simulation pro-
gram, REM (the Re-mapping Europa Mission), where the learning task is
the discrimination of specific locations on a series of unlabelled maps. The
agent’s task is to enhance the learning experience by providing timely,
contextual clues mediated through a graphic/audio interface. Factors
that influence such an agent’s ability to provide effective help, such as
modes of agent representation, are discussed in the context of differing
uses requiring alternative mode choices. The experimental context is ex-
plored with an in-depth look at the REM program. The paper concludes
with comments on audio interfaces, suggestions for multimodal agent
design and likely future directions for multimodal agent interfaces.

Introduction
This paper is a preliminary case study of a multimodal interface agent (GAIA:
a graphic-audio interface agent) that makes use of text-to-speech (TTS) com-
munication to assist a user with a task requiring visual point discrimination in
a geographic map with minimal graphic features. The context for this interac-
tion is a work-in-progress prototype development called the Re-mapping Europa
Mission (REM), designed to provide a setting for exploring interface agent ac-
tivity. It uses a task-driven game metaphor to teach users the locations of key
cities on a series of unlabelled maps. REM’s development was partly influenced
by the Mercator project, a study by Gerber et al. (1992) that investigated ways
of developing expertise in map reading. Oviatt (1996) found that users show a
marked preference for multimodal input (i.e. speech, keyboard and gesture) when
interacting with on-screen maps. Although input issues are discussed later in this
paper, its main intent is to deal with pedagogical, cognitive and perceptual issues
concerning interface agent output.
A pedagogical software agent is an autonomous software process, which
occupies the space between human learners and a task to be learned. The
agent’s task is likely to involve offering some kind of proactive, intelligent as-
sistance (Rich, 1996) to aid task completion. Agent software programs are cur-
rently used in a diverse range of pedagogical settings. They occupy roles such

C. Nehaniv (Ed.): Computation for Metaphors, Analogy, and Agents, LNCS 1562, pp. 154–164, 1999.

c Springer-Verlag Berlin Heidelberg 1999
GAIA: An Experimental Pedagogical Agent 155

as sophisticated tutor assistants (Frasson et al., 1997; Johnson and Shaw, 1997;
Conati et al., 1997; Schank and Cleary, 1994) offering knowledge-based advice, or
as interface agents acting in a knowledge-free capacity, guiding the user towards
a pre-specified learning goal.
Learning programs that employ pedagogical agents differ from the more ubiq-
uitous information-rich/interactively poor programs in that they can offer re-
active and sometimes corrective responses to user input. ‘Dustin’1 , a language-
learning simulator developed at Northwestern University’s Institute for the
Learning Sciences, gives users access to an online Tutor that can log a user’s
input and provide suitable responses to keep the student on track. Schank and
Cleary (1994) also propose the use of Searching Agents that can enhance a user’s
understanding of a given topic by locating related information consisting of fur-
ther examples or explaining the general principles involved.
An agent’s effectiveness in providing useful help to a learner is likely to be
determined by factors such as the learning context itself, the chosen mode of com-
munication, and the appropriateness of the interactions that occur. Where the
learning context is a visual sequencing task, for example, an appropriate agent
mode might be an animated graphic that can use gesture and text to guide a
user through a specified sequence of actions. Other contexts make the use of
alternate modes such as audio or multimodal interfaces more appropriate. A pi-
ano tutor program, for example, may need to display a score and simultaneously
play a sequence of notes. In choosing the most suitable means of representing
an agent in a pedagogical setting, an instructional designer needs to look closely
at the learning objectives to be met, the scope of help being offered, and the
best means of communicating that help. Factors such as the teaching paradigm
employed (e.g. problem-based learning) and a learner’s prerequisite knowledge
may have a major influence on the final design choice.
The next section is a general discussion about agent representation in a ped-
agogical context. It is followed by an examination of the REM program in depth.
Design and implementation aspects of the interface agent’s role in assisting users
to locate key cities are then discussed. Finally, future multimodal interface de-
velopment in learning contexts such as second language acquisition programs
and any learning programs where interaction is not limited to graphic or text
modes is considered.

Representation of Pedagogical Interface Agents

Pedagogical interface agents provide a link between a learner and a computer-


based learning task. In fulfilling this task they need to be represented to a user
in some way. They may also need to assume a communication mode appropriate
to the task at hand in order to provide the best possible level of help.
Interface agents are often represented in animated graphic form (Rickel and
Johnson, 1997). They can make use of a range of simultaneous communication
1
discussed in Schank, R. and Cleary, C.: ‘Engines for Education’ (1994)
156 Tom Fenton-Kerr

modes such as text, TTS or recorded speech, and gestures (Kono et al., 1998).
Agents may also be equipped to deal with user input such as speech, variable
text, simple mouse clicks or their combinations (Rudnicky, 1993; Oviatt, 1996).
These highly complex interactions can give us a sense that they are implicitly
being orchestrated by an organized intelligence of some kind. Such perceptions
in turn demand a believable agent representation if we are to accept and act
on the advice being offered in a learning setting. Bates (1994) asserts that such
believability requires the incorporation of human-like traits such as emotion and
desire. Isbister (1995) believes that a user’s perception of agent intelligence is a
factor in creating believable interface characters.
In a sense, graphic anthropomorphic agents seek to model this perceived or-
ganized intelligence by manifesting it in a human-like physical form. On one level,
this acts to make such agents acceptably believable to a user and is therefore
likely to enhance the interaction in a positive way. Unfortunately, users can also
imbue anthropomorphic agents with abilities and intelligence way beyond their
true capability. Hofstadter (1995) calls this the ‘Eliza effect’2 , defining it as ‘the
susceptibility of people to read far more understanding than is warranted into
strings of symbols - especially words - strung together by computers’. Although
Hofstadter is emphasizing the text mode here, the ‘Eliza effect’ can be seen in
almost all modes of human/computer interaction. King (1995) comments that
users perceive anthropomorphic representations of agents as having ‘intrinsic
qualities and abilities which the software controlling the agent cannot possibly
achieve.’ In a pedagogical setting this effect can have a negative impact on the
learning experience. Susceptible users are likely to have unrealistic expectations
of an agent’s potential to help them in a useful way. When the expected reve-
lations are not forthcoming, a user may ignore or trivialize any future help or
suggestions given out, even if it is obviously or logically in their interests to act
on such advice.
Graphic agents inevitably call attention to themselves when represented on a
screen. Where their task is to take the role of a magister, instructing a user about
physical aspects of a particular graphic through gesture, for example, a graphic
mode may be the most suitable. If the task is to interrupt some action that may
cause damage, such as inadvertently trashing files, then an agent expressing an
alert in graphic form is probably the best way of getting a user’s attention. One
situation where a graphic representation mode might not be the best choice is
where a user needs to give his or her attention to a task requiring visual discrim-
ination while receiving instructions or assistance from an agent. In a converse
sense, vision-impaired users might rely on an audio agent as a primary source of
both information and interactive feedback. Currently, driver navigation systems
frequently make use of audio agents to provide directions and warnings, freeing
the driver from the need to visually reference a map. Nagao and Rekimoto (1996)
have integrated an audio interface into their location-aware WalkNavi naviga-
tion/guidance system that can integrate linguistic and non-linguistic contexts in
real world situations. Their self-defined ‘augmented reality’ recognizes natural
2
after J. Weizenbaum’s ELIZA program written in the 1960s.
GAIA: An Experimental Pedagogical Agent 157

language input and responds using various modes such as graphic maps, text
and TTS instructions or explanations.
The key element in all modes of agent communication seems to be consis-
tency of representation. Users need to know that any advice is coming from the
same reliable source. If graphic agents make frequent changes to their on-screen
physical form, a user can soon get confused about just ‘who’ is offering them
help. Conversely, audio interface agents that make use of a fairly consistent and
characteristic voice can provide certainty to the user about the source and re-
liability of their communication. Audio agents can, of course, vary parameters
such as volume, tempo, prosody and spatial displacement to modify or emphasize
speech. Such modifications are appropriate where there is an obvious need for
the expression of emotion (Picard, 1995) or intention, or simply to create believ-
able rapport-building conversation. (See ‘Future Developments’ for a discussion
of enhancements to audio-based interfaces.)
Where an interface agent makes use of multiple modes of communication,
believability seems to be retained where at least one mode maintains a consis-
tent form. GAIA, the agent from the REM program described in the following
section makes use of a multimodal (graphic/audio) approach, but uses a consis-
tent characteristic voice for communicating useful feedback, whether the agent’s
graphic form is visible or not.

REM - The Re-mapping Europa Mission

REM acts as a test-bed for implementing an interface agent and exploring its
interaction with a user. A game metaphor was chosen to provide a setting that
would (hopefully) be very engaging, but general enough to be easily mapped
onto other learning situations.

A Rationale for REM’s Design and Implementation

REM’s genesis is a synthesis of two concepts: The first is the type of interface
interaction that occurs in a computer flight simulation where the task is to land
a fighter on the deck of an aircraft carrier3. Apart from flying the plane, a pilot
can seek the help of an audio interface agent (a ‘Landing Systems Officer’- LSO)
while attempting a landing. As the simulation progresses in real time, the LSO
gives audio instructions on whether the pilot is too low or high, too fast or slow,
and offers reminders about dropping the undercarriage and hook. As the pilot
is probably already suffering from cognitive overload just flying the plane, such
advice needs to be given in a mode that can be taken in and acted upon without
adding to the visual ‘clutter’ in any way. An audio interface seems to be the best
solution for instantaneous instructional delivery in this case.
The second concept relates to a toolkit for exploring agent designs, imple-
mented by Sloman and Poli (1995). The SIM-AGENT toolkit is ‘intended to
3
an example is Graphic Simulation’s F/A-18 Hornet
158 Tom Fenton-Kerr

support exploration of design options for one or more agents interacting in dis-
crete time’4 . Sloman and Poli used the toolkit to conduct a number of simulation
experiments, some of which simulate cooperative behaviour between two agents
- the ‘Blind/Lazy Scenario’. In this scheme ‘there is a flat 2-D world inhab-
ited by two agents, a blind agent and a lazy one. The blind agent can move in
the world, can send messages and can receive messages sent by the lazy agent,
but cannot perceive where the other agent is. The lazy agent can perceive the
(roughly quantized) relative position of the blind agent and can send messages,
but cannot move’5 . The stated objective of the experiment is to see whether
rules can be evolved allowing for cooperative behaviour and resulting in the two
‘robots’ getting together. In a very general sense, the task maps loosely onto the
aircraft landing task described above, and the map-point approximation task
that drives REM.
REM’s design is an attempt to use elements of these concepts in a pedagogical
setting. Its interface design is predicated on the idea that the primary means of
instruction or help be available in a single (audio) modality. The user takes
the part of the ‘blind’ agent described above, receiving audio instructions (or
stereophonic audio tones - described below) from the ‘lazy’ agent, played by
GAIA. It should be noted that GAIA represents a simulation of an artificially
intelligent (AI) interface agent. REM’s intent is not to develop new approaches
in AI architectures, but rather to provide a setting where agents using simulated
AI techniques can be implemented to explore instructional delivery issues in
completion of a learning task.

REM’s Architecture

REM has existed in three different forms since its inception. An early prototype
built in HyperCard on the Mac OS using Plaintalk 1.5 TTS was ported to an
NT 4.0 system and re-programmed for cross-platform use in Macromedia Direc-
tor’s Lingo language. The current web-based version uses elements of Microsoft’s
Agent software, executed in JavaScript and VBScript, to drive the agent’s inter-
action, including text and TTS output. Geographic information currently sup-
plied by a simulated ‘database agent’ embedded within the HTML page script,
will come from a true relational database, accessed by GAIA as required, in the
next version of REM.
User input in REM consists of basic navigation, filling in forms (for per-
sonalized feedback from GAIA), and mouse clicks on the map window. A fairly
straightforward algorithm captures mouse location information, determines
country and proximity-to-target, then builds GAIA’s spoken response as a con-
catenated TTS string. Graphic events, such as moving the cartographic analysis
tool to the click location, and map upgrades are handled in a similar fashion.
A future version will require coordination of multimedia output such as video,
enhanced audio and prosodic TTS.
4
ibid p. 392
5
ibid p. 401
GAIA: An Experimental Pedagogical Agent 159

Scenario: REM’s game scenario involves a post-conflict Europe of the future


(2062 AD). Details of the conflict itself are left deliberately vague in the game’s
introduction. REM takes the role of a peacekeeping authority with the task of re-
constructing areas devastated in the conflict. An initial part of the reconstruction
process is to locate and identify urban areas, provincial boundaries and original
country borders with appropriate labels. The hostilities have had the unfortunate
result of destroying large amounts of mapping data, but the REM authority has
managed to recover some topographic maps with no text labels of any kind,
and with black dots delineating urban concentrations. A user takes the role of a
‘cartographic analyst’ whose task is to locate and label cities chosen from a list
by clicking on the map.

Fig. 1. REM’s map page after successful location of Paris with Cartographic
Analysis Tool (CAT) visible, and Geo-agent GAIA in a separate window.

Getting Help: Users can access help from two sub-modules:

1. A ‘cartographic analysis tool’ (CAT) acts as a multi-purpose graphic aid


with the following functions:
(a) to provide a graticule for making direct geometric readings. Click co-
ordinates are indicated in text form.
(b) to provide additional information about the current region under inspec-
tion such as demographic data and contextual graphics or video.
(c) to signal successful location of the target city with text prompts and
graphics.
2. GAIA, the game’s animated ‘geo-agent’ provides help in the form of TTS
feedback on the proximity of each mouse click to the target city and relevant
country names. GAIA is represented as an animated graphic in a separate
window from the main program, which can be shown or hidden by the user.
160 Tom Fenton-Kerr

Description of Agent Interaction

GAIA’s interaction with a user varies according to the current task. TTS com-
munication was determined to be a promising mode for communicating reactive
feedback to user response, which mainly consists of mouse clicks on a map. Once
a target city has been chosen, GAIA’s task is to provide immediate, personalized
feedback. By consulting a ‘database agent’, GAIA can advise a user of the cor-
rect country name for a target city, then provide appropriate advice on whether
a user has clicked inside the country borders, in addition to guiding a user to the
correct city location. Task success elicits congratulatory remarks and a prompt
to locate further cities on the map. Map details including borders and minor
towns in the surrounding terrain are then added. When all listed cities within a
country have been located, a border outline flashes briefly to signify completion.
Secondary confirmation is provided textually by the CAT and in spoken form
by GAIA.
As users can click anywhere on the map, GAIA needs to be able to contextu-
alize responses accordingly. Where the target city is Paris, for example, a click
to the east of Spain would probably elicit the following: ‘That’s in the Mediter-
ranean and it’s too far right, too low’. GAIA’s response to a click near Calais
might be ‘Yes, that’s France, but that’s a little too high, a little too far left’.
GAIA distinguishes between an absolute ‘too far left/right’ and fuzzy descriptors
such as ‘a little too high/low’, depending on user input. Clicking within a target
city’s locus circle elicits randomized responses such as ‘You’re really warm now!’

Alpha Testing of REM

Although the REM program (with a rather more complex learning objective -
currently in development) has yet to be formally evaluated for pedagogical effec-
tiveness, two alpha tests of the system were conducted at the end of 1997. The
first, with the purpose of evaluating the viability of REM running on different
platforms, was carried out by volunteer NeTTL staff. Versions of the program
were run successfully on both NT 4.O and Mac OS systems, making use of differ-
ent shell programs and TTS engines. The results showed consistency in program
execution and graphic displays but variability in the quality of the spoken out-
put produced on each system. As GAIA represents a female assistant, her ‘audio
presence’ relies on the availability of realistic female TTS voices. At the time of
writing, the MacinTalk Pro high-end female voices (Victoria and Agnes) seem to
provide a better representation for our purposes than the female voices (Lernout
and Hauspie’s TTS) used in the NT 4.0 OS version.
The second alpha test was a ‘dry run’ of two experiments designed to evaluate
the effectiveness of different modes of audio feedback to a user, using volunteer
testers as subjects. In the first experiment, feedback was provided through head-
phones in the form of a variable audio tone coupled with left/right stereophonic
input. Testers were asked to locate target cities by moving a mouse over an un-
labelled map in response to a rising or falling tone that could also pan from one
GAIA: An Experimental Pedagogical Agent 161

ear to the other. Successful trials were indicated by location of the point that
produced the highest tone that was simultaneously perceived as ‘most central’,
(i.e. ‘localization in the vertical median plane’ - Hendrix, 1994:12) coinciding
with the target. No form of spoken or textual feedback was available until a
target city had been successfully located. The testers were not temporally con-
strained in any way, (which will be the case in the formal evaluation) but asked
to find the target ‘as quickly as possible’. Results of this preliminary experiment
support the idea that where the task is a straightforward procedural one, the
‘tonal feedback’ approach is a very efficient way of quickly locating a fixed point
on an unlabelled map.
The second preliminary experiment made use of GAIA as the only means of
(TTS) feedback, (apart from testers who were able to fortuitously click on a tar-
get city without needing the benefit of any guidance). Once again, testers were
asked to find the target as quickly as possible, relying on GAIA to provide infor-
mation about the clicked location and accurate hints about where to click next.
The graphic representation of the agent could be shown or hidden, according
to user preference. Results from this preliminary experiment indicate that users
can easily accomplish the same target-location task as that described above by
following spoken instructions, albeit at a noticeably slower rate compared to the
‘tonal feedback’ approach. This is hardly surprising, given the relative simplicity
of the task. In this alpha phase we were more interested in tester attitudes to each
experiment than in making quantitative comparisons of the time taken to locate
a given target. A frequent comment made by testers was that although GAIA’s
feedback was slower that the tonal approach, the agent provided additional, use-
ful geographic information that the first experiment was unable to supply. The
meta-level pedagogical aim here is learning about key European cities so GAIA’s
inclusion of incidental geographic feedback should help a learner to assimilate
new knowledge in an appropriate, contextually relevant form.
Volunteer tester feedback provided some useful insights into how formal eval-
uation of the system might be carried out. The alpha tests were not designed
to provide evaluative data as such, but they have been able to suggest some ef-
fective ways of evaluating the effectiveness of different modes of interface agents
used in pedagogical settings. A comparative study that contrasts different modes
of agent feedback is planned for the formal evaluation phase.

Factors in Human-Agent Interfacing

Subject interactions with the program during the alpha test phase also allowed
us to make some tentative general suppositions regarding the human/agent in-
terface:

1. Where the primary mode of agent communication is through TTS and where
deictic or gestural information is not exploited graphically, animated graphic
representation of an interface agent is less important and may well be dis-
tracting and/or superfluous in tasks such as the current one.
162 Tom Fenton-Kerr

2. Using a log-in that captures a user name can help to establish a basic rap-
port between the agent and a user. Additional benefits may include accurate
tracking of a user’s input, and providing a user with feedback on past per-
formance.
3. Agents need to be flexible enough in their communication to offer contextual
advice according to the relative accuracy of user input.
4. Randomizing an audio agent’s spoken responses can help to keep conversa-
tion novel and engaging for a user.
Agent characterization and personalized responses seem to be important fac-
tors in making a task enjoyable and easy to learn. In TTS mode, REM’s design
requires a user to listen to GAIA’s instructions in order to infer the next step,
which places the graphic representation of the interface agent in a secondary
role compared to the audio presentation. Future plans for the beta-testing phase
include the addition of contextual ambient sounds and music, and a range of
cultural graphics, videos and demographic information.

Future Developments in Multimodal Pedagogical Agents


The exploitation of audio interface designs in learning programs is certain to form
a large part of any future developments in computer-based learning. Until rela-
tively recently, speech recognition (SR) and TTS technologies were expensive and
notoriously difficult to implement. With access to SR engines that offer continu-
ous speech recognition and both speaker-independent and trainable recognition
modes, the scope for the design of engaging multimodal learning programs has
increased enormously. Flexible TTS engines mean that audio agents will exploit
prosody, volume and speed of delivery to make their communication with users
more effective. Campbell’s (1996) CHATR Speech Re-Sequencing System in-
dexes phones and their prosodic characteristics to give highly authentic concate-
nated speech synthesis independent of both language and individual speakers.
Prevost (1995) proposes a monologue generation system that models prosodic
output where intonational contours are based on discourse context, building on a
theory of discourse structure proposed by Grosz and Sidner (1986). By modeling
contrastive stress, audio agents could produce realistic intonational phrasing in
their output, enhancing their communicative abilities. Picard (1995:18) believes
such ‘affective’ (emotion-driven) enhancements will allow computers to commu-
nicate in a more natural and social way with humans. Hendrix’s (1994) study
looked at ways of determining and enhancing audio ‘presence’ in virtual envi-
ronments. Enhancements that exploit spatialized sound could amplify an agent’s
utility where the learning context is a virtual space. Creager (1994) discusses the
use of speech interfaces to create a ‘mental dialogue’ between a student and ed-
ucational material, adding pace and narrative to the presentation.
Where a range of media forms part of a learning process, multimodal in-
terface agents will be able to choose the most appropriate mode or modes for
interaction and dynamically construct content-rich dialogs for communication.
Language learning is one area where multimodal agents seem to have a bright
GAIA: An Experimental Pedagogical Agent 163

future. Traditionally, tape-based learning meant that lessons had to be recorded


and accessed in a sequential form. This offered learners little control over the
delivery of information and forced them into a passive learning mode. Dynami-
cally constructed audio mediated by an interface agent and delivered by digital
means opens up the field to many new teaching and learning approaches in
second language acquisition.
Future versions of REM and its offshoot developments will may use of several
of the audio enhancements discussed above. A development currently in progress
is the creation of a concatenated-speech synthesis engine based on a locally-
recorded corpus. The aim is to provide an audio agent with both pre-recorded
speech phrases and matched TTS output in a language learning context.

Conclusions

The prototype development REM continues in its role of discovering effective


ways of integrating multimodal agents into a learning context. We plan to extend
the application to other learning contexts requiring visual or aural discrimination
and expect to discover more principles and practical approaches along the way.
We anticipate that the governing factor in choosing appropriate communication
modes of agent interaction will largely depend on the context of each learning
task.

References
1. Bates, J., The Role of Emotion in Believable Agents. Communications of the ACM,
Special Issue on Agents (1994).
2. Campbell, N., CHATR: A High-Definition Speech Re-Sequencing System. Proceed-
ings of the 3rd ASA/ASJ Joint Meeting, Hawaii, Dec. 23-28 (1996).
3. Conati, C., Gertner, A., VanLehn, K and Druzdzel, M. J., On-Line Student Mod-
eling for Coached Problem Solving Using Bayesian Networks. Proceedings of the
Sixth International Conference on User Modeling (UM-97), Sardinia, Italy (1997).
4. Creager, W., Simulated Conversations: Speech as an Educational Tool. In The
Future of Speech and Audio in the Interface: A CHI’94 Workshop, Arons, B. and
Mynatt, E., co-convenors. SIGCHI Bulletin, Vol. 26, No. 4, October (1994) 44–48.
5. Frasson, C., Mengelle, T. and Aimeur, E., Using Pedagogical Agents in a Multi-
strategic Intelligent Tutoring System, Proceedings of the Workshop on Pedagogi-
cal Agents, World Conference on Artificial Intelligence in Education (AI-ED’97),
Kobe, Japan (1997).
6. Gerber, R., Lidstone, J. and Nason, R., Modelling Expertise in Map Reading:
Beginnings. International Research in Geographical and Environmental Education,
Volume 1, No. 1 (1992) 31–43.
7. Grosz, B. and Sidner, C., Attention, Intentions and the Structure of Discourse.
Computational Linguistics, Vol. 12, No. 3 (1986) 175–204.
8. Hendrix, C., Exploratory Studies on the Sense of Presence in Virtual Environ-
ments as a Function of Visual and Auditory Display Paramenters. M.S.E. Thesis
submitted to the University of Washington (1994).
164 Tom Fenton-Kerr

9. Hofstadter, D., Fluid Concepts and Creative Analogies. The Penguin Press, London
(1992) 157.
10. Isbister, K., Perceived Intelligence and the Design of Computer Characters. M.A
Thesis, Lifelike Computer Characters Conference, Snowbird, Utah, Sept. (1995).
11. Johnson, W. L. and Shaw E., Using Agents to Overcome Deficiencies in Web-Based
CourseWare. Proceedings of the Workshop on Intelligent Educational Systems on
the World Wide Web, 8th World Conference of the AIED Society, Kobe, Japan,
August (1997).
12. King, W., Anthropomorphic Agents: Friend, Foe, or Folly. Technical Memorandum
M-95-1, University of Washington (1995).
13. Kono, Y., Yano, T., Ikeda, T., Chino, T., Suzuki K. and Kanazawa, H., An Inter-
face Agent System Employing an ATMS-based Multimodal Input Interpretation
Method. To appear in the Journal of the Japanese Society for Artificial Intelli-
gence, Vol. 13, No. 2 (in Japanese) (1998).
14. Nagao, K. and Rekimoto, J., Agent Augmented Reality: A Software Agent Meets
the Real World. Proceedings of the Second International Conference of Multiagent
Systems (ICMAS-96) (1996).
15. Oviatt, S., Multimodal Interfaces for Dynamic Interactive Maps. Proceedings of
the Conference on Human Factors in Computing Systems (CHI’96), ACM Press,
New York. (1996) 95–102.
16. Picard, R.W., Affective Computing. MIT Press, Cambridge, Mass. 1997.
17. Prevost, S., Contextual Aspects of Prosody in Monologue Generation. Work-
shop Proceedings, Context in Natural Language Processing (IJCAI-95), Montreal
(1995).
18. Rich, C., Window Sharing with Collaborative Interface Agents. SIGCHI Bulletin,
Vol. 28, No. 1, January (1996).
19. Rickel, J. and Johnson W. L., Intelligent Tutoring in Virtual Reality: A Preliminary
Report. Proceedings of the Eighth World Conference on AI in Education, Kobe,
Japan, August (1997).
20. Rudnicky, A.I., Mode Preference in a Simple Data-retrieval Task. Proceedings of
the ARPA Workshop on Human Language Technology, San Mateo (1993) 364–369.
21. Schank, R., and Cleary, C., Engines for Education. Lawrence Erlbaum Associates
(1996).
22. Sloman, A. and Poli, R., SIM-AGENT: A Toolkit for Exploring Agent Designs.
In Wooldridge, M., Muller, J., and Tambe, M., editors, Intelligent Agents II: Pro-
ceedings of the IJCAI ’95 Workshop ATAL, August, 1995. Springer-Verlag, Berlin
(1996) 392–407.
When Agents Meet Cross-Cultural Metaphor: Can
They Be Equipped to Parse and Generate It?

Patricia O’Neill-Brown
th
U.S. Department of Commerce, Manager, Japan Technology Program, 14 & Constitution
Ave. NW, Washington, DC
PONeillBrown@doc.gov

Abstract. There is a growing awareness in the natural language processing


community that metaphor is ubiquitous in language and thought, that it is not
“rare” or “special,” and as such, ought best be accounted for within a general
theory of meaning. The computing environment does not escape metaphor’s
ubiquitous hold: it is, at all levels, metaphorical. Systems are being designed by
a diverse array of individuals who come to the programming task with differing
views of metaphorical meaning attribution. Thrown into the mix are users, who
also exhibit such diversity. This paper presents the findings of a study that
demonstrates that second language (L2) learners have difficulty understanding
and producing L2 metaphor and argues that as agents step into and attempt to
operate in diverse environments, they will encounter stumbling blocks in
effectively interacting with the environment, other agents and humans if not
equipped with adaptive communicative features.

1 The Ubiquity of Metaphor in Language and Thought

Metaphor is central to language and thought. Therefore, any system that attempts to
handle communicative acts must account for metaphor. Going back to The Philosophy
of Rhetoric (1936), Richards asserts that “human cognition is basically metaphoric in
nature rather than primarily literal, that the metaphors of our language actually derive
from an interaction of thoughts” and that “metaphor is not a cosmetic rhetorical
device or a stylistic ornament, but is an omnipresent principle of thought” (Johnson
1981:18-19). Similarly, Black held the view that metaphorical statements are not
replaceable by literal statements of comparison (Black 1962:31-37). It was not until
Reddy (1979) and then Lakoff and Johnson’s landmark study, Metaphors We Live By
(1980), that these views could be supported by data. These works demonstrate,
through copious examples, that metaphor shapes and influences our everyday
experience.
Once it was accepted that metaphor is ubiquitous in language and thought,
metaphor could be cast within a general theory of meaning. Indeed, the computational
models which have most effectively dealt with metaphor are those that have treated
metaphor in this manner. Due to the ubiquity of metaphor in language, therefore, it is
not a matter of “if agents encounter metaphor in a system,” or of agents perhaps
desiring to take advantage of metaphorical means, but rather a necessity that agents be
C. Nehaniv (Ed.): Computation for Metaphors, Analogy, and Agents, LNCS 1562, pp.165 -175, 1999.
 Springer-Verlag Berlin Heidelberg 1999
166 Patricia O’Neill-Brown

able to parse and generate metaphor. Therefore, the questions become, “when, how,
and what forms will the metaphors take and how will the agent respond to as well as
produce them?” One type of metaphor that agents will have to handle is cross-cultural
metaphor.

2 Metaphor as Language and Culture Specific


In Metaphors We Live By, Lakoff and Johnson analyze metaphors which are a part
of our everyday language, and demonstrate how they structure our ordinary
conceptual systems (Lakoff and Johnson 1980:139). The metaphors examined are
those that occur in American English, and therefore, the concepts of the culture they
discuss are the concepts of the culture shared by speakers of American English.
Lakoff and Johnson's work has prompted others to inquire into metaphor in other
languages. The findings of such research suggests that not only is metaphor
ubiquitous in other languages, but that metaphor is uniquely structured across
languages.
According to the cognitive linguists like Sweetser, cognition is a key, if not the
primary element in how language is created, structured and organized (Sweetser
1990). If cognition is a key factor in determining how categories are shaped, then the
idea of only an objective reality defining and shaping categories does not hold.
Instead, what we observe is human beings and human beings in diverse cultures
perceiving, counting, characterizing and categorizing things in their cultures in their
own unique ways. This means that every culture has linguistic structures that are
unique to it.
This turns out to be the case. For instance, the denotative values of English
structures may equate with the same denotations of Chinese structures, while their
connotative values may not show a one-to-one equivalence. For example, in English,
there exists the conceptual metaphor that “being happy is being off the ground,” as in:

1. I was flying high.


2. She was on cloud nine.
3. After the exam, I was walking on air for days.
(Yu 1995:73).

If we take a look at Chinese, the structures, Qing-fu (“light and floating”); qing-
piao (“light and drifting”); and piao-fu (“drifting and floating”), denote the idea of
“being off the ground,” yet connote not happiness, but rather, complacency, pride and
a lack of self-control, conjuring up for the native speaker of Chinese, the concept of
frivolity and superficiality. It is true that sometimes “being off the ground” in Chinese
is equated with the state of being happy. The expression, Teng yun -jia wu (“ride on
a cloud”) sometimes is used to describe happiness about major progress or success.
However, the “being off the ground” Chinese metaphors have both positive and
negative values, whereas the English “being off the ground” metaphors are positive,
and not negative. Hence, this is a case that demonstrates that there is not necessarily a
one-to-one equivalency of metaphorical meaning across languages.
When Agents Meet Cross-Cultural Metaphor 167

Similarly, Mohawk contains metaphors that have no English metaphorical


equivalents. An example of a metonymic structure found in Mohawk that does not
appear in English is:

Teyahsútha
te-ye-yahs-út-ha
DU-she-cross-attach, put onto-ASP
she attaches a cross
She is Catholic.
(Bonvillian 1989:187)

In Chagga, a Bantu language of Tanzania, the metaphoric domains for talking


about lust and sex are “eating” and “heat” (Emantatian 1995:169). While these
domains are also used in English to express sexual metaphors, Chagga has encodings
within these domains which do not appear in English. A case in point is that in
Chagga, a man can identify a woman as any food and apply the attributes of food to
her. A man can say of a woman:

nékesúka chá ngéra


she tastes like stale mbege.
She's no fun as a sexual partner.

Japanese has metaphors that do not have metaphorical counterparts in English. For
instance, the Japanese verb, nagareru, in one of its literal senses means “to flow,” as
in “the river flows,” but when it is used metaphorically, it can mean metaphorically
“passed;” “drenched;” “spread” or “forfeited”:

1. Kono machi ni utsuri sunde kara itsu no ma ni ka goju nen ijo no gabbi ga
nagaremashita.
Since I moved to this town, before I knew it, more than 50 years’ time had
passed.

2. Kabi ni nagareta seikatsu o keiken shite shimau to shisso na kurashi ga deki


kuku naru.
If you end up experiencing a life drenched in luxury, it gets to be hard to be able
to lead a simple life.

3. Miyako no chikaku ni subarashi to ko ga iru rashi to iu uwasa ga


nagareta.
The rumor spread that an outstanding potter seemed to be living near the capital.

4. Asu made ni o kane o shichiya ni motte ika nai to kamera ga nagarete shimau.
If you don’t bring the money to the pawn shop by tomorrow, the camera will
wind up being forfeited.

The English verb, “to flow,” has none of these metaphorical senses.
168 Patricia O’Neill-Brown

2.1 Second Language Learners and Metaphor in the Second Language (L2)

Research has shown that generally, L2 learners cannot understand L2 metaphor


when they encounter it, particularly metaphors which do not have counterparts in the
first language (L1) (Danesi 1992; Irujo 1986, 1993). Studies have indicated that L2
learners conceptualize the metaphoric domain of the L2 to be exactly the same as the
L1 (Tanaka and Abe 1984). The research described here has confirmed that this is the
case with L2 learners of Japanese. It has been found through an experiment, that L2
learners of Japanese have difficulty understanding Japanese metaphor. Furthermore,
the experiment proves that when provided with the type of instruction developed for
the experiment, which teaches subjects how to decode metaphor, more success at
understanding metaphor in the L2 is achieved.

2.2 A Computational Model for Understanding and Generating Japanese


Metaphor

There have been several implementations in natural language understanding


systems that handle the metaphorical along with the literal senses of words,
principally Way (1991), Martin (1990), and Veronis and Ide (1995). In these models,
metaphor is not considered special. Some rely on an analysis of core meaning (Martin
1990), while others rely on context alone (Veronis and Ide 1995). The approach taken
here is a core meaning plus context approach.
The task in the experiment reported on here was for subjects to correctly provide
the metaphorical and literal senses of Japanese verbs (O'Neill-Brown 1998). Correctly
providing the sense meant that they provided the English equivalent which 1)
accurately captured the Japanese concept and 2) made sense in English. Verbs were
selected since verbs tend to be the most polysemous of lexical items, and therefore,
tend to be fertile for bearing the metaphorical senses of structures. Indeed, for
English, the verb has been found to be relatively “mutable,” meaning that its reading
is in large part determined by the types of objects it is paired with (Gentner and
France 1988). The research carried out for purposes of the experiment reported on
here also revealed that Japanese verbs have the same sort of “mutability.”
The test administered to a group of eighteen L2 learners of Japanese, ranging from
beginning (less than one year) to advanced (more than four years) levels of Japanese,
was a one-way Within-Subject Analysis of Variance (ANOVA). The Within-Subject
ANOVA is very common in the field of language acquisition, since what is
commonly tested for in the field is whether students demonstrate differences in a “no
instruction” versus “instruction” condition. This is what was tested for in this
experiment. This is a significant question to language acquisition theorists because the
paradigm in the field is to question whether any form of direct learning is necessary.
Language acquisition theorists, believing that language is best “acquired” and not
learned, have to see hard evidence before they are convinced that language instruction
is necessary.
The exercises in the experiment consisted of two exercises for each of the seven
verbs taught. The first exercise, the control condition, provided the Japanese sentence
and prompted the learner for the correct reading of the verb. The second exercise, the
experimental condition, contained instruction based on the model developed here for
When Agents Meet Cross-Cultural Metaphor 169

understanding Japanese metaphor to assist learners in decoding the metaphorical


statements. To control for carryover effects into Exercise 2 from Exercise 1 for each
verb, different sentence examples were used. In other words, if the same sentence
examples were used for both Exercise 1 and Exercise 2 for each verb, then subjects
could have had an advantage on Exercise 2 since they had already been exposed to the
same sentence examples immediately before in Exercise 2. The research question was
whether the students performed better on Exercise 1 or Exercise 2. The experiment
took the form of a web-based program and can be found at
http://www.mntva.com/pobtest.
These are examples of the exercises in the control condition. This is for the verb,
ataru, which, in its literal, prototypical sense means “to target.”

1. Terebi ni osareppanashi no eiga da ga ko no sakuhin wa atari ni atatte


renjitsu ooiri manin da.
Translation for verb, atari ni atatte :

2. Ego no jikan ni yomi ga atatta.


Translation for verb, atatta:

Acceptable answers for question 1 would have been “was a hit,” the sentence, in
English, reading as “Movies are always crowded out by television, but this production
was a hit and day after day the theaters were packed.” For question 2, an acceptable
answer would have been “fell upon me,” the sentence reading, “During the English
period, the reading fell upon me.”
It was predicted that when asked for a metaphorical sense in the control condition,
subjects would provide the prototypical sense of the verb, which is typically a literal
sense. This turned out to be the case. Especially for the beginners, in the control
condition, the default did seem to be to answer with the literal, most prototypical
sense of the verb. All of the subjects had higher combined total scores for Exercise 2,
the experimental condition, than Exercise 1, the control condition, as shown in Table
1. Almost all, except in a few instances, had higher scores for the individual Exercise
2 than Exercise 1. The analysis of variance (ANOVA) showed that the effect of
instruction was significant, F=117.05, p = 0.0. This experiment revealed that all
differences among means were significant, p < .05. This demonstrates that second
language learners of Japanese do not exhibit metaphorical competence in Japanese,
and therefore, require instruction.

Table 1. Means and Standard Deviations of the Percentage Correct by Test Condition

Condition M S
Control 18.3 20.2
Experimental 76.5 10.5

The instruction helps the subject to acquire the conceptual structuring of Japanese
metaphor. The method combines a core meaning plus a context approach. The
method 1) presents the core meaning of the word under study to the student; 2)
provides a sentence with the word in it; 3) and asks the student to think of the core
170 Patricia O’Neill-Brown

meaning and the other words surrounding it in the sentence to generate a mental
picture of the situation to 4) arrive at the lexical meaning of the word in question.
The context operates on a dual level: the context of the sentence and the context of the
image that the subject conjures up of the situation.
In this way, the learner is led to a place for understanding the meaning of Japanese
words as it is understood by native speakers. In the end, the second language learner
and the first language learner may have the same conceptual understanding of
meaning. However, especially in the beginning phases of learning, L2 learners must
engage in different processes and employ different strategies for arriving at meaning.
For instance, the L2 learner, if not immersed in the language, must conjure up images
to obtain understanding, like the method employed here, which requires the student to
visually simulate the real world situation. This is in contrast to the first language
learner, who, always immersed in the environment, already has the “images” there.
Utilizing the lesson for the first verb to illustrate how the method used in exercise
2 operates, the method begins by introducing the Japanese verb and explaining what
the core meaning of that verb is, which we refer to as the general meaning of the verb.
The reason for choosing the term “the general meaning” as opposed to “core
meaning” is because students are often intimidated by linguistic terminology and a
more generic term is less distracting.

“In this lesson, we will examine the verb, ataru in more detail. Ataru has many
meanings but are related to one another in a general way. In this lesson we'll show
you how the meanings are related. In general, when you use the verb ataru what
you are doing is conceptualizing a situation in which someone or something is
directing attention to or putting themselves or itself, either physically or mentally,
at a particular point, which can be thought of as the object or the goal. The object
or goal can be a person or a thing.”

The next step is to define for the student the prototypical sense of the verb, which we
refer to in the lesson as the main meaning, for the same reason we refer to general
meaning as opposed to core meaning. Included is a sample sentence with the English
translation:

“The main meaning of ataru, which you may already know, is 'to target.' Here's an
example sentence with this meaning of ataru in it:
1. Zenryoku o agete teki ni atatta.
We targeted the enemy with all our might.

The next stage in the process is to explain to the student how the general meaning of
the verb fits in with the main meaning, using a concrete example:

“Let's look at how our explanation of the general meaning of ataru fits in with the
main meaning of ataru. Remember that we said that in general, when you use the
verb ataru what you are doing is conceptualizing a situation in which someone or
something is directing attention to or putting themselves or itself, either physically
or mentally, at a particular point, which can be thought of as the object or the goal.
The object or goal can be a person or a thing. How does this general meaning fit in
When Agents Meet Cross-Cultural Metaphor 171

with one of the scenarios covered by the main meaning, which we saw in the
example sentence? Let's look at it this way. When you are ataruing your enemy,
what are you doing? Remember, you are directing your attention to a particular
point and putting yourself at the point something else is at, either physically or
mentally. If you are in one place and your enemy is in another place, and you're
bringing yourself to the enemy, or making the enemy the object of something, what
are you doing? What you are doing is targeting your enemy.”

Using an example of a non-prototypical sense of the verb, it is then explained to


the student that if s/he assigns the main sense to that verb, the sentence will not make
sense. To help the students understand how to come up with the correct sense, we
explain to them that what they have to do is see how that specific example fits in with
the main sense:

“However, every time you see ataru used in a sentence, the main meaning, to
target” is not always used. So how can we tell what the meaning is? We can tell by
thinking of the general meaning of ataru and seeing how the parts of the sentence
fit in with this particular sense of ataru. Let's take an example.

2. Kaze ga yoku ataru umizoi no michi o aruite iru to suna ga me ni haitte kuru.

In this sentence if we were to say that its meaning is, ‘When you walk along the
coastal road where the wind targets, the sand gets in your eyes,’ 'the wind targets'
sounds funny in English. So what do we have to do to come up with a better
translation?”

The next step in our lesson is to introduce the students to the procedure of thinking
of what the general meaning of the verb is, then looking at what type of noun the verb
is paired with to determine correct sense:

“We have to look at the type of object that is being used with the verb ataru. The
type of object linked with the verb determines the interpretation you're going to be
thinking of when you see the verb ataru in a particular sentence. This is an
important principle to remember.

So you have plug in the specific object the verb is being used with and see how it
fits in with one of the scenarios covered by the general meaning of ataru to
determine the correct meaning.

We would ask ourselves these questions: When the wind atarus a road, what is the
wind doing? Let's think. In your mind you should imagine what is happening when
the wind is at the same point that the road is or what it means when the wind is
putting itself at the same point that the road is at.”

Now, the student is brought back to the sentence and asked to think what the verb
would mean in the context of the sentence, in this way, arming her with additional
172 Patricia O’Neill-Brown

clues for deciding what the verb means via the other words in the sentence. The
lesson continues as follows:

“We could say tentatively that in this sentence, 'hit' would be the best meaning for
ataru. Then we'd have to ask ourselves if this would this be the best translation for
the sentence. Let's see if it is. The sentence, again, is:

Kaze ga yoku ataru umizoi no michi o aruite iru to suna ga me ni haitte kuru.

Let's plug in 'hits' to see if it makes sense in this sentence.

When you walk along the coastal road where the wind hits, the sand gets in your
eyes.

We would determine that 'hits' does make sense in this sentence. We can see that
'hits' fits in with the general meaning of ataru. When one object hits another, the
two of them are in the same place.”

The student is stepped through two more examples, reinforcing the procedure for
arriving at correct meaning. They are then asked to work through the rest of the
exercises themselves. Here are sample questions for ataru:

3. Go shujin ni hara o tate takara to itte kodomo ni atari chirasu mono ja


arimasen.
She shouldn't ______ the children just because she is angry at her husband.

Think about what it means for a mother to atari chirasus her children. Plugging in
the general meaning of the verb, the one that is doing the atari chirasuing is
making the other person the object of something. What would we say that she is
doing to her children?

Translation for verb, atari chirasu:

4. Kono shigoto wa atareba tai shita mono da.


If this business ______, it will be something big.

When business is ataru, what does this mean? What's happening is that the
business meets a particular point, which can be considered the goal. Think about
what the goal of the people running a business would be.

Translation for verb, atareba:

For question 3, an acceptable answer is “take it out on” and for question 4, something
like, “makes it big,” “takes off” or “is successful” are acceptable.
The instructional method does not merely call upon the learner to memorize
Japanese metaphors by rote; rather, the learner is required to embrace a connectionist
approach to lexical acquisition. They are called upon to embody an understanding of
When Agents Meet Cross-Cultural Metaphor 173

the core meaning of a word, and then dynamically determine its lexical meaning in
context. The L2 learner does not have in place the understanding that there is such a
thing as a core meaning holding the literal and the metaphorical together, or how to,
starting from the core, arrive at the meaning of a lexical item in a context. The
instructional method described here, which has been demonstrated to be effective, is
novel, since second language instructors typically do not take a connectionist
approach to teaching the L2 lexicon. Furthermore, this method has the potential to
enable students to recognize metaphor on their own. After they had been through
both exercises for about three or four verbs, several of the subjects started to perform
better on Exercise 1, though still not better than on Exercise 2.

3 Implications for the Development of Agents which Produce and


Generate Metaphor
The findings of this study have implications for the consideration of the design of
agents that can understand and produce metaphor. It is a given that as humans have
problems with understanding cross-cultural metaphor, agents communicating with
other agents from systems will have problems communicating with agents that are
embodied in different metaphorical systems. Agents will have the same challenges as
L2 learners. As they are being designed by people with diverse backgrounds
immersed in different metaphorical systems, cross-cultural agent communication
stands the chance of being strained at all levels if agents do not have some form of
adaptable communicative feature. Whether the agents exchange, interpret or deliver
text, verbiage or visual icons, it is a linguistic message. Language is interpreted to not
only mean the “words” —the verbal and textual message of exchanges—but all
actions and items involved in the speech act that are used to encode and decode
messages.
The findings reported on here indicate that the programming involved in producing
an agent that is capable of understanding and generating metaphor may not be a mind-
boggling task requiring an omniscient programmer. Whether or not an agent capable
of producing and generating metaphor can be achieved is a question that is bound to
crop up at this stage in the evolutionary path of agent development. After all, the issue
of “how much context is necessary for understanding” has been a major question in
the field of Artificial Intelligence (AI) since its earliest days. It was once thought in
the field of Artificial Intelligence that one had to account for “the full extent of world
knowledge” for understanding human actions, such as language. There are still those
that claim that models for metaphor understanding must depend on “total” context
and “consider the full extent of background world knowledge all at once” (Veale
1998). However, the instructional method for understanding metaphor described here
shows that the sentence level context plus the image conjured up by it is sufficient for
understanding. The results obtained suggest that on the contrary, total context is not
necessary, just a slice of it, and further, what that slice is.
The results of this study demonstrate that metaphor 1) can be accounted for within
a general theory of meaning and 2) is learnable. Understanding how the second
language learner acquires the lexicon of the L2 brings us near to the development of
viable computational approaches for producing metaphorically competent agents. Just
174 Patricia O’Neill-Brown

as the processes for lexical acquisition must be made explicit to the second language
learner, processes for parsing and generation, including the parsing and generation of
metaphor, must be made explicit to a computational entity such as an agent.
The method employed here takes a connectionist approach to the lexicon—
something called a core meaning underlies the metaphorical and literal senses of
words, and lexical meaning is determined on the fly in the context of a situation. In
other words, the learner must memorize some content—the core meaning of a word—
which remains steady—and then step through a procedure for dynamically
understanding the lexical meaning of that word in a context. This method is
representative of the “flexible computing” approach. Checking core meaning and
relating it to other sentential constituents to conjure up an image to arrive at meaning
is procedural. Flexibility comes into the model in the sense that any verb, any
sentence, and any context can be processed. The context is not pre-composed—it is
built and computed on the fly. In addition, the method for understanding Japanese
metaphor is flexible in that it can be applied to languages other than Japanese.
Understanding Japanese metaphor involved developing a method for deriving core
meaning (not described here). This method was informed by a procedure used by
Brugman (1983) to derive core meaning for an English structure, as well as the
phenomena of “the verb mutability effect” uncovered by Gentner and France (1988).
The method was top-down, consisting of taking in and analyzing sentence after
sentence to determine core meaning. In turn, this method was used to develop the
instructional technique; essentially, the procedure in reverse. As described, the
method was bottom-up, starting from the core and generating up, capturing the lexical
meaning through the context of the sentence and the image conjured up by it. By
extension, a “flexible computing” approach similar to the methods for understanding
and producing metaphor described here may be a viable way for developing agents
that are capable of recognizing and generating metaphor.

References
1. Black, M.: Metaphor. In: Models and Metaphors: Studies in Language and Philosophy.
Cornell University Press, Ithaca, New York (1962) 25-47.
2. Bonvillain, N.: Noun Incorporation and Metaphor: Semantic Process in Akwesasne Mohawk.
In: Anthropological Linguistics (1989) 31:3-4.
3. Brugman, C.: Story of Over. Indiana University Linguistics Club, Bloomington Indiana
(1983).
4. Danesi, M.: Metaphorical Competence in Second Language Acquisition Research and
Language Teaching: The Neglected Dimension. In: Alatis, J. (ed.): Georgetown University
Round Table on Languages and Linguistics. Georgetown University Press, Washington,
D.C. (1992).
5. Dirven, R.: Metaphor as a Basic Means for Extending the Lexicon." In: Wolf, P., Dirven, R.
(eds.): The Ubiquity of Metaphor in Language and Thought, 85-119. John Benjamins,
Amsterdam (1985).
6. Emantatian, M.: Metaphor and the Expression of Emotion: The Value of Cross-Cultural
Perspectives. In: Metaphor and Symbolic Activity (1995) 10(3):163-182.
7. Gentner, D., France, I.M.: The Verb Mutability Effect: Studies of the Combinatorial
Semantics of Nouns and Verbs. In: Small, S. (ed.): Lexical Ambiguity Resolution:
When Agents Meet Cross-Cultural Metaphor 175

Perspectives from Psycholinguistics, Neuropsychology and Artificial Intelligence. Morgan


Kaufmann, San Mateo (1988) 343-382.
8. Hayashi, C.: Dictionary of Japanese Usage Examples. Kyoikusha, Tokyo (1986).
9. Irujo, S.: Don't Put your Leg in Your Mouth: Transfer in the Acquisition of Idioms in Second
Language. In: TESOL Quarterly (1986) 20(2).
10. ___________.: Steering Clear: Avoidance in the Production of Idioms. In: International
Review of Applied Linguistics in Language Teaching (1993).
11. Johnson, M.(ed.): Philosophical Perspectives on Metaphor. University of Minnesota Press,
Minneapolis (1981).
12. Lakoff, G., Johnson, M.: Metaphors We Live By. The University of Chicago Press,
Chicago (1980).
13. Lane, D.M.: HyperStat Online http://www.ruf.rice.edu/~lane/hyperstat/B131558.html;
http://www.ruf.rice.edu/~lane/hyperstat/B131018.html.
14. Martin, J.M.: A Computational Model of Metaphor Interpretation. Academic Press, San
Diego (1990).
15. O'Neill-Brown, P.: A Computational Method for Understanding and Teaching Japanese
Metaphor. Ph.D Dissertation. Georgetown University, Washington, D.C. (1998).
16. Reddy, M.J.: The Conduit Metaphor: A Case Frame of Conflict in our Language About
nd
Language. In: Ortony, A. (ed.): Metaphor and Thought. 2 edn. Cambridge University
Press, Cambridge (1993) 164-201.
17. Richards, I.A.: The Philosophy of Rhetoric. Oxford University Press, London (1936).
18. Sweetser, E.: From Etymology to Pragmatics: Metaphorical and Cultural Aspects of
Semantic Structure. Cambridge University Press, Cambridge (1990).
19. Tanaka, S., Abe, H.: Conditions on Interlingual Semantic Transfer. In: On TESOL '84: A
Brave New World for TESOL. Teachers of English to Speakers of Other Languages,
Washington, D.C. (1985) 101-120.
20. Veale, T.: Literature Review at the Metaphor Home Page.
http://www.compapp.dcu.ie/~tonyv/trinity/way.html;
http://www.compapp.dcu.ie/~tonyv/trinity/martin.html (1998).
21. Veronis, J., Ide, N.: Large Neural Networks for the Resolution of Ambiguity. In: Saint-
Dizier, P. and Vegas, E. (eds.): Computational Lexical Semantics. Cambridge University
Press, Cambridge (1995) 251-269.
22. Way, E.C.: Knowledge Representation and Metaphor. Kluwer, Boston (1991).
23. Yu, N.: Metaphorical Expressions of Anger and Happiness in English and Chinese. In:
Metaphor and Symbolic Activity. (1995) 10(2):73.
Imitation and Mechanisms of Joint Attention:
A Developmental Structure for Building Social
Skills on a Humanoid Robot

Brian Scassellati

MIT Artificial Intelligence Lab


545 Technology Square
Cambridge MA 02139, USA
scaz@ai.mit.edu
http://www.ai.mit.edu/people/scaz/

Abstract. Adults are extremely adept at recognizing social cues, such


as eye direction or pointing gestures, that establish the basis of joint
attention. These skills serve as the developmental basis for more com-
plex forms of metaphor and analogy by allowing an infant to ground
shared experiences and by assisting in the development of more complex
communication skills. In this chapter, we review some of the evidence
for the developmental course of these joint attention skills from develop-
mental psychology, from disorders of social development such as autism,
and from the evolutionary development of these social skills. We also
describe an on-going research program aimed at testing existing mod-
els of joint attention development by building a human-like robot which
communicates naturally with humans using joint attention.
Our group has constructed an upper-torso humanoid robot, called Cog,
in part to investigate how to build intelligent robotic systems by following
a developmental progression of skills similar to that observed in human
development. Just as a child learns social skills and conventions through
interactions with its parents, our robot will learn to interact with people
using natural social communication. We further consider the critical role
that imitation plays in bootstrapping a system from simple visual behav-
iors to more complex social skills. We will present data from a face and
eye finding system that serves as the basis of this developmental chain,
and an example of how this system can imitate the head movements of
an individual.

1 Motivation

One of the critical precursors to social learning in human development is the


ability to selectively attend to an object of mutual interest. Humans have a
large repertoire of social cues, such as gaze direction, pointing gestures, and
postural cues, that all indicate to an observer which object is currently under
consideration. These abilities, collectively named mechanisms of joint (or shared)
attention, are vital to the normal development of social skills in children. Joint

C. Nehaniv (Ed.): Computation for Metaphors, Analogy, and Agents, LNCS 1562, pp. 176-195, 1999.
c Springer-Verlag Berlin Heidelberg 1999
Imitation and Mechanisms of Joint Attention 177

attention to objects and events in the world serves as the initial mechanism
for infants to share experiences with others and to negotiate shared meanings.
Joint attention is also a mechanism for allowing infants to leverage the skills and
knowledge of an adult caretaker in order to learn about their environment, in
part by allowing the infant to manipulate the behavior of the caretaker and in
part by providing a basis for more complex forms of social communication such
as language and gestures.
Joint attention has been investigated by researchers in a variety of fields.
Experts in child development are interested in these skills as part of the normal
developmental course that infants acquire extremely rapidly, and in a stereotyped
sequence (Scaife & Bruner 1975, Moore & Dunham 1995). Additional work on
the etiology and behavioral manifestations of developmental disorders such as
autism and Asperger’s syndrome have focused on disruptions to joint attention
mechanisms and demonstrated how vital these skills are in our social world
(Cohen & Volkmar 1997, Baron-Cohen 1995). Philosophers have been interested
in joint attention both as an explanation for issues of contextual grounding
and as a precursor to a theory of other minds (Whiten 1991, Dennett 1991).
Evolutionary psychologists and primatologists have focused on the evolution of
these simple social skills throughout the animal kingdom as a means of evaluating
both the presence of theory of mind and as a measure of social functioning
(Povinelli & Preuss 1995, Hauser 1996, Premack 1988).
We have approached joint attention from a slightly different perspective:
the construction of human-like robots that exhibit these social skills (Scassel-
lati 1996). This approach focuses first on the construction of useful real-world
systems that can both recognize and produce normal human social cues, and
second on the evaluation of the complex models of joint attention developed by
other disciplines.
Building machines that can recognized human social cues will provide a flex-
ibility and robustness that current systems lack. While the past few decades
have seen increasingly complex machine learning systems, the systems we have
constructed have failed to approach the flexibility, robustness, and versatility
that humans display. There have been successful systems for extracting envi-
ronmental invariants and exploring static environments, but there have been
few attempts at building systems that learn by interacting with people using
natural, social cues. With advances in embodied systems research, we can now
build systems that are robust enough, safe enough, and stable enough to allow
machines to interact with humans in a learning environment. Constructing a
machine that can recognize the social cues from a human observer allows for
more natural human-machine interaction and creates possibilities for machines
to learn by directly observing untrained human instructors. We believe that by
using a developmental program to build social capabilities we will be able to
achieve a wide range of natural interactions with untrained observers (Brooks,
Ferrell, Irie, Kemp, Marjanovic, Scassellati & Williamson 1998).
Robotics also offers a unique tool to developmental psychology and related
disciplines in evaluating complex interaction models. By implementing these
178 Brian Scassellati

models in a real-world system, we provide a test bed for manipulating the be-
havioral progression. With an implemented developmental model, we can test
alternative learning and environmental conditions in order to evaluate alterna-
tive intervention and teaching techniques. This investigation of joint attention
asks questions about the development and origins of the complex non-verbal
communication skills that humans so easily master: What is the progression of
skills that humans must acquire to engage in shared attention? When something
goes wrong in this development, as it seems to do in autism, what problems can
occur, and what hope do we have for correcting these problems? What parts of
this complex interplay can be seen in other primates, and what can we learn
about the basis of communication from these comparisons? With a robotic im-
plementation of the theoretical models, we can further these investigations in
previously unavailable directions.
However, building a robot with the complete social skills of a human is a
Herculean task that still resides in the realm of science fiction and not artificial
intelligence. In order to build a successful implementation, we must decompose
the monolithic “social skills module” into manageable pieces. The remainder of
this chapter will be devoted to building a rough consensus of evidence from work
on autism and Asperger’s syndrome, from developmental psychology, and from
evolutionary studies on how this decomposition can best be accomplished. From
this rough consensus, we will outline a program for building a robot that can
recognize and generate simple joint attention behaviors. Finally, we will describe
some of the preliminary steps we have taken with one humanoid robot to build
this developmental program.

2 A Developmental Model of Joint Attention

To build complex social skills, we must have a decomposition of simpler be-


havioral skills that can be implemented and tested on our robotic system. This
section will first describe why we believe that a decomposition is possible, based
upon evidence from developmental psychology, abnormal psychology, and evolu-
tionary psychology. By studying the way that nature has decomposed this task,
we hope not only to find ways of breaking our computational problem into man-
ageable pieces, but also to explore some of the theories of human development.
We then focus on one module-based decomposition of joint attention skills. With
this as a theoretical basis, we then begin to develop a task-based decomposition
which can be implemented and tested on a robotic system.

2.1 Evidence that Decomposition is Possible

The most relevant studies to our purposes have occured as developmental and
evolutionary investigations of “theory of mind” (see Whiten (1991) for a collec-
tion of these studies). The most important finding, repeated in many different
forms, is that the mechanisms of joint attention are not a single monolithic sys-
tem. Evidence from childhood development shows that not all mechanisms for
Imitation and Mechanisms of Joint Attention 179

joint attention are present from birth, and there is a stereotypic progression of
skills that occurs in all infants at roughly the same rate (Hobson 1993). For
example, infants are always sensitive to eye direction before they can interpret
and generate pointing gestures.
There are also developmental disorders, such as autism, that limit and frac-
ture the components of this system (Frith 1990). Autism is a pervasive devel-
opmental disorder of unknown etiology that is diagnosed by a set of behav-
ioral criteria centered around abnormal social and communicative skills (DSM
1994, ICD 1993). Individuals with autism tend to have normal sensory and mo-
tor skills, but have difficulty with certain socially relevant tasks. For example,
autistic individuals fail to make appropriate eye contact, and while they can rec-
ognize where a person is looking, they often fail to grasp the implications of this
information. While the deficits of autism certainly cover many other cognitive
abilities, some researchers believe that the missing mechanisms of joint attention
may be critical to the other deficiencies (Baron-Cohen 1995). In comparison to
other mental retardation and developmental disorders (like Williams and Downs
Syndromes), the social deficiencies of autism are quite specific (Karmiloff-Smith,
Klima, Bellugi, Grant & Baron-Cohen 1995).
Evidence from research into the social skills of other animals has also indi-
cated that joint attention can be decomposed into a set of subskills. The same
ontological progression of joint attention skills that is evident in human infants
can also be seen as an evolutionary progression in which the increasingly complex
set of skills can be mapped to animals that are increasingly closer to humans on
a phylogenetic scale (Povinelli & Preuss 1995). For example, skills that infants
acquire early in life, such as sensitivity to eye direction, have been demonstrated
in relatively simple vertebrates, such as snakes (Burghardt & Greene 1990), while
skills that are acquired later tend to appear only in the primates (Whiten 1991).

2.2 A Module-Based Decomposition


As the basis for our implementation of joint attention, we begin with a develop-
mental model from Baron-Cohen (1995). Baron-Cohen’s model gives a coherent
account of the observed developmental stages of joint attention behaviors in both
normal and blind children, the observed deficiencies in joint attention of children
with autism, and a partial explanation of the observed abilities of primates on
joint attention tasks.
Baron-Cohen describes four Fodorian modules: the eye-direction detector
(EDD), the intentionality detector (ID), the shared attention module (SAM),
and the theory-of-mind module (TOMM). In brief, the eye-direction detector
locates eye-like shapes and extrapolates the object that they are focused upon
while the intentionality detector attributes desires and goals to objects that ap-
pear to move under their own volition. The outputs of these two modules (EDD
and ID) are used by the shared attention module to generate representations
and behaviors that link attentional states in the observer to attentional states
in the observed. Finally, the theory-of-mind module acts on the output of SAM
to predict the thoughts and actions of the observed individual.
180 Brian Scassellati

Stage #1: Mutual Gaze Stage #3: Imperative Pointing

Stage #2: Gaze Following Stage #4: Declarative Pointing

Fig. 1. A four-part task-based decomposition of joint attention skills. The capabilities


for maintaining mutual gaze lead to the ability of gaze following. Imperative point-
ing skills, combined with gaze following, results in declarative pointing. For further
information, see section 2.3.

This module-based description is a useful analysis tool, but does not provide
sufficient detail for a robotic implementation. To build a portion of joint behav-
ior skills, we require a set of observable behaviors that can be used to evaluate
the performance of the system incrementally. We require a task-level decom-
position of necessary skills and the developmental mechanisms that provide for
transition between stages. Our current work is on identifying and implementing
a developmental account of one possible skill decomposition, an account which
relies heavily upon imitation.

2.3 A Task-Based Decomposition


The task-based skill decomposition that we are pursuing can be broken down
into four stages: maintaining eye contact, gaze following, imperative pointing,
and declarative pointing. Figure 1 shows simple cartoon illustrations of these
four skills. The smaller figure on the left in each cartoon represents the novice
and the larger figure on the right represents the caretaker. In terms of Baron-
Cohen’s model, we are implementing a vertical slice of behaviors from parts of
EDD, ID, and SAM that additionally matches the observed phylogeny of these
skills.
The first step in producing mechanisms of joint attention is the recognition
and maintenance of eye contact. Many animals have been shown to be extremely
sensitive to eyes that are directed at them, including reptiles like the hognosed
snake (Burghardt & Greene 1990), avians like the chicken (Scaife 1976) and the
Imitation and Mechanisms of Joint Attention 181

plover (Ristau 1991), and all primates (Cheney & Seyfarth 1990). Identifying
whether or not something is looking at you provides an obvious evolutionary
advantage in escaping predators, but in many mammals, especially primates, the
recognition that another is looking at you carries social significance. In monkeys,
eye contact is significant for maintaining a social dominance hierarchy (Cheney
& Seyfarth 1990). In humans, the reliance on eye contact as a social cue is
even more striking. Infants have a strong preference for looking at human faces
and eyes, and maintain (and thus recognize) eye contact within the first three
months. Maintenance of eye contact will be the testable behavioral goal for a
system in this stage.
The second step is to engage in joint attention through gaze following. Gaze
following is the rapid alternation between looking at the eyes of the individual
and looking at the distal object of their attention. While many animals are sen-
sitive to eyes that are gazing directly at them, only primates show the capability
to extrapolate from the direction of gaze to a distal object, and only the great
apes will extrapolate to an object that is outside their immediate field of view
(Povinelli & Preuss 1995).1 This evolutionary progression is also mirrored in the
ontogeny of social skills. At least by the age of three months, human infants dis-
play maintenance (and thus recognition) of eye contact. However, it is not until
nine months that children begin to exhibit gaze following, and not until eighteen
months that children will follow gaze outside their field of view (Baron-Cohen
1995). Gaze following is an extremely useful imitative gesture which serves to
focus the child’s attention on the same object that the caregiver is attending to.
This simplest form of joint attention is believed to be critical for social scaffold-
ing(Thelen & Smith 1994), development of theory of mind(Baron-Cohen 1995),
and providing shared meaning for learning language (Wood, Bruner & Ross
1976). This functional imitation appears simple, but a complete implementation
of gaze following involves many separate proficiencies. Imitation is a developing
research area in the computational sciences (for excellent examples, see (Daut-
enhahn 1994, Hayes & Demiris 1994, Dautenhahn 1997)).
The third step in our account is imperative pointing. Imperative pointing is
a gesture used to obtain an object that is out of reach by pointing at that object.
This behavior is first seen in human children at about nine months of age (Baron-
Cohen 1995), and occurs in many monkeys (Cheney & Seyfarth 1990). However,
there is nothing particular to the infant’s behavior that is different from a simple
reach – the infant is initially as likely to perform imperative pointing when the
caretaker is attending to the infant as when the caretaker is looking in the other
direction or when the caretaker is not present. The caregiver’s interpretation of
infant’s gesture provides the shared meaning. Over time, the infant learns when
the gesture is appropriate. One can imagine the child learning this behavior
through simple reinforcement. The reaching motion of the infant is interpreted
by the adult as a request for a specific object, which the adult then acquires
1
The terms “monkey” and “ape” are not to be used interchangeably. Apes include
orangutans, gorillas, bonobos, chimpanzees, and humans. All apes are monkeys, but
not all monkeys are apes.
182 Brian Scassellati

and provides to the child. The acquisition of the desired object serves as positive
reinforcement for the contextual setting that preceded the reward (the reaching
action in the presence of the attentive caretaker). Generation of this behavior is
then a simple extension of a primitive reaching behavior.
The fourth step is the advent of declarative pointing. Declarative pointing is
characterized by an extended arm and index finger designed to draw attention
to a distal object. Unlike imperative pointing, it is not necessarily a request
for an object; children often use declarative pointing to draw attention to ob-
jects that are clearly outside their reach, such as the sun or an airplane passing
overhead. Declarative pointing also only occurs under specific social conditions;
children do not point unless there is someone to observe their action. We propose
that imitation is a critical factor in the ontogeny of declarative pointing. This
is an appealing speculation from both an ontological and a phylogenetic stand-
point. From an ontological perspective, declarative pointing begins to emerge at
approximately 12 months in human infants, which is also the same time that
other complex imitative behaviors such as pretend play begin to emerge. From
the phylogenetic perspective, declarative pointing has not been identified in any
non-human primate (Premack 1988). This also corresponds to the phylogeny of
imitation; no non-human primate has ever been documented to display imitative
behavior under general conditions (Hauser 1996). We propose that the child first
learns to recognize the declarative pointing gestures of the adult and then imi-
tates those gestures in order to produce declarative pointing. The recognition of
pointing gestures builds upon the competencies of gaze following and imperative
pointing; the infrastructure for extrapolation from a body cue is already present
from gaze following, it need only be applied to a new domain. The generation of
declarative pointing gestures requires the same motor capabilities as imperative
pointing, but it must be utilized in specific social circumstances. By imitating
the successful pointing gestures of other individuals, the child can learn to make
use of similar gestures.

3 Implementing Joint Attention

To build a system that can both recognize and produce the joint attention skills
outlined above, we require a system with both human-like sensory systems and
motor abilities. The Cog project at the MIT Artificial Intelligence Laboratory
has been constructing an upper-torso humanoid robot, called Cog, in part to
investigate how to build intelligent robotic systems by following a developmental
progression of skills similar to that observed in human development (Brooks &
Stein 1994, Brooks et al. 1998). In the past two years, a basic repertoire of
perceptual capabilities and sensory-motor skills have been implemented on the
robot (see Brooks et al. (1998) for a review).
The humanoid robot Cog has twenty-one degrees of freedom to approximate
human movement, and a variety of sensory systems that approximate human
senses, including visual, vestibular, auditory, and tactile senses. Cog’s visual sys-
tem is designed to mimic some of the capabilities of the human visual system,
Imitation and Mechanisms of Joint Attention 183

Fig. 2. Images obtained from the peripheral (top) and foveal (bottom) cameras on Cog.
The peripheral image is used for detecting salient objects worthy of visual attention,
while the foveal image is used to obtain high resolution detail of those objects.

including binocularity and space-variant sensing (Scassellati 1998a). To allow for


both a wide field of view and high resolution vision, there are two cameras per
eye, one which captures a wide-angle view of the periphery (approximately 110◦
field of view) and one which captures a narrow-angle view of the central (foveal)
area (approximately 20◦ field of view with the same resolution), as shown in
Figure 2. Two additional copies of this active vision system are used as desktop
development platforms, and were used to collect some of the data reported in
the following sections. While there are minor differences between the platforms,
these differences are not important to the work reported here. Cog also has a
three degree of freedom neck and a pair of human-like arms. Each arm has six
compliant degrees of freedom, each of which is powered by a series elastic actu-
ator (Pratt & Williamson 1995) which provides a sensible “natural” behavior: if
it is disturbed, or hits an obstacle, the arm simply deflects out of the way.

3.1 Implementing Maintenance of Eye Contact

Implementing the first stage in our developmental framework, recognizing and


responding to eye contact, requires mostly perceptual abilities. We require at
least that the robot be capable of (1) finding faces, (2) determining the location
of the eye within the face, and (3) determining if the eye is looking at the robot.
The only necessary motor abilities are to maintain a fixation point.
184 Brian Scassellati

Frame Prefilter Face


Grabber Detector

Motion
Detector
Fig. 3. Block diagram for the pre-filtering stage of face detection. The pre-filter selects
target locations based upon motion information and past history. The pre-filter allows
face detection to occur at 20 Hz with little accuracy loss.

Many computational methods of face detection on static images have been


investigated by the machine vision community, for example (Sung & Poggio 1994,
Rowley, Baluja & Kanade 1995). However, these methods are computationally
intensive, and current implementations do not operate in real time. However,
a simpler strategy for finding faces can operate in real time and produce good
results under dynamic conditions (Scassellati 1998b). The strategy that we use
is based on the ratio-template method of object detection reported by Sinha
(1994). In summary, finding a face is accomplished with the following five steps:

1. Use a motion-based pre-filter to identify potential face locations in the pe-


ripheral image.
2. Use a ratio-template based face detector to identify target faces.
3. Saccade to the target using a learned sensory-motor mapping.
4. Convert the location in the peripheral image to a foveal location using a
learned mapping.
5. Extract the image of the eye from the foveal image.

A short summary of these steps appears below, and additional details can be
found in Scassellati (1998b).
To identify face locations, the peripheral image is converted to grayscale and
passed through a pre-filter stage (see Figure 3). The pre-filter allows us to search
only locations that are likely to contain a face, greatly improving the speed of
the detection step. The pre-filter selects a location as a potential target if it has
had motion in the last 4 frames, was a detected face in the last 5 frames, or has
not been evaluated in 3 seconds. A combination of the pre-filter and some early-
rejection optimizations allows us to detect faces at 20 Hz with little accuracy
loss.
Face detection is done with a method called “ratio templates” designed to
recognize frontal views of faces under varying lighting conditions (Sinha 1996).
A ratio template is composed of a number of regions and a number of relations,
Imitation and Mechanisms of Joint Attention 185

Fig. 4. A ratio template for face detection. The template is composed of 16 regions
(the gray boxes) and 23 relations (shown by arrows).

as shown in Figure 4. Overlaying the template with a grayscale image location,


each region is convolved with the grayscale image to give the average grayscale
value for that region. Relations are comparisons between region values, such
as “the left forehead is brighter than the left temple.” In Figure 4, each arrow
indicates a relation, with the head of the arrow denoting the lesser value. The
match metric is the number of satisfied relations; the more matches, the higher
the probability of a face.
Once a face has been detected, the face location is converted into a motor
command to center the face in the peripheral image. To maintain portability
between the development platforms and to ensure accuracy in the sensory-motor
behaviors, we require that all of our sensory-motor behaviors be learned by
on-line adaptive algorithms (Brooks et al. 1998). The mapping between image
locations and the motor commands necessary to foveate that target is called a
saccade map. This map is implemented as a 17 × 17 interpolated lookup table,
which is trained by the following algorithm:
1. Initialize with a linear map obtained from self-calibration.
2. Randomly select a visual target.
3. Saccade using the current map.
4. Find the target in the post-saccade image using correlation.
5. Update the saccade map based on L2 error.
6. Go to step 2.
The system converges to an average of less than one pixel of error per saccade
after 2000 trials (1.5 hours). More information on this technique can be found
in Marjanović, Scassellati & Williamson (1996).
Because humans are rarely motionless, after the active vision system has
saccaded to the face, we first verify the location of the face in the peripheral
image. The face and eye locations from the template in the peripheral camera
186 Brian Scassellati

Face Saccade Motor


Detector Map Control

Peripheral to Foveal
Foveal Map Grabber

Fig. 5. Block diagram for finding eyes and faces. Once a target face has been located,
the system must saccade to that location, verify that the face is still present, and then
map the position of the eye from the face template onto a position in the foveal image.

are then mapped into foveal camera coordinates using a second learned mapping.
The mapping from foveal to peripheral pixel locations can be seen as an attempt
to find both the difference in scales between the images and the difference in
pixel offset. In other words, we need to estimate four parameters: the row and
column scale factor that we must apply to the foveal image to match the scale
of the peripheral image, and the row and column offset that must be applied to
the foveal image within the peripheral image. This mapping can be learned in
two steps. First, the scale factors are estimated using active vision techniques:
while moving the motor at a constant speed, we measure the optic flow of both
cameras. The ratio of the flow rates is the ratio of the image sizes. Second, we use
correlation to find the offsets. The foveal image is scaled down by the discovered
scale factors, and then correlated with the peripheral image to find the best
match location.
Once this mapping has been learned, whenever a face is foveated we can ex-
tract the image of the eye from the foveal image (see Figure 5). This extracted
image is then ready for further processing. The left image of Figure 6 shows
the result of the face detection routines on a typical grayscale image before the
saccade. The right image of Figure 6 shows the extracted subimage of the eye
that was obtained after saccading to the target face. Additional examples of
successful detections on a variety of faces can be seen in Figure 7. This method
achieves good results in a dynamic real-world environment; in a total of 140
trials distributed between 7 subjects, the system extracted a foveal image that
contained an eye on 131 trials (94% accuracy). Of the missed trials, two resulted
from an incorrect face identification (a face was falsely detected in the back-
ground clutter), and seven resulted from either an inaccurate saccade or motion
of the subject (Scassellati 1998b).
In order to accurately recognize whether or not the caregiver is looking at
the robot, we must take into account both the position of the eye within the
head and the position of the head with respect to the body. Work on extracting
the location of the pupil within the eye and the position of the head on the body
has begun, but is still in progress.
Imitation and Mechanisms of Joint Attention 187

Fig. 6. A successfully detected face and eye. The 128x128 grayscale image was captured
by the active vision system, and then processed by the pre-filtering and ratio template
detection routines. One face was found within the peripheral image, shown at left. The
right subimage was then extracted from the foveal image using a learned peripheral-
to-foveal mapping.

3.2 Implementing Gaze Following


Once our system is capable of detecting eye contact, we require three additional
subskills to achieve gaze following: extracting the angle of gaze, extrapolating
the angle of gaze to a distal object, and motor routines for alternating between
the distal object and the caregiver. Extracting angle of gaze is a generalization of
detecting someone gazing at you, and requires the skills noted in the preceding
section. Extrapolation of the angle of gaze can be more difficult. By a geometric
analysis of this task, we would need to determine not only the angle of gaze, but
also the degree of vergence of the observer’s eyes to find the distal object. How-
ever, the ontogeny of gaze following in human children demonstrates a simpler
strategy.
Butterworth (1991) has shown that at approximately 6 months, infants will
begin to follow a caregiver’s gaze to the correct side of the body, that is, the
child can distinguish between the caretaker looking to the left and the caretaker
looking to the right (see Figure 8). Over the next three months, their accuracy
increases so that they can roughly determine the angle of gaze. At 9 months, the
child will track from the caregiver’s eyes along the angle of gaze until a salient
object is encountered. Even if the actual object of attention is further along
the angle of gaze, the child is somehow “stuck” on the first object encountered
along that path. Butterworth labels this the “ecological” mechanism of joint
visual attention, since it is the nature of the environment itself that completes
the action. It is not until 12 months that the child will reliably attend to the
distal object regardless of its order in the scan path. This “geometric” stage
indicates that the infant successfully can determine not only the angle of gaze
but also the vergence. However, even at this stage, infants will only exhibit gaze
188 Brian Scassellati

Fig. 7. Additional examples of successful face and eye detections. The system locates
faces in the peripheral camera, saccades to that position, and then extracts the eye
image from the foveal camera. The position of the eye is inexact, in part because the
human subjects are not motionless.

following if the distal object is within their field of view. They will not turn to
look behind them, even if the angle of gaze from the caretaker would warrant
such an action. Around 18 months, the infant begins to enter a “representational”
stage in which it will follow gaze angles outside its own field of view, that is,
it somehow represents the angle of gaze and the presence of objects outside its
own view.
Implementing this progression for a robotic system provides a simple means
of bootstrapping behaviors. The capabilities used in detecting and maintaining
eye contact can be extended to provide a rough angle of gaze. By tracking along
this angle of gaze, and watching for objects that have salient color, intensity, or
motion, we can mimic the ecological strategy. From an ecological mechanism,
we can refine the algorithms for determining gaze and add mechanisms for de-
termining vergence. A rough geometric strategy can then be implemented, and
later refined through feedback from the caretaker. A representational strategy
requires the ability to maintain information on salient objects that are outside
of the field of view including information on their appearance, location, size,
and salient properties. The implementation of this strategy requires us to make
Imitation and Mechanisms of Joint Attention 189

6 months: Sensitivity to field 12 months: Geometric stage

9 months: Ecological stage 18 months: Representational stage

Fig. 8. Proposed developmental progression of gaze following adapted from Butter-


worth (1991). At 6 months, infants show sensitivity only to the side that the caretaker
is gazing. At 9 months, infants show a particular strategy of scanning along the line
of gaze for salient objects. By one year, the child can recognize the vergence of the
caretaker’s eyes to localize the distal target, but will not orient if that object is outside
the field of view until 18 months of age.

assumptions about the important properties of objects that must be included in


a representational structure, a topic beyond the scope of this chapter.

3.3 Implementing Imperative Pointing


Implementing imperative pointing is accomplished by implementing the more
generic task of reaching to a visual target. Children pass through a developmen-
tal progression of reaching skills (Diamond 1990). The fist stage in this progres-
sion appears around the fifth month and is characterized by a very stereotyped
reach which always initiates from a position close to the child’s eyes and moves
ballistically along an angle of gaze directly toward the target object. Should the
infant miss with the first attempt, the arm is withdrawn to the starting position
and the attempt is repeated.
To achieve this stage of reaching on our robotic system, we have utilized the
foveation behavior obtained from the first step in order to train the arm where to
reach (Marjanović et al. 1996). To reach to a visual target, the robot must learn
the mapping from retinal image coordinates x = (x, y) to the head-centered gaze
coordinates of the eye motors e = (pan, tilt) and then to the coordinates of the
arm motors α = (α0 ...α5 ) (see Figure 9). The saccade map S : x → e relates
positions in the camera image with the motor commands necessary to foveate
the eye at that location. Our task then becomes to learn the ballistic movement
mapping head-centered coordinates e to arm-centered coordinates α. To simplify
190 Brian Scassellati

Identify
Visual Foveate Generate
Target Saccade Target Ballistic Reach
Map Map
Gaze Arm Primitive
Retinal Coordinates Coordinates
Coordinates

Image
Correlation Motion
Detection

Fig. 9. Reaching to a visual target is the product of two subskills: foveating a target
and generating a ballistic reach from that eye position. Image correlation can be used
to train a saccade map which transforms retinal coordinates into gaze coordinates (eye
positions). This saccade map can then be used in conjunction with motion detection
to train a ballistic map which transforms gaze coordinates into a ballistic reach.

the dimensionality problems involved in controlling a six degree-of-freedom arm,


arm positions are specified as a linear combination of basis posture primitives.
The ballistic mapping B : e → α is constructed by an on-line learning
algorithm that compares motor command signals with visual motion feedback
clues to localize the arm in visual space. Once the saccade map has been trained,
we can utilize that mapping to generate error signals for attempted reaches (see
Figure 10). By tracking the moving arm, we can obtain its final position in image
coordinates. The vector from the tip of the arm in the image to the center of
the image is the visual error signal, which can be converted into an error in gaze
coordinates using the saccade mapping. The gaze coordinates can then be used
to train a forward and inverse model of the ballistic map using a distal supervised
learning technique (Jordan & Rumelhart 1992). A single learning trial proceeds
as follows:
1. Locate a visual target.
2. Saccade to that target using the learned saccade map.
3. Convert the eye position to a ballistic reach using the ballistic map.
4. As the arm moves, use motion detection to locate the end of the arm.
5. Use the saccade map to convert the error signal from image coordinates into
gaze positions, which can be used to train the ballistic map.
6. Withdraw the arm, and repeat.
This learning algorithm operates continually, in real time, and in an unstructured
“real-world” environment without using explicit world coordinates or complex
kinematics. This technique successfully trains a reaching behavior within ap-
proximately three hours of self-supervised training. Video clips of Cog reaching
Imitation and Mechanisms of Joint Attention 191

Fig. 10. Generation of error signals from a single reaching trial. Once a visual target
is foveated, the gaze coordinates are transformed into a ballistic reach by the ballistic
map. By observing the position of the moving hand, we can obtain a reaching error
signal in image coordinates, which can be converted back into gaze coordinates using
the saccade map.

to a visual target are available from http://www.ai.mit.edu/projects/cog/,


and additional details on this method can be found in Marjanović et al. (1996).

3.4 Implementing Declarative Pointing

The task of recognizing a declarative pointing gesture can be seen as the appli-
cation of the geometric and representational mechanisms for gaze following to
a new initial stimulus. Instead of extrapolating from the vector formed by the
angle of gaze to achieve a distal object, we extrapolate the vector formed by
the position of the arm with respect to the body. This requires a rudimentary
gesture recognition system, but otherwise utilizes the same mechanisms.
We have proposed that producing declarative pointing gestures relies upon
the imitation of declarative pointing in an appropriate social context. We have
not yet begun to focus on the problems involved in recognizing these contexts,
but we have begun to build systems capable of simple mimicry. By adding a
tracking mechanism to the output of the face detector and then classifying these
outputs, we have been able to have the system mimic yes/no head nods of the
caregiver, that is, when the caretaker nods yes, the robot responds by nodding yes
(see Figure 11). The face detection module produces a stream of face locations
at 20Hz. An attentional marker is attached to the most salient face stimulus,
and the location of that marker is tracked from frame to frame. If the position
192 Brian Scassellati

Fig. 11. Images captured from a videotape of the robot imitating head nods. The upper
two images show the robot imitating head nods from a human caretaker. The output of
the face detector is used to drive fixed yes/no nodding responses in the robot. The face
detector also picks out the face from stuffed animals, and will also mimic their actions.
The original video clips are available at http://www.ai.mit.edu/projects/cog/.

of the marker changes drastically, or if no face is determined to be salient, then


the tracking routine resets and waits for a new face to be acquired. Otherwise,
the position of the attentional marker over time represents the motion of the
face stimulus. The motion of the attentional marker for a fixed-duration window
is classified into one of three static classes: a yes class, a no class, and a no-
motion class. Two metrics are used to classify the motion, the cumulative sum
of the displacements between frames (the relative displacement over the time
window) and the cumulative sum of the absolute values of the displacements
(the total distance traveled by the marker). If the horizontal total trip distance
exceeds a threshold (indicating some motion), and if the horizontal cumulative
displacement is below a threshold (indicating that the motion was back and
forth around a mean), and if the horizontal total distance exceeds the vertical
total distance, then we classify the motion as part of the no class. Otherwise,
if the vertical cumulative total trip distance exceeds a threshold (indicating
some motion), and if the vertical cumulative displacement is below a threshold
(indicating that the motion was up and down around a mean), then we classify
the motion as part of the yes class. All other motion types default to the no-
motion class. These simple classes then drive fixed-action patterns for moving
the head and eyes in a yes or no nodding motion. While this is a very simple
Imitation and Mechanisms of Joint Attention 193

form of imitation, it is highly selective. Merely producing horizontal or vertical


movement is not sufficient for the head to mimic the action – the movement
must come from a face-like object. Video clips of this imitation, as well as further
documention, are available from http://www.ai.mit.edu/projects/cog/.

4 Conclusion

Guided by evidence from developmental psychology, from disorders of social


development such as autism, and from the evolutionary development of these
skills, we have described a task-based decomposition of joint attention skills.
Our implementation of this developmental progression is still in progress, but
our initial results with finding faces and eyes, and with the imitation of simple
head movements, suggest that this decomposition may be a useful mechanism for
building social skills for human-like robots. If this implementation is successful,
we can then begin to use the skills that our robot has acquired in order to test
the developmental models that inspired our program. A robotic implementation
will provide a new tool for investigating complex interactive models that has not
been previously available.

5 Acknowledgements

Support for this project is provided in part by an ONR/ARPA Vision MURI


Grant (No. N00014-95-1-0600). The author receives support from a National
Defense Science and Engineering Graduate Fellowship.
The author wishes to thank the members of the Cog group for their contribu-
tions to this work: Rod Brooks, Cynthia Breazeal (Ferrell), Robert Irie, Charles
Kemp, Matthew Marjanovic, and Matthew Williamson.

References
Baron-Cohen, S. (1995), Mindblindness, MIT Press.
Brooks, R. & Stein, L. A. (1994), ‘Building Brains for Bodies’, Autonomous Robots
1:1, 7–25.
Brooks, R. A., Ferrell, C., Irie, R., Kemp, C. C., Marjanovic, M., Scassellati, B. &
Williamson, M. (1998), Alternative Essences of Intelligence, in ‘Proceedings of
the Fifteenth National Conference on Artificial Intelligence (AAAI-98)’, AAAI
Press.
Burghardt, G. M. & Greene, H. W. (1990), ‘Predator Simulation and Duration of Death
Feigning in Neonate Hognose Snakes’, Animal Behaviour 36(6), 1842–1843.
Butterworth, G. (1991), The Ontogeny and Phylogeny of Joint Visual Attention, in
A. Whiten, ed., ‘Natural Theories of Mind’, Blackwell.
Cheney, D. L. & Seyfarth, R. M. (1990), How Monkeys See the World, University of
Chicago Press.
Cohen, D. J. & Volkmar, F. R., eds (1997), Handbook of Autism and Pervasive Devel-
opmental Disorders, second edn, John Wiley & Sons, Inc.
194 Brian Scassellati

Dautenhahn, K. (1994), Trying to Imitate — A Step Towards Releasing Robots from


Social Isolation, in ‘Proc. From Perception to Action Conference (Lausanne,
Switzerland, Sept 7-9, 1994)’, IEEE Computer Society Press, pp. 290–301.
Dautenhahn, K. (1997), ‘I could be you — the phenomenological dimension of social
understanding’, Cybernetics and Systems 25(8), 417–453.
Dennett, D. C. (1991), Consciousness Explained, Little, Brown, & Company.
Diamond, A. (1990), Developmental Time Course in Human Infants and Infant Mon-
keys, and the Neural Bases, of Inhibitory Control in Reaching, in ‘Development
and Neural Bases of Higher Cognitive Functions’, Vol. 608, New York Academy
of Sciences, pp. 637–676.
DSM (1994), ‘Diagnostic and Statistical Manual of Mental Disorders’, American Psy-
chiatric Association, Washington DC.
Frith, U. (1990), Autism : Explaining the Enigma, Basil Blackwell.
Hauser, M. D. (1996), Evolution of Communication, MIT Press.
Hayes, G. & Demiris, J. (1994), A Robot Controller Using Learning by Imitation,
in A. Borkowski & J. L. Crowley, eds, ‘Proc. 2nd International Symposium on
Intelligent Robotic Systems’, Grenoble, France: LIFTA-IMAG, pp. 198–204.
Hobson, R. P. (1993), Autism and the Development of Mind, Erlbaum.
ICD (1993), ‘The ICD-10 Classification of Mental and Behavioral Disorders: Diagnostic
Criteria for Research’, World Health Organization (WHO), Geneva.
Jordan, M. I. & Rumelhart, D. E. (1992), ‘Forward Models: supervised learning with
a distal teacher’, Cognitive Science 16, 307–354.
Karmiloff-Smith, A., Klima, E., Bellugi, U., Grant, J. & Baron-Cohen, S. (1995), ‘Is
there a social module? Language, face processing, and theory of mind in individ-
uals with Williams Syndrome’, Journal of Cognitive Neuroscience 7:2, 196–208.
Marjanović, M., Scassellati, B. & Williamson, M. (1996), Self-Taught Visually-Guided
Pointing for a Humanoid Robot, in ‘From Animals to Animats 4: Proceedings of
the Fourth International Conference on Simulation of Adaptive Behavior (SAB-
96)’, Bradford Books, pp. 35–44.
Moore, C. & Dunham, P. J., eds (1995), Joint Attention: Its Origins and Role in
Development, Erlbaum.
Povinelli, D. J. & Preuss, T. M. (1995), ‘Theory of Mind: evolutionary history of a
cognitive specialization’, Trends in Neuroscience.
Pratt, G. A. & Williamson, M. M. (1995), Series Elastic Actuators, in ‘Proceedings
of the IEEE/RSJ International Conference on Intelligent Robots and Systems
(IROS-95)’, Vol. 1, Pittsburg, PA, pp. 399–406.
Premack, D. (1988), ”Does the chimpanzee have a theory of mind?” revisited, in
R. Byrne & A. Whiten, eds, ‘Machiavellian Intelligence: Social Expertise and
the Evolution of Intellect in Monkeys, Apes, and Humans.’, Oxford University
Press.
Ristau, C. A. (1991), Before Mindreading: Attention, Purposes and Deception in
Birds?, in A. Whiten, ed., ‘Natural Theories of Mind’, Blackwell.
Rowley, H., Baluja, S. & Kanade, T. (1995), Human Face Detection in Visual Scenes,
Technical Report CMU-CS-95-158, Carnegie Mellon University.
Scaife, M. (1976), ‘The response to eye-like shapes by birds. II. The importance of
staring, pairedness, and shape.’, Animal Behavior 24, 200–206.
Scaife, M. & Bruner, J. (1975), ‘The capacity for joint visual attention in the infant.’,
Nature 253, 265–266.
Scassellati, B. (1996), Mechanisms of Shared Attention for a Humanoid Robot, in
‘Embodied Cognition and Action: Papers from the 1996 AAAI Fall Symposium’,
AAAI Press.
Imitation and Mechanisms of Joint Attention 195

Scassellati, B. (1998a), A Binocular, Foveated Active Vision System, Technical Report


1628, MIT Artificial Intelligence Lab Memo.
Scassellati, B. (1998b), Finding Eyes and Faces with a Foveated Vision System, in ‘Pro-
ceedings of the Fifteenth National Conference on Artificial Intelligence (AAAI-
98)’, AAAI Press.
Sinha, P. (1994), ‘Object Recognition via Image Invariants: A Case Study’, Investigative
Ophthalmology and Visual Science 35, 1735–1740.
Sinha, P. (1996), Perceiving and recognizing three-dimensional forms, PhD thesis, Mas-
sachusetts Institute of Technology.
Sung, K.-K. & Poggio, T. (1994), Example-based Learning for View-based Human Face
Detection, Technical Report 1521, MIT Artificial Intelligence Lab Memo.
Thelen, E. & Smith, L. (1994), A Dynamic Systems Approach to the Development of
Cognition and Action, MIT Press, Cambridge, MA.
Whiten, A., ed. (1991), Natural Theories of Mind, Blackwell.
Wood, D., Bruner, J. S. & Ross, G. (1976), ‘The role of tutoring in problem-solving’,
Journal of Child Psychology and Psychiatry 17, 89–100.
Figures of Speech, a Way to Acquire Language

Anneli Kauppinen

University of Helsinki and Helsinki Institute of Technology


PL 166 00181 Helsinki Finland
Anneli.Kauppinen@helsinki.fi

Abstract. The main aim of this study is to discuss the assumption that analogy
and imitation may be a crucial principle in the acquisition of language. The
manifestations of the acquisition process are called figures of speech in this
study. These memorized entities carry some elements of former contexts in
them. Figures of speech may be identical (deferred) imitations, but as well
some parts, or the outline of a former utterance may be repeated in them. A
tendency of some speech functions to be acquired as figures was found in this
study. The findings lead to the constructive-type grammar, in which
pragmatics and semantics are an integral part of the structure.

1 Introduction

The significance of imitation in language acquisition was disputed in


Transformational Grammar in the 1970’s. In some present studies its role is,
however, being reconsidered. The Piagetian term “deferred imitation” has been cited
also in the context of acquiring language structures. Instead of speaking about
imitating in global terms we could, however, pay attention to the units of language
acquisition. Many structures are actually acquired as formulaic utterances or, as I
would like to call them, figures of speech recorded in the long term memory (Wong
Fillmore 1979, Peters 1983). It is possible to explain a great deal of language learning
by means of a flexible analogical pattern, a kind of schema or representation
(Johnson-Laird 1983). To keep to Piaget's original idea and expand it, representations
are defined in this study to be both bodily movements and (speech) utterances.
It is assumed that the structures of language are for a child an important way to
break away from the symbiosis with mother. But whose “speech” is represented in
the structures the child begins to use? Is it something predestined by biology?
Locke (1995) puts forward an interesting supposition: there are two different
mechanisms in the language acquisition process. A specialization of social cognition

C. Nehaniv (Ed.): Computation for Metaphors, Analogy, and Agents, LNCS 1562, pp.196 -208, 1999.
 Springer-Verlag Berlin Heidelberg 1999
Figures of Speech, a Way to Acquire Language 197

(SSC) is warmly interactive and facilitates learning by rote. A grammatical


analysis module (GAM) deals in rules and representations. According to Locke the
SSC is working from the birth, and the GAM begins to operate between 20 and 30
months. The SSC activates the GAM, and the two tend to be coordinated.
We can look at the language acquisition problem from Bakhtin’s (1990) and
Voloshinov’s (1973) [1929] point of view, too. According to them language is
acquired in social context. Language structures are for the child a way to become a
subject and the stuctures are acquired in dialog. Voloshinov argues that a linguistic
structure is always orientated in the context. Expression organizes the experience, and
there is no linguistic creativity without a dialogic element in the structure.
It is typical for both adults and children to use memorized sequences in their
speech (Tannen 1989, Hopper 1993). The best known examples of them are e.g.
idioms and small talk phrases. But according to Fillmore, Kay, and O’Connor
(1988) “the realm of idiomaticity in a language includes a great deal that is
productive, highly structured and worthy of serious grammatical investigation.” Thus,
we have some reason to suppose that memorizing may be a much more important
language acquiring and using strategy than has been assumed.
There are many kinds of “deferred imitations”. They may be structures or
figures which occur repeatedly in the speech. Some of the repetitions or imitations
are identical, but mostly there is some variation compared to the original model. Some
of them are formal idioms, as Kay and Fillmore (1997) have termed utterances which
carry semantic and pragmatic features in their structure (as What are you doing
naked? / What are you doing without money?). Children, and supposedly adults too,
learn these kinds of structures analogically. The borderline between idioms,
formulaic utterances, and the so called productive ones is very fuzzy. There is also
an interesting connection between constructive grammar and classical rhetoric. Turner
(1997) has pointed out many similarities between the notions of construct and figure
(or schema) in old rhetoric.

2 Databases and Procedures

The current data are based on longitudinal diaries and tape recordings on a Finnish
boy, Teemu, in everyday situations (T-data). This database is compared with
recordings made in the University of Oulu and the University of Helsinki on about
twenty Finnish children.
When going through T-data, I found a tendency of some categories and speech
functions to be acquired by deferred imitation/repetition. This tendency was checked
by studying the other databases. Imitation is, however, not a fully adequate name for
this process, because the repeated chunks of speech may be variable. At the level of
syntax some of them are analogically acquired frames with open slots, and they may
be coupled with other structures. In addition, there are many examples of analogy in
the acquisition of semantics.
198 Anneli Kauppinen

3 Analogy in Early Semantics

Piaget’s theory of deferred imitation entails analogic thinking. For instance, in order
to imitate an opening and closing box with his or her hands, a child has had to
realize the functional similarities between these two events. From another viewpoint
these two courses of events have an iconic connection. A child’s utterance (speech,
movements, rhythm) is, from this perspective, an icon which stands for the
movements of the box. In T-data, the functional similarity is an important ground for
early semantic inference.
In T’s speech the word wov-wov stood at age 1;3 to 1;7 (year; months) for
different animals from cows to birds, both toys and real ones. Such examples are
known in the child language literature. The word ovi ‘door’ was used also for a
wooden cover. The verb avata ‘open’ was used also for ‘uncover’ and ‘peel’, as
'please, open this orange'. At the age of 1;9 T. had one utterance "piillon" (< piiloon
'to a hidden place') for four kinds of activities. He said "piillon", when 1) putting a
paper roll into a stick-type stand, 2) putting a folder into its cover, 3) folding a paper
sheet with a drawing in it, and 4) crawling under an adult’s bent knees. One word for
all these activcities is an indication of the child’s way to connect these functionally
similar acitivities analogically together.

4 Compliance Figures and Acquired Arguments

Many prohibitions by adults, directed to the child are at first memorized as formulaic
utterances about at two years of age. In the language acquisition research, there are
many examples of adults’ prohibitions switched to new contexts by children (Clark
1977, Painter 1984, Clancy 1985, Katis 1997). I call them compliance figures,
because children direct the imitated prohibitions towards themselves in private
speech, as Teemu in (1):

(1) Teemu (2;2) takes an orange and a knife and says to himself:
Älä itte kuori, älä itte kuori.
'Don't peel by yourself, don't peel by yourself'

Such deferred imitations (1) represent an adult’s voice, or to put it in Bakhtin's


words: another's voice. These figures are connected to the contexts in which they
have been used by adults with the affective meanings attached to them. Later on,
the figures (or parts of them) can become flexible enough to be used as expressions
of the child’s own intentions.
When comparing the memorized prohibitions in different cultures, English,
(Clark and Painter), Japanese (Clancy), Greek (Katis), it is possible to find they
represent the different figures of speech adults use for bringing up their children. A
good indication of these differences are the early examples of conditional complex
sentences in Japanese studied by Clancy (1985) and in Greek studied by Katis
(1997). These examples at the age of about two, or even earlier, are exceptions to the
Figures of Speech, a Way to Acquire Language 199

acquisition order supposed to be dependent on cognitive complexity. The Japanese


and Greek examples indicate that the acquisition order is determined not only by the
complexity of the structures but also by their functions. Some types of utterances are
for children more essential than others because of their purpose. The use of
compliance figures belongs to the SSC-type learning described by Locke. But as
Locke emphasizes, SSC and GAM tend to work together. This neurologically
grounded theory leads also to an important conclusion: pragmatic meanings can be
included in the syntactic structures from the beginning. Actually, due to this particular
property, they have to be called figures of speech, not mere abstract structures.
It is a typical feature of the compliance figures that they are switched to a new
context parallel to the original situation. Thus, we can conclude that before analyzing
the syntactic structure of the utterance, the child is able to use it in a relevant context,
which also works as a trigger for this utterance. It seems to me that the whole figure,
the affect attached to it and the relevant context are the most important elements in the
use of this kind of figures.
This principle can be applied also to other types of utterances. In T-data, there
are examples of acquired arguments switched to relevant or nearly relevant places in
a conversation. In example (2) mother forbids the child: älä ota mun kynnää 'don't
take my pen'. Mother had argued for this refusal (or this kind of refusal) in earlier
conversations by saying 'mother will cry'. This is used by the child as delayed
imitation. On the ground of the earlier language experience, he is able to make a
collaborative construction in the conversation (see Lerner 1991).

2;0.12
(2) Mother: Älä ota mun kynnää
Teemu : äiti ikkee

Mother: 'Don't take my pen


Teemu: mummy will cry'

In this example (2) the important trigger of the argument is the preceding turn.
The argument 'mummy will cry' is comparable to the compliance figures, because it
represents the adult's, not the child's own voice.
There are also other examples of this kind of acquired argument, used as delayed
imitations. Teemu's parents had earlier sometimes argumented their prohibitions by
saying sitte iltapäivällä, '[not now but] later in the afternoon'. During the age 2;5.17 -
2;8.18 there are eight examples of different conversations including the imitated
argument sitte iltapäivällä by Teemu, as in example (3). There is also another
imitated argument in this conversation (3): sitte pannan laastari, '[don't worry] we'll
put a band-aid on'.

(3) 2;6.7
Mother: Älä pelleile manteli suussa, tullee pipi.
Teemu: Sitte pannaan laastari
Mother: Ei se auta.
Teemu: Sitte iltapäivällä auttaa
200 Anneli Kauppinen

Mother: 'Don't fool around with an almond in your mouth, it will


hurt.
Teemu: Then we'll put a band-aid on.
Mother: It doesn't help.
Teemu: Later in the afternoon it will help.'

There is a difference between the figures in examples (1) and (2) compared to
example (3). The interpretation of the arguments 'then we'll put a band-aid on' and
'later in the afternoon it will help' in (3) is persuasive. They are imitated and they
reflect the adult-way of argumentation, but the figures are taken to work for the
child's own intentions. It seems to me that the child is somehow testing the effect of
this kind of argumentation. As we can see, these figures (3) are not in a quite
relevant context, but the examples lend support to the notion that "delayed
imitations" can be used productively in conversations.

5 The Definitions of Things

The definitions of everyday objects are important speech topics for children. By
describing the functions of the things around them children prepare themselves for
future practical tasks. These definitions occur in T- data in some repetitive forms,
as relative clauses, figures with modal verb voi, 'can' , and 'if - then' structures. The
child uses these formulas analogically: keeping up the “frame”, but varying the words.
Two examples of the formula (Pro)noun+ADE can verb+INF, 'It is possible to do
something with X' / 'One/You can do something with X', are presented in (4a),
(4b). The adessive case has an instrumental function in them. The structure is generic,
without any subject. This kind of figure is a usual way to define things, and therefore
is has been acquired analogically from adult speech.

(4a) 2;4.18
Teemu: Mikä tää o-n?
What this be+3SG
'What is this?'

Adult: Kamera
camera
'[It is] a camera'

-> Teemu: Si-llä voi naksauttaa


it + ADE can (SG 3) click (INF)
'One can click with it’
(4b) 2;6.12
Teemu says to himself:
Lumilapio-lla voi ottaa lun-ta
snow shovel +ADE can (SG 3) take (INF) snow+PRT
‘One can take snow with the snow shovel’
Figures of Speech, a Way to Acquire Language 201

________________________________________________________
Table 1. The first variations of the formula saa(n)ko ottaa (‘may I take’) in T-data. The
only exceptions from adult conventionality in table 1 are the phonological deviations ( as
partitive forms"mevuja" and "meuja" for mehua, "ieluja" for viulua, and the verb “aako” for
saako).
saa -n -ko minä ottaa
may+1SG+NTRG I take (INF)
‘May I take?’

AGE INF OBJECT

(years, months, days)


1;11.14 Aako ottaa mevu-ja ?
juice+PRT
1;11.18 Aako ottaa meu-ja ?
‘May [I] take juice?’

1;11.18 Aako ottaa ielu-ja ?


violin+PRT
'May [I] take a violin?'

1;11.18 Aako ottaa ääke-ttä ?


medicine+PRT
‘May [I] take medicine?’

1;11.18 Saako kävellä ?


‘May [I] walk?’

2;0.12 Saako ottaa?


‘May [I] take ?'

2;0.12 Saako mennä, jooko?


'May [I] go O.K.?'

2;0.12 Saako ottaa issä-ä?


more+PRT
‘May [I] take more?’

2;0.12 Saako ottaa, yh-e, rurino-i-ta?


one+PRT raisin+PL+PRT
‘May [I] take one, raisins?’

2;3.5 Saa-n-ko minä lissä-ä?


more+PRT
‘May I [take] more?’

2;3.5 Saa - n-ko minä si-tä keksi-i ?


it+PRT cracker+PRT
‘May I [have] that cracker?’

_________________________________________________________
202 Anneli Kauppinen

6 Utterances of Will and Permission

The permissions are important for a child to cope with everyday social situations.
Apparently for this reason some utterances of will and permission are acquired as
figures of speech in T-data. One example of this acquisition process is represented in
table 1. It can be seen that the formula, or collocation, saako ottaa, ‘may [I] take’ ,
becomes little by little more flexible and variable. All of these syntactic structures are
conventional in adult language. The generic structure saako ottaa without 1 SG
suffix or 1 SG pronoun, mostly gets the inclusive interpretation 'may I take?' in
adult speech, too. The kernel of the formula (s)aako ottaa gets different object
complements, all of them in partitive case (see table 1): Later, at the age of 2;3.5, the
first utterances with explicit 1 SG suffix and pronoun appear, but the infinitive is
dropped. The first person pronoun takes the place of the infinitive, and therefore the
outline and the rhythm of the utterances are preserved.

7 Utterances with the Conditional Verb Forms

The conditional verb forms are acquired and memorized in figures of speech
(formulaic utterances) in T-data. There are totally about 670 occurrences of utterances
including conditional verb forms in the database. They can be grouped into 36
different formulas. These formulas are frequent also in the other Finnish databases
analyzed for this research. The main functions of the conditional utterances are
request, imagining, and planning. (Kauppinen, forthcoming.) The first occurrences
of the conditional verb forms appear in T-data at the age of 2;0. All the occurrences
during 2:0 - 2;1 represent one figure, the semi-idiomatic question 'What would PRON
be?' By the age 2;4 there have been 35 occurrences of the conditional verb forms, all
of which belong to 4 figures. The findings suggest the supposition that the child
does not acquire distinct verb forms but some figures of speech including conditional
verb forms. In other words, he acquires some means to request, imagine and plan.
Each figure is a way to plan and handle everyday situations.
Compared to many Indo-European languages, the Finnish conditional verb form
can be said to have functions of both subjunctive and conditional verbs (Kauppinen
1996). In most languages these two verb categories have specific contexts, as
conditional verbs in apodosis, and subjunctive forms for example in protasis, final,
and concessive clauses. (Bybee et al.1994; Fillmore et al. 1988.) For this reason the
configuration of these verb categories is an essential feature of them, and therefore
it belongs to adult speech routines, too.
It is possible to see conditional sentences (with indicative verb forms, too) as a
figure of speech or a rhythmic pattern, as Ellis (1995) has put it. The conditional
sentences have their specific senses in different languages. It is not possible to learn
some patterns of logical argumentation without learning the ‘if - then’ formula, a
kind of representation, (see also Johnson-Laird 1983). Conditional structures have in
addition some other special senses, e.g. threatening in many languages (If you don’t
eat, I’ll - - ). Many examples of child language suggest that this affective meaning is
acquired together with the conditional figure of speech.
Figures of Speech, a Way to Acquire Language 203

7.1 Figures in Pretend Play

The imaginative function of speech is early. When Teemu, at the age of 1;4, crawled
on all fours and imitated a dog, he was a conscious pretender. When he, at the age of
two, calls a piece of wood "gun" or puts a stick into his mouth and pretends to smoke,
the whole pretending pattern is an analogical representation of earlier experiences. In
the pretend play the child moves to another mental space which he knows to be
different from the reality. According to Vygotsky (1978) the essence of pretend play
is analogical, not symbolic. The pretend plays are planned structures, the parts of
which are e.g. emplotment, enactment, and underscoring (Giffin 1984), as in the
pretend play by Aino aged 5 and Eeva aged 2;6 :

(5) Eeva: nää läht-is avaruut-een. [EMPLOTMENT]


these go+CON outer space + ILL
'[Let's pretend that] these ones went to outer space.'

avaruude-ssa on hirveän kylmä hui hui [UNDERSCORING]


space+ INE be 3SG terribly cold brr,brr
'It is terribly cold in outer space, brr brr'

Aino: siel on niin kylmä [UNDERSCORING]


There be 3SG so cold
'It is so cold there.'

siel tuli-s lisko-j-a. [EMPLOTMENT]


there come+CON lizard+PL+PART
'[Let's pretend that] lizards came there'

Eeva: joo [EMPLOTMENT]


yeah

Aino: Avaruude-ssa tule-e lisko-j-a [UNDERSCORING]


Space+ INE come+3SG lizard+PL+PART
'In outer space there comes lizards.'

Eeva pretending to be the child (in high pitch):


Apua, lisko-ja. Äiti, mu-a pelotta-a [ENACTMENT]
Help lizard+PL+PRT Mummy me+PRT be afraid+3SG
'Help me! Lizards! Mummy, I am afraid!'
Emplotment is a space builder – in this example (5) by accident also literally (see
Fauconnier 1985). In Finnish the prototypical, grammaticized form of emplotment is
a main clause with the conditional verb form. Neither any lexical reinforcement of
a modal verb nor 'let's pretend' type explication is needed for space building in the
emplotment. Instead, the space is built by a mere conditional affix of the main verb
(Kauppinen1996).
204 Anneli Kauppinen

The functional equivalents of this utterance type in many Indo-European


languages are main clauses with past tense or subjunctive/conjunctive verb forms.
Emplotment utterances typical to children in some Indo-European languages are e.g.
examples (6a - d).

(6a) Italian
Io sono il marito, e tu eri [IMPF] la mia moglie. (Bates 1976)

(6b) English
You were mother and she didn’t want you to go. (Lodge 1978)

(6c) Dutch
Ik was [IMPF] de vader en ik ging [IMPF] een diepe kuil graven. (Kaper 1980)

(6d) German
Dies ist ein Pferd und das wäre [KONJUNKTIV] der Stall. (Kaper 1980).

The examples (6a - d) indicate that there is a prototypical figure of speech specialized
for the emplotment of the pretend play in many languages. It has a specific space
builder function. In the studied languages the characteristic verb forms in the
emplotment of the pretend play seem to be subjunctive/conjunctive in nature.

7.2 Pretending and Conditionality - Different Figures

The morphosyntactic form of emplotment has been supposed to be characteristic of


children in different languages, especially because of the use of past tense in the play
planning function. The equivalent Finnish structure with the subjunctive-type
conditional verb form sounds, also, somehow unconventional. The comparative
investigation, however, shows that Finnish adult speakers use this kind of figure, but
not so frequently as children (Kauppinen, 1996; 1998). Musatti and Orsolini (1993)
put forward an equivalent assumption concerning the imperfect in Italian adult
language.
With the subjunctive-type verb form in their structure, the emplotment figures
come close to the protases, the only difference being connective 'if' in the beginning
of the figure. Let us compare examples (7a) and (7b):

SPACE BUILDER ACTION IN THE MENTAL SPACE

(7a)

Mä ol-isi-n äiti "Lapsi, tule syö-m-ään." "Nam, nam."


I be+CON+1SG mother Child come+IMP eat+INF+ILL

'I was/were mother "Child, come to eat." (C. is eating:) "Yum, yum." '
Figures of Speech, a Way to Acquire Language 205

(7b)

Jos mä ol-isi-n äiti, tarjoa-isi-n lapse-lle hyvä-ä ruoka-a.


if I be+CON+1SG mother offer+CON+1SG child+ALL good+PRT food+PRT

'If I were mother, I would offer delicious food for the child.'

An essential difference between the structures is that in (7a) the built mental space is
realized immediately as action, turn taking, and underscoring. In the conditional
complex sentence (7b), instead, there is a distance between the space builder and
action, and the indication of the distance is the connective 'if' (see Werth 1997).
These two structures (7a) and (7b) also include different contextual and pragmatic
senses. This is an important point in language acquisition. The logic of the pretend
play is not conditional logic but it is grounded on open possibilities and negotiation.
Children also prefer pretense-type structures because of the possibility to immediate
action. (see Kauppinen 1996.) Action precedes speech also in the human ontology.
The conditional complex sentences are common measures of children's
inference ability in the language acquisition research. It is typically assumed that the
ability of logical inference and the language structures the children use have a direct
connection to each other (e.g. Bowerman 1986). The theory of figures of speech,
instead, takes into account the child's need and will to use an utterance because of its
sense in the social context in question. On the basis of this view, it is possible to
explain the "exceptions" to the assumed acquisition order of language structures. It
also clarifies why children, such as Teemu, may favour pretending figures instead
of the conditional complex sentences.

8 Conclusions

There are plenty of examples of deferred imitation in language acquisition, if we give


it a broad interpretation. There are many features of analogy in the above-mentioned
examples. Imitation and analogy concern not only linguistic structures but also
movements, activities and contexts as a whole. Analogical acquisition is functional,
discourse bound and also sensitive to affect. The formulaic utterances are at first
typically rigid (deferred) imitations, but little by little they become more flexible and
variable. They may be coupled with other structures. Some of them are analogically
acquired frames with open slots.
The current study suggests that the children do not always analyze structures,
but instead catch some figures of speech in their contexts, and after that they may
switch them to equivalent contexts and vary them. The results suggest that both
the adult utterances and the adult sense in them are acquired at the same time by
children. The child may be sensitive to the relevant context before he or she
understands the logical connection between the memorized utterance and the
context. Speech is a collection of habits, and habits entail imitation and repetition.
Actually, it is not easy to make a clear distinction between habit and grammar.
The function of figures in language acquisition supports the idea of the
polyphony of speech, cited mostly in the context of literature research (Bakhtin,
206 Anneli Kauppinen

1990). As a social phenomenon, language is always a blending of many voices. An


utterance is not an individual product, but it originates in conversations and grows in
them. So, there cannot be any original space where the child's own experiences and
thoughts were uttered purely. On the other hand, the child learns little by little to
choose the structures which best meet his or her intentions.
Some speech functions tended to be acquired as figures of speech in the T-data,
and examples of this tendency were found also in other databases. Compliance figures
have been found in many languages. The structures with conditional verb form have
this tendency also in the other Finnish databases. The utterances of prohibition, will,
permission, request, imagining, planning, as well as definitions have something in
common. All of them can be called coping structures. The figures of speech seem to
be a strategy to manage in everyday situations. It is also remarkable that many
phenomena described in this study have equivalencies in adult language.

9 The Epilogue. Figures and Images

The theory of figures of speech leads to constructive grammar, which, in turn,


entails that not only words, but also linguistic structures may have connotations.
When finishing this article I noticed the news about the Nokia's unfortunate
advertisement campaign. The Finnish telecommunication factory had marketed its
cellulars with different covers in Germany, using the phrase Jedem das Seine, freely
translated: 'for everyone what he deserves'. Unfortunately this utterance had been the
slogan on the gate of Buchenwald concentration camp. Many Jewish organizations
protested, because they considered it to be an insult to Jews. This reaction was a total
surprise for Nokia Mobile Phones. The campaign had been planned by a German
advertisement agency. Nokia apologized and let destroy all the marketing material
including this phrase.
This case clarifies the idea of the constructs and figures. Jedem das Seine is a
construct in which the background is essential from the point of view of
interpretation, included the affective sense in it. The words in the phrase are common
in the German language without any special affect, but grouped as a figure, which
millions of Jews have learned on the gate of Buchenwald, they had got an indelible
stigma. If the same words had appeared in a slightly different structure, this
impression would have been avoided. The figure carries contexts in it.

Abbreviations
ADE adessive 'at', 'on', 'with' (instrumental)
CON conditional affix
INE inessive 'in'
INF infinitive
ILL illative 'into'
IMP imperative
IMPF imperfect verb form
INTRG interrogative morpheme
PL plural
PRON pronoun
PRT partitive case
SG singular
Figures of Speech, a Way to Acquire Language 207

References

1. Bakhtin, M. M.: The Dialogic Imagination. In: Four Essays by Michael Bakhtin. Edited by
Michael Holquist. Translated by Caryl Emerson and Michael Holquist. University of
Texas Press Austin (1990).
2. Bates, E.: Language and Context: The Acquisition of Pragmatics. Academic Press New
York (1976).
3. Bowerman, Melissa : First steps in acquiring conditionals. In: Traugott, E.C., Meulen, A.,
Reilly, J. S., Ferguson, C. A. (eds.): On Conditionals. Cambridge University Press
Cambridge (1986).
4. Bybee, J., Perkins R., Pagliuca W.: The Evolution of Grammar. Tense, Aspect, and
Modality in the Languages of the World. The University of Chicago Press Chicago
(1994).
5. Clancy, P. M.: The Acquisition of Japanese. In: Slobin, D. I. (ed.): The Crosslinguistic
Study of Language Acquisition. Volume 1: The Data. Lawrence Erlbaum Associates
Hillsdale, New Jersey (1985).
6. Clark, R.: What’s the use of imitation. Journal of Child Language 4 (1977) 341-358.
7. Ellis, R. D.: The imagist approach to inferential thought patterns: The crucial role of
rhythm pattern recognition. Pragmatics & Cognition. Vol. 3, No. 1, 75-109 (1995).
8. Fauconnier, G.: Mental Spaces: Aspects of Meaning Construction in Natural Language. A
Bradford Book Cambridge (1985).
9. Fillmore, C. J., Kay, P., O’Connor M. C.: Regularity and Idiomaticity in grammatical
constructions: The case of “Let alone”. Language. Volume 64 (3) 501-538 (1988).
10. Giffin, H.: The Coordination of meaning in the Creation of a Shared Make-Believe
Reality. In: Bretherton, I. (ed.) Symbolic Play. The Development of Social
Understanding. Academic Press, INC. Orlando. 73—100 (1984).
11. Hopper, P.: Emergent Grammar. — Proceedings of the Thirteenth Annual Meeting.
Berkeley Linguistic Society (1987).
12. Johnson-Laird, P. N.: Mental Models. Towards a Cognitive Science of Language,
Inference, and Consciousness. Cambridge University Press Cambridge (1983).
13. Kaper, W.: The use of the past tense in games of pretending. Journal of Child Language
7. 213—215 (1980).
14. Katis, D. :The emergence of conditionals in child language: Are they really so late? In:
Athanasiadou, A., Dirven R. (eds.): On Conditionals Again. Current Issues in Linguistic
Theory 143 John Benjamins Amsterdam/Philadelphia (1997).
15. Kauppinen, A.:The Italian imperfetto compared to the Finnish conditional verb form —
evidence from child language. Journal of Pragmatics 26, 109-136.(1996).
16. Kauppinen, A.: Puhekuviot, tilanteen ja rakenteen liitto. [Figures of speech, a union of
situation and structure.] Suomalaisen Kirjallisuuden Seura Helsinki (1998).
17. Kauppinen, A.: Acquisition of the Finnish conditional verb forms in formulaic utterances.
In: Hiraga, M., Sinha, C., Wilcox S.(eds.): Cognitive Linguistics 95, Vol. 3: Cultural,
Psychological, and Typological approaches. John Benjamins. (forthcoming).
18. Kay, P., Fillmore, C. J.: Grammatical Constructions and Linguistic Generalizations: the
`What's X doing Y?’ Construction , <http://www.icsi.berkeley.edu/~fillmore/concon.html>
1997 (Read 27th March 97).
19. Lerner, G. H., On the syntax of sentences-in-progress. Language in Society 20, 441—458
(1991).
20. Locke, J. L.: Development of the Capacity for Spoken Language. In: Fletcher, P.,
MacWhinney, B. (eds.): The Handbook of Child Language. Basil Blackwell
Ltd.Cambridge, (1995).
21. Lodge. K. R.: The use of the past tense in games of pretend. Journal of Child Language 6
(1978).
208 Anneli Kauppinen

22. Musatti, T., Orsolini, M.: Uses past forms in the social pretend play of Italian children.
Journal of Child Language 20, 619—639 (1993).
23. Painter, C.: Into the Mother Tongue: A Case Study in Early Language Development.
Frances Pinter London (1984).
24. Peters, A. M.: The Units of Language Acquisition. Cambridge University Press Cambridge
(1983 ).
25. Tannen, D.: Talking Voices. Repetition, Dialogue, and Imagery in Conversational
Discourse. Studies in Interactional Sociolinguistics 6. Cambridge University Press
Cambridge (1989)
26. Turner, M.: Figure. Manuscript. Forthcoming in: Cacciari - Gibbs - Katz - Turner:
Figurative Language and Thought. Oxford University Press. (1997) .
27. Voloshinov, V. N.: Marxism and The Philosophy of Language. Translated by Ladislav
Matejka and J.R. Titunik. Seminar Press New York. Studies in Language 1. Originally
published in Russian under the title Marksism i filosofija jazyka. Leningrad. Seminar
Press New York 1973 [1929].
28. Vygotsky, L.S.: Mind in Society. The Development of Higher Psychological Processes.
Edited by Michael Cole, Vera John-Steiner, Sylvia Scribner, Ellen Souberman. Harvad
University Press Cambridge (1978).
29. Werth, P.: Conditionality as Cognitive Distance. In: Athanasiadou, A., Dirven, R. (eds.):
On Conditionals Again. Current Issues in Linguistic Theory 143. John Benjamins
Amsterdam/Philadelphia (1997).
30. Wertsch, J. V.: The Semiotic Mediation of Mental Life: L. S. Vygotsky and M. M.
Bakhtin. In: Mertz, E., Parmentier., R. J. (eds.): Semiotic Mediation. Sociocultural and
Psychological Perspectives. pp. 49—71. (1985).
31. Wong Fillmore, L.: Individual differences in second language acquisition. In: Fillmore, C.
J., Kempler, D., Wang, W. S-Y. (eds.): Individual Differences in Language Ability and
Language Behaviour. Academic Press New York (1979).
“Meaning” through Clustering by
Self-Organisation of Spatial and Temporal
Information

Ulrich Nehmzow

Department of Computer Science


Manchester University, Manchester M13 9PL, United Kingdom
ulrich@cs.man.ac.uk

Abstract. This paper presents an episodic mapping mechanism used


for the self-localisation of autonomous mobile robots. A two layer self
organising neural network classifies perceptual and episodic information
to identify “perceptual landmarks” (and thus the robot’s position in the
world) uniquely. Through this process relevant information is obtained
from the temporal flow of ambiguous and redundant sensory information,
such that meaningful internal representations of the robot’s environment
emerge through an unsupervised process of self-organisation, construct-
ing an analogy to the real world.

1 Introduction
1.1 Motivation: Transfer of Redundant Sensory Perceptions to
Meaningful Localisation Information
The ability to move towards specific, identified places within the environment is
one of the most important competences for a mobile robot.
While tasks that require only either random exploration or following canoni-
cal paths (such as paths marked by induction loops, beacons, or other markers)
can be achieved by low-level behaviour based control (see e.g. [1], more complex
navigation tasks for mobile robots (such as for example delivery tasks) require
the robot to determine its current position, the goal position, and the required
motion between the two.
Apart from ‘staying operational’ (involving obstacle avoidance and staying
within the machine’s operational limits) navigation requires 1) the ability to
construct a mapping between observed features and an internal representation
(“mapbuilding”), and 2) the interpretation of this mapping (“map interpreta-
tion”).
Following Webster’s definition of “analogy” as a “resemblance in some partic-
ulars between things otherwise unlike” [14], such mapbuilding is the construction
of an analogy between the real world and a robot’s perception of it.
In order to use its representation of the world at all, the robot must know
where it is in representation space: localisation is the most fundamental compo-
nent of interpreting mappings. Unless the robot can identify its current position
on the map, no path planning and hence no navigation can be performed.

C. Nehaniv (Ed.): Computation for Metaphors, Analogy, and Agents, LNCS 1562, pp. 209–229, 1999.

c Springer-Verlag Berlin Heidelberg 1999
210 Ulrich Nehmzow

To achieve self-localisation, the only information available to a mobile robot


is the temporal flow of (redundant, noisy and ambiguous) sensory perception.
The topic of this paper is how this temporal flow of perceptions can be used
to generate stable and meaningful representations of space that can be used for
mobile robot self-localisation. We present a localisation mechanism based on an
“episodic mapping” paradigm that allows a mobile robot to identify its current
position within the mapping of the environment, based on current and preceding
perceptions of the world. Using an entropy-based quality metric we compare the
episodic mapping with a static mapping paradigm ([9,11,10] (i.e. a paradigm that
does not exploit temporal information), and demonstrate that episodic mapping
produces measurably better localisation performance than static mapping.

1.2 Related Work

The mapping used here is similar in many ways to hippocampal mappings found
in rats. In particular, place cells1 in the rat’s hippocampus can be likened to
activity patterns observed in the self-organising feature maps used here. There
have been a number of implementations of robot navigation systems that simu-
late such place cells, notably the work of Burgess, Recce and O’Keefe [2,3], but
also of others [8,9].
Self-organising mechanisms for static sensor signal clustering have been used
for robot localisation before [9,11,6,16]: the current sensory perception of a mo-
bile robot is clustered through an unsupervised, self-organising artificial neu-
ral network, and the network’s excitation pattern is then taken to indicate the
robot’s current position in perceptual space. If no perceptual aliasing (ambigu-
ous sensory perceptions) were present, this would then also identify the robot’s
position in the world unambiguously. Contrary to work discussed in this paper,
however, no information about perception over time was encoded in these cases.
Regarding the use of episodic information as input to a self-organising struc-
ture, some work has been done using such information in the input to a single
layer Kohonen network [10]. The work discussed here differs from that approach,
in that here we use a second Kohonen network that clusters the already clus-
tered sensory information encoded in the first layer network, rather than using
sequences of raw sensory data.
There is also related work in the area of assessment of robot performance [12].
Notably the work of Lee and Recce is relevant to the experiments reported here.
In their case [7] however, mapbuilding performance was measured against a
hand-crafted “optimal” performance (i.e. an absolute comparison). In contrast
to this, we perform quantitative comparisons between two algorithms performing
under identical conditions (i.e comparing against a relative standard).
1
Recordings from single units in and around the rat hippocampus show strong cor-
relation between a freely moving animal’s location and cell firing ([2]. Certain cells
only fire when the rat is in a restricted portion of its environment.
“Meaning” through Clustering 211

1.3 “Anchoring” through Exteroception

We contend that any navigation system that is to be used on a real robot, be-
yond the immediate vicinity of a “home location”, and over extended periods of
time, has to be anchored in exteroception2 , for the following reasons. Proprio-
ceptive systems are subject to uncorrectable drift errors, which means that the
anchor points of such navigation systems will change over time and introduce
navigation errors that are not correctable through proprioception alone. Only
through calibration using exteroception can this error be removed (a recent ex-
ample of such a system is presented in [15]. Drift errors are inherent and not
an engineering problem - more precise wheel encoders will simply mean that
the proprioception-based navigation system will function correctly for a longer
period of time. Eventually, however, it will fail, which is our main reason for
investigating landmark-based robot self-localisation.

2 The Episodic Mapping Algorithm

2.1 Introduction

The fundamental principle behind an episodic mapping mechanism is to take


into account both the sensory perceptions (perceptual signatures) at the robot’s
current location, as well as a history of the robot’s past perceptions. This allows
the disambiguation of two locations with identical perceptual signatures, if the
perceptions preceding those two locations differ. A localisation system based on
this method has successfully been used at Manchester for localisation when the
robot has got completely lost [4].

2.2 Why Topological Mappings?

There are two main shortcomings of any episodic mapping mechanism: firstly,
it is dependent upon robot motion along a fixed path (or a few fixed paths),
because a unique and repeatable sequence of perceptions is required to identify
a location. Secondly, localisation is affected by “freak perceptions”3 for a much
longer time than in a navigation system based on the current perception only,
because any erroneous (freak) perception is retained for n timesteps, where n
is the number of past perceptions used for localisation. Such freak perceptions
do not normally occur in computer simulations, but they occur frequently when
a real robot interacts with the real world, because of sensor properties (e.g.
specular reflection of sonar signals), sensor noise, or electronic noise.
The episodic mapping algorithm proposed here specifically addresses this
question of how to cope with freak perceptions when using an episodic mapping
2
Sensory stimuli impinging on the robot from the outside, as opposed to propriocep-
tion (using internal sensory stimuli).
3
Spurious sensory perceptions caused by intermittent processes such as specular re-
flections or sensor crosstalk.
212 Ulrich Nehmzow

mechanism. Because the mapping algorithm is based on topological (i.e. simi-


larity preserving) mappings of both sensory perceptions and episodes of percep-
tions, freak perceptions will affect localisation performance to a far lesser degree
than they would in systems that are dependent on uncorrupted perception of
each one of n sensory images.

2.3 Topological Clustering Using Self-Organising Feature Maps

To cluster incoming sensory information, in both the static and the episodic
mapbuilding paradigm, a self-organising feature map ([5] was used.
The self-organising feature map (SOFM) is an example of an artificial neural
network that performs a topological clustering of its input data using an unsu-
pervised learning mechanism. The network consists of one layer of cells typically
arranged as a two dimensional grid. Figure 1 shows the basic structure of such
a network.

Input vector i

w
... jn

... Oj
...
...
: : : :
...
Fig. 1. Structure of the SOFM

The input vector i, containing sensory information, is presented to each out-


put unit j of the network. Each unit j has a normalised weight vector wj .
The continuous-valued output oj of unit j is found by calculating the
weighted sum of its inputs, given by:

n

oj = wjk ik = wj · ı, (1)
k=1

with n being the number of elements in the input vector and the weight vectors.
The initial state of the network uses randomised values for the weights. There-
fore, when a stimulus is presented to the network, one cell of the network will
respond more strongly than the others to that particular input vector (see equa-
tion 1.
The weight vector of this “winning” unit as well as those of the eight neigh-
bouring units are then changed according to equation 2:
“Meaning” through Clustering 213

∆wjk = α(ik − wjk

and

wjk (t + 1) = wjk (t) + ∆wjk (2)

where α is the learning rate. Typical values for this parameter are in the range
0.2 - 0.5. A value of α = 0.25 (constant over time) was used in the experiments
presented here. Weight vectors are normalised again after being adjusted.
As this process continues the network organises into a state whereby dis-
similar input vectors/patterns map onto different regions of the network, whilst
similar patterns are clustered together in groups: a topological map of the input
space develops.
When the network has settled, distinct physical locations will map onto dis-
tinct regions of the network4 , whilst similar perceptual patterns cluster together
in a region. To achieve this, no symbolic representations have been created, and
the robot is mapping its environment “as it sees it”.
In this way, regions of the network can be seen as representing ‘perceptual
landmarks’ within the robot’s environment, and map response can then be used
for localisation.

2.4 First Layer: Static Mapping

The mapbuilding component used in the static mapbuilding paradigm is a two


dimensional SOFM of m x m units (m=9 or m=12 in our experiments). The input
to the SOFM consists of the robot’s 16 infrared sensor readings. Whenever the
robot has moved more than 25 cm, a 16-element input vector, containing the
raw sensor readings from the robot’s infrared sensors, is generated. The robot’s
turret maintains a constant orientation throughout the experiment to eliminate
any influence of the robot’s current orientation at a particular location, resulting
in a unique sensory perception at each location, irrespective of the angle at which
the robot approached that location. Note that the 16-element input vector does
not actually convey much information about the current location of the robot. A
coarse input vector such as this was deliberately chosen for these investigations
to produce perceptual aliasing — the aim here is to find ways of achieving
localisation even under very difficult circumstances.
As the robot moves through its environment, controlled by the operator,
sensor readings are obtained and input vectors fed into the SOFM. The net-
work clusters these perceptions according to their similarity and frequency of
occurrence.
4
Provided these distinct physical locations also have distinct perceptual signatures,
i.e. no perceptual aliasing occurs.
214 Ulrich Nehmzow

SOFM of

m x m units

16 raw infrared sensor readings

Fig. 2. The static mapping mechanism: the SOFM clusters the current sensory
perception and thus generates the static mapping.

2.5 Second Layer: Episodic Mapping

The episodic mapping paradigm uses two layers of self organising feature maps
(see figure 3. Layer one is the layer described in subsection 2.4.
Layer two is also a two-dimensional SOFM of k x k units (k=9 or k=12 in
our experiments), it is trained using an input vector of m2 -element length. All
elements of this vector are set to zero, apart from the last τ centres of excitation
of layer one, which are set to “1”. The value of the (“history”) parameter τ was
varied in our experiments.

SOFM of

k x k units

2
Input vector m elements long

SOFM of

m x m units

16 raw infrared sensor readings

Fig. 3. The episodic mapbuilding mechanism: the first layer SOFM clusters the
current sensory perception, the second layer SOFM clusters the last τ perceptions
and thus generates the episodic mapping.
“Meaning” through Clustering 215

Precedence relationships between these excitation centres are not encoded.


This means that the second layer of the dynamic mapbuilder — the output
layer — uses information about the perceptual signature of the current location
as well as temporal cues (in this case τ perceptions of the robot before it arrived
at the current location), but no precedence relationships between perceptions.
The second layer SOFM performs a clustering of the last τ excitation centres
observed in layer 1 in a topology-preserving manner. As the output of layer 1
is a topological mapping of all sixteen infrared sensor signals and the output
of layer 2 is again a topological map, the response of the episodic mapping
system is far less sensitive to freak sensory perceptions than a mapping system
that uses episodes of raw sensory data as input (such as the system presented
in [10]. This is a desirable property, because freak sensor signals occur regularly
in real robotic systems, and mapping systems dependent on constant and reliable
sensory perception therefore tend to be brittle.

3 Measuring Localisation Performance

3.1 An Entropy-Based Measure of Mapbuilding and Localisation


Performance

To assess the quality of mappings achieved by the episodic mapping mechanism,


and to quantify the influence of individual parameters, we used the following
entropy-based measure of performance.
First, a contingency table such as the one shown in table 1 (appendix A
was obtained by logging map response5 versus robot location6 . To obtain the
contingency table, physical space and map space were partitioned into regions
(“bins”), and whenever the robot was anywhere within a region of space, ob-
taining a map response from a particular map region, the corresponding field of
the contingency table was incremented. For example, table 1 shows that the first
category of map response was perceived 24 times in total, 18 times at location A,
and 6 times at location C.
In the experiments reported here, binning was performed by dividing the
physical space into units of equal size. This method results in a uniform spatial
resolution of the localisation algorithm — a desirable property — but has the
disadvantage of resulting in bin entries of different magnitude, because the robot
occupies some regions of space more often than others. This unequal distribution
can lead to computational problems in the contingency table analysis discussed
below.
The alternative would be to construct bin sizes according to duration of
travel in each bin. This would result in a more even distribution of data points
5
“Map response” denotes the most active unit of the second layer SOFM.
6
The robot’s physical location was obtained through dead reckoning. Over relatively
short distances, and logging only regions of space, rather than precise location, this
is accurate enough for the experiments described here.
216 Ulrich Nehmzow

per bin, but has the disadvantage that spatial resolution is no longer uniform over
physical space. For this reason we adopted a spatially uniform binning method.
Entropy-based measures were then used to determine the strength of associ-
ation between map response and robot location. The entropy (or average infor-
mation) provides a measure of the probability that a particular signal is received,
given particular contextual information. For example, if the system’s response to
a particular stimulus is known with absolute certainty (probability 1), then the
entropy (i.e. average information) of having perceived that particular stimulus
is obviously 0.
For localisation, entropy can serve as a quality metric in the following way: if
any response R of the localisation system corresponds with exactly one location L
in the physical world, then the entropy H(L|R) is zero for that case (the “perfect”
map). The larger H(L|R), the larger the uncertainty that the robot is at a
particular location, given some system response R.
H(L|R) is defined as follows [13]:

 pl,r
H(L|R) = − pl,r ln (3)
pl.
l,r

with

Nl,r
pl,r = (4)
N
and

Nl.
pl. = . (5)
N
N is the total number of events recorded in the contingency table, Nl,r is
the number of occurrences of response r at location l, and Nl. is the number of
occurrences of any response at location l.
H(L|R) can therefore be used as a metric to determine the suitability of the
obtained mapping for localisation. If H(L|R) is zero, perfect localisation can be
achieved, i.e. a particular system response R will indicate with absolute certainly
where the robot is in the world. If H(L|R) is non-zero, some ambiguity regarding
the robot’s current location exists, the larger H(L|R), the larger the ambiguity.
This measure allows quantitative comparison of two or more mapping
paradigms being compared under identical experimental circumstances. In par-
ticular, bin sizes must be identical for all experiments. In other words, the metric
allows the comparison between mapping systems, but does not provide an abso-
lute standard which is experiment-independent. This quality metric is a useful
measure for the experiments presented in this paper, because the fundamen-
tal question asked is which of two mapping paradigms performs better, under
identical experimental conditions.
“Meaning” through Clustering 217

3.2 Evaluation of Results

In all experiments reported here, we used the quality metric defined in subsec-
tion 3.1 to determine the quality of a mapping: the lower the entropy H(L | R),
the higher the map quality. A ‘perfect’ map has an entropy H(L | R) of zero.
In all our experiments, the “static mapping”, using a single layer self-
organising feature map (see subsection 2.4, and the “episodic mapping”, using
a twin layer self-organising feature map (see subsection 2.5 are directly com-
pared. Of the former we know that it does provide a feasible method for mobile
robot localisation [9,11,10]. The question is: does the episodic mapping paradigm
produce better maps, with respect to the criterion discussed in section 3.1?

4 Experiments

4.1 Experimental Method

In order to be able to modify individual parameters and to examine the results


under identical conditions (without resorting to computer simulations), we used
recorded sensor data obtained by manually driving a Nomad 200 mobile robot
(see figure 4 through two different environments. Both the static and the episodic
mapping schemes were then applied to the same data. This ensures that the input
data to each mapping scheme is identical throughout all experiments.

Fig. 4. The Manchester Nomad 200 mobile robot, “FortyTwo”. The robot has
sixteen sonar sensors, sixteen infrared sensors, camera, compass, tactile sensors
and onboard odometry sensors.

Furthermore, experimental parameters such as network size and bin size were
varied, to determine their influence upon localisation performance.
218 Ulrich Nehmzow

4.2 Experiments in Environment 1

Experimental procedure Here, the robot was manually driven along a (more
or less) fixed path in an environment containing brick walls, cloth screens and
cardboard boxes. The whole route was traversed six times, and 366 data points
in total were obtained, containing the robot’s 16 infrared sensors and location
in (x,y) coordinates (see figure 5.
Of the 366 data points, 120 were used for the initial training of the networks7 ,
i.e. the mapbuilding phase, and the remaining 246 data points were used for the
evaluation of the localisation ability.

Fig. 5. Actual trajectory taken by robot in environment 1 (left), and accumu-


lated infrared sensor readings (right, environment 1 “as the robot sees it”).
Dimensions in the diagram are in units of 2.5mm.

Localisation performance was then assessed using the performance metric


described in subsection 3.1.

Experimental results in environment 1 Using the data obtained in environ-


ment 1, we compared three different implementations of the episodic mapping
mechanism with the static mapping mechanism. In all cases, the episodic map-
ping mechanism outperformed the static mapping mechanism.
In the first experiment, we used self organising feature maps of 12x12 units.
To construct the contingency table for this experiment, the network output space
was partitioned into 16 bins, and the physical space in which the robot operated
was partitioned into 15 bins (see figure 6.
The results obtained are shown in figure 7.
It shows that for all values of τ up to τ = 7 episodic mapping produces
a lower H(L | R) than static mapping, that is better localisation performance
(the case τ = 1 has been included for verification of the experimental results: a
history length of 1 means that only the current perception is taken into account.
7
The first layer network only was trained with the first 20 data points, the remaining
100 data points were used to train both nets.
“Meaning” through Clustering 219

E F

C D

A B

Fig. 6. Partitioning of environment 1 into 6, 12 and 15 location bins respectively.


Dimensions are given in units of 2.5mm.

This is essentially identical to the static case, with the difference that in the
static case the sixteen sonar readings provide more information than the one
excitation centre of layer 1 that is used as input to layer 2 in episodic mapping.
The expected result, therefore, is that episodic mapping with τ = 1 always
produces slightly worse results than static mapping — as is indeed the outcome
in all experiments bar experiment 4, where both methods produce very similar
results).
The conclusion to draw from this experiment, then, is that episodic mapping
produces better localisation performance than static mapping, up to a certain
maximum value of τ , indicating that too much episodic information is confusing,
rather than helpful.
In the second experiment, the spatial resolution (i.e. the localisation preci-
sion) was reduced to 12 bins. As would be expected, localisation performance
improved (because there is less opportunity for error). The difference to experi-
ment 1, however, was small (figure 8, and essentially, the findings of experiment 2
confirm those of experiment 1.
In the final experiment conducted in environment 1, we used smaller net-
works, and reduced the bins size further. The results of this experiment are
shown in figure 9, a discussion of results follows in section 4.4. The contingency
table for this experiment is shown in table 1 in appendix A. Again, the earlier
findings are confirmed. In this case, episodic mapping produces better localisa-
tion performance than static mapping in all cases, regardless of the value of τ .

4.3 Experiments in Environment 2


Experimental procedure For a second set of experiments, a route in a differ-
ent environment containing cluttered furniture (desks, chairs), brick walls, and
220 Ulrich Nehmzow

Fig. 7. Experiment 1. Results obtained in environment 1, using 12x12 networks,


partitioned into 16 bins. The physical space of 2.87m x 4.30m was divided into
15 bins (see figure 6. The single layer network achieves H(L|R)=1.49 in this
experiment (indicated by horizontal line).

Fig. 8. Experiment 2. Results obtained in environment 1, using 12x12 networks,


partitioned into 16 bins. The physical space of 2.87m x 4.30m was divided into
12 bins (see figure 6. The single layer network achieves H(L|R)=1.44 in this
experiment (indicated by horizontal line).
“Meaning” through Clustering 221

Fig. 9. Experiment 3. Results obtained in environment 1, using a 9x9 network,


partitioned into 9 bins. The physical space of 2.87m x 4.30m was divided into 6
bins (see figure 6. The single layer network achieves H(L|R)=1.45 in this exper-
iment (indicated by horizontal line).

open space was traversed nine times, and 456 data points in total were obtained
by manually driving the robot. 160 of these data points were used for training
the networks8 , the remaining 296 data points were used to evaluate localisation
performance.
Environment 2 was less structured than environment 1, in that it contained
a larger variety of perceptually distinct objects, and more clutter. It was also
bigger, and the robot’s path in it is longer than in environment 1. Figure 10
shows the robot’s path through this environment, and the robot’s perception of
it.

Experimental results in environment 2 In the first experiment conducted


in environment 2, we used networks of 9x9 units, dividing both network space
and physical space into nine regions (figure 11 to build the contingency table.
The results of this experiment are shown in figure 12. As in the experiments
in environment 1, episodic mapping produces better localisation performance
than static mapping, in this case for all values of τ (the subsequent experiments
in environment 2 confirm, however, that there is a maximum useful length of τ ,
beyond which the performance of episodic mapping decreases).
We then evaluated localisation performance using 12x12 unit networks, at
increased spatial resolution. The results are shown in figure 13. Here, the maxi-
mum useful value of τ is eight previous perceptions, with optimum performance
for τ = 4. And again, episodic mapping outperforms static mapping by a wide
margin.
Finally, we reduced localisation precision to nine distinct regions, and ob-
tained the results shown in figure 14. The contingency table for this experiment
is shown in table 3 in appendix A. This final experiment confirms earlier findings
8
The first 20 data points were used for training the first layer network alone.
222 Ulrich Nehmzow

Fig. 10. Robot trajectory in environment 2 (left) and accumulated infrared sen-
sor readings obtained by the robot in environment 2 (right, environment 2 “as
the robot sees it”). Dimensions are in units of 2.5mm.

G H I

D E F

A B C

Fig. 11. Partitioning of environment 2 into 9 and 16 location bins respectively.


Dimensions are given in units of 2.5mm.

about the performance of both mechanisms, and the existence of a maximum


useful value of τ .

4.4 Discussion of Results

Analysis of contingency tables using the entropy-based metric In both


environments, and for all experiments conducted, episodic mapping outperforms
static mapping, provided not too long a history is incorporated into the input
vector. Here, for all 2 ≤ τ ≤ 7 episodic mapping performs better than static
mapping. The explanation for this observation is obvious: Exploiting temporal
information does help to disambiguate sensory perceptions obtained in different
locations, but including too much episodic information confuses, rather than
aids, and localisation performance decreases as a consequence.
“Meaning” through Clustering 223

Fig. 12. Experiment 4. Results obtained in environment 2, using 9x9 networks,


partitioned into 9 bins. The physical space of 3.37m x 3.36m was divided into
9 bins (see figure 11. The single layer network achieves H(L|R)=1.67 in this
experiment (indicated by horizontal line).

The optimum value of τ is dependent on bin sizes, but lies between 3 and 5
in most cases. Note that the optimum value can be determined in real time by
the robot itself, as the contingency table and H(L | R) is available to the robot.
The choice of network size and bin size appear to be non-critical. In all cases
episodic mapping can outperform static mapping.
The performance metric introduced in section 3.1 can be applied to any
mapbuilding system that generates categorical data, and therefore provides a
tool of comparing different paradigms, as well as determining the influence of
any process parameters, independent of the actual paradigm used.

Direct analysis of the contingency tables The statistical measure defined


in section 3.1 determines the strength of the correlation between map response
and the robot’s physical location. Besides taking this measure as an indication
of localisation performance, we also looked at the contingency tables directly to
determine how useful both static and episodic mapping would actually be for
localisation.
The four contingency tables of experiments 3 and 6 (see appendix A are
redrawn in figure 15, here indicating entries which comprise more than 30% and
more than 50% of a row’s entries graphically.
When a particular map response (rows in contingency table) is obtained, good
localisation is possible if along one row one field accounts for the majority of the
entries along the entire row: this field indicates the physical location (columns
in contingency table) the robot is at.
As can be seen from figure 15, episodic mapping produces more “decisive”
contingency tables. In the contingency tables for static mapping, for many rows
no column attracts a large number of entries — in these cases no unambiguous
localisation is possible — whereas episodic mapping generates exactly one strong
candidate in every column (with very few exceptions).
224 Ulrich Nehmzow

Fig. 13. Experiment 5. Results obtained in environment 2, using 12x12 networks,


partitioned into 16 bins. The physical space of 3.37m x 3.36m was divided into
16 bins (see figure 11. The single layer network achieves H(L|R)=1.71 in this
experiment (indicated by horizontal line).

This visual analysis of the contingency tables therefore confirms the findings
of subsection 4.4.

5 Summary and Conclusion


The ability to navigate is of paramount importance for autonomous mobile
robots for every task that requires goal-directed motion (i.e. all tasks that cannot
be achieved by random movement). Localisation, i.e. establishing one’s position
within a frame of reference (a “map”) is the most fundamental competence re-
quired for navigation. Mobile robot navigation, and in particular self localisation,
is a hard problem, because meaningful information about the robot’s position in
space has to be obtained from noisy, redundant and ambiguous data. In addition,
the problems of self-localisation are compounded by perceptual aliasing, the fact
that most sensory perceptions of a mobile robot are not uniquely associated with
exactly one position in the real world (there is usually no one-to-one mapping
between physical location and sensory perception).
This paper addresses these problems by presenting a localisation mechanism
for autonomous mobile robots that uses spatial and episodic information to es-
tablish the robot’s position in the world. In the first (static mapping) stage of
the process, raw sensory perceptions of the robot are processed using an unsu-
pervised, self-organising clustering technique. The last τ perceptions of this first
layer are then clustered again to encode episodic information. Through this unsu-
pervised, self-organising process meaningful internal representations of a mobile
robot’s environment (“mappings”) emerge, without any external intervention.
This process can be interpreted as the emergence of an analogy, as the inter-
nal representation resembles the robot’s environment in some particulars (i.e. the
robot’s sensory perception of the world is represented), without actually being
identical to the real world ([14].
“Meaning” through Clustering 225

Fig. 14. Experiment 6. Results obtained in environment 2, using 12x12 net-


works, partitioned into 16 bins. The physical space of 3.37m x 3.36m was divided
into 9 bins (see figure 11. The single layer network achieves H(L|R)=1.47 in this
experiment (indicated by horizontal line).

An entropy-based quality metric was used to compare the two localisation


paradigms, and to determine the influence of individual process parameters upon
the final map quality. It could be shown that, provided not too much episodic
information is included (i.e using a short term memory, rather than a medium
or long term one), episodic mapping outperforms static mapping, irrespective of
experimental parameters such as bin sizes or history length.
The main advantage of episodic topological mappings as presented here is that
this method is less sensitive to freak sensor signals than an episodic mapping
scheme that uses raw sensor data. This is advantageous for actual robot local-
isation, because freak sensor signal are a common occurrence when real robots
are used.
There are a number of unanswered questions, subject to future research. We
have shown that a maximum useful episode length exists, beyond which episodic
mapping produces worse results than static mapping. The information of what
constitutes the optimal episode length τ is actually available to the algorithm
through the computation of the entropy H(L | R), it is therefore conceivable that
the robot determines the optimal episode length automatically. This approach
is subject to ongoing research.
It is also conceivable that the optimal episode length τ can be used to char-
acterise robot-environment interaction: if identical mapping mechanisms and
identical robots are used in different environments, presumably the environment
with a higher H(L | R) is the more “difficult” environment, containing fewer
perceptual cues regarding the robot’s position. Characterising environments is
desirable to allow independent replication of mobile robotics experiments ([12].
Lastly, although we use previous perceptions for the episodic mapping, we
do not encode the temporal ordering of those perceptions. It is likely that using
this additional information would produce even better mappings. Again, this is
subject to ongoing research.
226 Ulrich Nehmzow

Physical Location Physical Location Physical Location Physical Location

M M
a a
p p

R R
e e
s s M M
p p a a
o o p
p
n n
s s R R
e e e e
s s
p
p
o
o
Episodic Mapping Static Mapping n
n
Experiment 3 Experiment 3 s s
e e

>=50% of all responses


Key >30% of all responses Episodic Mapping Static Mapping
Experiment 6 Experiment 6
<=30% of all responses

Fig. 15. Assessment of the suitability of contingency tables for robot self locali-
sation. Comparing the tables for static and episodic mapping demonstrates that
episodic mapping provides a clearer correlation between physical location and
map response. (The table values are shown in appendix A.

References

1. Rodney A. Brooks, A Robust Layered Control System for a Mobile Robot, MIT AI
Memo No. 864, September 1985. 209
2. N. Burgess and J. O’Keefe, Neuronal computations underlying the rfi ing of place
cells and their role in navigation, Hippocampus 7:749-762 (1996). 210
3. N. Burgess, J. O’Keefe and M. Recce, Using hippocampal ‘place cells’ for navi-
gation, exploiting phase coding, in Hanson, Giles and Cowan (eds.), Advances in
neural information processing systems 5, Morgan Kaufmann 1993. 210
4. Tom Duckett and Ulrich Nehmzow, Mobile Robot Self-Localization and Measure-
ment of Performance in Middle-Scale Environments, J. Robotics and Autonomous
Systems, Vol 24, Nos. 1-2, 1998. 211
5. Teuvo Kohonen, Self Organization and Associative Memory, Springer Verlag,
Berlin, Heidelberg, New York, 2nd edition, 1988. 212
6. Andreas Kurz, Constructing maps for mobile robot navigation based on ultrasonic
range data, IEEE Trans Systems, Man and Cybernetics B, Vol 26 No 2 1996. 210
7. David Charles Lee, The map-building and exploration strategies of a simple sonar-
equipped mobile robot; an experimental quantitative evaluation, PhD thesis, Uni-
versity College London, 1995. 210
8. Maja Mataric, Navigating with a Rat Brain: A Neurobiologically-Inspired Model
for Robot Spatial Representation, in Jean-Arcady Meyer and Stuart Wilson (eds.),
From Animals to Animats, MIT Press 1991. 210
“Meaning” through Clustering 227

9. Ulrich Nehmzow and Tim Smithers, Mapbuilding using Self-Organising Networks,


in: Jean-Arcady Meyer and Stewart Wilson (eds.), From Animals to Animats, MIT
Press, Cambridge Mass. and London, England, 1991, pp. 152-159. 210, 217
10. Ulrich Nehmzow, Tim Smithers and John Hallam, Location Recognition in a Mo-
bile Robot using Self-Organising Feature Maps, in G. Schmidt (ed.), Information
Processing in Autonomous Mobile Robots, Springer Verlag, Berlin, Heidelberg, New
York, 1991, pp. 267-277. 210, 215, 217
11. Ulrich Nehmzow and Tim Smithers, Using Motor Actions for Location Recognition,
in F. Varela and P. Bourgine (eds.), Toward a Practice of Autonomous Systems,
MIT Press, Cambridge Mass. and London, England, 1992, pp. 96-104. 210, 217
12. Ulrich Nehmzow and Michael Recce (eds.), Scientific Methods in Mobile Robotics,
Special Issue of J. Robotics and Automation, Vol 24, Nos. 1-2, 1998. 210, 225
13. W. Press, S. Teukolsky, W. Vetterling and B. Flannery, Numerical recipes in C,
Cambridge University Press, 1992. 216
14. Merriam-Webster online dictionary, http://www.m-w.com/. 209, 224
15. B. Yamauchi and R. Beer, Spatial learning for navigation in dynamic environments,
IEEE Trans Systems, Man and Cybernetics B, Vol 26 No 3, 1996. 211
16. U. Zimmer, Self-localisation in dynamic environments, submitted to IEEE/SOFT
Workshop BIES’95, Tokyo, May 95. 210

A Contingency Tables

Physical Location
A B C D E F
M 18 (67%) 0 (0%) 6 (33%) 0 (0%) 0 (0%) 0 (0%)
a 0 (0%) 0 (0%) 3 (33%) 0 (0%) 6 (67%) 0 (0%)
p 2 (9%) 0 (0%) 8 (35%) 0 (0%) 0 (0%) 13 (57%)
1 (3%) 0 (0%) 7 (24%) 0 (0%) 8 (28%) 13 (45%)
R 1 (2%) 1 (2%) 0 (0%) 9 (18%) 12 (24%) 27 (54%)
e 19 (43%) 4 (9%) 13 (30%) 3 (7%) 5 (11%) 0 (0%)
s 0 (0%) 0 (0%) 1 (25%) 3 (75%) 0 (0%) 0 (0%)
p 0 (0%) 0 (0%) 0 (0%) 11 (79%) 0 (0%) 3 (21%)
. 16 (55%) 1 (3%) 0 (0%) 9 (31%) 0 (0%) 3 (10%)

Table 1. Contingency table for experiment 3 (episodic mapping, τ = 11). The


eight situations in which first and second candidate location (given a particular
map response) differ by more than 150% are shown in heavy type. Percentages
of localisation obtained given response are shown in brackets.
228 Ulrich Nehmzow

Physical Location
A B C D E F
M 14 (44%) 0 (0%) 4 (13%) 6 (10%) 3 (9%) 5 (16%)
a 11 (50%) 2 (9%) 1 (5%) 3 (14%) 2 (9%) 3 (14%)
p 17 (53%) 2 (6%) 4 (13%) 5 (16%) 1 (3%) 3 (9%)
3 (11%) 0 (0%) 8 (30%) 4 (15%) 5 (19%) 7 (26%)
R 8 (30%) 1 (4%) 3 (11%) 6 (22%) 2 (7%) 7 (26%)
e 5 (21%) 1 (4%) 4 (17%) 4 (17%) 4 (17%) 6 (25%)
s 2 (7%) 0 (0%) 6 (21%) 3 (11%) 3 (11%) 14 (50%)
p 1 (4%) 0 (0%) 3 (11%) 14 (52%) 6 (22%) 3 (11%)
. 2 (8%) 0 (0%) 6 (23%) 1 (4%) 5 (19%) 12 (46%)

Table 2. Contingency table for experiment 3, static mapping. The six situations
in which first and second candidate location (given a particular map response)
differ by more than 150% are shown in heavy type. Percentages of localisation
given response are shown in brackets. See also table 1.

Physical Location
A B C D E F G H I
0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 3 (25) 9 (75)
M 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 9 (64) 0 (0) 0 (0) 5 (36)
a 0 (0) 7 (39) 0 (0) 0 (0) 5 (28) 3 (17) 0 (0) 2 (11) 1 (6)
p 3 (16) 2 (11) 0 (0) 2 (11) 5 (26) 5 (26) 0 (0) 1 (5) 1 (5)
2 (67) 1 (33) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)
0 (0) 5 0 (0) 3 (13) 10 (44) 3 (13) 0 (0) 2 (9) 0 (0)
R 0 (0) 3 (100) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)
e 5 (50) 5 (50) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)
s 23 (58) 2 (5) 0 (0) 13 (33) 1 (3) 1 (3) 0 (0) 0 (0) 0 (0)
p 3 (14) 8 (38) 0 (0) 0 (0) 0 (0) 6 (29) 0 (0) 0 (0) 4 (19)
o 2 (13) 0 (0) 0 (0) 0 (0) 4 (27) 8 (50) 0 (0) 0 (0) 0 (0)
n 0 (0) 7 (54) 0 (0) 0 (0) 1 (8) 5 (39) 0 (0) 0 (0) 0 (0)
s 0 (0) 0 (0) 0 (0) 1 (5) 8 (40) 3 (15) 0 (0) 4 (20) 4 (20)
e 0 (0) 0 (0) 0 (0) 0 (0) 0 (0) 5 (57) 0 (0) 2 (22) 2 (22)
6 (29) 0 (0) 0 (0) 4 (19) 3 (14) 1 (5) 0 (0) 0 (0) 7 (33)
4 (11) 0 (0) 0 (0) 1 (3) 6 (17) 0 (0) 0 (0) 7 (19) 18 (50)

Table 3. Contingency table for experiment 6 (episodic mapping, τ = 4). Map


responses that indicate a localisation with more than 150% lead over the second
candidate location are shown in heavy type. Percentages of localisation given
response are shown in brackets. Compare with table 4, which shows the results
obtained in the same experiment, but using the single layer network map.
“Meaning” through Clustering 229

Physical Location
A B C D E F G H I
10 (27) 3 (8) 0 (0) 4 (11) 3 (8) 2 (5) 0 (0) 4 (11) 11 (30)
M 4 (20) 1 (5) 0 (0%) 0 (0%) 7 (35) 4 (20) 0 (0%) 1 (5) 2 (10)
a 8 (35) 2 (9) 0 (0%) 0 (0%) 2 (9) 7 (30) 0 (0%) 0 (0%) 4 (17)
p 0 (0%) 4 (16) 0 (0%) 3 (12) 5 (20) 6 (24) 0 (0%) 4 (16) 3 (12)
1 (3) 0 (0%) 0 (0%) 3 (9) 10 (30) 1 (3) 0 (0%) 3 (9) 15 (46)
2 (13) 2 (13) 0 (0%) 0 (0%) 1 (7) 6 (40) 0 (0%) 0 (0%) 4 (27)
R 2 (25) 3 (38) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 2 (25) 1 (13)
e 1 (9) 2 (18) 0 (0%) 0 (0%) 2 (18) 0 (0%) 0 (0%) 3 (27) 3 (27)
s 10 (56) 2 (11) 0 (0%) 2 (11) 1 (6) 0 (0%) 0 (0%) 2 (11) 1 (6)
p 1 (10) 6 (60) 0 (0%) 0 (0%) 0 (0%) 2 (20) 0 (0%) 0 (0%) 1 (10)
o 2 (25) 5 (63) 0 (0%) 0 (0%) 1 (13) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
n 2 (50) 1 (25) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 1 (25)
s 3 (12) 0 (0%) 0 (0%) 10 (39) 4 (15) 8 (31) 0 (0%) 1 (4) 0 (0%)
e 0 (0%) 8 (42) 0 (0%) 0 (0%) 0 (0%) 8 (42) 0 (0%) 0 (0%) 3 (16)
6 (32) 5 (26) 0 (0%) 0 (0%) 4 (21) 3 (16) 0 (0%) 0 (0%) 1 (5)
5 (25) 4 (20) 0 (0%) 2 (10) 4 (20) 3 (15) 0 (0%) 1 (5) 1 (5)

Table 4. Contingency table for experiment 6, static mapping. Map responses


that indicate localisation with more than 150 lead over the second candidate
location are shown in bold. Percentages of localisation given response are shown
in brackets. See also table 3.
Conceptual Mappings from Spatial Motion to Time:
Analysis of English and Japanese
Kazuko Shinohara

Otsuma Women’s University,


2-7-1 Karakida, Tama-shi, Tokyo 206-0035, Japan

Abstract. In the metaphorical mapping from spatial motion to time, the path
schema is preserved but other source domain structures are constrained by the
target domain structure. There are at least four constraints: (1) the Front-Back
Constraint, (2) the Straight Path Constraint, (3) Restriction on Manner
Information, and (4) Exclusion of Cause, Circumstance, and Resultant State.
This is demonstrated by analyzing English and Japanese motion-time
metaphors. Thus it is shown that target domain structures play an important
role in determining the elements of information preserved in metaphorical
mappings which are regarded as “unidirectional.”

1 Framework
This study adopts the cognitive semantic theory of metaphor, originated and
developed by Lakoff and Johnson. In this theory, metaphor is defined as conceptual
mapping from the source domain to the target domain, and the image-schematic
structure of the source domain is said to be preserved in metaphorical mappings, as
seen in the following descriptions.

The definition of “metaphor”:


Metaphor is the basic mechanism by which abstract concepts are understood in terms
of more concrete concepts. Metaphors are conceptual mappings from structures in
one conceptual domain (the source domain) to structures in another domain (the target
domain). Lakoff (1993a: 28)

The Invariance Principle:


Metaphorical mappings preserve the cognitive topology (that is, the image-schema
structure) of the source domain, in a way consistent with the inherent structure of the
target domain. Lakoff (1993b: 215)

The Invariance Principle implies that the target domain structures are not totally
constructed by the mapping of the source domain structures, but they have their own
inherent structures which can restrict the mapping itself. The image schema of the
source domain, however, is said to be mapped to the target domain, therefore there
should be some kind of unidirectional mapping from the source domain to the target
domain.
The concept of image schema is taken from Johnson’s. Since Johnson does not give
a short definition, his descriptions are combined into the following working definition
of my own.

C. Nehaniv (Ed.): Computation for Metaphors, Analogy, and Agents, LNCS 1562, pp.230 -241, 1999.
 Springer-Verlag Berlin Heidelberg 1999
Conceptual Mappings from Spatial Motion to Time 231

Working Definition of “Image Schema”:


A recurrent, dynamic pattern, shape, and regularity of our perceptual interactions and
motor programs that gives coherence and structure to our experience, consisting of a
small number of parts and relations by virtue of which it can structure indefinitely
many perceptions, images and events.

It is presupposed that the Path Schema (an image schema which consists of a source,
a goal, and a sequence of contiguous locations connecting the source and the goal) is
the one preserved in the TIME AS MOTION metaphor. The Path Schema includes
nothing other than these elements, nor any information about lexicalization (whether
each element of the concept is lexically expressed or not).
The structure of the source domain (spatial motion) is analyzed in terms of Talmy’s
(1985) Motion Event Frame, and the constraints on the mapping from spatial motion
to time are specified in relation to this frame.

2 Constraints on the Motion-Time Mapping


Preceding studies have shown that both English and Japanese have the TIME AS
MOTION metaphor, which maps the concept of spatial motion to passing of time, and
that there are two sub-metaphors: the TIME IS A MOVING OBJECT metaphor and
the TIME IS A LINE ALONG WHICH OBSERVERS MOVE metaphor in both
languages. The former is the metaphor which conceptualizes time as something that
moves and humans as observers of the motion of time; the latter is the metaphor which
conceptualizes humans as moving objects and time as some landscape where humans
move. English examples are : “The time will come when .....” (TIME IS A MOVING
OBJECT), “We are approaching the end of the year.” (TIME IS A LINE ALONG
WHICH OBSERVERS MOVE), and so forth. These submetaphors and their
examples are discussed by Lakoff and Johnson (1980), Johnson (1987), Lakoff
(1993b), Yamanashi (1995), Yamaguchi (1995), Shinohara (1996), and others. It has
been also claimed that, in this metaphor, what is mapped is the Path Schema.

The new finding by this study is:


(a) The fact that some source-domain structures other than the Path Schema are also
mapped in the TIME IS A MOVING OBJECT metaphor.
(b) The mapping of these extra-image-schematic structures is restricted (that is,
they are partial mappings).
(c) The restrictions are summarized as four constraints, which are discussed in
sections 2.1-2.4.

These partial mappings are seen in the mappings of specific information concerning
the elements of the Motion Event Frame (Talmy (1985) with a slight revision of my
own). The Motion Event Frame consists of the following elements.
232 Kazuko Shinohara

[Motion Event Frame]


1. The Central Elements
(i) Figure (the moving object)
(ii) Ground (the reference-object with respect to which the motion is
conceptualized)
(iii) Path (the course followed or site occupied by the figure object with
respect to the Ground object),
(iv) Motion
2. The Non-Central Elements
(v) Manner (the way in which the Figure moves)
(vi) Cause, Circumstance, and Resultant State

When the source domain structures other than the elements of the Path Schema
(source, goal, and contiguous locations connecting the source and the goal) are
examined, some of them are found to be preserved in the target domain, while others
are not preserved. These partial mappings are analyzed in this study as the following
four constraints.

2.1 The Front-Back Constraint

Spatial orientation of motion (one aspect of the Path of motion in the central elements
of the Motion Event Frame) is one of the extra-image-schematic structures, since the
Path Schema includes no information about it. The spatial orientations which can be
mapped to time are basically restricted to front and back. Other spatial orientations
such as up-down, right-left, north-south, and others are rejected in motion-time
mappings, except in some idiomatic expressions using up-down orientation. This
constraint is found both in English and Japanese, as seen in the following examples
(asterisk indicates that it is an incorrect, inappropriate use).

(e.g. 1) a. John died ten days before [after / *to the right of / *to the left of /
*to the south of / *above / *below] his wedding.

b. John wa kekkonshiki no tooka mae [ato / *migi / *hidari /


*minami /* ue / *shita] ni shinda. (=(1a))

The spatial orientation (front-back) and the temporal orientation (future-past) are
mapped in terms of two reference points: the observer and the time. There are four
logically possible patterns of Future/Past assignment to the Front-Back slots for the
two reference points.
Conceptual Mappings from Spatial Motion to Time 233

Observer Time
Front Back Front Back

(a) Future Past Past Future


(b) Past Future Past Future
(c) Past Future Future Past
(d) Future Past Future Past

Fig. 1 Four patterns of Future-Past assignment to the Front-Back axis.

These four patterns can be regarded as typological parameters of the structure of


time concept in human languages, if every one of the four has at least one language
which has the assignment pattern (though this study does not deal with this typological
question).
It is clear from existing examples that English and Japanese select the same
parameter (a) above. That is, the observer is facing the future and the time is facing
the past. These are illustrated by the following examples. (Japanese has the same kind
of pairs.)

(e.g. 2) a. In the weeks ahead of us ...... (future)


b. That's all behind us now. (past)
c. Coming up in the weeks ahead.... (future)
d. For some time back ..... (past)

(e.g. 3) a. In the following weeks .... (future)


b. In the preceding weeks ..... (past)
c. John left behind schedule. (future)
d. Paul came ahead of schedule. (past)
(Lakoff and Johnson 1980: 41-2, Lakoff 1990: 56, Yamaguchi 1995: 205, Shinohara's
italics)

Apparently contradictory expressions like ‘We are looking forward to the following
weeks’ or ‘San nen mae o furikaeru (three years front ACC look-back)’ can be
explained in terms of these dual reference points and parameters of assignment of
orientation. (There can be other languages which select (b), (c), or (d). Malagasy is a
candidate for (b).)

2.2 The Straight Path Constraint

The shape of the path of motion is also an extra-image-schematic structure. The use
of nonstraight paths is restricted to a considerable extent both in English and Japanese,
though this is not an absolute constraint. Cyclic time is possible in both languages, but
the application of cyclic (nonstraight) paths is not free. It seems that the cyclic path is
available only when some repetitious experience is involved.
234 Kazuko Shinohara

(e.g. 4) a. Time passed [*zigzagged / *circled] by.


b. Toki ga sugite [*dakooshite / *mawatte] itta. (=(4a)

Neutral expressions of time (including no repetitious experience) are thus restricted


to straight motion.

(e.g. 5) a. Leap year [*3:17 PM / *the end of the world] came around.
b. Uruudoshi [*gogo 3-ji 17-fun / *sekai no owari] ga megutte kita.
(=(5a))

Time expressions which imply some repetitious experience allow nonstraight


motion, but otherwise it is inappropriate to use verbs of nonstraight motion in this kind
of expression.
The restriction of the use of cyclic path in the motion-time metaphor to repetitious
experiences may be because the concept of cyclic time is motivated by our repetitious
experiences, especially those of natural phenomena.

2.3 Restriction on Manner Information

Manner of motion is another kind of extra-image-schematic information. These are


not totally excluded from the mappings, but are restricted in a consistent way. Since
English and Japanese differ in their dominant conflation patterns (English is a
“Motion+Manner”-type language, while Japanese is a “Motion+Path”-type language
according to Talmy’s (1985) typology), English has a far greater number of
“Motion+Manner Verbs” than Japanese. That is, English has a dominant set of
motion verbs which conflate the concept of “motion” itself and that of “manner” (the
way in which the object moves), while Japanese has a dominant set of motion verbs
which conflate the concept of “motion” and that of “path.” By analyzing 168 English
Motion+Manner Verbs and 13 Japanese Motion+Manner Verbs plus 64 Japanese
compound verbs of [Motion+Manner Verb] + [Motion+Path Verb] type, some
common characteristics of the Motion+Manner Verbs which are compatible with time
metaphors were found. The verbs examined are listed at the end of this paper.
Verbs which are used without the sense of inappropriateness in the TIME IS A
MOVING OBJECT metaphor are:

English: flow, fly, crawl, creep, dash, hurry, march, run, rush, sneak, roll,
slide, slip, glide
Japanese: nagareru (flow), ?hashiri-saru (run-leave), tobi-saru (fly-leave),
nagare-saru (flow-leave), kake-nukeru (run through), shinobi-yoru
(sneak-approach)
These verbs imply either of the aspects (a) saliently high or low speed, (b) motion
which is unnoticeable to the observer, (c) motion with regular rhythm, (d) invariable,
smooth motion, as shown in Fig. 2 (English) and 3. (Japanese).
Conceptual Mappings from Spatial Motion to Time 235

speed unnoticeable regular invariable


motion rhythm motion

flow - - - +
fly +h - - +-
crawl +l +- - -
creep +l + - -
dash +h - - -
hurry +h - - -
march - - + -
run +h - - -
rush +h - - -
sneak +l + - -
roll - - +- +
slide - + - +
slip - + - -
glide - + - +

Fig. 2 [+] indicates that the verb has the implication, [-] indicates otherwise. [+-]
indicates that both cases are possible depending on context. [+h] means “high
speed” and [+l] means “low speed.”

speed unnoticeable regular invariable


motion rhythm motion

-nagareru - - - +
?hashiri-saru +h - - -
tobi-saru +h - - -
nagare-saru - - - +
kake-nukeru +h - - -
shinobi-yoru +l + - -
Fig. 3 (As for representation, see Fig. 2).

Thus:
Positive Factors concerning manner of motion are;
(a) speed (saliently high or saliently low)
(b) unnoticeable motion
(c) invariable motion
(d) regular rhythm
236 Kazuko Shinohara

There are also some negative factors for Motion+Manner Verbs in the TIME IS A
MOVING OBJECT metaphor. See Fig. 4.

limb instrument speed unnoticeable regular invariable


motion motion rhythm motion

fly + - +h - - +-
crawl + - +l +- - -
run + - +h - - -
*swim + - - - - -
*shuffle + - - - - -
*walk + - - - - -
*skip + - - - - -
*limp + - - - - -
*cruise - + - - - -
*canoe - + - - - -
*jet - + +h - - -
*rocket - + +h - - -

Fig. 4 (As for representation, see Fig. 2.)

As seen in Fig. 4, implication of “limb motion” functions as a negative factor if the


verb has none of the positive factors (swim, shuffle, walk, skip, and limp in Fig. 4),
while if the verb has one or more positive factors, the verb is an appropriate one in this
metaphor (fly, crawl, and run in Fig. 4). By contrast, implication of “instrument” is an
absolute negative factor, since implication of a positive factor does not save the verb if
it has “instrument” aspect (jet and rocket in Fig. 4). Likewise, some other absolute
negative factors are detected by examining other motion verbs: “sound emission”
(e.g., bang, gurgle, rattle and others), “up-down or random motion” (e.g., climb, prowl
and others), “specific circumstance of motion” (e.g., swim, wade, plow and others),
“plural figures” (e.g., troop). Expressions like ‘Time climbed on,’ ‘Time helicoptered
away,’ ‘Time wriggled on,’ ‘Time rattled by,’ or ‘Time swam by’ are far less
appropriate (or even inappropriate) because these verbs have negative factors.
Among these negative factors, only “limb motion” becomes ineffective by the
implication of one or more positive factors. The other five are always effective as
negative factors.

Thus:
The negative factors conditioning the use of Motion+Manner Verbs in the TIME IS A
MOVING OBJECT metaphor are;
(a) up-down or random (non-front-back) motion
(b) implication of the type of instrument used
(c) implication of sound emission
(d) salient motion of limbs or body-internal motion
(e) implication of specified circumstances of motion
(f) motion of plural figures.
Conceptual Mappings from Spatial Motion to Time 237

While English has at least 14 Motion+Manner Verbs which are often used in the
TIME IS A MOVING OBJECT metaphor, Japanese has only 6 Motion+Manner
Verbs which can be used for the TIME IS A MOVING OBJECT metaphor. They are
‘nagareru (flow),’ ‘hashiri-saru (run-leave),’ ‘tobi-saru (fly-leave),’ ‘nagare-saru
(flow-leave),’ ‘kake-nukeru (run-go through),’ and ‘shinobi-yoru (hide-approach).’
Except ‘nagareru,’ all of them are compound verbs which are formed by
[Motion+Manner Verb] + [Motion+Path Verb]. The above positive and negative
factors, however, seem to be common in English and Japanese.

The major difference between English and Japanese concerning this metaphor is
seen in the pattern of expressing manner of motion. The striking difference is that
English allows the Motion+Manner Verbs which have one or more positive factors but
not negative factors (except limb motion) to be used in single forms, in most cases
accompanied by Path expressions such as ‘by,’ ‘on,’ or ‘away,’ while Japanese allows
only one single verb (‘nagareru’ (flow)) and requires other Motion+Manner verbs
such as ‘tobu (fly),’ ‘hashiru / kakeru (run),’ ‘hau (crawl / creep),’ ‘suberu (glide /
slide)’ or ‘korogaru (roll)’ to be accompanied by a Motion+Path Verb or by a simile
marker ‘yooni (as if)’ plus Motion+Path Verb like ‘sugiru (pass)’ or ‘sugite iku (pass
go).’ This difference seems to be due to the difference in lexicalization patterns
between English and Japanese. Verbs like ‘fly,’ ‘run,’ ‘crawl,’ or ‘creep’ (and the
counterparts in Japanese) basically denote an action, which prototypically implies
change of place (these are called “Motion-Propelling Action Verbs” by Kageyama
(1997)). In these verbs, Manner information is attributed to the action itself, not to the
motion. In order to denote change of place, these English verbs requires, in most
cases, Path information expressed mostly by adverbs or prepositional phrases, since
English is “Motion+Manner”-type language. By contrast, since Japanese is a
“Motion+Path”-type language, it does not regularly use Path expressions outside the
verbs; that is, basic Path information is conflated in verbs. Thus, when temporal
motion is expressed by a Motion-Propelling Action Verb in Japanese, the Path
information is attached to the expression by the use of a compound verb or by
attaching ‘yooni (as if)’ and a Motion+Path Verb.
In spite of this difference, it is clear that English and Japanese share the fundamental
constraints on motion-time mappings. The difference is seen only in the patterns of
lexical realization, which are consistent with the major patterns of lexicalization of the
Motion Event Frame.

2.4 Exclusion of Cause, Circumstance, and Resultant State

The sixth elements of the Motion Event Frame (Cause, Circumstance, and Resultant
State) are consistently excluded from motion-time mappings both in English and
Japanese. Thus, the expressions like ‘Time blew off’ (meaning ‘Time passed
quickly’), ‘Time wore wings to the past’ (meaning ‘Time flew away’), ‘The
examination day stuck to next Wednesday’ (meaning ‘The examination day came as
near as next Wednesday’), and so on are rejected.
238 Kazuko Shinohara

3 Conclusion
In motion-time mappings, the aspects of spatial motion such as orientation (front-
back, up-down, right-left, north-south, etc.), the shape of the path (straight, curve,
circular, zigzag, etc.), and manner of motion (‘run,’ ‘fly,’ ‘creep,’ ‘wiggle,’ etc.) are
only partially mapped. The same constraints are found in English and Japanese.
Since these constraints concern extra-image-schematic structures of the source
domain, it is concluded that the partial mappings are seen outside the image schema in
the TIME IS A MOVING OBJECT metaphor. The Path Schema is preserved, since
these constraints do not affect the mappings of this image schema. See Fig. 5.

Rejection of mapping seems to be caused by the structure of the target domain


concept (the concept of passing of time) and our basic experiences.
(i) The Front-Back Constraint seems to be motivated by our experience of basic
direction of motion. Our asymmetrical body with inherent front and back, and our
bodily structure designed to move in the direction of the front, mark the front-back
axis as the most basic, important one for human beings. The front-back axis is the
only purely one-dimensional direction, and this one-dimensional nature accords with
the one-dimensionality of time.
(ii) The Straight Path Constraint seems to come from the equivalent nature of time
with the ordinal structure of events or our mental process of dealing with perception,
cognition, or memory. If we can assume that the conceptual structure of time emerges
from the ordinal structure inherent in our mental process and the consequent ordinal
recognition of events, it is understood that the structure of time is most naturally
represented as one-dimensional structure.

< Spatial Motion > < Temporal Motion >

C B A A’ B’ C’

Fig. 5
A --> A’ : The Path Schema is completely mapped.
B --> B’ : The part of the conceptual structure of spatial motion
which is allowed by the constraints is mapped.
C --> C’ : The rest of the conceptual structure of spatial motion
(rejected by the constraints) is not mapped.
Conceptual Mappings from Spatial Motion to Time 239

(iii) The positive and the negative factors concerning the TIME IS A MOVING
OBJECT metaphor are also understood as motivated by the structure of the concept of
time. Speed (high or low) and unnoticeable motion (our unawareness of the passing
of time) are our subjective feelings about time projected to the motion of time. The
other two of the positive factors are the result of our concept of time that time is
passing constantly, incessantly, or invariably in always the same manner. The
negative factors, which must not be mapped to time, can also be explained in terms of
the conceptual structure of time. "Up-down or random motion" is excluded by the
Front-Back constraint, and the other negative factors ("instrument used," "sound
emission," "salient bodily motion," "specified circumstance," and "plural figure") are
also explained by the conceptual structure of time, which we assume to lack such
elements.
(iv) Exclusion of Cause, Circumstance and Resultant State is also motivated
conceptually. These elements are rejected because our concept of time tells us that
there can be no agent acting on the motion of time and thus causing time to move, that
time is engaged in no other activities than motion itself, and that time undergoes no
durative change of state caused by its motion.

Thus, the constraints are experientially, cognitively, or conceptually motivated.


They are not arbitrary conventions with no relation to human experience. These
motivated constraints suggest that some part of the conceptual structure of time may
be universal to human beings. As clarified in this paper, English and Japanese have
striking similarities in the structure of the TIME AS MOTION metaphor. Considering
that English and Japanese are genetically and areally remote to a considerable degree,
and that they differ in their dominant lexicalization patterns of motion events, these
similarities must be attributed to the universal structure of human conceptualization of
time, that is, the universal structure of the space-time metaphor. Yet the fact that
English and Japanese differ in some part suggests that the space-time metaphor, when
expressed in language, can be affected and constrained by the grammatical and lexico-
semantic structure of the language.
To summarize in plain words, we conceptualize time as something similar to spatial
motion, something that is structured in terms of the structure of spatial motion, but not
all of the aspects of spatial motion are mapped to the concept of time. The structure
of the concept of time plays an important role in restricting this mapping. This study
discussed some of the constraints on such partial mappings, whereby some aspects of
the relationship between the concept of spatial motion and that of time were clarified.

Appendix

Motion verbs examined in this study.

Asterisk indicates that it is inappropriate to use the verb in expressions like “Time
________ by (away, on, etc.).” Question marks indicate that the use of the verb is not
totally inappropriate but it is somewhat strange or it needs some special context
(judged by two to five native speakers).
240 Kazuko Shinohara

1. List of Motion+Manner Verbs (English) (168)

(a) Verbs of Motion by spontaneous (internal) cause.


?amble, ?bowl, *burst, ?canter, *clamber, *climb, crawl, creep, dash, *flit, fly,
?gallop, ?hasten, *hike, ?hobble, *hop, hurry, ?inch, *jog, *jump, ??lag, *leap, *limp,
?lumber, ?lurch, march, ?mosey, ?nip, ?pad, *parade, *plod, *plow, *pop, *prowl,
??race, *ramble, *roam, *rove, run, rush, ??saunter, *scramble, ??scud, ?scurry,
*scuffle, ?scuttle, ??shamble, ??shuffle, *skim, *skip, *slouch, sneak, *soar, speed,
??stagger, *stalk, *stray, ?stride, *stroll, *strut, *stumble, *swagger, ?sweep, *swim,
??tear, ?tiptoe, *toil, *toddle, *totter, *tramp, *trek, *troop, ?trot, *trudge, *vault,
*waddle, *wade, *walk, *wander, ?zip

(b) Verbs of Motion by unconscious (external) cause.


*bounce, *bound, *coil, ??drift, *float, flow, glide, *meander, ??revolve, roll, slide,
slip, slither, *swing, *tumble, *whirl, *wind

(c) Verbs of Motion with the type of instrument used.


*cruise, *drive, *fly (by plane), *ride, *row, ??sail

Verbs derived from nouns of instruments.


*bicycle, *bike, *boat, *bus, *cab, *canoe, *chariot, *cycle, *dogsled, *ferry,
*helicopter, *jeep, *jet, *oar, *paddle, *pedal, *raft, *rocket, *skate, *ski, *sled,
*sleigh, *taxi, *yacht

(d) Verbs of sound emission


*babble, *bang, *beat, *beep, *burr, ??buzz, *chatter, *clash, *clatter, *hiss,
*gurgle, *rattle, ??roar, *rumble, *screech, *shriek, *splash, *thump, *whistle,
??zoom

(e) Verbs of dancing


*boogie, *dance, *jig, *jive, *polka, *rumba, *samba, *tango, *waltz

(f) Verbs of body internal motion


*buck, *fidget, *kick, *rock, *teeter, *twitch, *waggle, *wiggle, *wobble, *wriggle

2. List of Motion+Manner Verbs (Japanese) (14 single verbs and 63


compound verbs)

(I) Single Motion+Manner Verbs.


*aruku (walk), *hashiru (run), *haneru (leap), *hau (crawl), *kakeru (run), *moguru
(dive), *oyogu (swim), *tobu (fly), *tobu (jump), *chiru (scatter), *korogaru (roll),
nagareru (flow), *suberu (slide), *mau (dance),

(II) Compound Verbs : [ V1(Manner) + V2(Path) ]


*aruki-mawaru (walk around), *ayumi-deru (walk out), *ayumi-saru (walk-leave),
*hai-agaru (crawl up), *hai-deru (crawl out), *hai-mawaru (crawl around), *hai-
modoru (crawl back), *hai-oriru (crawl down), *hane-agaru (leap up), *hane-mawaru
Conceptual Mappings from Spatial Motion to Time 241

(leap around), *hane-modoru (leap back), *hashiri-deru (run out), *hashiri-komu (run
into), *hashiri-mawaru (run around), *hashiri-oriru (run down), ?hashiri-saru (run-
leave), *kake-agaru (run up), *kake-komu (run into), *kake-mawaru (run around),
*kake-meguru (run around), *kake-modoru (run back), *kake-noboru (run up), kake-
nukeru (run through), *kake-oriru (run down), *korogari-deru (roll out), *korogari-
komu (roll into), *koroge-mawaru (roll around), *korogari-modoru (roll back),
*korogari-nukeru (roll through), *korogari-ochiru (roll-fall), *korogari-oriru (roll
down), *korogari-saru (roll-leave), *mai-agaru (dance up), *mai-komu (dance into),
*mai-modoru (dance back), *mai-ochiru (dance-fall), *mai-oriru (dance down),
*suberi-komu (slide into), *nagare-deru (flow out), *nagare-komu (flow into),
*nagare-kudaru (flow down), *nagare-ochiru (flow-fall), nagare-saru (flow-leave),
*nagare-tsuku (flow-arrive), *nige-daru (sneak away), *oyogi-mawaru (swim around),
*oyogi-saru (swim-leave), *oyogi-tsuku (swim-arrive), shinobi-yoru (sneak-
approach), *suberi-deru (slide out), *suberi-komu (slide into), *suberi-ochiru (slide-
fall), *suberi-oriru (slide down), *tobi-agaru (junp up), *tobi-dasu (jump out), *tobi-
deru (jump out), *tobi-koeru (jump over), *tobi-komu (jump into), *tobi-mawaru
(jump/fly around), *tobi-oriru (jump down), tobi-saru (fly away)

(III) Compound Verbs : [ V1(Manner) + V2(Manner) ]


*mai-chiru (dance-scatter), *mai-tobu (dance-fly)

References
1. Johnson, M., The Body in the Mind: The Bodily Basis of Meaning, Imagination,
and Reason. Chicago: The University of Chicago Press (1987).
2. Kageyama, T., Nichieigo dooshi no imi to bumpoo (Meaning and grammar of
Japanese and English verbs). Handout for presentation at Summer Special
Lectures, Tokyo Gengo Kenkyuujo (1997).
3. Lakoff, G., The Invariance Hypothesis: Is abstract reason based on image-
schemas? Cognitive Linguistics 1 (1990), 39-74.
4. Lakoff, G., The syntax of metaphorical semantic roles. In J. Pustejovsky (Ed.),
Semantics and the Lexicon. Dordrecht: Kluwer (1993a), pp. 27-36.
5. Lakoff, G., The contemporary theory of metaphor. In A. Ortony (Ed.), Metaphor
and Thought, Second ed. Cambridge: Cambridge UniversityPress (1993b), pp.
202-251.
6. Lakoff, G., Johnson, M.: Metaphors We Live By. Chicago: The University of
Chicago Press (1980).
7. Shinohara, K., Invariance and override in space-time metaphor. ICU English
Studies 5 (1996), 39-56.
8. Talmy, L., Lexicalization patterns: semantic structure in lexical forms. In T.
Shopen (Ed.), Language Typology and Syntactic Description, vol. 3, Cambridge:
Cambridge University Press (1985), pp. 57-149.
9. Yamaguchi, K., Cognitive approach to temporal expressions in Japanese and
English. Proceedings of TACL summer institute of linguistics (1995), 203-214.
10. Yamanashi, M., Ninchi Bumpooron (Cognitive Grammar). Tokyo: Hitsuji
Shoboo (1995).
An Introduction to Algebraic Semiotics,
with Application to User Interface Design
Joseph Goguen
Dept. Computer Science & Engineering, Univ. of California at San Diego
Abstract: This paper introduces a new approach to user interface design and
other areas, called algebraic semiotics. The approach is based on a notion of
sign, which allows complex hierarchical structure and incorporates the insight
(emphasized by Saussure) that signs come in systems, and should be studied
at that level, rather than individually. A user interface can be considered as a
representation of the underlying functionality to which it provides access, and
thus user interface design can be considered a craft of constructing such repre-
sentations, where both the interface and the underlying functionality are con-
sidered as (structured) sign systems. In this setting, representations appear as
mappings, or morphisms, between sign systems, which should preserve as much
structure as possible. This motivates developing a calculus having systematic
ways to combine signs, sign systems, and representations. One important mode
of composition is blending, introduced by Fauconnier and Turner; we relate this
to certain concepts from the very abstract area of mathematics called category
theory. Applications for algebraic semiotics include not only user interface design,
but also cognitive linguistics, especially metaphor theory and cognitive poetics.
The main contribution of this paper is the precision it can bring to such areas.
Building on an insight from computer science, that discrete structures can be
described by algebraic theories, sign systems are de ned to be algebraic theo-
ries with extra structure, and semiotic morphisms are de ned to be mappings
of algebraic theories that (to some extent) preserve the extra structure. As an
aid for practical design, we show that the quality of representations is closely
related to the preservation properties of semiotic morphisms; these measures of
quality also provide the orderings needed by our category theoretic formulation
of blending.

1 Introduction
Analogy, metaphor, representation and user interface have much in common:
each involves signs, meaning, one or more people, and some context, including
culture; moreover each can be looked at dually from either a design or a use
perspective. Recent research in several disciplines is converging on a general
area that includes the four topics in the rst sentence above; these disciplines
include (aspects of) sociology, cognitive linguistics, computer science, literary
criticism, user interface design, psychology, semiotics, and philosophy. Of these,
semiotics takes perhaps the most general view, although much of the research in
this area has been rather vague. A goal of the research reported here is to develop
C. Nehaniv (Ed.): Computation for Metaphors, Analogy, and Agents, LNCS 1562, pp. 242-291, 1999.
c Springer-Verlag Berlin Heidelberg 1999
An Introduction to Algebraic Semiotics 243

a mathematically precise theory of semiotics, called algebraic semiotics, that


avoids the error of rei cation, that is, of identifying its abstractions with real
world phenomena, making only the more modest claim of developing potentially
useful models. This paper focuses on applications to user interface design, but
the mathematical formalism also applies to the other areas mentioned above,
especially metaphor theory and cognitive poetics, within cognitive linguistics.
The job of user interface designers is to build good metaphors (representa-
tions, translations, etc.). In this area, the domains to be represented are often
very clear, though prosaic, the designers are often engineers, the intended users
are often mass market consumers, and quality can often be tested, e.g., by lab-
oratory experiments and statistics. Therefore user interface design provides a
good laboratory for studying the general area that we have identi ed. It is inter-
esting to contrast user interface design with (say) poetry, where the objects of
interest are unique brilliant creations, and analysis is dicult (but rewarding).
Nevertheless, they have much in common, including the applicability of semiotic
morphisms and blends.
User interface designers have long wanted the same capability as electrical
and mechanical engineers to make models and reason about them, instead of
having to build prototypes and test them, because proper experiments can be
both time consuming and expensive. Clearly this requires an e ective under-
standing of what user interfaces are and what makes some better than others. A
major di erence from the more established engineering disciplines is that social
factors must be taken into account in setting up the models. Therefore purely
mechanistic procedures are unlikely to be achieved in the near future. My claims
are that user interfaces are representations, that their quality is determined by
what they preserve, and that this can be an e ective basis for design.
User interface issues are exceedingly common, despite a persistent tendency
to ignore them, to downplay their importance, or to minimize their diculty.
A co ee cup is an interface between the co ee and the co ee drinker; questions
like thickness, volume, and handle shape are interface design issues. A book can
be considered a user interface to its content. Buildings can be seen as providing
interfaces to users who want to navigate within them, e.g., a directory in the
lobby, buttons outside and inside the elevators, \exit" signs, doorknobs, stair-
ways, and even corridors (you make choices with your body, not your mouse).
A technical research paper can be seen as a user interface, that to succeed must
take account of its intended user community. Returning to the obvious, medical
instruments have user interfaces (for doctors, nurses, and even patients) that
can have extreme consequences if badly designed. By perhaps stretching a bit,
almost anything can be seen as a user interface; doing so will highlight certain
issues of design and representation that might otherwise remain obscure, though
of course it will not include all possible relevant issues.
User interface issues are also important in mathematics, and have been given
particular attention in relation to choice of notation and to education. As Leibniz
put it,
244 Joseph Goguen

In signs, one sees an advantage for discovery that is greatest when they
express the exact nature of a thing brie y and, as it were, picture it;
then, indeed, the labor of thought is wonderfully diminished.
A good example is the di erence in plane geometry between doing proofs with
diagrams and doing proofs with axioms (see also Appendix D). The above quota-
tion also draws attention to signs and their use, and indeed, our previous discus-
sion about co ee cups, elevator buttons, etc. can be re-expressed very nicely in
the language of semiotics, which is the study of signs. Signs are everywhere: not
just icons on computer screens and corporate logos on T-shirts or racing cars,
but more signi cantly, the organization of signs is the very nature of language,
natural human language both spoken and written, arti cial computer languages,
and visual languages, as in architecture and art, both ne and popular, including
cinema.
We will see that the following ideas are basic to our general theory:
{ Signs appear as members of sign systems1, not in isolation.
{ Most signs are complex objects, constructed from other, lower level signs.
{ Sign systems are better viewed as theories { that is, as declarations for
symbols plus sentences, called \axioms," that restrict their use { than as
(set-based) models.
{ Representations in general, and user interfaces in particular, are \morphisms"
(mappings) between sign systems.
Charles Sanders Peirce [49], a nineteenth century logician and philosopher
working in Boston, coined the word \semiotics" and introduced many of its basic
concepts. He emphasized that meanings are not directly attached to signs, but
that instead, signs mediate meaning, through events (or processes) of semiosis,
each involving a signi er (i.e., a sign), a signi ed (an \object" of some kind
{ e.g., an idea), and an interpretant2 that links these two; these three things
are often called the semiotic triad, and occur wherever there is meaning. Signs,
meanings, and referents only exist for a particular semiosis, which must include
its social context; therefore meaning is always embedded and embodied. In gen-
eral, the signi ed is not given, but must be inferred by some person or persons
involved. Designers work in the reverse direction, creating signs for a given sig-
ni ed. Peirce's approach may sound simple, but it is very di erent from more
common and naive approaches, such as the use of denotational semantics for
programming languages. Peirce's theory of signs is not a representational theory
of meaning, in which a sign has a denotation; instead, the interpretant makes
it a relational theory of meaning. Peirce's important notions of icon, index and
symbol are discussed below in Section 4. In addition, we use the term signal for
a physical con guration that may or may not be a sign.
1
There is a diculty with terminology here: the phrase \semiotic system" sounds too
broad, while \sign system" may sound too narrow, since it is intended to include
(descriptions of) conceptual spaces, as well as systems of physical signs.
2
This is Peirce's original terminology; \interpretant" should not be confused with
\interpreter," as it refers to the link itself.
An Introduction to Algebraic Semiotics 245

Ferdinand de Saussure [54] was a late nineteenth century Swiss linguist,


whose work inspired the recent French structuralist and poststructuralist move-
ments; if he were around today, he might not wish to be called a semioticist.
Nevertheless, he had an important insight that is perhaps most clearly expressed
using the language of signs: it is that signs are always parts of sign systems. He
also gave perhaps the rst, and certainly one of the most in uential examples of
a sign system, with his theory that phonemes, the smallest recognized units of
spoken language, are organized into systems of binary oppositions3, which may
be thought of as features. More complex linguistic signs are then constructed
from lower level signs: words (\lexemes") are sequences of phonemes; sentences
are sequences of words; and tense, gender, number etc. are indicated by various
syntactic features. (Recent research quali es and modi es this classical view in
various ways, but it is a useful model, still widely used in linguistics.)
Composing signs from other signs is a fundamental strategy for managing the
complexity of non-trivial communication, regarding complex signs at one level
as individual signs at a higher level. This is illustrated by the linguistic levels
discussed above. A simple computer graphics example might have as its levels
pixels (individual \dots" on the screen), characters, words, simple geometrical
gures, and windows, which are collections of signs at lower levels plus other
windows; each sign at each level has attributes for location and size, and perhaps
for color and intensity. This whole/part hierarchical structure puts each sign in
a context of other signs with which it forms still higher level signs. Note the
recursivity in the de nition of windows.
More recent uses of sign systems, for example in the classic literary study
S/Z by Roland Barthes [4], tend to be less rigid than the linguistics of Saussure
or the anthropology of Levi-Strauss. Instead of binary oppositions, there are
multi-valued, even continuous, scales; instead of constructing higher level signs
by sequential composition, there are more complex relations of interpenetration
and in uence; and perhaps most importantly, there is a much greater sensitivity
to context. Indeed, the \structuralist" tendency of classical semiotics has been
severely criticized by the post-structuralist and deconstructionist schools for its
limited ability to deal with context. Although Lyotard, Derrida, Baudrillard, and
the rest are surely correct in such criticisms, there is a danger of throwing out
the baby of structure with the dirty bathwater of decontextualization. Although
meaning, as human experience, certainly does not con ne itself to rigid systems
of features, however complexly structured, it is equally undeniable that we see
structure everywhere, and not least in language.
3
A sign system that has just one element can't convey any information, because there
are no di erences. For example, imagine if Paul Revere, in describing how lamps in
the church tower would indicate British invasion plans for Boston, had said \One
if by land and one if by sea." instead of \One if by land and two if by sea." More
technically, with just one sign, the Shannon information content of a message is zero.
If there are two or more signs in a system, there must be some systematic way to
distinguish among them. Or as Gregory Bateson said, information is a di erence
that makes a di erence.
246 Joseph Goguen

Structure is part of our experience, and though seemingly more abstract than
immediate sensations, emotions, evaluations, etc., there is strong evidence that
it too plays a crucial role in the formation of such experiences (e.g., consider how
movies are structured). Context, which for spoken language would include the
speaker, can be at least as important for meaning as the signs involved. For an
extreme example, \Yes" can mean almost anything given an appropriate context.
Moreover, work in arti cial intelligence has found contextual cues essential for
disambiguation in speech understanding, machine vision, and elsewhere.
The vowel systems of various accents within the same language show that
the same sign system can be realized in di erent ways; let us call these di erent
models of the sign system. For computer scientists, it may be helpful to view
sign systems as abstract data types, because this already includes the idea
that the same information can be represented in di erent ways; for example,
dates, times, and sports scores each have multiple familiar representations. The
Greek, Roman and Cyrillic alphabets show that the sets underlying models can
overlap; this example also shows that a signal that is meaningful in one sign
system may not be in another, even though they share a medium. The same
signal in a di erent alphabet is a di erent sign, because it is in a di erent sign
system. The vowel system example also shows that di erent models of the same
sign system can use exactly the same signals in di erent ways; therefore it is how
elements are used that makes the models di erent, not the elements themselves.
Here are some further useful concepts:
{ A medium expresses dimensions within which signs can vary; for example,
standard TV is a two dimensional pixel array with certain possible ranges of
intensity and color, plus a monophonic audio channel with a certain possible
range of frequency, etc.
{ A genre is a collection of conventions for using a medium; these can be
seen as further delimiting a sign system. For example, the daily newspaper
is a genre within the medium of multisection collections of large size pages.
Soap operas are a genre for TV. Obviously, genres have subgenres; e.g., soap
operas about rich families.
{ Multimedia are characterized by multiple simultaneous perceptual chan-
nels. So TV is multimedia, and so (in a weak sense) are cartoons, as well as
books with pictures.
{ Interactive media allow inputs as well as outputs. So PCs are (potentially)
interactive multimedia. The web provides (at least one) genre within this
medium; email is another.
We can even say that a book is interactive, because users can mark and turn
pages, and can go to any page they wish; indices, glossaries, etc. are also used
in an interactive manner. Many museums have interactive multimedia exhibits,
and every museum is interactive in a more prosaic sense.
This paper proposes a precise framework for studying sign systems and their
representations, as well as for studying what makes some representations better
than others, and how to combine representations. The framework is intended
for application to aspects of communication and cognition, such as designing
An Introduction to Algebraic Semiotics 247

and understanding interfaces, coordinating information in di erent media, and


choosing e ective representations in, e.g., natural language, video clips, inter-
active media, etc. One goal is to get a calculational approach to user interface
design, like that found in other engineering disciplines. Although our ocial
name for this approach is algebraic semiotics, it might also be called struc-
tural semiotics to emphasize that meaning is structural, or (in its philosophic
guise) even morphic semiotics, to emphasize that meaning is dynamic, con-
textual, embodied and social. In a sense, this paper proposes a general theory
of meaning, although it denies the possibility of traditional context-independent
meaning. The social nature of information is discussed in [21], using ideas from
ethnomethodology [53].
Familiarity with (a little bit of) OBJ3 and algebraic speci cation is needed
for the examples in Appendix A, and familiarity with basic category theory is
needed for Appendix B; references for these two topics are [28] and [33, 16, 17],
respectively. Most philosophical discussion has been banished to Appendix C,
while Appendix D is an essay on the social nature of proofs, which provides more
concrete illustrations of some points in Appendix C.

1.1 Semiotic Morphisms


One of the great insights of twentieth century mathematics, with consequences
that are still unfolding, is that structure preserving morphisms are often at least
as important as the structures themselves. For example, linear algebra is more
fundamentally concerned with linear maps (often represented by matrices) be-
tween vector spaces, than with vector spaces themselves (though the latter are of
course not to be despised); without giving details, there are also computable func-
tions in recursion theory, embeddings and tangent maps in geometry, analytic
and meromorphic functions in complex analysis, continuous maps in topology,
and much more { all of them structure preserving maps.
This conceptual revolution took a more de nite and systematic form with
the invention of category theory in the early 1940's by Eilenberg and Mac Lane;
see [41]. Technical developments within category theory have in turn spurred
further and deeper uses of morphisms within mathematics, and more recently in
applied elds like computer science. This process has not ceased, and applications
continue to inspire new theory, such as the 32 -categories and 32 -pushouts that are
discussed in Appendix B of this paper.
Semiotics has escaped this particular revolution, probably in part due to
its increasing alienation from formalization during the relevant period. But I
claim there is much to be gained from this unlikely marriage of semiotics and
category theory (with cognitive linguistics as bridesmaid), not the least of which
is a theory of representation that can be applied to topics of current interest,
like user interface design, metaphor theory, and natural language understanding.
The essential idea is that interfaces, representations, metaphors, interpretations,
etc. are morphisms from one sign system to another.
A user interface for a computer system can be seen as a semiotic morphism
from (the theory of) the underlying abstract machine (what the system does)
248 Joseph Goguen

to a sign systems for windows, buttons, menus, etc. [31]. A web browser can
be seen as a map from html (plus JavaScript, etc.) into the capabilities of a
particular computer on which it is running4 . Metaphors can be seen as semiotic
morphisms from one system of concepts to another [10, 12, 58]. A given text
(spoken utterance, etc.) can be seen as the image under a morphism from some
(usually unknown) structure into the sign system of written English (or spoken
English, or whatever). Conversely, we may be given some situation, and want to
nd the best way to describe it in natural language, or in some other medium or
combination of media, such as text with photos, or cartoon sequences, or video,
or online hypertext or hypermedia [27].
In these and many other cases, representations are signs in one system that
relate systematically to signs in another system. Generally it is just as fruitless
to study representations of single signs as to study single isolated signs. For rep-
resentations also occur in systems, just as signs do: usually there are systematic
regularities in how signs of one system are represented as signs of another. Let
us use the notation M : S1 ! S2 for a morphism from sign system S1 to sign
system S2 . Of course, in all but the most trivial cases, there is no unique mor-
phism S1 ! S2. Think, for example, of the diculties of translating from one
language to another. Moreover in general, morphisms are partial, that is, not
de ned for all the signs in the source system; some signs may be untranslatable,
or at least, not translated by a given morphism.
Here are some very simple examples. Let N1 be the familiar decimal Arabic
numerals and let N2 be the Roman numerals. Then there is a natural morphism
M : N1 ! N2 but it is unde ned for Arabic 0, since the Romans did not have
the concept of zero. We can also consider transliterations between the English
and Greek alphabets: then certain letters just don't map. Similarly, Scandinavian
alphabets make some distinctions that the English alphabet does not; Chinese
and Sanskrit raise still other problems. Ciphers (i.e., \secret codes") are also
representations, simple in their input and output alphabets, but deliberately
complex in their algorithmic construction.
Further examples and details about the systematic organization of signs are
discussed later, but it should now be clear that an ambitious enterprise is being
proposed, taking a wide interpretation of the notion of sign, and treating sign
systems and their morphisms with great rigor. However, because this enterprise is
still at an early stage, our examples cannot be both complex and detailed. Hoping
that readers will forgive the ambition and e rontery of combining such diverse
elements, I acknowledge the deep indebtedness of this work to its precedents,
and hope to have the help of readers of this paper in developing its potential.

4
These two examples highlight the important but subtle point that theory morphisms
go in the opposite direction from the maps of models that they induce; this duality
is explained at an abstract level by the theory of institutions [24], but is well outside
the scope of this paper.
An Introduction to Algebraic Semiotics 249

1.2 Some Related Work


An adequate survey of related work in semiotics, cognitive science, linguistics,
user interface design, literary criticism, etc. would consume many volumes. We
have already mentioned the work of Peirce [49] and Saussure [54], whose in uence
is pervasive; this brief subsection only sketches a few especially closely related
items of more recent vintage. First is joint work with Linde begun more than 15
years ago [27], which contains the seeds for the main ideas of this paper. Analo-
gies and le names were studied by Gentner [14] and Carroll [8] respectively,
using set-based formalisms that can capture structure, but without axioms, lev-
els or constructors; Sacks' ethnomethodological notion of \category system" [52]
seems similar to our notion of sign system, but is very informal. We build on
work of cognitive linguists Lako , Johnson and others [40] on metaphors, and
Fauconnier and Turner's exciting proposal of blending (also called \conceptual
integration") as a fundamental cognitive operation for metaphor, grammar, etc.
[11, 12]; see also [58]. Shneiderman [55] is a good textbook on user interface de-
sign, and Norman [48] gives a good overview of broader design issues. Latour
[43] gives a fascinating case study of design emphasizing the importance of so-
cial context, and [38] contains a number of case studies in the area of computer
systems design. Andersen [2] has done some fascinating work applying semiotics
and catastrophe theory to the design of interactive systems.

1.3 On Formalization
Sapir said all systems leak ; he was referring to the fact that no grammatical
system has ever successfully captured a real natural language, but it is natural
to generalize his slogan to the formalization of any complex natural sign system.
There are always \loose ends"; some deep reasons for this, having to do with the
social nature of communication, are discussed in [21]. Thus we cannot expect our
semiotic models to be perfect. However, a precise description that is somewhat
wrong is better than a description so vague that no one can tell if it's wrong.
We do not seek to formalize actual living meanings, but rather to express our
partial understandings more exactly. Precision is also needed to build computer
programs that use the theory. I do not believe that meaning in the human sense
can be captured by formal sign systems; however, human analysts can note the
extent to which the meanings that they see in some sign system are preserved by
di erent representations. Thus we seek to formalize particular understandings
of analysts, without claiming that such understandings are necessarily correct,
or have some ideal kind of Platonic existence.

Acknowledgements
The proofs in Appendix B were provided by Grigore Rosu, and the basic def-
initions were worked out in collaboration with Grigore Rosu and Razvan Dia-
conescu. Further results on 32 -colimits should eventually appear in a separate
paper. I wish to thank the students in my Winter 1998 class CSE 271 on user
250 Joseph Goguen

interface design, for their patience, enthusiasm, and questions. I also thank Gilles
Fauconnier, Masako Hiraga, and Mark Turner for their valuable comments on
earlier drafts of this paper, and Michael Reddy for intensifying my interest in
metaphor, as I supervised his PhD thesis at the University of Chicago.

2 Sign Systems
Sign systems usually have a classi cation of signs into certain sorts5 , and some
rules for combining signs of appropriate sorts to get a new sign of another sort;
we call these rules the constructors of the system. Constructors may have pa-
rameters. For example, a \cat" sign on a computer screen may have parameters
for the size and location of its upper lefthand corner; changing these values does
not change the identity of the cat.
Constructors may have what we call priority: a primary constructor has
greatest priority; secondary constructors have less priority than the primary
constructor but more than any non-primary or non-secondary constructor; ter-
tiary constructors, etc. follow the same pattern. Priority is a partial ordering,
not total. Experiments of Goguen and Linde [27] (testing subjects after multi-
media instruction in various formats about a simple electronic device) support
the hypothesis that the reasoning discourse type [32] has a primary constructor
that conjoins reasons supporting a statement6 .
Semiotics should focus on the structure of sign systems rather than on ad
hoc properties of individual signs and their settings, just as modern biology
focuses on molecular structures like DNA rather than on descriptive classi cation
schemes. For example, formalizing the handwritten letter \a" (or the spoken
sound \ah") in isolation, is both far harder and less useful than formalizing
relations between written letters and words (or phonemes and spoken words).
It is natural to think of a sign system as a set of signs, grouped into sorts
and levels, not necessarily disjoint, with \constructor" functions at each level
that build new signs from old ones. But such a set-based approach does not
capture the openness of sign systems, that there might be other signs we don't
yet know about, or haven't wanted to include, because we are always involved in
constructing only partial understandings. It is therefore preferable to view sign
systems as theories than as pre-given set theoretic objects. This motivates the
following:
De nition 1: A sign system S consists of:
1. a set S of sorts for signs, not necessarily disjoint;
5
We deliberately avoid the more familiar word \type" because it has had so many
di erent uses in computer science. The so called parts of speech in syntax, such as
noun and verb, are one example of sorts in the sense that we intend.
6
The primary constructor of a given discourse type is its \default" constructor, i.e., the
constructor assumed when there is no explicit marker in the text. In narrative, if one
sentence follows another we assume they are connected by a sequence constructor;
this is called the narrative presupposition [39].
An Introduction to Algebraic Semiotics 251

2. a partial ordering on S , called the subsort relation and denoted ;


3. a set V of data sorts, for information about signs, such as colors, locations,
and truth values;
4. a partial ordering of sorts by level, such that data sorts are lower than sign
sorts, and such that there is a unique sort of maximal level, called the top
sort;
5. a set Cn of level n constructors used to build level n signs from signs at
levels n or less, and written r : s1 :::sk d1 :::dl ! s, indicating that its ith
argument must have sort si , its j th parameter data sort dj , and its result
sort is s; constants c : ! s are also allowed;
6. a priority (partial) ordering on each Cn ;
7. some relations and functions on signs; and
8. a set A of sentences (in the sense of logic), called axioms, that constrain
the possible signs.
2
We can illustrate some parts of this de nition with a very simple time of day
sign system. It has just one sort, namely time, and just two constructors, one
the constant time 0 (for midnight), and the other a successor operation s, where
for a time t, s(t) is the next minute. There are no subsorts, data sorts, levels, or
priorities. But there is one important axiom,
s1440 (t) = t ;
where s1440 indicates 1440 applications of s, or more prosaically7,
s1440 (0) = 0 :
These axioms capture the cyclic nature of time over a day; any reasonable rep-
resentation for time of day must satisfy this condition. Let's denote this sign
system TOD.
An example illustrating some further parts of De nition 1 is a 24 line by
80 character display for a simple line-oriented text editor. The main sorts of
interest here are: char (for character), line, and window. The sort char has two
important subsorts: alphanum (alphanumeric) and spec (special); and alphanum
has subsorts alpha and num. Among the special characters should be characters
for space and punctuation, including comma, period, etc. The subsort relations
involved here have the following graph,
char
/ \
alphanum spec
/ \
alpha num
where of course alpha and num are also subsorts of char. These sorts have levels
in a natural way: window is the most important and therefore8 has level 1, line
7
An additional assumption that is explained later is needed to show the equivalence
of these two axioms.
8
This assumes the ordering of sorts by level takes 1 as the maximum level.
252 Joseph Goguen

has level 2, char has level 3, alphanum and spec have level 4, and alpha and
num have level 5 (or we could give all subsorts of char level 4, or even 3; such
choices can be a bit arbitrary until they are forced by some de nite applica-
tion). There are various choices for the constructors of this sign system. Since
lines are strings of characters, one choice is an operation _ that concatenates a
character with a line to get a longer line, and another operation, also denoted
_, that concatenates a line and a window to get another window; there must
also be constant constructors for the empty line and the empty window. (The
constraints on the lengths of lines and windows are given by axioms that are
discussed below.) For each sort, the concatenation operations have priority over
the constant operations.
This editor also has data sorts for xed data types that are used in an
auxiliary way in describing its signs: these include at least the natural numbers,
and possibly colors, fonts, etc., depending on the capabilities we want to give our
little editor. Functions include windowidth and windowlength, and there could
also be predicates for the subsorts, such as a numeric predicate on characters.
Then the constraints of length can be expressed by the following axioms:
(8L : line) windowidth(L)  24 .
(8W : window) windowlength(W )  80 .
Let us denote this sign system W.
If we want to study how texts can be displayed in this window, we should
de ne a sign system for texts. One simple way to do this has sorts char, word,
sent (sentence), and text, in addition to the data sorts and the subsorts of
char as in W above; the sort text is level 1, sent level 2, word level 3, and
char level 4. There are several choices for constructors, one of which de nes any
concatenation of alphanumeric characters to be a word, any concatenation of
words to be a sentence, and any concatenation of sentences to be a text. Let us
denote this sign system TXT. Clearly there are many di erent ways to display
texts in a window, and each one is a di erent semiotic morphism; we will see
some of these later.
A somewhat di erent sign system is given by simple parsed sentences, i.e.,
sentences with their \part of speech" (or syntactic category) explicitly given.
The most familiar way to describe these is probably with a context free gram-
mar like that below, where S, NP, VP, N, Det, V, PP and P stand for sentence, noun
phrase, verb phrase, noun, determiner, verb, prepositional phrase, and preposi-
tion, respectively:

S -> NP VP
NP -> N
NP -> Det N
VP -> V
VP -> V PP
PP -> P NP
.....
An Introduction to Algebraic Semiotics 253

The \parts of speech" S, NP, VP, etc. are the sorts of this sign system, and the
rules are its constructors. For example, the rst rule says that a sentence can
be constructed from a NP and a VP. There should also be some constants of the
various sorts, such as
N -> time
N -> arrow
V -> flies
Det -> an
Det -> the
P -> like
......

There is a systematic way to view context free rules as operations that \con-
struct" things from their parts (introduced in [15]), which in this case gives the
following:
sen : NP VP -> S
nnp : N -> NP
np : N Det -> N
vvp : V -> VP
vp : V PP -> VP
pp : P NP -> PP
.....
time : -> N
flies : -> V
.....

It is a more elegant use of the machinery we have available to regard N as a


subsort of NP, and V as a subsort of VP, than to have monadic operations N ->
NP and V -> VP. Let's call the resulting sign system PS. It gives what computer
scientists call abstract syntax for sentences, without saying how they are to be
realized. We can of course still get \real" sentences, such as \time ies like an
arrow", but this traditional linear form fails to show the syntactic structure,
which is typically done using trees, as in
S
/ \
NP VP
/ / \
N V PP
/ | / \
time flies P NP
/ / \
like an arrow

Another approach is to view a sentence as a \term" involving the operations


above (terms are compositions of constructors); here's how our little sentence
looks in that notation:
254 Joseph Goguen

sen(time, vp(flies, pp(like, np(an, arrow)))) .


So called bracket (or bracket-with-subscript) notation, as used in linguistics, also
shows syntactic structure; it is surely a bit harder to read, and looks like this:
[[time]N[[flies]V[[like]P[[an]Det[arrow]N ]NP ]PP]VP ]S .
(Another example of bracket notation appears in Section 4). In this setting, we
can also use equations to express constraints on sentences, for example, that the
number of the subject and of the verb agree (i.e., both are singular or both are
plural).
Each of these concrete ways to realize abstract syntax (trees, terms, bracket
notation, and lists) can be considered to give a model of the sign system, pro-
viding a set of signs for each sort, and operations on those sets which build new
signs from old ones. You might have expected this to be the de nition of sign
system, instead of what we gave, which is a language for talking about such mod-
els. Our sign systems are theories rather than models. The distinction is that a
model provides concrete interpretations for the things in the theory: sorts are
interpreted as sets; constant symbols are interpreted as elements; constructors
are interpreted as functions, etc. This allows much exibility for what can be a
model.
We often wish to exclude models where two di erent terms denote the same
thing; otherwise, for example, two di erent times of day might be represented
the same way9. This is called the no confusion condition; more precisely, it
says that if two terms cannot be proved equal using axioms in the theory, then
they must denote di erent elements in the model. Also it is often desirable to
restrict to models where all signs are denoted by terms in their theory; these
are called the reachable models10 . An important point brought out in the next
section is that semiotic morphisms do the same conceptual work as models, but
in a way that is more convenient for many purposes.
It has been shown that any computable model can be de ned using only
equations as axioms11. Therefore we lose no generality by using equational logic
for examples, as has been advocated in the situated abstract data type approach
described in [20]. More precisely, our examples (in the appendices) use order
sorted equational logic over a xed data algebra [18, 29], although the reader
does not need to be familiar with the technicalities of this logic.
What Fauconnier and Turner [10, 12] call conceptual spaces are also sign
systems, of a rather simple kind, where there are (usually) no constructors except
constants, and where in addition, there are some relations de ned among these
constants. Typical conceptual spaces are little theories of some everyday concept,
9
But see the 12 hour analog clock representation denoted A below, where this seems
to happen.
10
This is also called the no junk condition. It is now an interesting exercise to prove
that the two di erent equations previously given for expressing the cyclic nature of
days are equivalent for reachable models.
11
Although an additional technical condition called initiality is needed; see [47] for a
survey of this and related results.
An Introduction to Algebraic Semiotics 255

owner resident
owner passenger

own live-in own ride

house
boat

Fig. 1. Two Simple Conceptual Spaces

including only as much detail as is needed to analyze some particular text. For
example, a theory of houses might have constants house, owner and resident,
with relations own and live-in making the obvious assertions. Similarly, a boat
theory might have constants boat, owner and passenger, with relations own and
ride. These two spaces are illustrated in Figure 1. No sorts are shown, but for
this simple example, one is enough, say Thing. That a relation such as own, holds
of two things is given by a line in the gure, and in the corresponding logical
theory is given by an axiom, e.g., own(owner,house). It is usually assumed that
relation instances that are not shown (such as ride(boat,owner)) do not hold,
i.e., are false (one way to formalize this, which is related to the so called frame
problem in arti cial intelligence, is given in Chapter 8 of [23]). Let us call this
the default negation assumption. But sometimes whether or not a relation
holds may be unknown. Humans generally do a good job of guring all this
out, using what is called \common sense". However, the deductions involved can
actually be extremely complex; some hints of this complexity may be found in
the discussion of the blending examples in Section 5.
Formalism and representation feature in much recent work in sociology of
science, with many fascinating examples. For example, Latour [42] shows how
representation by cartographic maps was essential for European colonization,
and Bowers [6] discusses the politics of formalism, including cscw systems. La-
tour leaves representation unde ned, while Bowers has a slightly formal notion
of formalism. I believe that such discussions could be given greater precision by
using the framework proposed in this paper.

3 Semiotic Morphisms
The purpose of semiotic morphisms12 is to provide a way to describe the move-
ment (mapping, translation, interpretation, representation) of signs in one sys-
tem to signs in another system. This is intended to include metaphors as well
as representations in the more familiar user interface design sense. Generating a
good icon, le name, explanation or metaphor, or arranging text and graphics
together in an appropriate way, each involves moving signs from one system to
12
Although the root \morph" of the noun \morphism" means \form," this word has
recently also become a verb meaning \to change form."
256 Joseph Goguen

another. Just as we de ned sign systems to be theories rather than models, so


their morphisms are between theories, translating from the language of one sign
system to the language of another, instead of just translating the concrete signs
in the models. This may sound a bit indirect, but it has important advantages
over a model based approach; moreover, theories and their morphisms determine
models and their mappings.
A good semiotic morphism should preserve as much of the structure in its
source sign system as possible. Certainly it should map sorts to sorts, subsorts
to subsorts, data sorts to data sorts, constants to constants, constructors to con-
structors, etc. But it turns out that in many real world examples, not everything
is preserved. So these must all be partial maps. Axioms should also be preserved
| but again in practice, sometimes not all axioms are preserved.
De nition 2: Given sign systems S1; S2, a semiotic morphism M : S1 !
S2 , from S1 to S2 , consists of the following partial functions (all denoted M ):
1. sorts of S1 ! sorts of S2 ,
2. constructors of S1 ! constructors of S2 , and
3. predicates and functions of S1 ! predicates and functions of S2 ,
such that
1. if s  s0 then M (s)  M (s0 ),
2. if c : s1 :::sk ! s is a constructor (or function) of S1 , then (if de ned)
M (c) : M (s1 ):::M (sk ) ! M (s) is a constructor (or function) of S2 ,
3. if p : s1 :::sk is a predicate of S1 , then (if de ned) M (p) : M (s1 ):::M (sk ) is a
predicate of S2 , and
4. M is the identity on all sorts and operations for data in S1 .
More generally, a semiotic morphism can map source system constructors and
predicates to compound terms de ned in the target system13 . 2
A semiotic morphism S1 ! S2 gives representations in S2 for signs in S1 . If
we know how a semiotic morphism maps constructors, then we can compute
how it maps complex signs. For example, if M (a) = a0 ; M (b) = b0 ; M (c)(x; y) =
c0 (x; y + 1) + 1, and M (f )(x; y) = x + y + 1, then
M (c(a; f (3; b))) = c0 (a0 ; b0 + 5) + 1 .
We now consider some examples. First, suppose we want to represent time
of day, TOD, in the little window, W. Clearly there are many ways to do this;
each of them must map the sort time to the sort window, map the constructor
0 to some string of (less than 25) strings of (less than 81) characters, and map
the constructor s to a function sending each such string of strings to some other
string of strings. There isn't anything else to preserve in this very simple example
except the axiom, which however is very important here.
Recall that the items of abstract syntax in TOD are strings of up to 1439 s's
followed by a single 0. One simple representation just maps these strings directly
to strings of strings of s's plus a nal 0, such that the total number of s's is the
13
This is illustrated by M (c) is the example just below this de nition.
An Introduction to Algebraic Semiotics 257

same; this is a kind of unary notation. Let N (t) be the number of s's in some
t from TOD. Let Q(t) and R(t) be the quotient and remainder after dividing
N (t) by 80. Then there will be Q(t) lines of 80 s's followed by one line of R(t)
s's and a nal 0. This is guaranteed to t in our window because Q(1439) = 17
is less than 24, and R(t) + 1  80. For humans, this representation is so detailed
that it is more or less analog: I think after getting familiar with it, a user would
have a \feel" for the approximate number of (these strange 80 minute) hours in
a window and of minutes in the last line, just from its appearance. Let us call
this representation U . Figure 2 shows the time that we would call \1:15 pm" in
it.

ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss0

Fig. 2. A Strange \Unary" Clock

Another obvious but naive representation just displays N (t) in decimal nota-
tion, giving a string of 1 to 4 decimal digits. This is very di erent from our usual
representations; but we could imagine a culture that divides its days into 14
\hours" each having 100 minutes, except the last hour, which only has 40 (this
is less strange than what we do with our months, with their varying numbers
of days!). Here N (0) is 0, and s just adds 1, except that s(1439) = 0. Figure
3 shows quarter after one in the afternoon in this representation; the last two
digits give the number of minutes, and those to the left of that give the number
of \hours". Let us call this representation N .

795
Fig. 3. A Naive Digital Clock

A more familiar representation is constructed as follows: Let N 1 and N 2 be


the quotient and remainder of N divided by 60, both in base 10, with 0's added
in front if necessary so that each has exactly 2 digits. Now form the string of
characters \N1 : N2". This is the so-called \military" representation of time;
let's denote it by M . Then M (0) = 00:00, and of course you know how s goes.
Figure 4 shows our usual afternoon time in a slight variant of this representation.
Notice that this representation has been de ned as a composition of N with a
re-representation of TOD to itself. The spoken variant of military time has the
258 Joseph Goguen

13 15
Fig. 4. A Military Time Clock

form \N 1 hundred N 2 hours" (unless N 2 = 00, in which case N2 is omitted).


The use of \hundred" and \hours" may seem odd here, because it isn't hundreds
and it isn't hours! | but at least it's clear | and that's the point. Part of this
clarity stems from the phonology: the aspirated \h" sound at the beginning of
\hundred" and \hour" does not occur in any of the numerals, and hence makes
a good separator, especially over radio, where it gets exaggerated.
Readers should now be able to construct other representations of time as
semiotic morphisms, including even the \analog" representation of a clock face.
Here 0 has both hands up, satisfaction of the axiom follows because something
even stronger is true, namely s(719) = 0, which is built into the circular nature
of this geometrical representation. This is an example where no confusion seems
to fail { but does it really?14 Let's call this representation A.
Now let's consider displaying parsed sentences, from the sign system PS, in
the window W; this means constructing a semiotic morphism E from PS to W.
One issue is placing spaces so that words are always separated. The designer
will also have to do something about words that want to go past the end of a
line; choices include wrapping around, hyphenating, and lling in with spaces;
the limit of 24 lines will also pose problems. This can clearly get complex. The
designer may also want to consider more sophisticated issues, such as automatic
insertion of commas to show some of the syntactic structure.

owner user landlord renter


000
111 000111
111000
111
1111
0000 000
111 111
000000
0000
1111 000
111 000
111000
111
000
111 000
111
000111
0000
1111 000
0000
1111 000
111 111000
111
0000
1111 000
111 000
111000
111
own
0000
1111 000
111
use
000
111 000
111
000111
own
111000
rent
000
111
0000
1111 000
111 000
111
0000
1111
0000
1111 000
111 000111
111000
000
111
0000
1111 000
111
11
00 000111
111000
11
00
object house

Fig. 5. A Conceptual Space Morphism

Morphisms of conceptual spaces are relatively simple; we map sorts to sorts,


constants to constants and relations to relations, so as to preserve the sorts
of constants and relations. Because assertions that relations hold are given by
axioms, preservation of axioms implies preservation of the corresponding relation
14
Think about why a clock modulo 720 might work, but (say) modulo 120 or modulo
300 would not work; what about modulo 360? Why? Hint: consider the context
in which the representation is typically used. Answers to these exercises appear in
Section 4.
An Introduction to Algebraic Semiotics 259

instances, positive or negative. If no axiom is given for some relation instance


in the source theory, then either value is possible, and since there is nothing to
preserve, the same holds for the target theory. The pairs of constants, one from
the source and one from the target, that are mapped by a semiotic morphism
are called connectors by Fauconnier and Turner [10, 12]. For example, there is
a connector from object to house (though we have not drawn it, as Fauconnier
and Turner would).
A natural task in the experiments of [27] is to describe the condition of
four lights on the face of an experimental device. This involves constructing a
sequence of clauses arranged in narrative order, and can be seen as a semiotic
morphism from the sign system of lights to that of English; a typical sentence
would (in part) have the form \the rst light is on, the second is o , ...".
For another example, a source sign system might contain instructions for
repairing some piece of equipment, with target sign system a non-color graphics
screen plus a speech chip. We can then ask how to generate instructional material
using these particular capabilities in the most e ective way.
Star [56] introduced the notion of boundary object, which (for our pur-
poses) can be seen as a sign system that is interpreted in di erent ways by
di erent social groups. These interpretations can be seen as semiotic morphisms
from the sign system of the boundary object into the more speci c sign systems
of each group. For example, bird sightings are taken in di erent ways by amateur
and professional ornithologists; the former make lists, emphasizing rare birds in
friendly competition with other amateurs, whereas the latter integrate sightings
with much larger datasets to construct migration patterns, population densities,
long term trends, etc. A sighting may rst appear as a eld notebook entry, and
then move into the two very di erent contexts. (A similar example is given in
[56].)

3.1 First Steps Towards a Calculus of Representation


Semiotic morphisms can be composed; this is important because it is very com-
mon to compose representations, for example, in processes of iterative design,
where details are added in progressive stages. The composition of semiotic mor-
phisms satis es some simple equations, which show that sign systems together
with semiotic morphisms form what is called a category in mathematics; surpris-
ingly, this is enough to de ne some important concepts and derive their main
properties (much more information on the category of sign systems is given in
Appendix B).
Given semiotic morphisms M : S1 ! S2 and M 0 : S2 ! S3 , then their
composition, which is denoted15 M ; M 0 : S1 ! S3, is formed just by composing
the component functions of each morphism, not forgetting that these are partial
functions, so that the result is also partial. For example, the sort set function
15
We use the symbol \;" for composition, to indicate the opposite order from that
associated with the usual symbol \" { that is, \f ; g" means rst do f , then do g,
as with commands in a programming language.
260 Joseph Goguen

for M ; M 0 is the composite of the sort set functions of M and M 0 . Then it is


easy to see that M ; M 0 is also a semiotic morphism, and that if in addition we
are given M 00 : S3 ! S4 , then
(M ; M 0); M 00 = M ; (M 0 ; M 00 ) ,
i.e., composition of semiotic morphisms is associative. Moreover, for every sign
system S , there is an identity semiotic morphism 1S : S ! S , which is de ned
to have as its component functions the identities on each set from S . This trivial
morphism is rarely useful in practical design, but it plays an important role in
the theory, just as do identity elements in algebra. For any M : S1 ! S2 , the
following equations are satis ed,
1S1 ; M = M ,
M ; 1S2 = M .
Composition of semiotic morphisms can be used to factor design problems,
i.e., to separate concerns into di erent stages. For example, designing a text
editor using the window W involves constructing a semiotic morphism E from
TXT to W. In addition to the issues already mentioned for displaying sentences,
we need to mark the boundaries of sentences, e.g., with a period followed by two
spaces at the end. Issues that concern separating units, e.g., adding spaces and
periods, can be separated from issues that concern the size of the window, e.g.,
hyphenation, by \factoring" the morphism E into two morphisms, E 1 : TXT !
STG and E 2 : STG ! W, where STG is a sign system of strings of characters,
and E is the composition E 1; E 2.
Composition and identity of semiotic morphisms allow us to de ne the notion
of isomorphism for sign systems, as a morphism M : S1 ! S2 such that
there exists an inverse morphism M 0 : S2 ! S1 such that M ; M 0 = 1S1 and
M 0 ; M = 1S2 . Isomorphic sign systems de ne exactly the same structure, and
have the same models. Remarkably, it now follows from just the associative and
identity laws that the inverse is unique if it exists, and that the relation of
isomorphy (i.e., of being isomorphic) on sign systems is re exive, symmetric
and transitive. If we write  = for the isomorphy relation, then these facts may be
written
S=S ,
S1 
= S2 implies S2  = S1 ,
S1 
= S2 and S2  = S3 implies S1  = S3 .
Moreover, if we denote the inverse of M by M ;1, then the following equations
are also easily proved,
(1S );1 = 1S ,
(M ;1 );1 = M , and
(M ; M 0);1 = M 0;1 ; M ;1 .
The last equation particularly has some not entirely trivial content, and can be
useful in thinking about the composition of isomorphism representations.
An Introduction to Algebraic Semiotics 261

4 The Quality of Semiotic Morphisms


The goal of user interface design is to produce high quality representations;
unfortunately, it has not been very clear how to determine quality. Also, as in
other areas of engineering, design is subject to constraints, and typically involves
tradeo s, i.e., compromises between competing measures of success, such as cost,
size, complexity and response time. Limits on human capability for dealing with
complex displays implies that some information may have to be compressed,
deleted, or moved elsewhere. This in turn implies that we need to understand
priorities on what should be preserved.
In determining what makes one representation better than another, the entire
structure of the sign systems involved should be considered. The structure that
is preserved by semiotic morphisms provides an important way to compare their
quality. First, notice that because a semiotic morphism M : S1 ! S2 need not
be total, some signs in S1 may have no representation in S2 ; moreover, some
of the complex internal structure of signs in S1 could be lost. This might seem
undesirable, but if representations in S2 get too complex, they will not be useful
in practice. For example, if S1 is English sentences and S2 is bracket notation,
then the representation (from the data of [27])
[[[[the]Det [light]N ]NP [[on]Prep [[the]Det [left ]N ]NP ]PP ]NP [[comes]V [on]Part ]VP ]Sent
is not as useful for human communication as the linear representation would be.
In fact, we very often want what Latour [42] calls a re-representation, which con-
centrates or abstracts information. For example, the representation \[NP PP ]VP "
is more useful than that above for some purposes, precisely because it omits some
information. Statistics, such as the mean and median of a population, are also
re-representations in this sense, as are cartographic maps.
Peirce [49] introduced a well-known three-fold classi cation of signs into icon,
index, and symbol. These terms have precise technical meanings that di er from
their everyday use. Peirce de ned an icon as a \sign which refers to the Object
that it denotes merely by virtue of characters of its own ... such as a lead-pencil
streak representing a geometrical line." In contrast, a sign x is an index for an
object y if x and y are regularly connected, in the sense \that always or usually
when there is an x, there is also a y in some more or less exactly speci able
spatio-temporal relation to the x in question" [1]. \Such, for instance, is a piece
of mould with a bullet-hole in it as sign of a shot" [49]. In this example, the
spatio-temporal relation is a causal one, which applies with great generality.
However many indices only work in very particular spatio-temporal contexts,
e.g., the use of rst names for persons. Finally, Peirce de nes a symbol as a
\sign which is constituted a sign merely or mainly by the fact that it is used
and understood as such." In addition, we use the term signal for a physical
con guration that may or may not be a sign.
Thus, an iconic representation preserves some important properties of signi-
ed signs; for a semiotic morphism, these might appear as axioms and/or data
valued functions (for which the word \attribute" is commonly used in the object
oriented community). An indexical representation participates in some larger
262 Joseph Goguen

situation (i.e., theory) within which we can deduce the connection between the
signi ed and signifying signs. For a symbol, there is no such more basic relation-
ship between source and target signs.
For purposes of design, other things being equal, there is a natural ordering to
these three kinds of sign: icons are better than indices, and indices are better than
symbols. However, things are not always equal. For example, base 1 notation for
natural numbers is iconic, e.g., 4 is represented as ||||, 3 as |||, and we get
their sum just by copying and appending,
|||| + ||| = ||||||| ,
which is iconic. But base one notation is very inecient for representing large
numbers. With Arabic numerals, the use of 1 for \one" is iconic (one stroke), but
the others are symbolic16 . Using the blank character for \zero" would be iconic,
but of course this would undermine the positional aspect of decimal notation
and introduce ambiguities. Chinese notation for several of the small numerals is
iconic.
Peirce's three classes of sign overlap, so some signs will be hard to classify.
Also, complex situations may involve all three kinds of sign, interacting in com-
plex ways; indeed, di erent aspects of the same sign can be iconic, indexical, and
symbolic. It is often necessary to consider the context of a sign, e.g., how is it
used in practice, and of course its relation to other signs in the same system. See
[19, 35] for further examples and discussion, the former mainly from computer
science, and the latter mainly from language.
The following de nition gives some precise ways to compare the quality of
representations:
De nition 3: Given a semiotic morphism M : S1 ! S2, then:
(1) M is level preserving i the partial ordering on levels is preserved by M ,
in the sense that if sort s is lower level than sort s0 in S1 , then M (s) has
lower (or equal) level than M (s0 ) in S2 .
(2) M is priority preserving i c < c0 in S1 implies M (c) < M (c0 ) in S2 .
(3) M is axiom preserving i for each axiom a of S1 , its translation M (a) to
S2 is a logical consequence of the axioms in S2 .
(4) Given also M 0 : S1 ! S2 , then M 0 is (at least) as de ned as M , written
M  M 0 , i for each constructor c of S1 , M 0 (c) is de ned whenever M (c) is.
(5) Given also M 0 : S1 ! S2 , then M 0 preserves all axioms that M does,
written M  M 0 , i whenever M preserves an axiom, then so does M 0 .
(6) Given also M 0 : S1 ! S2 , then M 0 is (at least) as inclusive as M i
M (x) = x implies M 0 (x) = x for each sign x of S1 .
(7) Given also M 0 : S1 ! S2 , then M 0 preserves (at least) as much content
as M , written M  M 0, i M 0 is as de ned as M and M 0 preserves every
selector that M does, where a morphism M : S1 ! S2 preserves a selector
f1 of S1 i there is a selector f2 for S2 such that for every sign x of S1 where
M is de ned, then f2 (M (x)) = f1 (x), where
16
Though there is a trick for regarding several of the small Arabic numerals as symbolic.
An Introduction to Algebraic Semiotics 263

(8) a selector for a sign system S is a function f : s ! d, where s is a sign


sort and d a data sort of S , such that there are axioms A0 such that adding
f and A0 to S is consistent and de nes a unique value f (x) for each sign x
of sort s. For example, each parameter of a constructor has a corresponding
selector to extract its value.
2
The intuition for (7) is that content is preserved if there is some way to retrieve
each data value of the source sign from its image in the target sign system; the
de nition of selector in (8) is unfortunately rather technical.
It may be that neither M nor M 0 preserves strictly more than the other; for
example, M might preserve more constructors while M 0 preserves more content.
Also, each of these orderings is itself partial, not total. Still other orderings
on morphisms than those de ned above may be useful for some applications;
for example, special measures may be important at certain levels of some signs
systems, such as phonological complexity (which is the e ort of pronunciation)
for spoken language. In general, speci c \designer orderings" which combine
various preservation properties in a speci c way, may be needed to re ect the
design tradeo s of speci c applications (e.g., see the end of Appendix B). As a
result of this, given sign systems S1 ; S2 , we can assume a partial ordering on the
collection of semiotic morphisms from S1 to S2 , as is needed for the 32 -categories
of Appendix B.
We can see some of the complexities involved in comparing the quality of
representations by considering simple examples where there is not very much
structure to preserve. For example, in the time of day representations, simplic-
ity, uniformity, and precision of the display are important: the naive decimal
representation N lacks uniformity in the size of its \hours"; the strange unary
representation U lacks precision (at least for humans who refuse to count very
carefully) as well as simplicity and (to an extent) uniformity. The representa-
tions that are most straightforward mathematically may not be very close to
the ones we use every day; for example, military time and analog clock time
require mathematically more complex operations for their de nition than do the
decimal and strange unary representations.
Now let's consider the no confusion condition in regard to the cyclicity of
clocks. The military clock M satis es the condition. But for the standard 12 hour
analog clock A, we could say it is only \half satis ed," because one extra bit, as
found on most watches, to indicate \am" or \pm," is necessary and sucient to
avoid confusion. Of course this extra bit is often available just by looking out the
window to see if it's day or night; but if you lived underground for a few weeks
with just a 12 hour clock and no other information (such as radio), you might
well lose track of that one bit. A 6 hour analog clock would only satisfy the no
confusion condition only \one quarter," because two extra bits are needed. A 5
or 7 or 17 hour analog clock would be much worse, because these numbers are
relatively prime to 24. An alternative way to talk about this is to say that a
selector for the number of elapsed minutes from midnight can be de ned that
is not preserved. So although the general rule is that the more preservation the
264 Joseph Goguen

better, sometimes we can recover lost information some other way, and then less
preservation may be better, because it allows for a more compact representation.
For another example, let's consider representing (abstract) texts as strings,
i.e., let's consider semiotic morphisms M : TXT ! STG. The sign system
TXT has sorts for sentences, words, and characters, while the sign system STG
only has sorts for strings and characters. Because characters are a data sort, any
morphism M : TXT ! STG must preserve the sort char, and there is also no
choice about how to map the other sorts of TXT: they must all go to the sort
string. The top level constructor of TXT forms texts by concatenating sen-
tences, while its second level constructor concatenates words to form sentences,
and its third level constructor concatenates characters to form words. Since the
only constructor for STG concatenates characters to form strings, the obvious
thing to do is map each concatenation of TXT to the concatenation of STG.
However, the sign resulting from a text would now be just one huge ugly string
which \mushes" everything together. As we know, it is usual to insert spaces
between words, and a period and two spaces after each sentence. It is easy to
de ne a morphism that does this, though it is more complex than the \mushing"
representation.
Both these morphisms preserve the structure of TXT. But what would it
mean for a morphism M : TXT ! STG not to preserve this structure? There
are many possibilities, including dropping some characters, words, and/or sen-
tences, and permuting them in a random order. Phenomena like these will clearly
produce a low quality display.
Experiments reported in [27] show that preserving high levels is more im-
portant than preserving priorities, which in turn is more important than pre-
serving content. They also show a strong tendency to preserve higher levels at
the expense of lower levels when some structure must be dropped. This may be
surprising, because of emphasis by cognitive psychologists on the \basic level" of
lexical concepts (e.g., Rosch [50, 51]). For natural language, the sentential level
was long considered to be basic, but research like that of [27] shows that the
discourse level is higher in our technical sense, and thus more important. This
is consistent with the important general principle that structure has priority
over content, i.e., form is more important than content (if something must be
sacri ced to limit the complexity of the display).
Much more detailed empirical work is needed to determine more precisely the
tradeo s among various preservation and other optimality criteria for semiotic
morphisms. At start is being made by assembling a collection of examples of bad
design arising from failures of semiotic morphisms to fully preserve structure in
the \world-famous"17 UC San Diego Semiotic Zoo. Although not all the expla-
nations are available yet, the animals can be visited at any hour of the day or
night, at
http://www.cs.ucsd.edu/users/goguen/zoo/

17
For some reason, the real San Diego Zoo, which really is world famous, almost always
precedes its name with \world-famous," with the hyphen.
An Introduction to Algebraic Semiotics 265

where much additional information (and some bad jokes about zoos) can also be
found. Most of the exhibits there involve color and/or interactive graphics, and
so cannot easily be discussed in this traditional medium of print.
The tatami project at UCSD is applying semiotic morphisms and their order-
ings to design the user interface of a system to supports cooperative distributed
proofs over the world wide web [31, 25]. We found that certain ways we had used
to represent proofs were not semiotic morphisms, which then led us to construct
better representations; we also used semiotic morphisms to determine aspects of
window layout, button location, etc. Details can be found especially in [22, 25],
and of course on the project website
http://www.cs.ucsd.edu/groups/tatami/
which should always have the very latest information.

5 Blending, Ambiguity and Pushouts

Fauconnier and Turner [10, 12] study the \blending" of conceptual spaces, to
obtain new spaces that combine the parts of the input spaces. Blends are common
in natural language, for example, in words like \houseboat" and \roadkill," and
in phrases like \arti cial life" and \computer virus," as well as in metaphors
that have more than one strand (as is usually the case).
The most basic kind of blend may be visualized using the diagram below,
where I1 and I2 are called the inputs, G the generic, and B the blend18 .
More precisely, we de ne a blend of sign systems I1 and I2 over G (using
given semiotic morphisms G ! I1 and G ! I2 ) to be a sign system B with
morphisms I1 ! B , I2 ! B , and G ! B , which are all called injections, such
that the diagram weakly commutes, in the sense that both the compositions
G ! I1 ! B and G ! I2 ! B are weakly equal to the morphism G ! B ,
in the sense that each sign in G gets mapped to the same sign in B under
them, provided that both morphisms are de ned on it19 . It follows that the
compositions G ! I1 ! B and G ! I2 ! B are also weakly equal when G ! B
is totally de ned, but not necessarily otherwise. The special case where all sign
systems are conceptual spaces is called a conceptual blend. In general, we
should expect the morphisms to the blend to preserve as much as possible from
the inputs and generic.
18
The form of this diagram is \upside down" from that used by Fauconnier and Turner,
in that our arrows go up, with the generic G on the bottom, and the blend B on
the top; this is consistent with the metaphor (or \image scheme" [40]) that \up is
more" as well as with conventions for drawing such diagrams in mathematics. Also,
Fauconnier and Turner do not include the map G ! B .
19
Strict commutativity, which is usually called just commutativity, means that the
compositions are strictly equal.
266 Joseph Goguen

B
;; 6I
 @@
; @
I1
; @I
I@
@ ;;
2

@@ ;;
G
Mathematically, it is more perspicuous to think of blending the two morphisms
ai : G ! Ii than the two spaces I1 ; I2 , and for this reason we will sometimes use
the notation a1 3 a2 to stand for an arbitrary blend of a1 and a2 ; this will be
especially helpful in writing formulae for our calculus of blending.
Blends have applications in computer interface design, some of which are
described in [31]. For a simple example, suppose we want to display both tem-
perature and time of day on the same device. This is an example of the product
of sign systems: if TMP is a sign system for temperature; then the sign system
for our device is TOD  TMP. Before giving the technical de nition, let 1 denote
the \trivial" sign system that has only one sort (its top sort) and no operations
(except those for data). Now given sign systems S1 and S2 , their product, de-
noted S1  S2 , is the blend of S1 and S2 over 1 with the obvious (and only)
morphisms 1 ! Si , formed by taking the disjoint union20 of S1 and S2 , and
then identifying their top sorts to get a new sort called the product sort. Both
injections are injective and both triangles strictly commute.
It is not hard to prove some simple properties of product, including the
following, where S; S1 ; S2 ; S3 are arbitrary sign systems,
S1 =S ,
1S  =S ,
S1  S2 
= S2  S1 ,
S1  (S2  S3 ) 
= (S1  S2 )  S3 .
These are only a modest addition to our calculus of representation, but the
notion of product becomes more interesting later on, when extended from sign
systems to representations. Forms of the commutative and identity laws also
hold for blends, and may be written as
a1 3 a2 = a2 3 a1 ,
a 3 1G = a ,
1G 3 a = a ,
20
This involves renaming sorts and operations, if necessary, so that there are no over-
laps except for the data sorts and operations. Thus this blend is a sort of \amalga-
mated sum" of its two inputs (this phrase is used in algebraic topology, among other
places). Due to the duality between theories and models (as formalized in the theory
of institutions [24]), this corresponds to taking products of models.
An Introduction to Algebraic Semiotics 267

where the rst should be read as saying that any blend of a1 ; a2 is also a blend
of a2 ; a1 , and the next two as saying that one blend of any space with its generic
space is the space itself.
Before doing a slightly more complex example in some detail, we generalize
the concept of blend to a labeled graph, with sign systems on its nodes and
morphisms on its edges, such that if e is an edge from n0 to n1 , then the mor-
phism on e has as its source the sign system on n0 and as its target the one
on n1 . We will call this labeled graph the base graph. Some morphisms in the
base graph may be designated as auxiliary21 , indicating that the relationships
that they embody do not need to be preserved. Then a blend for a given base
graph is some sign system, together with a morphism called an injection to it
from each sign system in the graph, such that any triangle of morphisms involv-
ing two injections and one non-auxiliary morphism in the base graph weakly
commutes. The exclusion of auxiliary morphisms is important, because commu-
tativity should not be expected for auxiliary information; this is illustrated in
the example below. The base graph for the basic kind of blend considered at the
beginning of this section has a \V" shape; let us use the term V-blends for this
case. Also, let us call a node in the base graph auxiliary if all morphisms to
and from it in the base graph are auxiliary22.
Appendix B develops the above ideas more precisely, and puts blending in the
rich mathematical framework of category theory, relating V-blends to what are
called \pushouts", and the more general blend of a base graph to what are called
colimits. In addition, Appendix B develops a special kind of category, called a
3 -category, and shows that (what we there call) 3 -pushouts and 3 -colimits give
2 2 2
blends that are \best possible" in a certain precise sense that involves ordering
semiotic morphisms by quality, e.g., that they should be as de ned as possible,
should preserve as many axioms as possible, and should be as inclusive as possible
(see De nition 3).
We now show several ways to blend spaces for the words \house" and \boat";
see Figure 6, in which the generic space is auxiliary. We do not aspire to great ac-
curacy in linguistic modeling here; certainly much more detail could be added to
the various spaces, and some details could be challenged23. Our interest is rather
to illustrate the mathematical machinery introduced in this section with a sim-
ple, intuitive example. The generic space has three constants, object, medium,
and person, plus two relations, on and use. The house input has constants for
house, land, and resident; these are mapped onto by object, medium, and
person from the generic space, respectively; the relations are live-in, and on,
where the rst is mapped onto by use, and where the house is on land. Simi-
21
More technically, it is the edges that are designated as auxiliary, because it is possible
that the same morphism appears on more than one edge, where not all instances of
it are auxiliary.
22
I thank Grigore Rosu for the suggestion to generalize from auxiliary nodes to auxil-
iary edges.
23
This is consistent with our belief that unique best possible theories do not exist for
most real world concepts [21].
268 Joseph Goguen

resident water boat land

live-in on live-in on

hsbt bths

resident land pasngr water

ride on
live-in on

house boat

person medium

use on

object

Fig. 6. Two Di erent Blends of Two Input Spaces

larly, the boat input space has constants for boat, water, and passenger, which
are mapped onto by object, medium, and person, respectively; and it has rela-
tions ride and on, where the rst is mapped onto by use, and where the boat
is on water. In forming a blend, there is a con ict between being on water and
being on land, and for \houseboat", water wins. Here all triangles commute.
The blend for boathouse chooses land instead of water. But the most interest-
ing things to notice about the boathouse blend are that the boat becomes the
resident, and that this leads to a non-commutative triangle of morphisms on the
right side.
There are also some other, more surprising, blends for these two conceptual
spaces: one gives a boat for transporting houses, and another gives an amphibious
house! See Figure 7. The rst blend (to the left in Figure 7) is dual to houseboat:
instead of the boat ending up in the house, the house ends up on the boat; there's
nothing strange about this except that we don't have any established word for
it, and it doesn't correspond to anything in (most people's) experience24 . The
second blend (to the right in Figure 7) is more exotic, since the resulting object
can be either on land or on water, and the user both rides and lives in it. Although
24
But we can easily imagine a construction project on an island where prefabricated
houses are transported by boat.
An Introduction to Algebraic Semiotics 269

no such thing exists in our world now, we can easily imagine some mad engineer
trying to build one. Now it is interesting to see which triangles commute for each
of these, and then to compare the naturalness of each blend with its degree of
commutativity. The left triangle of the rst blend fails to commute (again just
dual to \boathouse"). For the second, although both its triangles commute, the
situation here is actually worse than if they didn't, because the injections fail
to preserve some of the relevant structure, namely the (implicit) negations of
relation instances, such as that the boat is not on land.

house water user


land

water
ride
moves on on
live-in
on

hsbt hsbt

Fig. 7. Two Less Familiar Blends

The above is a good illustration of the very important fact that blends are not
unique. Ambiguity and its resolution are pervasive in natural language under-
standing. A word, phrase or sentence with an \obvious" meaning in one context,
or in isolation, can have a very di erent meaning in another context. What is
amazing is that we resolve ambiguities so e ortlessly that we aren't even aware
that they existed, so that it takes some e ort to discover the other possibilities
that were passed over so easily! For another example, Appendix A constructs a
context in which the old aphorism \Time ies like an arrow" undergoes a drastic
change of meaning, and also gives a formal speci cation of the conceptual spaces
involved, using the OBJ system [28] to compute the blend, parse the sentence,
and then evaluate it to reveal the \meaning". A di erent way to illustrate the
ambiguity of blends can be seen in the beautiful analyses done by Hiraga [37,
36] of haiku by the great Japanese poet Basho; she shows that several di erent
blends coexist for these haiku, and argues that this is a deliberate exploitation
of ambiguity as a poetic device.
Ambiguity also plays an interesting role in so called \oxymorons" (like \mil-
itary intelligence"): these involve two di erent blends of two given words, one of
which has a standard meaning, and the other of which has some kind of con ict
in it. The second meaning only arises because the word \oxymoron" has been
introduced, and this deliberate creation of a surprising ambiguity is what makes
these a form of humor. For \military intelligence" the standard meaning is an
agency that gathers intelligence (i.e., information, especially secret information)
for military purposes, while the second, con ictual meaning is something like
\stupid smartness", playing o the common (but incorrect) prejudice that the
military are stupid, plus the more usual meaning of intelligence. A lot of hu-
270 Joseph Goguen

mor seems to have a similar character: an informal survey of cartoons in the


local newspaper found that more than half of the intendedly humorous cartoons
achieved their e ect by recontextualization, through blending a given concep-
tual space with some new conceptual space, to give some parts of the old one
surprising new meanings.
Semiotic morphisms can also arise when signs should have some additional
structure in order to be considered \good". For example, typical recent Holly-
wood movies have a three act structure with two speci c \plot points" that move
the action from one act into the next; let's call this the \Syd Field" structure
after an author who advocates it [13]. Blending this with the \ lm medium"
structure (consisting of shots, scenes, angles, etc.) gives a precise sign system
that can help with understanding certain aspects of lms. (This is a rather dif-
ferent approach to applying semiotics to cinema than that of the large literature
found in the semiotics of lm, e.g., [9], but is still compatible with it.)
Now let's consider products of representations. For example, we might have
reprepresentations M1 and M2 for time and of temperature that we want to use
to realize the sign system TOD  TMP, where say M1 : TOD ! S1 and M2 : TMP !
S2 . Then what we want is a semiotic morphisms M1  M2 : TOD  TMP ! S1  S2 ,
de ned to be M1 on TOD and M2 on TMP, except that the product sort of each
source maps to that of its target. We can now prove the following laws, analogous
to those for products of sign systems, where M; M1 ; M2 ; M3; M4 are semiotic
morphisms, and 1 now denotes the identity semiotic morphism on the trivial
sign system 1, and where  = now refers to a consistent family of isomorphisms,
one for each choice of the morphisms involved25:
M 1 =M
1M  =M
M1  M2  = M2  M1
M1  (M2  M3 )  = (M1  M2)  M3
(M1  M2 ); (M3  M4)  = (M1 ; M3 )  (M2 ; M4 ) .
It is clear that a great deal more could be done along these lines, for example,
giving laws for the more general forms of blending. This introductory paper does
not seem the right place for such a development, but a few more laws are found
in Appendix B.
A traditional view is that a metaphor is a mapping from one cognitive space
to another, as in the formalization of Gentner [14]. However, the work of Fau-
connier and Turner [12] suggests a di erent view, in which the existence of such
a \cross space mapping" between two input spaces is a special asymmetric con-
dition that may occur if one input space dominates the other in the blend. In
general, there may be more than two input spaces, and the information about
links between the contents of these spaces is distributed among the injections in
a complex way that cannot be summarized in any single map.
25
In the technical language of category theory, they are natural isomorphisms.
An Introduction to Algebraic Semiotics 271

6 Discussion
This paper has introduced algebraic semiotics, a new approach to user inter-
face design, cognitive linguistics, and other areas, based on a notion of sign
allowing complex hierarchical structure, thus elaborating Saussure's insight that
signs come in systems. Representations are mappings, or morphisms, between
sign systems, and a user interface is considered a representation of the underlying
functionality to which it provides access. This motivates a calculus for combining
signs, sign systems, and representations. One important mode of composition is
blending, introduced by Fauconnier and Turner, which is related to certain con-
cepts from category theory. The main contribution of this paper is the precision
that its approach can bring to applications. Building on an insight from com-
puter science, that discrete structures can be described by algebraic theories,
sign systems are de ned as algebraic theories with some extra structure, and
semiotic morphisms are de ned as mappings of algebraic theories that pre-
serve the extra structure to some extent; the quality of representations was found
to correlate with the degree to which structure is preserved.
When one sees concrete examples of sign systems like graphical user inter-
faces, it is easy to believe that these sign systems \really exist". It is amazing
how quickly and easily we see signs as actually existing with all their structure
\out there" in the \real world". Nevertheless, what \really exists" (in the sense
of physics) are the photons coming o the screen; the structure that we see is our
own construction. This paper provides a way to describe and study perceived
regularities, as modeled by sign systems, without claiming that these regularities
correspond to real objects, let alone that best possible descriptions exist for any
given phenomenon. This is consistent with ordinary engineering practice, which
constructs models for bridges, aircraft wings, audio ampli ers, etc. that are good
enough for the practical purpose at hand, without claiming that the models are
the reality, and indeed, with a deep awareness, based on practical experience,
that the models are de nitely not adequate in certain respects, some known and
some unknown26. Another advantage of our approach is that it enables us to
avoid a lot of distracting philosophical problems, e.g., having to do with the
doctrine of realism.
The use of morphisms of theories for representations instead of morphisms of
models relates to the above point, in that we tend to think of models as nally
grounding the representation process in something \real", whereas morphisms
never claim more than to be re-representations, which may add more detail, but
do not exhaust all of the possibilities for description.
William Burroughs said language is a virus [7], meaning (for example) that
peculiarities of accent, vocabulary, attitude, disposition, confusion, neurosis, etc.
are contagious, and tend to spread within communities. Mikhael Bakhtin [3] em-
phasized that language is never a single homogeneous system, using the word
\heteroglossia". Paraphrasing Burroughs in the light of Bakhtin, we might say
26
For example, Hook's law for the length of a spring as a function of the weight it is
holding, fails if the weight is too heavy, because the spring will be damaged.
272 Joseph Goguen

that language is an ecology of interacting viruses. So despite our use of formal


mathematical description techniques, we should not expect such a realm to be
characterized by formal modernist order, but rather to exhibit multiple species
of interacting chaotic evolution: signs and interpretations are co-evolving co-
emergent social phenomena that are too complex to be fully described; order
appears in our multiple, partial descriptions, and such descriptions are what
we can formalize. But these descriptions should never be confused with \real-
ity". In contrast with situation theory [5], we do not consider that signs and
representations are pre-existing residents of some Platonic heaven, but instead
claim that they arise in a context of social interaction. (Further philosophical
discussion related to this appears in Appendix C, where in brief, we nd that
realism is dicult to reconcile with the practice of engineering design, and that
phenomenology is more congenial.)
The dynamic aspect of sign systems that emerges from the above discussion
brings out an important limitation of the formal apparatus introduced in this
paper: it does not address the history-sensitive aspects that are needed for many
applications to user interface design. For example, most text editors have an
undo command that takes one back to the state before the last command was
executed. By a fortunate coincidence, a recent advance in algebraic semantics
called hidden algebra [29, 30, 18] provides exactly the technical apparatus that is
needed for this extension, by using hidden sorts for internal states. The extension
is actually very straightforward mathematically, but to develop the methodology
for its application will require some further work.
There are many other dynamic aspects of sign systems. Real world sign sys-
tems evolve; for example, in natural languages, words change their meanings,
new words are added, old words disappear, syntax changes, and of course the
huge contextual background changes, as social experience changes. In yet another
important kind of dynamics, a listener or reader (or \user") constructs meanings
dynamically and incrementally, in real time. How this happens is a very di-
cult problem, about which little information is directly available. It is however
clear that no simple algorithm based on just the structure of the sign systems
involved can be used compute meanings, because even for the simple blend of
two conceptual spaces, selection among the manifold possibilities is governed by
external contextual factors in a very complex way, crucially including the val-
ues of the person doing the blend. Moreover, the perception that the blend is in
some way coherent seems to be at least as important as any of the more mechan-
ical measures of optimality. This paper does not attempt to solve these dicult
problems, but only the simpler problem of providing a precise language for de-
scribing structural aspects of particular understandings. In a dynamic context,
these static descriptions will be snapshots of evolving structures.
Another area that needs further work is \higher order" signi cation, which
concerns explicit references to meaning; one approach to this problem is to pro-
vide some form of \meta-spaces." Meaning is one of the deepest and most dicult
of all subjects, and it should not be thought that the explorations in the present
paper are more than early steps down one particular path into a great jungle.
An Introduction to Algebraic Semiotics 273

References
1. William P. Alston. Sign and symbol. In Paul Edwards, editor, Encyclopaedia of
Philosophy, Volume 7, pages 437{441. Macmillan, Free Press, 1967. In 8 volumes;
republished 1972 in 4 books.
2. Peter B. Andersen. Dynamic logic. Kodikas, 18(4):249{275, 1995.
3. Mikhail Bakhtin. The Dialogic Imagination: Four Essays. University of Texas at
Austin, 1981.
4. Roland Barthes. S/Z: An Essay and Attitudes. Hill and Wang, 1974. Trans.
Richard Miller.
5. Jon Barwise and John Perry. Situations and Attitudes. MIT (Bradford), 1983.
6. John Bowers. The politics of formalism. In Martin Lea, editor, Contexts of
Computer-Mediated Communication. Harvester Wheatsheaf, 1992.
7. William S. Burroughs. The Adding Machine: Selected Essays. Arcade, 1986.
8. John Carroll. Learning, using, and designing lenames and command paradigms.
Behavior and Information Technology, 1(4):327{246, 1982.
9. Alain Cohen. Blade Runner: Aesthetics of agonistics and the law of response. Il
Cannocchiale, 3:43{58, 1996.
10. Gilles Fauconnier and Mark Turner. Conceptual projection and middle spaces.
Technical Report 9401, University of California at San Diego, 1994. Dept. of
Cognitive Science.
11. Gilles Fauconnier and Mark Turner. Blending as a central process of grammar. In
Adele E. Goldberg, editor, Conceptual Structure, Discourse and Language, pages
113{129. CSLI, 1996.
12. Gilles Fauconnier and Mark Turner. Conceptual integration networks. Cognitive
Science, 22(2):133{187, 1998.
13. Syd Field. Screenplay: The Foundations of Screenwriting. Dell, 1982. Third edition.
14. Deidre Gentner. Structure-mapping: A theoretical framework for analogy. Cogni-
tive Science, 7(2):155{170, 1983.
15. Joseph Goguen. Semantics of computation. In Ernest Manes, editor, Proceedings,
First International Symposium on Category Theory Applied to Computation and
Control, pages 151{163. Springer, 1975. (San Fransisco, February 1974.) Lecture
Notes in Computer Science, Volume 25.
16. Joseph Goguen. What is uni cation? A categorical view of substitution, equa-
tion and solution. In Maurice Nivat and Hassan At-Kaci, editors, Resolution of
Equations in Algebraic Structures, Volume 1: Algebraic Techniques, pages 217{261.
Academic, 1989.
17. Joseph Goguen. A categorical manifesto. Mathematical Structures in Computer
Science, 1(1):49{67, March 1991.
18. Joseph Goguen. Types as theories. In George Michael Reed, Andrew William
Roscoe, and Ralph F. Wachter, editors, Topology and Category Theory in Computer
Science, pages 357{390. Oxford, 1991. Proceedings of a Conference held at Oxford,
June 1989.
19. Joseph Goguen. On notation (a sketch of the paper). In Boris Magnus-
son, Bertrand Meyer, and Jean-Francois Perrot, editors, TOOLS 10: Tech-
nology of Object-Oriented Languages and Systems, pages 5{10. Prentice-
Hall, 1993. The extended version of this paper may be obtained from
http://www.cs.ucsd.edu/users/goguen/ps/notn.ps.gz.
20. Joseph Goguen. Requirements engineering as the reconciliation of social and tech-
nical issues. In Marina Jirotka and Joseph Goguen, editors, Requirements Engi-
neering: Social and Technical Issues, pages 165{200. Academic, 1994.
274 Joseph Goguen

21. Joseph Goguen. Towards a social, ethical theory of information. In Geo rey
Bowker, Leigh Star, William Turner, and Les Gasser, editors, Social Science, Tech-
nical Systems and Cooperative Work: Beyond the Great Divide, pages 27{56. Erl-
baum, 1997.
22. Joseph Goguen. Social and semiotic analyses for theorem prover user interface
design, submitted for publication 1998.
23. Joseph Goguen. Theorem Proving and Algebra. MIT, to appear.
24. Joseph Goguen and Rod Burstall. Institutions: Abstract model theory for speci-
cation and programming. Journal of the Association for Computing Machinery,
39(1):95{146, January 1992.
25. Joseph Goguen, Kai Lin, Akira Mori, Grigore Rosu, and Akiyoshi Sato. Dis-
tributed cooperative formal methods tools. In Michael Lowry, editor, Proceedings,
Automated Software Engineering, pages 55{62. IEEE, 1997.
26. Joseph Goguen, Kai Lin, Akira Mori, Grigore Rosu, and Akiyoshi Sato. Tools for
distributed cooperative design and validation. In Proceedings, CafeOBJ Sympo-
sium. Japan Advanced Institute for Science and Technology, 1998. Nomuzu, Japan,
April 1998.
27. Joseph Goguen and Charlotte Linde. Optimal structures for multi-media instruc-
tion. Technical report, SRI International, 1984. To Oce of Naval Research,
Psychological Sciences Division.
28. Joseph Goguen and Grant Malcolm. Algebraic Semantics of Imperative Programs.
MIT, 1996.
29. Joseph Goguen and Grant Malcolm. A hidden agenda. Technical Report CS97{
538, UCSD, Dept. Computer Science & Eng., May 1997. To appear in special issue
of Theoretical Computer Science on Algebraic Engineering, edited by Chrystopher
Nehaniv and Masamo Ito. Early abstract in Proc., Conf. Intelligent Systems: A
Semiotic Perspective, Vol. I, ed. J. Albus, A. Meystel and R. Quintero, Nat. Inst.
Science & Technology (Gaithersberg MD, 20{23 October 1996), pages 159{167.
30. Joseph Goguen and Grant Malcolm. Hidden coinduction: Behavioral correctness
proofs for objects. Mathematical Structures in Computer Science, to appear 1999.
31. Joseph Goguen, Akira Mori, and Kai Lin. Algebraic semiotics, ProofWebs and dis-
tributed cooperative proving. In Yves Bartot, editor, Proceedings, User Interfaces
for Theorem Provers, pages 25{34. INRIA, 1997. (Sophia Antipolis, 1{2 September
1997).
32. Joseph Goguen, James Weiner, and Charlotte Linde. Reasoning and natural ex-
planation. International Journal of Man-Machine Studies, 19:521{559, 1983.
33. Robert Goldblatt. Topoi, the Categorial Analysis of Logic. North-Holland, 1979.
34. Martin Heidegger. Being and Time. Blackwell, 1962. Translated by John Mac-
quarrie and Edward Robinson from Sein und Zeit, Niemeyer, 1927.
35. Masako K. Hiraga. Diagrams and metaphors: Iconic aspects in language. Journal
of Pragmatics, 22:5{21, 1994.
36. Masako K. Hiraga. Rough seas and the milky way: `Blending' in a haiku text.
In Plenary Working Papers in Computation for Metaphors, Analogy and Agents,
pages 17{23. University of Aizu, 1998. Technical Report 98-1-005, Graduate School
of Computer Science and Engineering.
37. Masako K. Hiraga. `Blending' and an interpretation of haiku : A cognitive approach.
Poetics Today, to appear 1998.
38. Marina Jirotka and Joseph Goguen. Requirements Engineering: Social and Tech-
nical Issues. Academic, 1994.
39. William Labov. The transformation of experience in narrative syntax. In Language
in the Inner City, pages 354{396. University of Pennsylvania, 1972.
An Introduction to Algebraic Semiotics 275

40. George Lako and Mark Johnson. Metaphors We Live By. Chicago, 1980.
41. Saunders Mac Lane. Categories for the Working Mathematician. Springer, 1971.
42. Bruno Latour. Science in Action. Open, 1987.
43. Bruno Latour. Aramis, or the Love of Technology. Harvard, 1996.
44. John Lechte. Fifty Key Contemporary Thinkers. Routledge, 1994.
45. Eric Livingston. The Ethnomethodology of Mathematics. Routledge & Kegan Paul,
1987.
46. Grant Malcolm and Joseph Goguen. Signs and representations: Semiotics for user
interface design. In Ray Paton and Irene Nielson, editors, Visual Representations
and Interpretations. Springer Workshops in Computing, 1998. Proceedings of an
international workshop held in Liverpool.
47. Jose Meseguer and Joseph Goguen. Initiality, induction and computability. In
Maurice Nivat and John Reynolds, editors, Algebraic Methods in Semantics, pages
459{541. Cambridge, 1985.
48. Donald A. Norman. The Design of Everyday Things. Doubleday, 1988.
49. Charles Saunders Peirce. Collected Papers. Harvard, 1965. In 6 volumes; see
especially Volume 2: Elements of Logic.
50. Eleanor Rosch. On the internal structure of perceptual and semantic categories.
In T.M. Moore, editor, Cognitive Development and the Acquisition of Language.
Academic, 1973.
51. Eleanor Rosch. Cognitive reference points. Cognitive Psychology, 7, 1975.
52. Harvey Sacks. On the analyzability of stories by children. In John Gumpertz and
Del Hymes, editors, Directions in Sociolinguistics, pages 325{345. Holt, Rinehart
and Winston, 1972.
53. Harvey Sacks. Lectures on Conversation. Blackwell, 1992. Edited by Gail Je erson.
54. Ferdinand de Saussure. Course in General Linguistics. Duckworth, 1976. Trans-
lated by Roy Harris.
55. Ben Shneiderman. Designing the User Interface. Addison Wesley, 1997.
56. Susan Leigh Star. The structure of ill-structured solutions: Boundary objects and
heterogeneous problem-solving. In Les Gasser and Michael Huhns, editors, Dis-
tributed Arti cial Intelligence, volume 2, pages 37{54. Pitman, 1989.
57. Lucy Suchman. Plans and Situated Actions: The Problem of Human-machine Com-
munication. Cambridge, 1987.
58. Mark Turner. The Literary Mind. Oxford, 1997.

A Two Examples in OBJ3


Consider the following \science ction" fragment, which constructs a context in
which the old aphorism \Time ies like an arrow" undergoes a drastic change of
meaning:
A gravity kink forced the ship enough o course that realtime was needed
to calculate corrections. Taking realtime in a wormhole creates a local
space-time vector, and time ies were already buzzing about, making the
corrections even harder. \They hang onto any vector they can nd out
here," Randi said. \Time ies like an arrow. We may never get out."
Here the original verb \ ies" becomes the subject; the original subject \time"
now modi es \ ies"; the preposition \like" becomes the verb; and \arrow" be-
comes the object of \like". The only word that doesn't change its syntactic role
276 Joseph Goguen

is the lowly article \an"! How does this happen? The \local space-time vector"
(whatever that is) prepares the reader for \an arrow", and then \time ies" are
introduced explicitly. These two conceptual spaces blend into another, where our
sentence gets its new interpretation; they share a subspace where a ship takes
realtime in a wormhole.
We describe these three conceptual spaces, form a blend, and then parse and
evaluate our sentence using the OBJ language (for more on OBJ and its under-
lying theory, see [28]), which is especially suitable because of its rich facilities for
combining theories. The keyword pair th...endth delimits OBJ modules that
introduce \theories" which allow any model that satis es the axioms. The two
\pr SHIP" lines indicate importation of the theory SHIP in such a way that it is
shared; + tells OBJ to form a blend (which is actually their colimit in the sense of
Appendix B below), which is then named POUT as part of the make...endm con-
struct, which just builds and names a module. Predicates appear as Bool(ean)
valued functions. Finally, red tells OBJ to parse what follows, apply equations
as left to right rewrite rules, and then print the nal result (if there is one):
th SHIP is sort Thing .
ops (the ship) wormhole vector : -> Thing .
op _in_ : Thing Thing -> Bool .
op _makes_ : Thing Thing -> Bool .
eq the ship in wormhole = true .
var X : Thing .
cq X makes vector = true if X in wormhole .
endth

th FLIES is pr SHIP .
op time flies : -> Thing .
ops (_like_)(_buzz around_) :
Thing Thing -> Bool .
eq time flies buzz around the ship = true .
var X : Thing .
cq time flies like X = true if X == vector .
endth

th ARROW is pr SHIP .
op an arrow : -> Thing .
eq an arrow = vector .
endth

make POUT is FLIES + ARROW . endm

red the ship makes an arrow .


red time flies like an arrow .

Of course, as an understanding of the text, this formal system is grossly over-


simpli ed; however, it is precise. These sign systems have only two levels (for
\words" and \sentences"), no priority, and few signs; the morphisms are just
inclusions. Probably the hardest part, that an arrow is a vector, has been just
An Introduction to Algebraic Semiotics 277

posited, because OBJ cannot do this kind of \selection" process, although it is


well suited for de ning and blending sign systems, and for parsing and evaluating
expressions. Here is OBJ's output from the above:
\|||||||||||||||||/
--- Welcome to OBJ3 ---
/|||||||||||||||||\
OBJ3 version 2.04 built 1994 Feb 28 Mon 15:07:40
Copyright 1988,1989,1991 SRI International
1997 Jan 18 Sat 22:25:11
OBJ>
==========================================
obj SHIP
==========================================
obj FLIES
==========================================
obj ARROW
==========================================
make POUT
==========================================
reduce in POUT : the ship makes an arrow
rewrites: 3
result Bool: true
==========================================
reduce in POUT : time flies like an arrow
rewrites: 3
result Bool: true
OBJ> Bye.

This shows OBJ parsing both sentences and then \understanding" that they are
\true"; note that neither sentence parses outside the blend. I hope the reader
is as pleased as the author at how easy27 all this is. Of course, we could get the
usual understanding of the sentence by evaluating it in a di erent context.
We now consider a somewhat more complex example, a proof that one
metaphor is better than another, under certain assumptions. The assumptions
are given in the ve theories, the metaphors in the two views, and the proof in the
four reductions. The rst metaphor, \The internet is an information tornado,"
comes from a press release from the Federal Communications Commission, while
the second, \The internet is an information volcano," comes from a poster that
the author of this paper prepared for a course on material in this paper at UCSD.
The keyword \us" (from \using") indicates importation by copying rather than
sharing, and *(op A to B) indicates a renaming of the operation A to become
B.

th COMMON is
27
It took about 15 minutes to write the code, and less than a second for OBJ to
process it, most of which is spent on input-output, rather than on processing the
various declarations and doing the 6 applications of rewrite rules.
278 Joseph Goguen

sorts Agent Effect .


op effect : Agent Agent -> Effect .
ops hurt nil helped : -> Effect .
endth

th PROCESS is us COMMON .
sort Volume .
ops subject process : -> Agent .
op flow : Agent Agent -> Volume .
ops low medium high huge : -> Volume .
endth

th INTERNET is us (PROCESS *(op subject to user)


*(op process to internet)).
eq flow(internet,user) = huge .
eq flow(user,internet) = low .
eq effect(internet,user) = hurt .
endth

th VOLCANO is us (PROCESS *(op subject to victim)


*(op process to volcano)).
eq flow(volcano,victim) = huge .
eq flow(victim,volcano) = low .
eq effect(volcano,victim) = hurt .
endth

th TORNADO is us (PROCESS *(op subject to victim)


*(op process to tornado)).
eq flow(tornado,victim) = low .
eq flow(victim,tornado) = huge .
eq effect(tornado,victim) = hurt .
endth

*** The internet is an information tornado.


view TORNADO from TORNADO to INTERNET is
op victim to user .
op tornado to internet .
endv

th TESTT is us (TORNADO + INTERNET). endth


red flow(victim,tornado) == flow(user,internet).
red flow(tornado,victim) == flow(internet,user).

*** The internet is an information volcano.


view VOLCANO from VOLCANO to INTERNET is
op victim to user .
op volcano to internet .
endv

th TESTV is us (VOLCANO + INTERNET). endth


An Introduction to Algebraic Semiotics 279

red flow(victim,volcano) == flow(user,internet).


red flow(volcano,victim) == flow(internet,user).

The OBJ3 output from this shows that the rst two reductions give false and
the second two give true. This means that the rst semiotic morphism does
not preserve the axioms (which concern the ow of material between the user
and the object, either tornado or volcano), while the second morphism does,
which implies that the second metaphor is better than the rst with respect to
preserving these axioms. (On the other hand, the tornado metaphor resonates
with many common phrases such as \winds of change," which are part of our
culture, whereas we have less collective experience and associated language for
volcanos.)

B Categories, Blends, Pushouts, 32 -Categories and


2 -Pushouts
3

Although this appendix is written under the assumption that readers already
know some basic category theory28 , it is nonetheless essentially self-contained,
though terse, in order to x notation for the new material. The essential intuition
behind categories is that they capture mathematical structures; for example,
sets, groups, vector spaces, and automata, along with their structure preserving
morphisms, each form a category, and their morphisms are an essential part of
the picture.
De nition 4: A category C consists of: a collection, denoted jCj, of objects;
for each pair A; B of objects, a set C(A; B ) of morphisms (also called arrows
or maps) from A to B ; for each object A, a morphism 1A from A to A called the
identity at A; and for each three objects A; B; C , an operation called compo-
sition, C(A; B)  C(B; C ) ! C(A; C ) denoted \;" such that f ; (g; h) = (f ; g); h
and f ; 1A = f and 1A ; g = g whenever these compositions are de ned. We write
f : A ! B when f 2 C(A; B ), and call A the source and B the target of f . 2
Results in the body of this paper show that sign systems with semiotic mor-
phisms form a category. We will review the notions of pushout, cone and col-
imit for ordinary categories, relate this to blending, and then consider the more
general setting of 32 -categories, which captures more of the phenomenology of
blending.
The intuition for colimits is that they put some components together, iden-
tifying as little as possible, with nothing left over, and with nothing essentially
new added [17]. This suggests that colimits should give some kind of optimal
blend. We will see that there are problems with this, so that the traditional
categorical notions are not quite appropriate for blending. Nevertheless, they
provide a good place to begin our journey of formalization.
28
See [33, 16, 17] for relatively gentle introductions to some basic ideas of category
theory; there are also many many other papers and many other books.
280 Joseph Goguen

De nition 5: Given a category C, a V in C is a pair ai : G ! Ii (i = 1; 2) of


morphisms, and a cone with apex B over a V a1 ; a2 is a pair bi : Ii ! B (i =
1; 2) of morphisms; then a1 ; a2 and b1 ; b2 together are said to form a diamond
(or a square). The cone (or its diamond) commutes i a1 ; b1 = a2 ; b2 , and is
a pushout i given any other commutative cone ci : Ii ! C over a1 ; a2 , there
is a unique arrow u : B ! C such that bi ; u = ci for i = 1; 2.
A diagram D in a category C is a directed graph with its nodes labeled by
objects from C and its edges labeled by arrows from C, such that if an arrow
f : Di ! Dj labels an edge e : i ! j , then the source node i of e is labeled by
Di and the target node j of e is labeled by Dj . A cone over D is an object B ,
called its apex, together with an arrow bi : Di ! B , called an injection, from
each object of D to B , and is commutative i for each f : Di ! Dj in D, we
have29 bi = f ; bj . A colimit of D is a commutative cone bi : Di ! B over D
such that if ci : Di ! C is any other commutative cone over D, then there is a
unique u : B ! C such that30 bi ; u = ci for all nodes i of D. 2
Pushouts are the special case of colimits where the diagram is a V. However
there seems to be a discrepancy in the de nitions, because pushouts are not
required to have an arrow G ! B . But when the diagram is a V, this missing
arrow is automatically provided by the morphism a1 ; b1 = a2 ; b2 .
There is a short proof that any two colimits of a diagram D are isomorphic.
Let the cones be bi : Di ! B and b0i : Di ! B 0 . Then there are unique arrows
u : B ! B 0 and v : B 0 ! B satisfying the appropriate triangles, and there are
also unique arrows B ! B and B 0 ! B 0 satisfying their appropriate triangles,
namely the respective identities 1B and 1B ; but u; v and v; u also satisfy the
0

same triangles; so by uniqueness, u; v = 1B and v; u = 1B . 0

Following the suggestion of Section 5 that blends are commutative cones,


it follows that colimits should be some kind of optimal blend. For example,
the \houseboat" blend of \house" and \boat" is a colimit. But the fact that
colimits are only determined up to isomorphism seems inconsistent with this,
because the names attached to the elements in a blend are important; that
is, isomorphic cones do not represent the same blend. This di ers from the
situation in group theory or topology, where it is enough to characterize an object
up to isomorphism. However the requirement (also motivated by the examples
in Section 5) that the injections should be inclusions to as great an extent as
possible, causes the actual names of elements to be captured by blends, and thus
eliminates the apparent inconsistency.
Another problem with de ning blends to be commutative cones is that, as
shown in Section 5, not all blends actually have fully commutative cones; for
\house" and \boat", only the \houseboat" blend has all its triangles commu-
tative. But as suggested there, the notion of auxiliary morphism solves this
problem. The auxiliary morphisms in D are those whose triangles are not
required to commute; these morphisms can be removed from D, to yield another
29
These equations are called triangles below, after the corresponding three node com-
mutative diagrams.
30
These equations may also be called \triangles" below.
An Introduction to Algebraic Semiotics 281

diagram D0 having the same nodes as D. Commutative cones over D0 are then
cones over D that commute except possibly over the auxiliary morphisms. Now
we can also form a colimit of D0 , to get a \best possible" such cone over D. It
therefore makes sense to de ne a blend to be a commutative cone over a diagram
with the auxiliary morphisms removed.
One advantage of formalization is that it makes it possible to prove general
laws, in this case, laws about blends based on general results from category
theory, such as that \the pushout of a pushout is a pushout." This result suggests
proving that \the blend of a blend is a blend," so that compositionality of the
kind of optimal blends given by pushouts follows from the above quoted result
about pushouts. The meaning of these assertions will be clearer if we refer to
the following diagram:
?  _❅ ❅
c2     ❅ ❅ c3
❅❅

 ❅
 _❅ ❅ ?
 _❅ ❅
❅❅   ❅ ❅ b3
a1 ❅ ❅ ❅   b2
❅❅

 _❅ ❅ ?
❅❅  
a2 ❅ ❅ ❅ 
  a3

Here we assume that b2 ; b3 is a blend of a2 ; a3 , and c2 ; c3 is a blend of a1 ; b2 , i.e.,
that a2 ; b2 = a3 ; b3 and a1 ; c2 = b2 ; c3 ; then the claim is that c2 ; b3; c3 is a blend
of a2 ; a1 ; a3 , which follows because a2 ; a1 ; c2 = a3 ; b3 ; c3 . Using the notation
a2 3 a3 for an arbitrary blend of a2 ; a3 , we can write this result rather nicely in
the form
a1 3 (a2 3 a3 ) = (a2 ; a1 ) 3 a3 ,
taking advantage of a convention that a1 3 (a2 3 a3 ) indicates blending a1 with
the left injection of (a2 3 a3 ) (the top left edge of its diamond).
The pushout composition result (proved e.g. in [33, 41]) states that if b2 ; b3 is
a pushout of a2 ; a3 , and c2 ; c3 is a pushout of a1 ; b2, then c2 ; b3 ; c3 is a pushout
of a2 ; a1 ; a3 . If we write a2 ./ a3 for the pushout of a2 ; a3 , then this result can
also be written neatly, as
a1 ./ (a2 ./ a3 ) = (a2 ; a1 ) ./ a3 .
We can also place a second blend (or pushout) on top of b3 instead of b2 ;
corresponding results then follow by symmetry, and after some renaming of
arrows can be written as follows:
(a1 3 a2 ) 3 a3 = a1 3 (a2 ; a3 ) .
(a1 ./ a2 ) ./ a3 = a1 ./ (a2 ; a3 ) .
We can further generalize to any pattern of diamonds: if they all commute, then
so does the outside gure; and if they are all pushouts, then so is the outside
282 Joseph Goguen

gure. Another very general result from category theory says that the colimit
of any connected diagram can be built from pushouts of its parts. Taken all
together, these results give a good deal of calculational power for blending.
Now it's time to broaden our framework. The category of sign systems with
semiotic morphisms has some additional structure over that of a category: it is
an ordered category, because of the orderings by quality of representation that
can be put on its morphisms. This extra structure gives a richer framework
for considering blends; I believe this approach captures what Fauconnier and
Turner have called \emergent" structure, without needing any other machinery.
Moreover, all the usual categorical compositionality results about pushouts and
colimits extend to 32 -categories.
De nition 6: A 32 -category31 is a category C such that each set C(A; B) is
partially ordered, composition preserves the orderings, and identities are maxi-
mal. 2
Because we are concerned here with ordered categories, a somewhat di erent
notion of pushout is appropriate, and for this notion, the uniqueness property is
(fortunately!) lost:
De nition 7: Given a V, ai : G ! Ii (i = 1; 2) in a 32 -category C, a cone b1; b2
over a1 ; a2 is consistent i there exists some d : G ! B such that a1 ; b1  d
and a2 ; b2  d, and is a 32 -pushout i given any consistent cone ci : Ii ! C
over a1 ; a2 , the set
fh : B ! C j b1 ; h  c1 and b2 ; h  c2 g
has a maximum element. 2
Proposition 8: The composition of two 32 -pushouts is also a 32 -pushout.
Proof: Let b1; b2 be a 32 -pushout3 of a1; a2, and let c1; c2 be a 32 -pushout of a3; b1;
we will show that c1 b2; c2 is a 2 -pushout of a1 ; a3 ; a2 .
A

O ]
c

h
d1
 h
 @@@
?

d2 _

c @@
c2 @@
 1
@ @
@@
_

 @@@
? _

@@  @
a3 @@ b1 b2 @@

@ 
@@ 
_

 ?

@ 
a1 @@@ a2


31
In the literature, similar structures have been called \one and a half" categories,
because they are half way between ordinary (\one dimensional") categories and the
more general \two (dimensional)" categories.
An Introduction to Algebraic Semiotics 283

Suppose d1 ; d2 together with a1 ; a3 and a2 form a consistent diamond. Then


a3 ; d1 and d2 with a1 ; a3 also form a consistent diamond, and because b1 ; b2 is a 32 -
pushout for a1 ; a2 , the set fg j b1 ; g  a3 ; d1 ; b2 ; g  d2 g has a maximum element,
which we denote h. Note that d1 ; h with a3 ; b1 form a consistent diamond. Then
because c1 ; c2 is a 32 -pushout of a3 ; b1 , the set fg j c1 ; g  d1 ; c2 ; g  hg has
a maximum element, which we denote h. We now claim that the following two
sets are equal:
M1 = fg j c1 ; g  d1 and c2 ; g  hg ; and
M2 = fg j c1 ; g  d1 and b2 ; c2 ; g  d2 g:
First let g 2 M1 . Then b2 ; (c2 ; g)  b2 ; h  d2 . Therefore g 2 M2 . Conversely,
suppose g 2 M2 ; then all we have to prove is that c2 ; g  h. Because b2 ; (c2 ; g) 
d2 and b1 ; (c2 ; g) = (a3 ; c1 ); g  a3 ; d1 , and because h is the maximum element
satisfying the inequalities above, we get c2 ; g  h. Therefore M1 = M2 , which
implies they have the same maximum, namely h. 2
However, unlike the situation for ordinary pushouts, the composition of consis-
tent diamonds need not be consistent, and two di erent 32 -pushouts need not be
isomorphic; this means that ambiguity is natural in this setting. The following
is another compositionality result for 32 -pushouts:
Proposition 9: In the diagram below, if the four small squares are 32 -pushouts,
then so is the large outside square.
@
d1 
 ? _

@@ d
@@ 2
 @@

 
c1 
 @@@ c
? _

c  @@@ c ? _

@@2 3  @@4
 @@  @@
 
@ @ 
@@
_

 @@@
?


_ ?

@@  @ 
b1 @@  b2 b3 @@  b4
@ 
@@ _

 ?

@ 
a1 @@@ a2



Proof: Applying Proposition 8 twice gives two 32 -pushouts shown below,


@
d1 

@@ c ;d ? _

 @@4 2
 @@

 
c1 
 @@@ b3 ;c2
? _


?

@@@ 
 
@  b4

@ 
_

@@ 
?

@@ 
a1 ;b1 @@  a2

284 Joseph Goguen

and applying Proposition 8 once more gives us that the big square is a 32 -pushout.
2
Passing from V's to arbitrary diagrams of morphisms generalizes 32 -pushouts
to 3 -colimits,
2 and provides what seems a natural way to blend complex inter-
connections of meanings. The notion of consistent diamond extends naturally to
arbitrary diagrams, as follows:
De nition 10: Let D be a diagram. Then a family f i gi2jDj of morphisms
is D-consistent i a; j  i whenever there is a morphism a : i ! j in D.
Similarly, given J  jDj, we say a family of morphisms f i gi2J is D-consistent
i f i gi2J extends to a D-consistent family f i gi2jDj . 2
Fact 11: A diamond a1 ; a2; b1; b2 is consistent if and only if fb1; b2g is fa1; a2g-
consistent.
Proof: If the diamond is consistent then there is some d such that a1; b1  d
and a2 ; b2  d. But then fb1; b2 ; dg is fa1; a2 g-consistent, i.e., fb1; b2 g is fa1 ; a2 g-
consistent. Conversely, if fb1; b2 g is fa1 ; a2 g-consistent, then some d exists such
that fb1 ; b2; dg is fa1 ; a2 g-consistent, which says exactly that a1 ; b1  d and
a2 ; b2  d, i.e., that the diamond is consistent. 2
De nition 12: Let D be a diagram. Then a family f igi2jDj is a 32 -colimit of
D i it is a cone and for any D-consistent family f i gi2jDj , the set fh j i ; h 
i ; for each i 2 jDjg has a maximum element. 2
The following is another typical result that extends from ordinary colimits
to 32 -colimits:
Theorem 13: Let a W diagram consist of two V's connected at the middle
top. If D is a W diagram, then a 32 -colimit of D is obtained by taking a 32 -pushout
of each V, and then taking a pushout those two pushouts, as shown below.
Proof: Let D contain3 the morphisms a1; a2; a3; a4 , let b1; b2 3be a 32 -pushout of
a1 ; a2 , let b3 ; b4 be a 2 -pushout of a3 ; a4 , and let c1 ; c2 be a 2 -pushout of b2 b3 .
Then we must show that the family of morphisms fb1 ; c1 ; a2 ; b2 ; c1 ; b2 ; c1 ; a3 ; b3 ; c2 ; b4; c2 g
is a 32 -colimit of D.
:

; c

A K O R S ]

 h 6
3

h1  @ h2,
d1   @@ d5
)
? _

 c1 d3 c2@
 @@ '
 
 d2 d4 
 @@@  @@
@
 
? _ ? _

b1 b2@@ b3 b4@@
 @  @

@  
@@
_

 @@@ ? _


?

@  @ a
a1 @@@ a2 a3 @@@ 

4

 
An Introduction to Algebraic Semiotics 285

Let fd1 ; d2 ; d3 ; d4 ; d5 g be a D-consistent family. Then d1 and d3 with a1 ; a2 form


a consistent diamond (because a1 ; d1  d2 and a2 ; d3  d2 ), and because b1 ; b2
is a 32 -pushout, we deduce that there exists h1 (as the maximum of a set of
morphisms) such that b1 ; h1  d1 and b2 ; h1  d3 . Similarly there exists h2
such that b3 ; h2  d3 and b4 ; h2  d5 . Now note that h1 ; h2 with b2 ; b3 give a
consistent diamond (because there is d3 such that b2 ; h1  d3 and b3 ; h2  d3 ).
We next claim that the following two sets are equal:
M1 = fh j c1 ; h  h1 and c2 ; h  h2 g ; and
M2 = fh j (b1 ; c1 ); h  d1 and (b2 ; c1 ); h  d3 and (b4 ; c2 ); h  d5 g:
(The corresponding inequalities for d2 and d4 are omitted from M2 because they
are implicit). First we show M1  M2 . If h 2 M1 then
{ (b1; c1); h = b1; (c1; h)  b1; h1  d1 .
{ (b2; c1); h = b2; (c1; h)  b2; h1  d3 .
{ (b4; c2); h = b4; (c2; h)  b4; h2  d5 .
This implies h 2 M2 . Conversely, if h 2 M2 then b1 ; (c1 ; h)  b1 ; h1  d1 and
b2 ; (c1 ; h)  b2 ; h1  d3 . Then by maximality of h1 , we get c1 ; h  h1 . In a
similar way, we can show c2 ; h  h2 , and thus h 2 M1 . Therefore M1 = M2 ,
which implies that these sets have the same maximum element. 2
Extending our pushout notation ./ to 32 -categories, the above result can be
rather neatly written in the form
(a1 ./ a2 ) ./ (a3 ./ a4 ) = Colim (W ) ,
where W is the bottom part of the diagram, with edges labeled a1 ; a2 ; a3 ; a4 .
A generalization of the above result implies that 32 -pushouts can be used to
compute the 32 -colimit of any connected diagram. Observe that the notion of
auxiliary morphism carries over to the framework of 32 -categories without any
change.
It is natural to use the conditions that morphisms should be as de ned as
possible, should preserve as many axioms as possible, and should be as inclusive
as possible, to de ne a quality ordering (see De nition 3). More precisely, given
morphisms f; g : A ! B between conceptual spaces A; B , let us de ne f  g
i g preserves as much content as f , preserves all axioms that f does, and is as
inclusive as f . Although more work should be done to determine whether this
particular \designer ordering" really works the best for this particular applica-
tion, the situation with respect to our house and boat example from Section 5
really is quite satisfying, in that the most natural blend is an ordinary pushout,
all the other good blends are 32 -pushouts, and various blends that fail to preserve
as much structure as they could are not any kind of pushout.

C Some Philosophical Issues


The research program of which this paper is part, is primarily concerned with
practical applications, and the goal of this paper is to provide some of the theory
that is needed to support such applications. By contrast, most work in semiotics
286 Joseph Goguen

has had a much more philosophical focus. As a result, a great deal of philosophical
discussion could be generated concerning the heretical approach of this paper.
This appendix con nes itself to just a few points that seem to have some practical
signi cance.
Today humanists of nearly all schools reject the notion that some kind of
\Cartesian coordinates" can be imposed on experience, despite partial evidence
to the contrary from elds like linguistics and music. This rejection is understand-
able as a reaction to the scientistic reductionism that nearly always accompanies
projects to impose structure on experience. Such tendencies are deeply ingrained
in Western civilization, going back at least to Pythagoras and Plato. But evi-
dence from a wide range of elds now makes it clear that traditional reductionism
has serious limitations. The following are brief descriptions of some better known
examples:
1. Work on mechanical speech recognition has shown that contextual informa-
tion is essential for determining what phoneme some raw acoustic waveform
represents (if anything); this contextual information may include not just
prior but also subsequent speech, a pro le for the individual speaker (accent,
eccentricities, etc.), the topic of discourse, and much more, up to arbitrary
shared cultural knowledge.
2. In music, the same acoustic event in a di erent context can have a radically
di erent impact, ranging from ugly and incongruous, to great beauty and
elegance. Moreover, the background of the listener is crucial; for example,
naive listeners have little chance of appreciating the subtleties and beauties
of Cecil Taylor or Ornette Coleman, however familiar with theories of psycho-
acoustics and harmony they might be.
3. Similar things happen in cinema and poetry, and indeed any art or craft,
from architecture and interior design, to basket weaving, pottery, and ower
arranging. Often a great deal of cultural context is needed to appreciate
(in any deep sense) a single artifact; buildings, rooms, baskets and pots are
used by ordinary people in their ordinary lives, as part of the complex social
fabric. The \Gucci" label on a purse is not lovely in itself, but nonetheless
it has a meaning to those who go out of their way to acquire it. A brightly
colored postmodern bank building in Lisbon has a complex cultural meaning
that does not transfer to Paris, London, or New York.
4. Despite the stunning success of applying simple atomic theory to basic molec-
ular chemistry, physics has found it necessary to postulate nonlocalized quan-
tum elds to explain many important phenomena, some of which appear even
in applied chemistry, to say nothing of more rare ed areas.
5. Metamathematics has had great success in formalizing mathematics, and in
studying what is provable. But its greatest successes have been results, like
Godel's incompleteness theorem, that demonstrate the limitations of formal-
ization. Moreover, formal proofs lack the comprehensibility, and the human
interest, of well done informal proofs. See Appendix D for more discussion
along these lines, demonstrating the importance of context for making proofs
\come alive."
An Introduction to Algebraic Semiotics 287

Returning now to our main point, there is a justi able opposition to totalizing
reductionist structuralist systems, while at the same time, there is the utterly
pervasive presence of structured signs. What are we to do about this seemingly
contradictory situation?
Two alternatives have been most explored, each with some valuable results.
The rst is to pursue the quest for structure, digging deeper wherever it seems to
work, and avoiding the (very many) areas where things just seem too slippery to
admit much precision. This inevitably results in a partial view, which is open to
criticism in various ways (as post-structuralism has criticized the structuralism
of Saussure, Levi-Strauss, Barthes, etc.). The second alternative is to abandon
structure and work with intuitive experiences and descriptions (some currently
fashionable words are \rich," \nuanced," \textured," and \postmodern"). This
too inevitably results in a partial view, which in the extreme avoids criticism by
refusing to be pinned down, even to the extent of using inconsistent, incoher-
ent language. Through both are extreme positions, it seems dicult to nd a
clear, consistent, defensible middle ground. (A general reference for continental
philosophy is [44].)
It seems to me that ethnomethodology provides some valuable hints on a
way out of this impasse. Often presented as a principled criticism of traditional
sociology, especially its normative category schemes (gender, race, status, etc.),
ethnomethodology can perhaps better be seen positively as an approach to un-
derstanding social phenomena (such as signs!) by seeing how members of some
group come to see those sign as present. Thus, ethnomethodology wants to know
what categories the members of a social group use, and what methods they use
to determine instances of those categories. This requires careful attention to
real social interaction, and avoids the Platonist assumption that the categories
have a pre-given existence \in nature." Rather, we see how members of a group
achieve categorization in actual practice, without having to give these either the
categories or their instances any status other than what has been achieved in
a particular way at a particular time. The branch of ethnomethodology called
conversation analysis has taken a rather radical approach to the social context
of language, showing that even simple features such as whose turn it is to speak
are always negotiated in real time by actual social groups [52, 53], and should
not be considered as given. Words like \rei cation" and \transcendentalizing"
are used to describe approaches that take the opposite view. (Of course, any one
paragraph description of ethnomethodology is necessarily a gross oversimpli ca-
tion; more information may be found in [57] and [21] among many other places,
some of which may be very dicult to read.)
Although this paper is not the place to discuss it, phenomenology has also
been an important in uence on our formulation of a philosophical foundation for
semiotics, particularly in its insistence that the only possible starting point is
the ground of our own actual experience, with all metaphysical principles rmly
bracketed.
The sign, object, interpretant triad of classical semiotics (Peirce, Morris, Eco,
etc.) presupposes an objective world, whereas our morphic semiotics is consistent
288 Joseph Goguen

with the view that mind (usually unconsciously) constructs models by selecting
and blending (abstractions from) immediate and past experience, using (e.g.)
templates derived from embodied motion [40], so that what we see as \objects"
are actually parts of these models. This does not deny that a \world" exists,
but it does deny that we experience it directly. As Heidegger observed, we come
closest to experiencing \reality" when our models break down [34]. Similarly, we
may reinterpret the syntax, semantics, pragmatics triad of classical semiotics,
by claiming that its instances can probably be better understood through the
use of semiotic morphisms.
The above ideas suggest various ways to avoid the extremes of mindless
reductionism and mindless holism. The most straightforward approach is to ad-
mit that while each individual analysis no doubt has biases and limitations, it
nonetheless embodies certain structures, values, insights, etc. A given analysis,
if it is clear, coherent and consistent, can be formalized, and may have some
value as such; for example, its limitations will be easier to spot. Such an anal-
ysis should not pretend to be objective, factual, complete, universal, or even
self-contained; it is a momentary snapshot of a partial understanding of one (or
more) interested party, and of course, can only be understood by other interested
parties who have a more or less comparable background. It has frozen out the
uid processes of interpretation that actually produced the understanding.
The previous paragraph may claim too little, because sometimes analyses
can have great impact, with broad acceptance, important applications, etc., e.g.,
Newtonian mechanics32 . However this paper is not the place to try to understand
why some analyses may work better than others in some given social context.
It is enough for our purposes that analyses exist, exhibit structure, and can be
formalized, without requiring a totalizing, reductionist, or realist stance.

D What is a Proof?
Mathematicians talk of \proofs" as real things. But all we can ever actually nd
in the real world of actual experience are proof events, or \provings", each of
which is a social interaction occurring at a particular time and place, involv-
ing particular people, who have particular skills as members of an appropriate
mathematical social community.
A proof event minimally involves a \proof observer" with the relevant back-
ground and interest, and some mediating physical objects, such as spoken words,
gestures, hand written formulae, 3D models, or printed words, diagrams or for-
mulae. But none of these can be a \proof" by itself, because each must be
interpreted in order to come alive as a proof event.
The ecacy of some proof events depends on the marks that constitute a
diagram being seen to be drawn in a certain order; e.g., Euclidean geometric
proofs, and commutative diagrams in algebra; in some cases, the order may not
32
We should not forget that, according to today's science, Newtonian mechanics, de-
spite its tremendous utility, is not a correct physical theory, but only a practical
approximation that holds within certain (not entirely well speci ed) limits.
An Introduction to Algebraic Semiotics 289

be easily inferred from just the diagram. Therefore we must generalize from proof
objects to proof processes, such as diagrams being drawn, movies being shown,
and Java applets being executed.
Mathematicians habitually and professionally reify, and it seems that what
they call proofs are idealized Platonic \mathematical objects," like numbers,
that cannot be found anywhere on this earth. So let us agree to go along with
this confusion (I almost wrote \joke") and call any object or process a \proof" if
it e ectively mediates a proof event, not forgetting that an appropriate context
is also needed. Then perhaps surprisingly, almost anything can be a proof! For
example, 3 geese joining a group of 7 geese ying north is a proof that 7 + 3
= 10, to an appropriate observer. Peirce's notion of semiosis takes a cognitive
view of examples like this, placing emphasis on a sign having a relation to an
interpretation.
Notice that a proof event can have many di erent outcomes. For a mathe-
matician engaged in proving, the most satisfactory outcome is that all partici-
pants agree that \a proof has been given." Other outcomes may be that most
are more or less convinced, but want to see some further details; or they may
agree that the result is probably true, but believe there are signi cant gaps; or
they may think that the proof is bad and the result is false. And of course, some
observers may be lost or confused. In real provings, outcomes are not always
just `true' or `false'. Moreover, a group of proof observers need not agree among
themselves, in which case there may not be any de nite socially negotiated \out-
come" at all!
Going a little further, the distinction between a proof giver and a proof
observer is often arti cial or problematic; for example, a group of mathematicians
working collaboratively on a proof may argue among themselves about whether
or not some given person has contributed substantively to \the proof". Hence
we should speak of \proof participants", however they happen to be distributed
in space and time, and be aware that the nature of their participation is subject
to social negotiation, like everything else.
The above deconstruction of \proofs" as objectively existing real things is
only the rst part of a more complex story. In addition to a proof object (or
process), certain practices (also called methods) are needed to establish an in-
terpretation of a proof object as a proof event. For example, to interpret the
ying geese as a proof about addition requires a practice of counting. This runs
counter to the tendency, in mathematics as well as in literature and linguistics,
to insist on the \primacy of the text" ignoring the practices required to bring
texts to life, as well as the communities that embody those practices.
In fact, practices and their communities are at least as important as proof
objects; in particular, it is clear that they are indispensable for interpreting some
experience as a proof; if you can't count, then you can't see goose patterns as
proofs, and if you haven't been taught about the numerals `7', `3', `10', then you
can't explain your proof to the decimal digit speaking community. Of course, this
line of thought takes us further from the objective certainties that mathematics
likes to claim, but if we look at the history of mathematics, it is clear that there
290 Joseph Goguen

have been many di erent communities of proving practice; for example, what
we call \mathematical rigor" is a relatively very new viewpoint, and even within
it, there are various competing schools, including formalists, intuitionists and
constructivists, each of which itself has many variants. Moreover, the availability
of calculators and computers is even now once more changing mathematical
practice.
Mathematical logic restricts attention to small sets of simple mechanical
methods, called rules of inference, and claims that all proofs can be constructed
as nite sequences of applications of such rules. While this approach is appro-
priate for foundational studies, and has been interesting and valuable in many
ways, it is far from capturing the great diversity and vital living quality of natural
proofs.
Unfortunately, we lack the detailed studies that would reveal the full richness
of mathematical practice, but it is already clear that proof participants bring a
tremendous variety of resources to bear on proof objects (see [45] for an excellent
discussion). For example, a discussion among a group of mathematicians at a
blackboard will typically involve the integration of writing, drawing, talking
and gesturing in real time multimedia interaction. In at least some cases, this
interaction has a high level \narrative" structure, in which sequentially organized
proof parts are interleaved with evaluation and motivation in complex ways.
Aristotle said \Drama is con ict", meaning that the dramatic interest, or
excitement, of a play comes from con ict, that is, from obstacles and diculties.
Anyone who has done mathematics knows that many diculties arise. But the
way proofs are typically presented hides those diculties, showing only the spe-
cialized bulldozers, grenades, torpedos, etc. that were built to eradicate them.
Thus reading a conventional proof can be a highly alienating experience, since it
is dicult or impossible to understand why these particular weapons have been
deployed. No wonder the public's typical response to mathematics is something
like \I don't understand it. I can't do it. I don't like it". I believe that mathe-
maticians' systematic elision of con ict must take a signi cant part of the blame
for this. (Note the military metaphor used above; it is suggestive, and also very
common in mathematical discourse.)
So called \natural deduction" (due to Gentzen) is a proof structure with
some advantages, but it is very far from \natural" in the sense of being what
provers do in natural settings; natural deduction presents proofs in a purely top
down manner, so that, for example, lemmas cannot be proved before they are
used. We need to move beyond the extreme poverty of the proof structures that
are traditional in mathematical logic, by developing more exible and inclusive
structures. A rst step towards accommodating con ict in proofs might be to
allow alternative proofs that are incomplete, or even incorrect. For example, to
show why a lemma is needed, it is helpful to rst show how the proof fails without
it; or to show why trans nite induction is needed, it may help to show how
ordinary induction fails. A history of attempts to build a proof records con icts,
and hence reintroduces drama, which can make proofs more interesting and less
alienating. Of course, we should not go too far with this; no proof reader will
An Introduction to Algebraic Semiotics 291

want to see all the small errors a proof author makes, e.g., bad syntax, failure
to check hypotheses of a theorem before applying it, etc. As in a good movie,
con ict introduction should be carefully structured and carefully timed, so that
the clarity of the narrative line is not lost, but actually enhanced. The tatami
system, which embodies many of these ideas, is described in [25, 26], and more
detail on the application of ideas in this paper to that system can be found in
[31, 22]; for a less formal introduction to some of the ideas of algebraic semiotics,
see also [46].
The narrative structures of natural proofs seem to have much in common
with cinema: there is a hierarchical structuring (of acts, scenes, shots in cinema,
and of proof parts in mathematics); there are ashbacks and ashforwards; there
is a rich use of multimedia; etc. The traditional formal languages for proofs are
also very impoverished in the mechanisms they provide for structuring proofs
into parts, and for explaining these structures and parts. Probably we could learn
much about how to better structure proofs by studying movies, because a movie
must present a complex world, involving the parallel lives of many people, as a
linear sequence of scenes, in a way that holds audience interest, e.g., see [9]. No
doubt there are many other exciting areas for further exploration in our quest
to improve the understandability of proofs. Success in this quest could have a
signi cant impact on mathematics education, given the impending pervasiveness
of computers in schools, and the mounting frustration with current mathematical
education practices.
(The essay in this appendix was in part inspired by remarks of Eric Liv-
ingston, whom I wish to thank, though I may still have got it wrong. The re-
marks on narrative draw on detailed studies by the sociolinguist William Labov
[39]. See [21] for some related discussion and background.)
An Algebraic Approach to Modeling Creativity
of Metaphor

Bipin Indurkhya

Department of Computer Science


Tokyo University of Agriculture and Technology
2-24-16 Nakacho, Koganei, Tokyo 184-8588, Japan
bipin@cc.tuat.ac.jp

Abstract. In this article we consider the problem of creative metaphors


— that is, those metaphors that induce new ontologies and new struc-
tures on an object or a situation, thereby creating new perspectives —
and how they might be modeled formally. We argue that to address this
problem we need to fix the model, and study how different theories orga-
nize the model differently. We briefly present some algebraic mechanisms
that can be used to formalize this intuition, and discuss some of their
implications. Then we provide a few examples to illustrate our approach.
Finally, we emphasize that our proposed mechanisms are meant to sup-
plement the existing algebraic approaches to formalizing metaphor, and
are not suggested as a replacement.

1 Introduction: Creativity in Metaphor


The creative aspect of metaphor that we focus on here concerns the phenomenon
of gaining a new perspective on or an insight into an object or a situation. This
kind of creativity in problem solving has been studied by Gordon (1961, 1965),
Koestler (1964) and Schön (1963; 1979), among others. For example, Schön re-
counts how the idea that a paintbrush might be viewed as a pump led to a new
ontology and new structure for the painting process, which in turn led to an
improved synthetic-fiber paintbrush.
More recent psychological research has also demonstrated this aspect of cre-
ativity in understanding metaphorical juxtaposition in poetry (Gineste,
Indurkhya & Scart-Lhomme 1997; Nueckles & Jantezko 1997; Tourangeau &
Rips 1991). The key point here is that a metaphor involved in this kind of
creativity is not based on some existing similarities between its two objects or
situations, but, if the metaphor is successful, creates the similarities. For ex-
ample, people usually do not see any similarity between the ocean and a harp,
but Stephen Spender’s beautiful poem Seascape draws on a compelling imagery,
where the sunlight playing on the ocean waves is compared to the strumming of
harp strings.
In an explanatory model of this process that we have articulated elsewhere
(Indurkhya 1992, 1997a), it is argued that such metaphors work by changing
the representation of the object or situation that is the topic of the metaphor.

C. Nehaniv (Ed.): Computation for Metaphors, Analogy, and Agents, LNCS 1562, pp. 292–306, 1999.

c Springer-Verlag Berlin Heidelberg 1999
An Algebraic Approach to Modeling Creativity of Metaphor 293

Moreover, this process is constrained by the intrinsic nature of the object or


the situation, which resists arbitrary changes of representation. In our model,
this intrinsic nature of the object is taken to be the either the sensory-motor
data set corresponding to the object, if the object is perceptually available;
or the imagery and episodic data (retrieved from memory) corresponding to
the object, when the object is not perceptually available. Indeed, imagery and
episodic memory have been known to play a key role in understanding certain
metaphors (Marschark, Katz & Paivio 1983; Paivio 1979) — a claim that has
been strengthened by recent neurolinguistic research (Bottini 1994; Burgess and
Chiarello 1996) — to the extent that some researchers argue that metaphors are
essentially grounded in perception (Dent-Read and Szokolszky 1993).
In this article we outline an approach to formalizing these ideas using alge-
braic notions. The article is organized as follows. In the next section we motivate
the need to introduce certain non-standard algebraic mechanisms to formalize
our intuitions, and describe these mechanisms briefly. In Section 3, we discuss
how we apply these mechanisms to approach the creativity of metaphor, and
in Section 4 we present some examples to illustrate our approach. Finally, in
Section 5, we remark on how our ideas relate to the existing research, and in
Section 6, we conclude by summarizing the main points of the paper. We assume
familiarity with some elementary algebraic notions. The discussion throughout
is kept focused on the motivation for certain formal mechanisms, and so it has
an informal tone, and definitions, theorems, etc. have been left out.

2 Outline of an Algebraic Approach

Classical model theory studies the properties of and relations between different
models of a given theory. A similar approach is used in most other formaliza-
tions of semiotics (Goguen 1997). This situation is depicted in Figure 1 below.
However, to understand creativity of metaphor, we need to reverse our stand-
point and consider different theories of the same model. For example, in the
painting-as-pumping metaphor mentioned above, one would like to see how the
pumping theory restructures the painting model. In the Seascape example, we
would like to be able to describe how the harp and its related concepts (which
could be considered a theory) restructure the experiential datum (the model) of
the ocean. This situation is depicted in Figure 2.
To avoid the confusion between two senses of ‘model’: one referring to mod-
eling creativity in metaphor, and the other to the model of a theory, we will
henceforth use the term environment to refer to the model of a theory. Thus,
Figure 1 should be read as ‘Focus on multiple environments of a theory’ and
Figure 2 as ‘Focus on multiple theories of an environment’. We believe that in
order to model creativity of metaphor we must focus on Figure 2, and study how
different theories can conceptualize the same environment differently.
294 Bipin Indurkhya

theory
theory 1 theory 2 theory N
(sign system)

model
model 1 model 2 model N
(environment)

Fig. 1: Focus on multiple models of a theory. Fig. 2: Focus on multiple theories of a model.

Now in formalizing the environment (model), we need to keep in mind the


following two points: (1) it should have an autonomous structure (that resists
arbitrary restructuring); and (2) it should allow multiple restructurings. The sec-
ond point has two further implications: (a) the structure should be rich enough
so that any object has many potential structures; and (b) it should not be struc-
tured too strongly a priori — meaning that we should not predetermine the set
of primitives, the sorts, the levels of abstractions, and so on. Intuitively, the mo-
tivation behind these requirements is as follows. Different languages and cultures
— different semiotic systems — have different ways of describing (structuring)
any given experience or situation. The sorts, categories, even which objects are
considered as primitives and which as composites can vary considerably from
one semiotic system to another. So if all these choices have already been prede-
termined in an environment, then there will be little possibility of restructuring
it in novel ways.
With all these factors in mind, the approach we propose is to formalize the
environment as an algebra: that is, a set of objects and a set of operators over
it. Now the term ‘structure’ here refers to how an object can be decomposed
into its parts; or, to put it in other words, how an object can be composed from
its components by applying certain operators. This sense of ‘structure’ is quite
similar to the way it is used in most AI knowledge representation languages, KL-
One, for example (see also Brachman & Levesque 1985). Notice, however, that
in this sense a structure becomes a term of the algebra, and the term algebra
contains all possible structures in the environment.
A few other comments seem to be in order here. First of all, for the reasons
mentioned above, we choose not to put any sorts in the algebra. Though, obvi-
ously, all the operators need not be defined on all the objects, still we can take
care of that by having one or more ‘undefined’ or ‘error’ objects in the algebra.
Secondly, the objects are not assigned any predetermined level of complexity.
In fact, we expect circularity: meaning that situations where, for example, an
object A generates B, B generates C, and C generates A are allowed. In these
cases, there is no fixed set of primitives. If A is taken as a primitive, then B takes
An Algebraic Approach to Modeling Creativity of Metaphor 295

level-2 complexity, and C level-3. But if C is taken as a primitive, then A takes


level-2 complexity and B level-3. This characteristic allows us to model cognitive
interactions as “closed systems that are, at the same time, open to exchanges
with the environment.” (Piaget 1967, pp. 154–58.)
Another point worth emphasizing is that we deliberately choose to not put
any predicates in the algebra for the environment because we feel that operators
(corresponding to actions of an agent) are more primitive than relations (Piaget
1953), and all relations can be broken down to some sequence of operators.
However, in some applications, as we will see in the example of legal reasoning
in Section 4, it may be more convenient to allow predicates and relations in the
environment algebra.
Having formalized the environment like this, a theory can be formalized sim-
ilarly as an algebra. Here, however, we allow sorts, complexity-levels, a prede-
termined set of primitives, ordering, and other structures or restrictions as may
seem appropriate: perhaps similar to a sign system of Goguen (1997).
Now a cognitive (or semantic) relation is formed by connecting the objects
and operators of the theory algebra to the objects and operators of the environ-
ment algebra. As the environment algebra does not have any sorts, complexity-
levels, etc., only the arity of the operators needs to be preserved. Notice first of all
that we allow the two algebras to have different signatures. Secondly, we allow a
cognitive relation to be a many-to-many relation, but it can be turned into a func-
tion by grouping the environment algebra appropriately. (See Indurkhya 1992,
Chap. 6, for details.) Finally, though structure-preserving property, which we
refer to as coherency, is the ideal for cognitive relations, a more useful notion
for cognitive modeling is that of local coherency, that is, coherency within some
restricted subalgebras of the theory and the environment.
A cognitive relation induces a structure in the environment that reflects the
structure of the theory: we can say that the environment is structured by the
theory. A different theory would structure the environment differently. Though
both these structures may look very different, they are both, nonetheless, con-
strained by the autonomous structure of the environment. Any incoherency that
is detected by the agent must be countered by either modifying the cognitive
relation (thereby changing the ontology of the environment as seen from within
the theory) or by changing the structure of the theory. We emphasize again that
the autonomous structure of the environment cannot be changed by the agent,
though it can be organized in different ways by different theories.

3 Formalizing Creativity of Metaphors

Many theories and their cognitive relations are inherited, biologically or cul-
turally, or learned as we grow up. We can dub them as conventional cognitive
relations. These cognitive relations structure our environment in various ways,
and it is this structured environment that we live in and interact with. How-
ever, in certain situations, it becomes necessary to form new cognitive relations.
A prime example of such situations is metaphor. In metaphor, a new cognitive
296 Bipin Indurkhya

relation is created between a theory and an environment. Usually, the vehicle


theory interacts with the topic environment, but often the process is mediated
by the topic theory.
Not all metaphors result in a new perspective and a new representation. Actu-
ally, many metaphors can be understood by constructing some mapping between
the topic and the vehicle theories, as in semiotic morphisms of Goguen (1997).
However, for some metaphors, no such mappings could be found (there are no
existing similarities.) In such cases, it becomes necessary to conceptualize the
topic environment anew — as if it were encountered for the first time — using
the concepts from the vehicle theory. In this interaction — and we must em-
phasize that the result of the interaction is determined in part by the structure
of the topic environment, and in part by the structure of the vehicle theory —
a new structure of the topic environment emerges (if the process is successful.)
For example, in projecting the pumping theory onto painting process, a new
ontology for paintbrush emerged, in which the space between the fibres played
a key role.
Thus, the process underlying creative metaphor becomes that of instantiat-
ing a new cognitive relation between a theory and an environment, such that
it preserves the structure of each. This new cognitive relation restructures the
environment, and as a result, new attributes of the environment may emerge and
new information about the environment may become apparent. For example, in
restructuring the painting environment by pumping theory, the part of the paint-
brush where it bends away from the surface being painted becomes very crucial,
and the part of the paintbrush which is already in contact with the surface fades
into irrelevance. Or in understanding the ocean-as-a-harp metaphor, new per-
ceptual similarities between the ocean and the harp emerge — similarities that
were lost when the two were viewed from the conceptual level via their respec-
tive conventional theories — and one gets a glimpse of an alternative semiotic
system in which the two would be semantically very close, and even be assigned
the same category.
Here, an interesting result can be obtained by generalizing the first isomor-
phism theorem (Cohn 1981, p. 60; Mal’cev 1973, pp. 47–8) for certain cognitive
relations by taking into account the change of signature (see Indurkhya 1992,
Chap. 6). The first isomorphism theorem essentially says that any homomor-
phism from a source algebra to a target algebra can be factored into a unique
isomorphism. The trick is to first take the kernel of the source algebra, which
means grouping the elements of the source algebra as follows: if any two ele-
ments map to the same element of the target algebra then they are put in the
same group. Secondly, we limit the target algebra to its subalgebra that is the
range of the homomorphism. That is, if a certain element of the target algebra
is such that no element of the source algebra maps into it, then that element
is not included in the subalgebra. After these two steps one finds that there
exists an isomorphism between the kernel of the source algebra and the ‘range’
subalgebra of the target. Moreover, this isomorphism is unique, so that different
homomorphisms factor into different isomorphisms. In other words, every iso-
An Algebraic Approach to Modeling Creativity of Metaphor 297

morphism factored by this process carries a unique stamp of the homomorphism


from which it was derived.
The mechanism corresponding to the first isomorphism theorem corresponds
to a frequently used cognitive process, and failing to realize it has resulted in some
needless controversy over whether metaphors ought to be formalized as a rela-
tion, a homomorphism or an isomorphism. For example, Max Black (1962, 1979)
proposed that underlying every metaphor is an isomorphism between its topic
and its vehicle, and many scholars have chided him for positing too strong a
requirement.
To realize the cognitive correlate of the first isomorphism theorem, consider
how we use the map of a city. Obviously, the map does not represent everything
in the city. (There is a charming story by Borges precisely on this theme.) Yet,
in using the map, one gives the city an ontology or a representation where parts
of the city are grouped together and are seen as primitives: two lanes of a street,
the sidewalks, and the shops and building along the street are all seen as a unit
and correspond to a line on the map. In using the map, one acts as if it were
isomorphic to the city, even though the street is not painted orange, but the
line on the map is, and the vehicles and the people on the street are nowhere to
be found on the map. Thus, the operations of taking a subalgebra and forming
groupings (as in taking the kernel) play an important role in modeling cognitive
interaction.
If we assume that a cognitive agent can be aware of its environment only
as far as it is represented in a theory, then we can also provide an explanatory
model of how new features can be created by metaphor (Indurkhya 1998).
The approach outlined here has some other applications as well, and we
would like to mention one of them briefly. Consider the prototype effect, which
is demonstrated by Eleanor Rosch in her prolific work on human categorization
(Rosch 1977). According to it, categories have a radial structure, with certain
members occupying a more central position than others. (See also Lakoff 1987).
To model this phenomenon, we have to realize that the environment does not
have a preassigned set of primitives. Which objects are considered as primitives
depends on the structure given to it by the cognitive relation. As the objects in
an algebra are structured by its operators, if we deem a certain subset of objects
of the algebra to be primitive (prototype), and assign a measure function that
assigns a ‘distance’ to every other object depending on the length of the shortest
description of that object using only the primitives, then a kind of radial struc-
ture (Lakoff 1987, Chap. 6) emerges. For example, in the Dyirbal classification
system discussed by Lakoff (1987, p. 100), the category Balan includes women,
fire and dangerous things. If women are considered as primitives, then danger-
ous things become distant members of the category, because the derivation from
women to dangerous things is a long one: going from women to sun, then to fire,
finally arriving at dangerous things. On the other hand, if fire is considered a
primitive, then dangerous things become more central members of the category
but women become more distant members.
298 Bipin Indurkhya

4 Some Examples

We now present a few examples to illustrate our approach. The first example
is from the Copycat domain pioneered by Hofstadter (1984), which concerns
proportional analogy problems between letter strings, as in:

abc : abd :: pqr : ?? (1)

This domain may seem rather simple at first but in fact, as Hofstadter has
shown, a number of rich and complex analogies can be drawn in it. In particular,
the Copycat domain is quite suitable for demonstrating the context effect, ac-
cording to which an object needs to be represented differently depending on the
context, thereby revealing the limitations of fixed-representation approaches. For
instance, in the analogy problems (2) and (3) below, the first term of the analogy
(abba) is the same, but it needs to be given a different representation to solve
each problem: for analogy (2), abba needs to be represented as a symmetrical
object, with the string ab, reflected and appended to itself; and for analogy (3)
it needs to be seen as an iterative structure, namely two copies of b, flanked by
the same object, namely a, on either side.

abba : abab :: pqrrqp : ?? (2)


abba : abbbbba :: pqrrpq : ?? (3)

In order to model this context effect in our approach, we take Leeuwen-


berg’s Structural Information Theory [SIT henceforth] (Leeuwenberg 1971) as
the starting point. In SIT, a certain way of expressing different representations
(also known as ‘gestalts’) of a pattern in terms of iteration, symmetry and alter-
nation operators is defined. Then a measure called ‘information load’ is defined
on every representation. According to SIT, for any given pattern, the represen-
tation with the minimum information load is the preferred gestalt. (See also Van
der Helm and Leeuwenberg 1991.)
In integrating SIT within our algebraic approach, we extend SIT in two
significant ways. One is to allow domain-dependent operators to participate in
the gestalt representations. For example, in the Copycat domain, the operators
‘successor’ and ‘predecessor’ play a key role, so that an object like ‘abcd’ can be
seen to have an iterative structure where the operator ‘successor’ is applied at
each iteration.
Secondly, whereas SIT only accounts for the preferred gestalts of patterns in
isolation, we incorporate context effect by taking into consideration the complex-
ity of representation algebras also, which can be simply measured by counting
the number of elements and the number of operators in it. For example, applying
the information load criterion, the preferred gestalt for ‘abba’ is the one that
sees a symmetry structure in it. However, when this object is considered together
with ‘abbbbba’, as in the analogy (3) above, we must also take into account the
complexity of the representation algebra that generates the gestalts for both.
Though ‘abbbbba’ can also be written as a symmetry structure — albeit with
An Algebraic Approach to Modeling Creativity of Metaphor 299

odd symmetry, for it has a pivot point in the middle ‘b’ — the representation
algebras that generate the minimum information load gestalts for each of these
terms individually have mostly different elements, and so when we combine them
to get the representation algebra that can generate both the terms, the complex-
ity of the resulting algebra is almost cumulative. However, if we represent ‘abba’
and ‘abbbbba’ as iterative structures, then their individual representation al-
gebra have a high degree of overlap, so that the complexity of the combined
representation algebra remains almost the same. Fuller details of our approach
can be found in Dastani, Indurkhya and Scha (1997), and Dastani (1998).

: :: :

: :: :

A : B :: C : D
Fig. 3: Two examples of proportional analogy relations A is to B as C is to D involving geomteric
figures. Notice that the terms A and B are the same in each example, yet different figures for the C term
forces a different way of decomposing figures A and B.

This approach can be further illustrated by considering the creation of sim-


ilarity in proportional analogies involving geometric figures. In the two propor-
tional analogy relations shown in Fig. 3, figures A and B are the same, yet
they must be seen differently, or described differently, for understanding each
example. People can comprehend them easily, but analogy systems based on
mappings between fixed representation cannot account for them. The reason is
that in fixed-representation systems, one must first choose how each figure is
represented or described. If figure A is described as a triangle put on top of
another inverted triangle, then the upper analogy relation in Fig. 3 can be com-
prehended but not the lower one. If, on the other hand, figure A is described
as a hexagon with an outside facing equilateral triangle on each of its six sides,
then the lower analogy relation in Fig. 3 can be understood, but not the upper
one. Notice that if we describe figures A and B in terms of line segments/arcs
(or pixels), then neither of the analogies can be comprehended, for the ontology
300 Bipin Indurkhya

of various closed figures, like ‘triangle’, and their structural configurations are
essential to understanding the analogies.
What seems necessary here is to provide a sufficiently low-level description
of the figures (say, in terms of line segments and arcs), and a rich repertoire of
operators and gestalts that allow one to build different higher-level structured
representations from these low-level descriptions. For the examples in Fig. 3, we
need the gestalts of ‘triangle’, ‘hexagon’, ‘ellipse’, etc.; and operators like ‘invert’
(turn upside down), ‘juxtapose’ ‘rotate-clockwise’, and so on. A structured rep-
resentation using these gestalts and operators essentially shows how the figure
can be constructed from the line segments and arcs.1 Needless to say, there are
many ways to construct each figure, so there are many corresponding structured
representations.
Thus the heart of the problem, in this approach, lies in searching for a struc-
tured representation that is most appropriate in a given context. As representa-
tions correspond to algebraic terms, it means we must find suitable representa-
tion algebras for each of the figures — where ‘suitability’ must take into account
complexity of representation algebras, complexity of representations, existence
of an isomorphic mapping between representation algebras, and the complexity
of this mapping. We must emphasize two somewhat unusual aspects of our ap-
proach here. One is that we require a mapping between representation algebras,
and not between representations themselves, to capture the analogical relation.
The reason for this is that a mapping between representation algebras is more
robust with respect to trivial changes of representation — such as ones arising
from symmetry or transitivity of operators. The second distinctive feature is that
we require an isomorphism rather than a homomorphism. However, as explained
above in Section 3, this by no means constitutes a limitation of our approach;
on the contrary, it focuses attention on the isomorphism underlying each homo-
morphism. (See Indurkhya 1991 for a further elaboration of these issues and a
formally worked out example.)
The next example we would like to present, taken from Indurkhya (1997b),
concerns modeling a certain kind of creative arguments in legal reasoning. Very
briefly, the example is about a college professor, Weissman, who deducted the
expenses of maintaining an office at home from his taxable income. A precedent
that was helpful to Weissman’s arguments was the case of a concert violinist,
Drucker, who was allowed to claim home-office deduction for keeping a studio
at home where he practiced. However, the Revenue Service tried to distinguish
Weissman from Drucker on the grounds that Drucker’s employer provided no
space for practice, which is obviously required of a musician, whereas Weissman’s
employer provided an office (a shared one). The judges, however, ruled that
Weissman’s employer provided no suitable space for carrying out his required
1
It should be noted here that the algebra corresponding to this domain would be like
the algebraic specification of any drawing or graphics program such as Superpaint. In
any such graphics program, the user can create various objects on the screen, group
them in certain ways to create different gestalts, and apply a variety of operations
on them.
An Algebraic Approach to Modeling Creativity of Metaphor 301

duties (the office, being a shared one, was not safe for keeping books and other
research material), just as Drucker’s employer provided no suitable space for
Drucker to practice.
The key issue in modeling this argument is how to specialize category ‘no
space provided by the employer’ to ‘no suitable space provided by the employer’,
because the former distinguishes Weissman from Drucker, but the latter cate-
gory allows Drucker to be applied to Weissman. We have argued that the new
category can be obtained from other precedents. In this example, there was an-
other precedent, Cousino, a high-school teacher who was denied home-office tax
deduction, because the judges argued that his employer provided him a suitable
space for each task for which he was responsible. A very interesting aspect of this
example is that though Cousino and Drucker, when they are individually applied
to Weissman, lead to a decision against Weissman; but when Cousino is used to
reinterpret Drucker, and then reinterpreted Drucker is applied to Weissman, a
decision in favor of Weissman can be obtained.
In modeling this argument in our approach, the environment level is as-
sociated with the facts of a case, and the model or theory level is associated
with the rationale for the decision of the case (Hunter and Indurkhya 1998).
For example, facts of the Cousino case would include: ‘employer-of (Cousino) =
XYZ’, ‘high-school (XYZ)’, ‘responsible (Cousino, teach)’, ‘responsible (Cousino,
grade-papers)’, ‘provided (Cousino, XYZ, classroom)’ ‘provided (Cousino, XYZ,
staff-room)’, ‘suitable-for (classroom, teaching), ‘suitable-for (staff-room, grade-
papers) ’, etc. Notice that because the facts are themselves composed of linguistic
and abstract categories, we need to allow predicates and relations in the envi-
ronment algebra.
The rationale of the case, in this example, would consist of a complex term
(we mean algebraic term here) ‘employer provided suitable space for the tasks for
which the employee is responsible’. As this is a precedent, that has already been
decided, the terms of the rationale level would already be connected to the facts
level (meaning that a cognitive relation exists). This already shows the grouping
phenomenon, and how the facts level seems isomorphic to the rationale level.
The object ‘tasks’ at the rationale level is connected to different objects at the
facts level, including ‘teach’, ‘grade-papers’, ‘prepare-lessons’, ‘talk-to-parents’,
etc. So all these activities are grouped together and are seen as a unit from the
rationale level. Also, many facts at the facts level are not considered relevant,
and so are not connected to anything at the rationale level. Nonetheless, it is
necessary to keep these facts, for they may become necessary in reinterpreting
the Cousino case, which is precisely what happens when Cousino is applied to
reinterpret Drucker.
In applying the rationale of Cousino — which contains the gestalt ‘suitable
space’ — to the facts of Drucker, a new rationale and a new cognitive relation
between the rationale and the facts levels of the Drucker case emerges. Using
this new rationale, the facts of the Weissman case can also be organized in such
a way that a decision favorable to Weissman can be obtained, and moreover,
Drucker can be cited as a precedent to support this argument.
302 Bipin Indurkhya

Our final example concerns linguistic metaphor, and is taken from a certain
translation of the Bible. As Stephen is persecuted for spreading the teachings of
Jesus, he rebukes his persecutors:

“You stiff-necked people, uncircumcised in heart and ears, you always


resist the Holy Spirit. As your fathers did, so do you.” (Acts 7:51. The
Oxford Annotated Bible with the Apocrypha. Revised Standard Version.
Oxford University Press, 1965.)

The phrase we would like to focus on is ‘uncircumcised in heart and ears’. Now
several gestalt descriptions (or algebraic terms) can be associated with ‘circum-
cised’: for example ‘surgically removing prepuce’, ‘purify spiritually’, etc. Note
that these descriptions themselves contain gestalts like ‘prepuce’, ‘purify’, which
can be further decomposed into other gestalts. However, at some point, we have
to try to interpret the gestalt descriptions by finding similar operations in the
context of ears and heart. For example, ‘surgically remove’ is an operation ap-
plied to ‘prepuce’, so we have to find a similar operation that can be applied
to some part of the ear. This process may require creating imagery for ear (and
possibly for circumcision as well) using perceptual knowledge about it. Perhaps
the gestalt that is easiest to interpret is ‘purify’ or ‘cleanse’, which means ‘un-
circumcised’ would correspond to ‘unclean’ (negation operation is applied). But
‘unclean’ for ears could suggest ears plugged up by earwax, for example, so that
the person cannot hear the message.
Finding the right gestalt of ‘uncircumcised’ to interpret in the context of
‘heart’ is more complex, because ‘heart’ itself is used metaphorically, not for
the physical organ that pumps blood, but for feelings and understanding. Here
one can perhaps construct an image where something that is unclean cannot
receive new ideas or impressions (e.g. adding a new tint to the dirty water),
and the person with the unclean heart does not see what is the truth according
to Stephen. There may also be the association that as circumcision requires a
surgical procedure, something drastic needs to be done to purify the heart.
We should add that all this analysis is done from a viewpoint that is outside
of the Bible, for when viewed within the Bible, circumcision is a dead or a
conventional metaphor (e.g. ‘Circumcise yourselves to the Lord’. Jeremiah 4:4.)
Also, in some other translations a more literal approach is taken:

“ ‘How stubborn you are!’ Stephen went on to say, ‘How heathen your
hearts, how deaf you are to God’s message! You are just like your an-
cestors: you too have always resisted the Holy Spirit!’ ” (Acts 7:51. The
Good News Bible. The Bible in Today’s English Version translated and
published by the United Bible Societies, 1976.)

5 Related Research

In the last twenty years or so there has been much interest in metaphor, and
many researchers from different disciplines have approached the problem from
An Algebraic Approach to Modeling Creativity of Metaphor 303

various angles. Our approach outlined here is based on the insights of Max
Black (1962; 1979) and Nelson Goodman (1978), among others. However, be-
cause of not being spelled out precisely, these ideas have often been misunder-
stood. We already mentioned above that Black has been unfairly criticized for
claiming that there is an isomorphism underlying every metaphor. Then Black
has also been inconsistent on the symmetry of metaphor: at times suggesting
that metaphors may be symmetrical, while in most places his account is clearly
asymmetrical. This again has caused some needless misunderstanding (see, for
example, Lakoff & Turner 1989, pp. 131–133). Our approach towards formal-
izing their insights and extending it further, we hope, dispels many of these
misunderstandings.
The research on metaphor and its role in organizing our conceptual sys-
tem has received a huge impetus from the work of George Lakoff and his col-
leagues (Lakoff & Johnson 1980; Lakoff 1987). While the empirical data they
have amassed to demonstrate how metaphors pervade our everyday life and
discourse are indeed impressive, their attempts to explain how a metaphor can
reorganize the topic and create new features in it are fraught with contradictions.
In some places they claim that certain topic domains derive their structure pri-
marily through metaphors, and they do not have a pre-metaphorical structure.
At other places they imply that the topic constrains the possible metaphorical
mappings and creation of feature slots. (See also Indurkhya 1992, pp. 78–84,
pp. 124–127.) We believe that our formal approach clearly resolves this appar-
ent paradox of how metaphor can restructure the topic, and yet it is not the
case that anything goes.2
More recently, Gilles Fauconnier and Mark Turner have introduced a the-
ory of conceptual blending (see, for example, Turner & Fauconnier 1995), which
introduces a multiple space model. However, their theory works primarily with
concepts, showing how concepts from many spaces blend together to produce
metaphorical meanings. While we acknowledge that the multiple-space model
does indeed come close to the way real-world metaphors work, we also feel that
it is crucial to involve the object or the situation (what we have been calling the
environment) in the interaction. Without incorporating this orthogonal compo-
nent, we believe, the creativity of metaphor cannot be accounted for satisfacto-
rily. Thus, in our view the approach presented here supplements the conceptual
blending theory, and in the future we expect to broaden it by considering how
multiple environments and multiple theories interact together to produce new
meanings.

2
On the formal side, Goguen (1997) has embarked on an ambitious project to de-
velop a formal framework for systems of signs and their representations. However,
we believe that the mechanisms proposed here would have to be incorporated in
the semiotic morphisms of Goguen in order to be able to account for creativity in
metaphor. Though we must add that this kind of creative restructuring is neither
always required, nor always desirable. Therefore, there may well be many situations
where semiotic morphisms without allowing restructuring would work just fine. But a
more comprehensive framework would have to allow the possibility of restructuring.
304 Bipin Indurkhya

Finally, a very different approach to modeling creativity of analogy and


metaphor is taken by Doug Hofstadter and his colleagues (Hofstadter 1995).
By cleverly designing a number of seemingly simple microdomains that capture
the creative aspects of analogy and metaphor in their full complexity, they have
focused right on the crux of the problem, and have built computational systems
to model creativity of metaphor. Though they have deliberately eschewed any
formalization of their ideas, their computational systems are a kind of formal
system. Nonetheless, some of their underlying principles are not clear and it is
difficult to glean them from their description of the systems. For example, a
key concept used in many of Hofstadter’s systems is that of ‘temperature’. The
lower the temperature, the better the analogy is supposed to be. However, it
is not clear at all how the temperature is computed: its underlying principles
are not made explicit. A formal approach such as the one outlined here allows
such hidden principles to be articulated explicitly. For example, in our model of
proportional analogy described in Section 4, we adapt Leeuwenberg’s concept
of information load (Leeuwenberg 1971) to articulate the goodness of analogy.
Thus, we feel that our formal approach fills an important niche left open by
Hofstadter and his colleagues’ research.

6 Conclusions
In this paper we have focused on the problem of how metaphor can restructure an
object or a situation, and create new perspectives on it. With this goal in mind,
we outlined some algebraic mechanisms that can be used to model creativity
and restructuring of metaphor. Needless to say, the approach presented here is
merely a step towards a fuller understanding of the creativity of metaphor. First
of all, the model, as it is, needs to be elaborated considerably, and computa-
tional mechanisms need to be developed to implement its different mechanisms.
For example, elsewhere (Indurkhya 1997b) we have suggested a blackboard archi-
tecture for modeling interaction between a cognitive model and an environment
in the domain of legal reasoning. Secondly, the approach needs to be expanded
to incorporate language, communication between agents, and so on. Obviously,
all these issues will keep us busy for years to come.

References
Black, M. (1962). Metaphor. In M. Black Models and Metaphors, Cornell University
Press, Ithaca, NY, pp. 25–47.
Black, M. (1979). More about Metaphor. In A. Ortony (ed.) Metaphor and Thought,
Cambridge University Press, Cambridge, UK, pp. 19–45.
Bottini, G., Corcoran, R., Sterzi, R., Paulesu, E., Schenone, P., Scarpa, P., Frackowiak,
R.S.J., and Frith, C.D. (1994). The role of the right hemisphere in the interpretation
of figurative aspects of language: A positron emission tomography activation study.
Brain, 117, pp. 1241–1253.
Brachman, R.J. and Levesque, H.J. (eds.) (1985). Readings in Knowledge Representa-
tion. Morgan Kaufmann, San Mateo, California.
An Algebraic Approach to Modeling Creativity of Metaphor 305

Burgess, C., and Chiarello, C. (1996). Neurocognitive Mechanisms Underlying


Metaphor Comprehension and Other Figurative Language. Metaphor and Symbolic
Activity, 11, No. 1, pp. 67–84.
Cohn, P.M. (1981). Universal Algebra. Revised edition, D. Reidel, Dordrecht, The
Netherlands.
Dastani, M. 1998, Languages of Perception. Ph.D. dissertation. Institute for Logic, Lan-
guage and Computation, (ILLC Dissertation Series 1998-05), Univ. of Amsterdam,
Amsterdam.
Dastani, M., Indurkhya, B., and Scha, R. (1997). An Algebraic Approach to Modeling
Analogical Projection in Pattern Perception. In T. Veale (ed.) Proceedings of Mind
II: Computational Models of Creative Cognition, Dublin, Ireland, September 15–17,
1997.
Dent-Read, C., and Szokolszky, A. (1993). Where do metaphors come from? Metaphor
and Symbolic Activity, 8(3), pp. 227–242.
Gineste, M.-D., Indurkhya, B., and Scart-Lhomme, V. (1997). Mental Representations
in Understanding Metaphors. Notes et Documents Limsi No. 97–02, LIMSI-CNRS,
BP 133, F-91403, Orsay, Cedex, France.
Goguen, J. (1997). Semiotic Morphisms. Technical Report TR-CS97-553, Dept. of Com-
puter Science & Engineering, Univ. of California at San Diego. San Diego, Calif.
Goodman, N. (1978). Ways of Worldmaking. Hackett, Indianapolis.
Gordon, W.J.J. (1961). Synectics: The Development of Creative Capacity. Harper &
Row, New York.
Gordon, W.J.J. (1965). The Metaphorical Way of Knowing. In G. Kepes (ed.) Educa-
tion of Vision. George Braziller, New York, pp. 96–103.
Hofstadter, D. (1984). The Copycat Project: An Experiment in Nondeterminism and
Creative Analogies. AI Memo 755. Artificial Intelligence Laboratory, MIT, Cam-
bridge: Mass.
Hofstadter, D., and The Fluid Analogies Research Group (1995). Fluid Concepts and
Creative Analogies. Basic Books, New York.
Hunter, D. and Indurkhya B. (1998). ‘Don’t Think, but Look!’ A Gestalt Interaction-
ist Approach to Legal Thinking. In K. Holyoak, D. Gentner and B. Kokinov (eds.)
Advances in Analogy Research: Integration of Theory and Data from the Cognitive,
Computational and Neural Sciences. NBU series in Cognitive Science, New Bulgarian
University, Sofia, pp. 345–353.
Indurkhya, B. (1991). On the Role of Interpretive Analogy in Learning. New Generation
Computing 8, pp. 385–402.
Indurkhya, B. (1992). Metaphor and Cognition: An Interactionist Approach. Kluwer
Academic Publishers, Dordrecht, The Netherlands.
Indurkhya, B. (1997a). Metaphor as Change of Representation: An Artificial Intelli-
gence Perspective. Journal of Experimental and Theoretical Artificial Intelligence 9,
pp. 1–36.
Indurkhya, B. (1997b). On Modeling Creativity in Legal Reasoning. Proceedings of the
Sixth International Conference on AI and Law, Melbourne, Australia, June 30–July
3, 1997, pp. 180–189.
Indurkhya, B. (1998). On Creation of Features and Change of Representation. Journal
of the Japanese Cognitive Science Society 5, No.2 (June 1998), pp. 43-56.
Koestler, A. (1964). The Act of Creation. Hutchinsons of London. 2nd Danube edition
(1976).
Lakoff, G. (1987). Women, Fire and Dangerous Things.
Univ. of Chicago Press, Chicago.
306 Bipin Indurkhya

Lakoff, G. and Johnson, M. (1980). Metaphors We Live By. Univ. of Chicago Press,
Chicago.
Lakoff, G. and Turner, M. (1989). More than Cool Reasons: A Field Guide to Poetic
Metaphors. Univ. of Chicago Press, Chicago.
Leeuwenberg, E. (1971). A Perceptual Coding Language for Visual and Auditory Pat-
tern. American Journal of Psychology 84, pp. 307–349.
Mal’cev, A.I., (1973). Algebraic Systems. B.D. Seckler & A.P. Doohovskoy (trans.).
Springer-Verlag, Berlin, Germany.
Marschark, M., Katz, A. and Paivio, A. (1983). Dimensions of Metaphors. Journal of
Psycholinguistic Research 12, pp. 17–40.
Nueckles, M. and Janetzko, D. (1997). The Role of Semantic Similarity in the Com-
prehension of Metaphor. In Proceedings of the Nineteenth Annual Conference of the
Cognitive Science Society, Lawrence Erlbaum Associates, Hillsdale, New Jersey, pp.
578–583.
Paivio, A. (1979). Imagery and Verbal Processes. Hillsdale, NJ : Lawrence Erlbaum
associates, Inc.
Piaget, J. (1953). Logic and Psychology. Manchester University Press, Manchester, UK.
Piaget, J. (1967). Biology and Knowledge. B. Walsh (trans.) (1971). Univ. of Chicago
Press, Chicago.
Rosch, E. (1977). Human Categorization. In N. Warren (ed.) Studies in Cross-Cultural
Psychology: Vol. 1. Academic Press, London, pp. 1–49.
Schön, D.A. (1963). Displacement of Concepts. Humanities Press, New York.
Schön, D.A. (1979). Generative Metaphor: A Perspective on Problem-Setting in Social
Policy. In A. Ortony (ed.) Metaphor and Thought. Cambridge Univ. Press, Cambridge,
UK, pp. 254–283.
Tourangeau, R. and Rips, L. (1991). Understanding and Appreciating Metaphors. Cog-
nition 11, pp. 203–244.
Turner, M. and Fauconnier, G. (1995). Conceptual Integration and Formal Expression.
Metaphor and Symbolic Activity, 10(3), pp. 183–204.
Van der Helm, P. and Leeuwenberg, E. (1991). Accessibility: A Criterion for Regularity
and Hierarchy in Visual Pattern Code. Journal of Mathematical Psychology, 35, 151–
213.
Metaphor and Human-Computer Interaction: A Model
Based Approach

J.L. Alty and R.P. Knott

LUTCHI Research Centre


Dept. of Computer Studies
Loughborough University
Loughborough
Leicestershire, UK
LE11 3TU.
{j.l.alty & r.p.knott}@lboro.ac.uk

Abstract. The role of metaphor in the interface design process is examined and
the importance of formal approaches for characterizing metaphor is stressed.
Two mathematical models of metaphor are put forward - a model based upon a
set approach and a model based upon functional decomposition. The set-based
model has proved to be useful in the design process enabling designers to
identify problem areas and possible improvement areas. The more detailed
functional model mirrors the set approach and is still under development,
however the main ideas are outlined.

1 Computer Applications and Interfaces

The interface between a human being and a computer application consists of a set of
interface objects which map onto objects in the underlying computer system and
whose manipulation instructs the system to perform certain functions. The state of
these interface objects also reflects the current system state and provides
communication between system and user. Recently there has been more emphasis on
graphical user interfaces enabling designers to provide realistic interface controls
which can be “directly manipulated” (Shneiderman, 1978). This shifts the emphasis to
“doing” rather than linguistic reasoning when solving interface problems, resulting in
new interest in the use of metaphor. Two of the most ubiquitous metaphors used have
been the “Desktop Metaphor”, where many housekeeping functions are mapped to the
manipulation of papers on a desktop, and the “Windows Metaphor“ whereby users
have views onto different applications. These metaphors have been successful in
allowing users to manage files and to control many applications simultaneously.

Carroll & Mack (1985) state that ‘metaphors can facilitate active learning….. by
providing clues for abductive and adductive inferences through which learners
construct procedural knowledge of the computer’. The selection and application of
C. Nehaniv (Ed.): Computation for Metaphors, Analogy, and Agents, LNCS 1562, pp.307 -321, 1999.
 Springer-Verlag Berlin Heidelberg 1999
308 J.L. Alty and R.P. Knott

existing models of familiar objects and experiences allow users to comprehend novel
situations. Lakoff & Johnson, (1980) claim that all learning is metaphoric in nature.

2 What Is Metaphor ?

Literary theory characterizes the role of metaphor as the presentation of one idea in
terms of another, such that understanding of the first idea is transformed. From the
fusion of the two ideas, a new one is created. Richards (1936) has proposed a
nomenclature in which he defines the original idea as the ‘tenor’ and the second idea
imported to modify or transform the tenor as the ‘vehicle’. The use of metaphor must
involve some form of transformation; otherwise the construction is simply an analogy
or juxtaposition and not a metaphor. Metaphors draw incomplete parallels between
unlike things, emphasizing some qualities and suppressing others, (Lakoff & Johnson,
1980).

The mismatches are an important part of metaphor. One thinks of those lines from
Auden made famous in the film “Three Weddings and a Funeral”:

“The stars are not wanted now: put out every one;
Pack up the moon and dismantle the sun;
Pour away the ocean and sweep up the wood;
For nothing now can ever come to any good.”

The mismatches are huge (“pack up the moon”, “dismantle the sun”), but the images
are powerful.

Using Richards’ terms, in the design of the Apple Macintosh interface, the real-world
desktop acts as a vehicle in order to transform the tenor, in this case the operating
system of the computer. Thus a metaphor requires three concepts; the Tenor, the
Vehicle and the transformation between them.

Although there have been many papers on the use of metaphor at the interface, there
has been a lack of formal design approaches. A mathematical approach to metaphor
representation is mentioned in Kuhn et al.. (1991), but is not developed.

3 Metaphoric Interfaces

When designers use a metaphor at the interface, they have to carefully design a set of
interface objects with which to represent the observable states of the system. These
objects have a dual function. They present the state of the system to the users and
inform them of system changes. At the same time they must provide a set of actions
Metaphor and Human-Computer Interaction: A Model Based Approach 309

through which the user can initiate changes in the system state. In Graphical User
interfaces (GUIs) these actions corresponds to mouse clicks, or dragging etc.
Usually, each metaphor at the interface, relates to a single application (or even a sub-
task such as cut-and-paste). If several applications are running there may be several
concurrent metaphors at the interface, one for each active application and others for
system functions. A user, however, is usually only concerned with one application at
a time. Note however that this application could be the operating system itself and
that some metaphors may apply across all applications.

4 The Model of Anderson et al. (1995)

Anderson and co-workers have put forward a model which has proved very useful
in investigating metaphoric mapping issues. The model is shown in Figure 1.

S -M +
Features o f vehicle (M )

Features of S+M +
system (S )
S+M -

S -M -

Fig. 1. The Anderson et al. (1995), Pragmatic Model

The four areas (in what might be considered a Venn Diagram) are:
S+M+ à features in the system supported by the Metaphor,
S+M- à features in the system not supported by the Metaphor.,
S-M+ à features implied by the Metaphor but not provided by the system,
S-M- à features not implied by the Metaphor nor supported by the system.

Anderson et al. (1994) used this model to investigate the importance of the concept of
“conceptual baggage” - the proportion of S-M+ to S+M+ features (that is those
features of the metaphor, which do not map to system functionality compared with
those which do). Anderson et al.. found empirical evidence that conceptual baggage
did play an important role in the overall effectiveness of metaphor at the interface. In
the process control area, conceptual baggage is an important issue since it could lead
operators into erroneous conclusions about the process.
310 J.L. Alty and R.P. Knott

5 A Case Study of Design: The DOORS System

5.1 Description of the System Functionality

The prototype system used in these investigations was designed to act as an interface
to an office-based integrated digital broadband telecommunications infrastructure.
More specifically, the system was designed to broadcast the availability state of all
users of the system at any given point in time, and to enable users to make point to
point audio-visual connections. Each user of the system was represented as a
graphical icon which was available to all other users of the system. Communication
between users of the system was initiated via these icons which were also used to
display the availability state of the particular user. In order to provide an adequate
simulation of such technology, the system, known as DOORS (MITS 1994a), was
developed to utilize the audio-visual infrastructure and controlling software (Gaver et
al., 1992) available at Rank Xerox Research Centre, Cambridge.

Preliminary analysis of office-based communications during early design suggested


that a person’s availability can generally be allocated to one of three states:
1. Available for communication.
2. Busy but interruptible.
3. Not available for communication (Anderson, 1994).

5.2 Description of Vehicle-System Pairings

In order to describe the relationships between system and vehicle for each of the three
pairings, it was necessary to explore the features of each of the vehicles with respect
to the proposed system functionality. Techniques suggested by Carroll et al. (1988)
were used to consider the mappings between vehicle and system at the levels of
‘tasks’, ‘methods’ and ‘appearances’ in a representative set of scenarios. The results
of this analysis were set in the context of the above model so that it was possible to
allocate attributes of the vehicle-system pairing to one of the four categories in
Anderson's model. The ease and immediacy of the allocation process formed the
basis of the characterization of each vehicle-system pairing. For example, office
doors immediately provided a wide range of possible attributes pertinent to the
initiation of point to point audio-visual connections, compared to the attributes
associated with dogs.

5.2.1 Office Doors

The first vehicle-system pairing adopted the office door as a vehicle for representing
the availability of a user. Specifically, an open door corresponded to ‘available for
communication’, a partially open door to ‘busy but interruptible’ and finally a closed
Metaphor and Human-Computer Interaction: A Model Based Approach 311

door to ‘not available for communication’. The characterization of the relationship


between this vehicle and the system is shown in Figure 2.

Features of vehicle

Features of system

Fig. 2. Characterization of Office doors/System pairing

In order that equivalent vehicle-system pairings could be constructed, the


functionality underlying the interface was kept relatively simple. As a result of this
strategy, and the fact that the office door is a very rich vehicle in this particular
context, there were a great number of features of the vehicle that were not supported
by the system. The system functionality for example, does not allow doors to be
locked. Thus it can be seen in Figure 2 that the proportion of S-M+ features compared
to S+M+ features was relatively high. In addition, most of the system functionality
was accounted for by features of the vehicle. From this characterization, certain
predictions about the patterns of subject performance of this system could be
expected. Firstly it could be expected that subjects would find the system easy to use
even if they had not encountered it before, not only because the metaphor seems
contextually relevant, but also because the ratio of S+M- features to S+M+ features is
quite low. For the same reason it could be expected that subjects would quickly
explore the system and successfully utilize the underlying functionality. However, it
would be predicted that over time subjects would become frustrated that features they
might expect to be present in the context of this system were not in fact supported as
the conceptual baggage of this particular vehicle-system pairing is quite high. Office
doors was therefore considered a rich and appropriate vehicle in the context of this
pairing.

5.2.2 Dogs

The second vehicle-system pairing adopted the dog as a vehicle for representing the
availability of a user. Specifically, an attentive dog corresponded to ‘available for
communication’, a digging dog to ‘busy but interruptible’ and finally a sleeping dog
to ‘not available for communication’. The characterization of the relationship between
this vehicle and the system is shown in Figure 3.

In this pairing, as in the previous case, there were also a great number of
potentially relevant features of the vehicle that were not supported by the system. For
312 J.L. Alty and R.P. Knott

example, dogs could not be trained to allow communications from specified people.
Thus it can be seen that the proportion of S-M+ features compared to S+M+ features
was relatively high. Again, there was considerable conceptual baggage. However, it
can be seen that very little of the system functionality was accounted for by features
of this vehicle. Such a characterization would lead to different predictions about the
patterns of user performance. Firstly it would be expected that initially subjects would
not find the system intuitive, not only because the metaphor seems less contextually
relevant, but also because the ratio of S+M- features to S+M+ features was
comparatively high. Dogs was therefore considered to be a rich but inappropriate
vehicle in the context of this pairing.

Features of vehicle

Features of system

Fig. 3. . Characterization of Dogs/System pairing

5.2.3 Traffic Lights

The third vehicle-system pairing adopted the traffic light as a vehicle for representing
the availability of a user. Specifically, a green light corresponded to ‘available for
communication’, an amber light to ‘busy but interruptible’ and finally a red light to
‘not available for communication’. The characterization of the relationship between
this vehicle and the system is shown in Figure 4.

In this pairing it can be seen that there were few potentially relevant features of the
vehicle that were not supported by the system. Thus the proportion of S-M+ features
compared to S+M+ features was relatively low. In this instance, there was
considerably less conceptual baggage than in the previous two situations. As was the
case with the dog, it can be seen that very little of the system functionality was
accounted for by features of the vehicle. This characterization would lead to further
predictions about the patterns of subject performance. Firstly it would be expected
that subjects would not initially find the system intuitive, not only because the
metaphor seems less contextually relevant, but also because the ratio of S+M- features
to S+M+ features would be quite high. For the same reason it would be expected that
even if subjects do explore the system and become familiar with the functionality, the
boundary between S+M- and S+M+ features will be apparent. Finally, owing to the
predicted lack of conceptual baggage it would be expected that the subjects would be
Metaphor and Human-Computer Interaction: A Model Based Approach 313

better able to distinguish between S-M+ features and S+M+ features associated with
this vehicle-system pairing. Traffic Lights was therefore considered to be a sparse
vehicle with limited appropriateness in the context of this pairing.

Features of vehicle

Features of system

Fig. 4. Characterization of Traffic Lights/System pairing

5.3 Experimental Results.

An experiment was designed and carried out to investigate the viability of the model
by utilizing the interface metaphors Office doors, Dogs and Traffic Lights. In order to
compare and contrast the effects of each of the vehicle-system pairings, three
independent groups of subjects undertook the same task that required usage of
identical underlying telecommunications services. Experimental data was collected
using a combination of verbal protocol, activity capture using video and questionnaire
techniques. This section will focus on the data generated by the questionnaire and will
outline some preliminary findings.

It is clear from the results that the intuitive nature of the Office Door interface
metaphor caused the subjects to make incorrect assumptions concerning the nature of
the underlying system functionality.

This would imply that subjects were confident that they were able to distinguish
functionality that was in the system but not covered by the vehicle, from functionality
that was covered by the vehicle, when in fact this was found not to be the case. The
subjects exhibited a misplaced sense of confidence about their answers due to the
richness and contextual relevance of this vehicle, which had the effect of masking the
boundary of the mapping between vehicle and system. It would seem therefore that
the Office doors vehicle, while providing a contextually rich set of resources, brought
a considerable amount of conceptual baggage to this particular vehicle-system
pairing. The effect of this baggage was exacerbated by the relative simplicity of the
underlying system functionality.
314 J.L. Alty and R.P. Knott

In the case of Dogs, subjects were better able to identify system functionality that was
not supported by the vehicle, than functionality that was suggested by the vehicle but
was not present in the system.

In contrast to the Office doors vehicle, it would seem that Dogs provided a rich set of
resources that were largely inappropriate in the context of this particular vehicle-
system pairing. This is indicated by the fact that subjects reported a need for a manual
explaining the representations of system state at the start of the task. Thus, whilst a
degree of conceptual baggage could be expected, the lack of contextual relevance
caused the effect to be reduced.

Finally in the case of Traffic Lights, subjects were better able to identify system
functionality that was supported by the vehicle than functionality that was suggested
by the vehicle but was not present in the system.

In addition this last result indicates that the vehicle maps only to a small part of the
system functionality causing subjects to be aware of the boundary between the two.
Subjects did not find this vehicle at all intuitive as is indicated by the fact that the
majority of them expressed a need for a manual to explain the representations of
system state. Once the subjects became aware of the mapping between vehicle and
system, actual understanding of the interactions was superior to that in either of the
other two vehicle-system pairings. The Traffic Lights vehicle then, did not provide a
rich set of resources. However the resources it did provide mapped tightly to a small
subset of the system functionality. Consequently the effect of this vehicle’s inherent
conceptual baggage was not as marked as in either of the other vehicle-system
pairings.

6 Extending the Model


Whilst the above model has proved useful in examining issues concerning the
relationships between tenor and vehicle, in real world systems there is an additional
important object which is related directly to both - the actual interface (which we will
call V), implemented and presented to the operator. Norman (1990) has referred, in an
informal way, to this correspondence between the designer's, user's and system model.
The model of Anderson et al. can easily be extended to include this aspect. As a
consequence, there is a complex relationship between S, M and V which is
diagrammatically shown in Figure 5.

The additional component increases the number of distinct areas to eight, namely,
S+M+V+, S+M+V-, S+M-V+, S+M-V-, S-M+V+, S-M+V-, S-M-V+, and S-M-V.
Where, of course, areas in the Anderson model each subsume two areas in our new
model (e.g. S+M+ = {S+M+V+} + {S+M+V-}).
Metaphor and Human-Computer Interaction: A Model Based Approach 315

It is important to understand how to discuss the model. The metaphor M should be


thought of as the way a target user would expect the system functionality S to be
manipulated given a metaphor M. The area V represents the ways in which the
designer actually chose to implement the metaphor. There are likely, of course, be
areas of conflict between the user’s model of the metaphor and that of the designer
causing possible dissonance in the user.

V
S-M+V+
S-M-V+ S-M+V-
M
S+M-V+ S+M+V+ S+M+V-

S+M-V-

Fig. 5. The Revised Model taking the Interface into Account

The meanings of the different areas are as follows:


S+M+V+. These are implementations of the metaphor in the interface which
successfully map onto system functionality. We call these Proper Implementations
of the metaphor. The system does what the user expects of the metaphor.

S-M-V-. These are operations and objects in the world which are of no interest to
us. We call these Irrelevant Mappings.

S+M-V+. These are operations which are implemented at the interface, do map to
system functionality, but either have no metaphoric interpretation or have an incorrect
metaphoric interpretation. We call these Metaphor Inconsistencies. This is an area of
dissonance. The designer has implemented a function not consistent with the
metaphor. A classic example of this is dragging the disk icon into the waste bin in the
MacIntosh interface. The metaphor would suggest that the disk will be deleted, or
trashed, whereas the functionality is ejection of the disk from the system.

S+M-V-. These are operations available in the system but not implemented in the
interface, nor suggested by the metaphor. We call these External Functions to this
Metaphor. These will usually be functions covered by other metaphors.
316 J.L. Alty and R.P. Knott

S+M+V-. These are operations which are available in the system, which the
metaphor suggests can be done, but which are not implemented. We call these Missed
Opportunities or Implementation Exclusions These are usually caused by a narrow
interpretation of the metaphor. For example, the “doors” metaphor used by Anderson
et al. provided the user with an indication of the availability of another party on a
communication link. An “open” door meant available, a “closed” door, not available,
and a door, which was “ajar”, meant possibly interruptible. The doors were merely
signaling intention. They were not connected in any way to access security. Thus, if
users closed their doors this did not prevent interruption (though it might have done in
a more embracing interpretation of the metaphor).

S-M+V+. These are operations which are consistent with the Metaphor and are
implemented but have no corresponding functionality. We call these Metaphoric
Surface Operations. These usually correspond to operations which have no system
interpretation but are useful and consistent with the metaphor. An example would be
tidying the desktop by dragging file icons.

S-M-V+. These are implementations in the interface which are neither in the
system functionality nor in the metaphor. They are like metaphoric housekeeping but
do not have a relevance to the metaphor. We call these Non-Metaphoric Surface
Operations. Examples of these type of operations would be changing the size or
color of an icon, or a font size (user tailoring of the objects in an interface).

S-M+V-. These are operations which are suggested by the metaphor but which are
neither in the system nor the interface. This is essentially what is meant by
“conceptual baggage”. The user is erroneously led to believe that something can be
done within the metaphor but which is not implemented and does not map onto any
system functionality. This is the area of Conceptual Baggage discussed earlier. A
good example of this is the use of the “clipboard” in the MacIntoch interface. In a
normal clipboard, a user can successively clip additional documents to the board
(hence the clip), The board acts as a last-in/first-out storage system. However in the
Macintosh implementation, a second clip overwrites the first.

7 The Usefulness of the Model Based Approach

Designers, when implementing metaphors at the interface should first examine the
objects and manipulations defined in the set V (i.e. what they have actually
implemented). These manipulations can be divided into four discrete subsets:

S+M+V+ These are “correct” implementations


S+M-V+ These are Metaphor implementation inconsistencies
S-M+V+ These are surface functions only
S-M-V+ Tailoring operations only
Metaphor and Human-Computer Interaction: A Model Based Approach 317

The first two are very important. Two other important areas are S-M+V- and S+M+V-
(Conceptual Baggage and Missed Opportunities, neither of these are in the
implementation). Conversion of S-M+V- to S+M+V+ is a powerful design tool.
Furthermore, adding interface support for S+M+V- manipulations can strengthen the
metaphor, make the interface easier to learn and remember. Such situations are often
the result of system extensions, or lack of thought about implementation.

8 A More Detailed Look at the Functionality of the Interface.

The underlying system (tenor) will exist in a number of possible unique states s1, s2,
...sn. System behavior is characterized by movements between these states. Today it is
common for a user to have several applications active at the same time, each having
its own state. The system state is the aggregation of these active application states and
the basic underlying system. If the user has applications A1 A2…Am active and
application Ai can have possible states ai,1, ai,2,…..ai,r then at any one time the total set
of possible system states is
æ ö
S = ∏ ç si * ∏ ∏ aj , k ÷ .
ç ÷
i
è j k ø

We can represent a typical element, ê, of S as ê=(si, a1,i1, a2,i2,… ar,ir). State changes
result from direct user actions, system responses to them, or external events. A user
may initiate a copy command, which moves the system through state changes which
write information to disk. The initial change was initiated by the user, but later state
changes result from system actions, some visible, some not.

Although a user may control a number of applications at the same time, at any
particular moment the user will only concentrate on one of them, so for the rest of our
discussion we will assume there is a current, single application, or set of system
functions Ai.

When a designer uses a metaphor to implement functionality in the interface there


will be transformations which can be classified into three distinct subsets.

The set of transformations which initiate state changes in the underlying system.
A set of transformations at the interface which cause changes in the system, and
the set of transformations induced or predicted in the user’s mental model of
the metaphor.
318 J.L. Alty and R.P. Knott

8.1 The Functionality of the System

The system designer has to specify some functionality for the underlying system and
for the user interface. We define a system function f, which acts upon a non-empty
set of states Si ⊂ S and produces a non-empty set of state Sj ⊂ S (illustrated in Figure
6).
f: Si → Sj
Both the subject and object of this function must be subsets of S, rather than elements,
since the same function will be applicable not only in many different states of the
chosen application, but also for almost all possible states of the other active
applications. An example of a function is that of opening a new file. This can be done
at many states in the system.

S
f

S
j
S
i

Fig. 6. A System Function f.

The functionality of the system is the set F of all system functions f. F represents the
functionality of the underlying system for which we wish to build an interface and is
equivalent to S in Figure 5.

8.2 A Set of Transformations at the Interface

At any time, the set of objects representing the vehicle (or interface) for Ai is in one of
a number of object states O ={ o1, o2, ..., ok}. Each object state ok represents a
configuration of the actual interface objects which the user can manipulate. Each
object state can be transformed into another object state through manipulation of the
real objects at the interface. This corresponds to the area V in figure 5. There will be
many possible transformations between the elements of O. Each oi affords some
actions at the interface, which, if initiated, would move the vehicle into some new
state oj.

Consequently, we define a interface implementation function g, which acts upon a


object state oi and produces a final state oj.
g: oi →oj.
Metaphor and Human-Computer Interaction: A Model Based Approach 319

The functionality of the interface implementation is the set G of all interface


functions g.

8.3 The Set of Transformations in the User’s Mental Model

Finally we describe a metaphoric model in the user's mind. The metaphorical model is
similar to the set of interface objects and manipulations in the implementation. At any
time, the user's mind (in relation to the computer application) is in one of a number of
mental states U ={ u1, u2 ,u3 ,…un}. Each object state ui represents a configuration of
objects in the user's mental model. Each mental state can be transformed into another
mental state through manipulations in the mental model (corresponding to M in
Figure 5). Each ui affords some mental actions which, if initiated, moves the mental
model into a new state uj.

A metaphor function u acts upon a mental model state ui and produces a final state
uj.
u: ui →uj.
The functionality of the metaphor is the set U of all metaphor functions u.

There may be a difference between the designer's view of a metaphor and the user's
view. Indeed this can be a cause of difficulty in interface design. This is best solved
by the designer carefully checking the metaphors used against the target user
population. Thus we assume that the designer's metaphor and the user population's
metaphor will agree, and this is the metaphor described above.

9 The Mapping between the Metaphor and the Functionality

Bringing three functional mappings together we obtain a complete mapping as shown


in Figure 7.

There is a mapping, φ which represents the relationship between the metaphor objects
in the mental model and the interface objects There is also an associated mapping υ
from the elements of M to the set V.

Clearly we require that if u∈ dom(υ) and u maps ui to uj then if φ(ui) =oi and φ( uj) = oj
the υ(u) maps oi to oj. The implementation reflects the user’s expectations for all such
mappings.

Similarly, there is a mapping θ from the set of interface object states to the set S such
that if oi is some object state, then it must correspond to some state ai, k of the
application Ai. Let êi be the element of S corresponding to this state. Then we define
θ(oi) = êi.
320 J.L. Alty and R.P. Knott

Also there is a mapping ω: G → F such that if d is a function in G (d ∈ G) and d maps


oi to oj then ω(d) maps θ(oi) to θ(oj) in S. Moreover, these relationships must
combine appropriately. That is, all diagrams in figure must be commutative. The
interface designer has successfully mapped the user's expectations onto the system
functionality via the interface in a consistent way

u
ui uj Set of Metaphor
Objects
φ (Mental Models)
φ
d=υ(u) Set of Interface
oi oj
Objects

θ
θ

f=ω(d)
θ(oi)
θ(oj) Set of System
Functions
Si
Sj

Fig. 7. The Complete Model

Referring to Figure 7, there will be elements of M, which are not in the domain of
υ and they form the set M+V-. The image of υ is the set M+V+. The elements in V
but not in dom(ω) is the set S-V+ while the set Im(ω) is the set S+V+.

10 Conclusions

We have shown two different set-based models relating functionality, interface


implementation and metaphor. Our extension of Anderson's model has proved itself
immediately useful in assisting in the interface design process. By examining
S+M+V+, S+M-V+, S-M+V- and S+M+V- the interface designer can critically
examine implementations for current weaknesses and possible improvements.
Metaphor and Human-Computer Interaction: A Model Based Approach 321

Converting S-M+V- and S+M+V- elements to S+M+V+ can be a powerful driver for
extending interface functionality in novel ways.

A second, more detailed functional model is an attempt to describe what is going on


in finer detail, and allows the designer to examine fine detail in differences between
implementations. This representation needs more work, but we hope that we will
eventually be able to use it to examine ill-defined concepts such as “mixed
metaphors”, “hierarchical metaphors” and “families of metaphors”.

References

1. Anderson, B. (1994). Cognitive Anthropology and user-centered system design: How to


enter an office. LUTCHI Internal Report No. 94/M/LUTCHI/0162, Loughborough
University, UK.
2. Anderson, B., Smyth, M., Knott, R.P., Bergan, M., Bergan, J., and Alty, J.L., (1994),
Minimising conceptual baggage: Making choices about metaphor, In G Cockton, S.
Draper and G. Weir (Eds.), People and Computers IX, Proc. of HCI’94, pp 179 - 194,
Cambridge University Press.
3. Auden, W.H., Collected Poems, Twelve Songs, IX, Faber and Faber Ltd.
4. Carroll, J. M. & Mack, R. L. (1985). Metaphor, Computing Systems and Active Learning,
Int. Jour. of Man Machine Studies , Vol. 22, No. 1, 39-57.
5. Carroll, J.M., Mack, R.L, Kellogg, W.A., (1988). Interface metaphors and user interface
design. In Helander, M (ed.) Handbook of Human-Computer Interaction, North-Holland,
Amsterdam.
6. Gaver, W., Moran, T., MacLean, A., Lovstrand, L., Dourish, P., Carter, K., & Buxton, W.
(1992). Realising a Video Environment: EuroPARC’s RAVE System. Proc. ACM
Conference on Human Factors in Computing Systems CHI ‘92, Monterey, Ca., May 1992.
7. Kuhn, W., Jackson, J.P., and Frank, A.U., (1991), Specifying metaphors algebraically,
SIGCHI Bulletin, Vol. 32, No. 1, pp 58 - 60.
8. Lakoff, G. & Johnson, M. (1980). Metaphors We Live By. University of Chicago Press,
Chicago.
9. MITS (1994a). Deliverable D6, Review of Demonstrators. RACE deliverable, No
R2094/NOKIA/D6/DS//L/006/b1.
10. Norman, D. A. The Design of Everyday Things, Doubleday, New York, 1990.
11. Richards, I.A. (1936). The Philosophy of Rhetoric. Oxford University Press.
12. Shneiderman, B., (1978), “Direct Manipulation: A Step beyond Programming
Languages”, IEEE Computer, August 1983, pp 57 - 62.
Empirical Modelling
and the
Foundations of Artificial Intelligence

Meurig Beynon

Department of Computer Science


University of Warwick, Coventry CV4 7AL, UK

Abstract. This paper proposes Empirical Modelling (EM) as a possi-


ble foundation for AI research outside the logicist framework. EM offers
principles for constructing physical models, typically computer-based,
by construing experience in terms of three fundamental concepts: ob-
servables, dependency and agency. EM is discussed in the context of
critiques of logicism drawn from a variety of sources, with particular
reference to the five foundational issues raised by Kirsh in his paper
Foundations of AI: the Big Issues (AI, 47:3-30, 1991), William James’s
Essays on Radical Empiricism (Bison Books, 1996), and the controversy
surrounding formal definitions for primitive concepts such as metaphor
and agent that are recognised as fundamental for AI. EM principles are
motivated and illustrated with reference to a historic railway accident
that occurred at the Clayton Tunnel in 1861.
The principal thesis of the paper is that logicist and non-logicist ap-
proaches to AI presume radically different ontologies. Specifically, EM
points to a fundamental framework for AI in which experimentally guided
construction of physical artefacts is the primary mode of knowledge rep-
resentation. In this context, propositional knowledge is associated with
phenomena that are perceived as circumscribed and reliable from an
objective ‘third-person’ perspective. The essential need to incorporate
subjective ‘first-person’ elements in an account of AI, and the role that
commitment plays in attaching an objective meaning to phenomena, are
seen to preclude a hybrid approach to AI in the conventional sense.

1 Introduction

More than ten years have elapsed since McDermott’s celebrated renunciation of
logicism in AI first appeared [59]. The status of neat and scruffy approaches to
AI remains controversial, and there has been limited progress towards the two
complementary goals that might make the most decisive impact on the argument:
Goal L (“The Logicist Goal”): Develop sophisticated symbolic models with
powerful applications.
Goal NL (“The Non-Logicist Goal”): Identify general principles for application
development outside the logicist framework.

C. Nehaniv (Ed.): Computation for Metaphors, Analogy, and Agents, LNCS 1562, pp. 322–365, 1999.

c Springer-Verlag Berlin Heidelberg 1999
Empirical Modelling and the Foundations of Artificial Intelligence 323

By way of illustration, Goal L would be served if the aspirations of Lenat and


Feigenbaum’s experiment in knowledge representation [57] were to be realised,
and Goal NL by the discovery of general principles for constructing reactive
systems of agents sufficient to account (in particular) for the achievements of
Rodney Brooks and his collaborators at MIT [32,31].
A cynic might argue that neither of these goals has significant academic or
commercial implications. Whether or not logicism delivers significant practical
applications, the logicist view of AI is firmly entrenched in the curriculum of
computer science, and will remain so whilst there is perceived to be no aca-
demically respectable alternative to mathematical foundations based on logic
and rules. And whether or not there are any fundamental principles to account
for the achievements of scruffy AI, those techniques that are most effective in
practice will attract commercial interest and exploitation.
More considered reflection suggests that progress towards one or other of
the goals set out above potentially has serious implications in both academic
and commercial terms. As Brooks argues in [31], AI has been very influential in
shaping the development of computational paradigms and knowledge represen-
tation techniques, and its agenda is increasingly relevant to modern computing
applications. The limitations of traditional foundations for computer science are
becoming topical in many key areas. Recent contributions on this theme include,
for instance, West’s discussion of hermeneutic computing [76], Wegner’s propos-
als for extensions to the Turing model of computation [75], and the analysis
of information systems development by Hirschheim al. [52]. Related concerns
include the future for database concepts beyond relational and object-oriented
frameworks, and Fred Brooks’s reiteration of his call for principles with concep-
tual integrity to address the problems of software development [30]. A logicist
framework that fails to engage with the agenda of modern practical comput-
ing calls into question the integrity of AI and computer science as academic
disciplines. Computing practice that has no coherent guiding principles is un-
likely to deliver well-engineered products or to exploit the full potential of new
technology.
The aim of this paper is to consider the potential of Empirical Modelling
(EM), developed by the author and his collaborators at Warwick over several
years, as a broader foundation for AI and Computer Science. (See our website:
http://www.dcs.warwick.ac.uk/pub/research/modelling for further details of the
Empirical Modelling Project.) By way of clarification, it should be noted that the
term ‘agent’ has a distinctive meaning in EM that has been developed largely
independently of the associations that ‘agent-oriented’ now has in Computer
Science. By way of disclaimer, this paper aims to expose a fundamental difference
in preoccupation between the logicist and non-logicist perspectives, and should
be intepreted as questioning the potential significance rather than the intrinsic
merit and interest of logicist researches. For instance, automatic techniques for
truth maintenance and belief revision are a legitimate way to ameliorate the
effects of adopting a closed-world perspective, but this does not address the
fundamental problem raised in section 2.3 below.
324 Meurig Beynon

The paper is in three main sections. Section 2 contrasts logicist and non-
logicist perspectives on intelligence with reference to a typical IQ puzzle and to
the analysis of a historic railway accident. Section 3 introduces EM principles
and techniques, and illustrates their potential significance for railway accident
investigation. Section 4 discusses the new foundational perspective on AI that
EM affords with particular reference to the work of William James on Radical
Empiricism, of David Gooding on the empirical roots of science, of Mark Turner
on the roots of language and of Rodney Brooks on robotics.

2 Perspectives on Intelligence

In [55], Kirsh discusses the foundations of AI with reference to five issues:

– Core AI is the study of conceptualization and should begin with knowledge-


level theories.
– Cognition can be studied as a disembodied process without solving the
grounding problem.
– Cognition can be effectively described in propositional terms.
– Cognition can be studied separately from learning.
– A uniform architecture underlies virtually all cognition.

Kirsh identifies these as assumptions typically associated with a logicist view-


point. EM promotes an alternative viewpoint on intelligence. In particular, it
takes a different stance on each of these five foundational issues.
The essential distinction concerns the way in which a system is construed to
operate. As discussed in Kirsh [55], the logicist aims at a mathematical structure
of objects, functions and relations close enough to the real world for a system to
achieve its purposes, and construes the system as “acting as if it were inferring”.
In EM, in contrast, a construal makes more explicit reference to human agency,
can have a more subjective character, and be more loosely concerned with specific
goals. A central idea of EM is that physical artefacts are needed to communicate
such construals, but its general import can be expressed as: “so far as I/we can
judge from previous experience, and subject to exceptional behaviour for which
there is no pre-conceived explanation, the system is acting as if it were composed
of the following family of autonomous agents, each responding to the following
observables, and exercising the following privileges to change their values in the
context of the following dependencies between observables”.
A construal in EM has a number of key features:

– It is empirically established: it is informed by past experience and is subject


to modification in the light of future experience.
– It is experientially mediated: the interaction in which each agent engages is
represented metaphorically via a physical artefact, typically computer-based.
– The choice of agents is pragmatic: what is deemed to be an agent may be
shaped by the context for our investigation of the system.
Empirical Modelling and the Foundations of Artificial Intelligence 325

– It only accounts for changes of state in the system to a limited degree: the
future states of the system are not circumscribed, there may be singular
states in which conflicting values are attributed to observables, and there
are no guarantees of reliable response or progress.

Construals in logicism and in EM are associated with radically different on-


tologies and epistemologies. This ontological distinction is highlighted when, as
is sometimes appropriate, EM is used to develop models of a logicist character.
The critical point in this development is the point at which the developer shifts
perspective from “my experience so far suggests that this pattern of interaction
between agents occurs reliably and this appears to conform to the experience of
others also” to “a particular behavioural pattern within the system is described
objectively and precisely by the following logical model whose construction is
predicated upon the assumption that this pattern occurs reliably”.
Two examples of interpretations of intelligence will be used to highlight this
difference in perspective, and to motivate the more detailed discussion and anal-
ysis that follows.

2.1 A Classic Intelligence Test

The problem posed in Box 1 illustrates one popular view of intelligence that has
much in common with the logicist perspective as portrayed in [55]. It is drawn
from a publication by Mensa, a society whose membership comprises people with
a high “intelligence quotient”.

The Captain of the darts team needs 72 to win. Before throwing a dart, he remarks
that (coincidentally) 72 is the product of the ages of his three daughters. After
throwing one dart, he remarks that (coincidentally) the score for the dart he has
just thrown is the sum of the ages of his daughters. Fred, his opponent, observes at
this point that he does not know the ages of the Captain’s daughters. “I’ll give you
a clue”, says the Captain. My eldest daughter is called Vanessa. “I see”, says Fred.
“Now I know their ages.”

Table 1. A Mensa Intelligence Test

The solution to this problem centres on the fact that factorisations of 72 into 3
factors are disambiguated by the sum of factors but for the pair of factorisations:

72 = 3 * 3 * 8 = 6 * 6 * 2.

By observing that he does not know the ages of the daughters, Fred discloses
to the solver that one or other of these factorisations of 72 is the required one.
(Note that, to make his observation, Fred does not need to know—as we as
solvers do—that no other pair of factorisations of 72 into three yields the same
326 Meurig Beynon

sum, since he knows that the Captain has scored 14.) When he knows there is
an eldest daughter, he knows that the ages of the daughters are 3, 3 and 8.
This puzzle illustrates several ingredients of logicism discussed in [55]. The
problem is contrived around a mathematical model in the poser’s mind. The
casual and artificial way in which the abstract problem is placed in a real-world
context echoes the modularity of ‘inventing conceptualizations’ and ‘grounding
concepts’ presumed in logicism [55]. Embodiment plays a contrived role in the
problem. The issue of psychological realism is not addressed. It is assumed that
Fred exercises instantaneous—or at least very rapid—inference skills on-line,
whilst “knowing the ages of the daughters” is an abstract concept, unconnected
with being able to associate an age with a daughter who might turn up at the
darts match. Nor is indexicality respected. In order to draw any inferences, a
single Mensa-like persona must be imposed on the agents in the puzzle (the
Captain and Fred) and on the poser and solver also.
The remarkable thing about problems of this nature is that the IQ-literate
reader adopts the conventions of the problem poser so readily. Why should we
regard problem-solving of this nature as intelligent? Perhaps because it involves
being able to see through the contrived presentation to make ingenious abstract
inferences, discounting the commonsense obstacles to deduction (cf. Naur’s anal-
ysis of logical deduction in Sherlock Holmes stories [63]: “truth and logical infer-
ence in human affairs is a matter of the way in which these affairs are described”).
To some degree, facility in making abstractions is a quality of intelligence.
Some commonsense facts about the world must be taken for granted to make
sense of the problem. For example, a game of darts takes place on such a timescale
that the ages of the children are fixed for its duration. 14 is a legitimate score
for one dart. Yet the puzzle is posed so artificially that it is almost a parody of
intelligence.
A complementary mental skill is far less well-represented in logicism. This is
the ability to transpose the problem imaginatively so as to disclose the implicit
presumptions about the relationship between the abstract and the real-world el-
ements. Imagination of this kind can subvert the intelligence test. A suspension
of disbelief is needed in supposing that the Captain and Fred are mathemati-
cally adept and sober enough to factorise 72 in their heads whilst simultaneously
taking turns at darts, or that Fred determines the ages of the children because
of an inference rather than because he remembers Vanessa’s age. In some con-
texts, especially where creativity or design are concerned, such questioning of
the premises of a problem is essential, but it is out-of-place in the world of Mensa
problems. The intended world model is closed and preconceived.
The Mensa problem is an example of the kind of challenge that might be
addressed by an intelligence inference engine. It might not be easy to meet, as
it involves some meta-level reasoning. This is illustrated by the fact that if Fred
said he knew the ages of the daughters before he was told the name of the eldest,
no inference could be drawn.
Though logicism is not primarily concerned with artificial tests of intelligence
of this nature, it can be seen as construing intelligence in similar terms. It involves
Empirical Modelling and the Foundations of Artificial Intelligence 327

establishing a formal relationship between the world and a logical model similar
to that between the mathematical model and the darts match scenario, such
that intelligent behaviour can be viewed as if it were inference of the kind used
in solving the intelligence test.
Empirical Modelling techniques address the broader view of intelligence that
encompasses creativity and imagination. They are not particularly well-suited for
exercises in inference masquerading as commonsense problems, but have direct
relevance to real-life scenarios in which abstract explanations are sought.

2.2 The Clayton Tunnel Railway Accident

Fig. 1. Signalman Killick’s view of the Clayton Tunnel

The following discussion refers to a 19th century railway accident [67] that is
described in Box 2 and illustrated in Figure 1. In analysing the accident (e.g. as
in conducting an accident inquiry), the significance of embodiment is particularly
clear. To assess the behaviour of the human agents, it is essential to take account
of psychological and experiential matters. How big was the red flag? How was
it displayed? Did the drivers and signalman have normal sight? How far away
could oncoming trains be seen? These are perceptual matters, which taken in
conjunction with knowledge about how fast trains travelled and how closely they
followed each other, help us to gauge the performance of human agents. There
are also conceptual matters, to be considered in the light of the training given
328 Meurig Beynon

The Clayton Tunnel Disaster August 25th 1861


Three heavy trains leave Brighton for London Victoria on a fine Sunday morning.
They are all scheduled to pass through the Clayton Tunnel—the first railway tunnel
to be protected by a telegraph protocol designed to prevent two trains being in the
tunnel at once. Elsewhere, safe operation is to be guaranteed by a time interval
system, whereby consecutive trains run at least 5 minutes apart. On this occasion,
the time intervals between the three trains on their departure from Brighton are 3
and 4 minutes.
There is a signal box at each end of the tunnel. The North Box is operated by Brown
and the South by Killick. K has been working for 24 hours continuously. In his cabin,
he has a clock, an alarm bell, a single needle telegraph and a handwheel with which to
operate a signal 350 yards down the line. He also has red (stop) and white (go) flags
for use in emergency. The telegraph has a dial with three indications: NEUTRAL,
OCCUPIED and CLEAR.
When K sends a train into the tunnel, he sends an OCCUPIED signal to B. Before
he sends another train, he sends an IS LINE CLEAR? request to B, to which B can
respond CLEAR when the next train has emerged from the North end of the tunnel.
The dial at one end of the telegraph only displays OCCUPIED or CLEAR when the
appropriate key is being pressed at the other—it otherwise displays NEUTRAL.
The distant signal is to be interpreted by a train driver either as all clear or as
proceed with caution. The signal is designed to return to proceed with caution as a
train passes it, but if this automatic mechanism fails, it rings the alarm in K’s cabin.
The accident
When train 1 passed K and entered the tunnel the automatic signal failed to work.
The alarm rang in K’s cabin. K first sent an OCCUPIED message to B, but then
found that train 2 had passed the defective signal before he managed to reset it. K
picked up the red flag and displayed it to Scott, the driver of train 2, just as his
engine was entering the tunnel. He again sent an OCCUPIED signal to B.
K did not know whether train 1 was still in the tunnel. Nor did he know whether S
had seen his red flag. He sent an IS LINE CLEAR? signal to B. At that moment,
B saw train 1 emerge from the tunnel, and responded CLEAR. Train 3 was now
proceeding with caution towards the tunnel, and K signalled all clear to the driver
with his white flag.
But S had seen the red flag. He stopped in the tunnel and cautiously reversed his
train to find out what was wrong from K.
Train 3 ran into the rear of Train 2 after travelling 250 yards into the tunnel, pro-
pelling Train 2 forwards for 50 yards. The chimney of the engine of Train 3 hit
the roof of the tunnel 24 feet above. In all 23 passengers were killed and 176 were
seriously injured.

Table 2. An Account of the Clayton Tunnel Railway Accident

to drivers and signalmen. It is reasonable to expect that a responsible driver can


interpret a red flag as a signal for danger, and make this inference at the speed
of thought (cf. the implausibly rapid inferences that Fred must make in his darts
Empirical Modelling and the Foundations of Artificial Intelligence 329

match). The process of identifying and actively checking the state of the signal
also has a conceptual component.
Issues of this nature have to be viewed with reference to the particular en-
vironment, such as the weather conditions. In this context, whether the speed
of the train was “too fast” is a matter of pragmatics rather than mathemat-
ics. The need to think in egocentric indexical terms is self-evident. None of the
human agents has a comprehensive view of the system. Without at least being
able to acquire some representative experience of what signalman Killick’s task
involved, it is hard to make a fair judgement about his degree of responsibility
for the accident, and to assess the relevance of his having worked for 24 hours
at a stretch.
In the railway accident scenario, unlike the Mensa problem, the interaction
between conceptual worlds and the real world is very subtle. Ironically, the prac-
tical measures designed to protect against the dangers of a breakdown in the
tunnel also generated the conceptual framework that led to the disaster. Driver
Scott’s decision to reverse his train arose from the fiction that a train may have
broken down in the tunnel ahead. Had he had another misconception, such as
that Killick had waved a white flag, there would have been no accident, and
normal operation would shortly have been resumed. In the real world, there are
degrees of physical interaction between trains that fall short of the catastrophe
that actually occurred, some of which might even have entailed no disruption to
the railway system. It is hard to envisage how logicist models could address the
range of ways in which what is informally viewed as inconsistency can be mani-
fest. Drastic colocation of trains is a particularly striking example of embodied
inconsistency. After this event, there is, in some sense, no longer a model.

2.3 A Logicist Model of the Railway Accident?


Kirsh [55] suggests that a theory of AI is concerned with specifying the knowledge
that underpins a particular cognitive skill. On this basis, accounting for the
Clayton Tunnel Disaster is an exercise of significant intrinsic importance that
can be seen as a challenge for a theory of AI. This exercise involves understanding
the contributions made by all the human agents in the accident scenario. As part
of this process, it would be necessary to place the accident in a generic context,
so as to see the actual events in relation to normal operation, and to explore the
probable outcomes had the circumstances been different. For instance, there are
closely related scenarios in which no accident occurs, or the crash is less violent,
and there are myriad factors that could have had a significant influence, such as
the reaction times of drivers and signalmen, the effectiveness of braking on the
trains, and the geography of the tunnel.
If our objective is to understand the Clayton Tunnel Railway accident in these
terms, there appear to be significant problems in constructing a logicist model.
To observe logicist principles, it seems that the goal of understanding the acci-
dent should lead to the identification of a closed-world model that encompasses
the accident scenario and is adequate for all purposes of explanation. Modern
railway practice demonstrates that—at least in principle — a closed-world model
330 Meurig Beynon

can be effective in this role, accounting for the highly complex interactions in
the railway system within a robust generic conceptual framework.
There are three challenges in particular that are met in conceiving railway
system operation in closed-world terms. They are concerned with obtaining guar-
antees, so far as this is possible, on the following points:

– All human activities are framed around objective knowledge and skills.
– All significant operations are based on highly reliable assumptions.
– Practice does not depend on the specific features of particular environments.

In the analysis of the Clayton Tunnel accident, it is hard to see how to


construct a logicist model to meet these requirements.

The need to deal with first person concerns. One possible subgoal for an
investigator might be reconstructing the mechanics of the accident. A mathe-
matical model could be developed in terms of such factors as the mass, position,
velocity, acceleration, braking efficiency of the trains and friction and gradient
in the environment. In this model, agency would manifest itself as changes in
acceleration due to manipulation of the throttle and brake.
An alternative model might be aimed at reconstructing the sequence of sig-
nificant events. This could be built around an analysis of the protocols for inter-
action between the signalmen and the drivers, e.g. using a mathematical model
for concurrency such as process algebra or calculus. Such a model would register
the communications between the human agents as abstract events, and enable
their possible patterns of synchronisation to be analysed.
From each perspective, the result is a self-contained closed-world model of
the accident. That is to say, both models can be developed to the point where,
relative to their subgoal, there is apparently no need to make further reference to
the physical context in which the accident took place. In accounting for the crash,
the mechanical model can give insight into the influence of technological factors
and perhaps supply objective information about the train drivers’ actions. The
protocol model can likewise clarify what communication took place, and help to
assess its significance.
In practice, both perspectives are too deficient in psychological terms to be
helpful to an accident inquiry in making judgements about responsibility. Both
models create objective “third person” accounts that help to clarify exactly what
an external observer might have seen, and put this observation in the context of
other possible scenarios. Neither gives us insight into how the experiences of the
human agents and the physical embodiments of mechanical agents contributed
to the accident.
To construct a logicist model that is adequate for understanding the railway
accident would certainly require more sophisticated mathematics. What form
should such a model take? It would have to model agents so as to take sufficient
account of mechanics and how communication between agents is synchronised. It
would also have to characterise the interactions between agents in propositional
terms in a way that took sufficient account of psychological factors.
Empirical Modelling and the Foundations of Artificial Intelligence 331

The need to deal with provisional and unreliable insight. Understanding


the railway accident involves construing the disaster in the context of day-to-day
operation of the railway system. This process of construal has no counterpart in
the context of the IQ test above. The Mensa problem is posed with a particular
construal in mind: we have to assume that the Captain and Fred act on the basis
of inference, oblivious to other commonsense factors, such as personal knowledge
Fred might have of the Captain’s family. In contrast, the construal of the accident
admits no ‘right answer’. In this respect, it is far more representative of the
challenge to knowledge representation involved in intelligent system design.
In considering how the accident is construed, it is important to recognise
the scientific and cultural prejudices that can operate. It may appear obvious
that we know what agencies and observables need to be taken into account. It
is presumably irrelevant whether Killick had a moustache, or wore a red shirt,
or that it was Sunday rather than Saturday morning. From an empiricist stand-
point, such concerns are not to be absolutely dismissed. Had Killick worn a red
shirt, it might have prevented the driver from spotting the red flag in time to
give an acknowledgement. There were doubtless many people who speculated
on whether the disaster was an act of divine retribution on those who sought
pleasure or conducted work on the Sabbath.
It may be argued that the operation of the railway system was predicated
on a particular construal, but this in itself is no justification for adopting the
same construal in analysing the accident. No doubt the conduct of people during
the Great Plague of London was guided to some degree by a construal of how
disease spread and could be avoided. Contemporary medical knowledge leads us
to analyse the events retrospectively from an entirely different perspective.
It may be tempting to suppose that modern science can be made adequate to
the task of construing the entire context of the accident in closed-world terms.
Extreme forms of logicism seem to combine a narrow reductionism with a blind
faith in the power of propositions to frame the world. It is clear that ergonomic
issues to do with human interaction played a part in the Clayton Tunnel Disaster,
but our insight into such issues is even now far from being a science fit for the
logicist. Nor does the success of modern railways require any such insight: it
relies on confining railway system operation to territories of knowledge that are
empirically safe.

The need to deal with the particular context. In considering the accident
scenario, it is often necessary to speculate on the precise characteristics of the
environment for the accident. Sufficient detail has been retained in the account
of the accident given above to convey the impression of the richness of the
context surrounding the crash. The trains apparently leave Brighton just a few
minutes apart; Killick is fatigued; the trains are heavy; it is a Sunday morning.
These details may or may not be relevant. Whatever details we include, it seems
that language cannot do justice to what we need to know when we probe the
circumstances of the accident.
332 Meurig Beynon

Did Killick have to leave the cabin in order to wave the flag? What was the
exact distance between the signal and the cabin, and how much longer would it
have to have been for Scott to see the flag? Was Scott supposed to acknowledge
seeing the flag? Did his train have a whistle? All these issues require reference
to the real situation, and are concerned with the specific characteristics of the
particular time and place.
The explanation of particular events can also invoke observables in ways that
cannot be preconceived. In the particular scenario of the Clayton Tunnel crash,
the signalman needed to know whether the driver—several hundred yards away
in the tunnel—had seen the red flag. Perhaps other accident scenarios in which
there was no violation of agreed practice would throw up different examples of
rogue observables that were never considered by the designers of the protocols
or the pioneers of railway and communications technology.
From the above discussion, modelling in the logicist tradition is seen to be
intimately connected with identifying contexts in the world that are stable with
respect to preconceived patterns of interaction. Validating that such a context
has been identified is a pragmatic and empirical matter about which no absolute
guarantees can be given. The observables that feature in these worlds, though
not necessarily statically predetemined, have to come and go according to pre-
conceived patterns. The agents that operate in these worlds must perform their
actions in a manner that respects preconceived integrity constraints. These are
the characterisations of closed worlds and circumscribed agency.

3 Empirical Modelling
The preceding discussion argues the need for an alternative to logicism as a
framework for modelling. Accident investigation demands something other than
closed-world modelling. In particular, it suggests a specific agenda: modelling
from a first-person perspective, with partial and provisional knowledge, and
with reference to a specific context. To respect the need to consult the world
in the process of model-building, the modelling process should also be situated:
it should take place in or as if in the context of the situation to which it refers.
Empirical Modelling, here introduced and illustrated with reference to the Clay-
ton Tunnel Accident scenario, has been conceived with this agenda in mind.

3.1 Orientation
The context for the Empirical Modelling Project is supplied by what Brödner [28]
has identified as a conflict between two engineering cultures:
One position, . . . the “closed world” paradigm, suggests that all real-
world phenomena, the properties and relations of its objects, can ulti-
mately, and at least in principle, be transformed by human cognition
into objectified, explicitly stated, propositional knowledge.
The counterposition, . . . the “open development” paradigm . . .
contests the completeness of this knowledge. In contrast, it assumes the
Empirical Modelling and the Foundations of Artificial Intelligence 333

primary existence of practical experience, a body of tacit knowledge


grown with a person’s acting in the world. This can be transformed into
explicit theoretical knowledge under specific circumstances and to a prin-
cipally limited extent only . . . Human interaction with the environment,
thus, unfolds a dialectic of form and process through which practical
experience is partly formalized and objectified as language, tools or ma-
chines (i.e. form) the use of which, in turn, produces new experience (i.e.
process) as basis for further objectification.
This conflict has both abstract and practical aspects and significance.
Brödner attributes “huge productivity problems and failures of AI attempts”
to the dominant influence of the closed world paradigm, and adds that “what
appears to be a philosophical struggle turns out to be of the highest practical
relevance”. The conflict is not confined to the “neat vs. scruffy” debate in AI.
It is also manifest in Computer Science as a tension between principles and
pragmatism that is a source of several unresolved controversies: declarative vs.
procedural programming; relational vs. object-oriented databases; formal vs. in-
formal methods of software development.
Three key problems, drawn from different areas of computing, have had a
seminal influence on our research:

Is there a universal framework for multi-paradigm programming?


Birtwistle al. (1967) [26] introduced the object abstraction to represent program-
ming as a form of modelling. Backus (1979) [6] argued the need for a history
sensitive mode of programming with the virtues of declarative programming.
Neither programme has generated an entirely satisfactory programming style
and their objectives do not seem to be convergent. The modern agenda for com-
puting has to address paradigms for more general applications such as parallel
programming, end-user programming and visual programming. This concern is
reflected in trends towards varieties of agent-oriented programming [77,61] and
the use of spreadsheets to aid interaction and interpretation in environments for
end-user programming [62].

What principles are needed to address complex systems engineering?


Brooks [29] expresses scepticism about most of the current techniques to sup-
port the development of large software systems and contends that we have yet
to understand the essence of the problem. Formal methods (such as Chandy and
Misra [36]) are effective for closely circumscribed problems. Cohen and Stew-
art [38] identify fundamental limitations that are encountered in rigorous math-
ematical modelling for complex systems. Fashionable pragmatic approaches to
software development in an object-oriented idiom (such as Rumbaugh [68]) at
some point have to make an uneasy transition between objects as real-world
representations and as programming abstractions. Harel’s response to Brooks’s
challenge [50,49] invokes the computer both as machine (in commending for-
mal operational semantics) and as instrument (in advocating the use of visual
formalisms).
334 Meurig Beynon

What paradigm for data modelling can support modern applications?


Kent [54] devotes an entire book to the problems that beset the classical database
models, and tentatively concludes that there is probably no adequate formal
modelling system for representing information on computers. Codd’s relational
model [37] offers a formal approach that has had a profound impact on com-
mercial database systems. New requirements (knowledge-based systems for de-
sign and Integrated Project Support Environments [33], databases for graphics
and multi-media, interfaces via direct manipulation, spreadsheets or constraint
techniques [40]) have exposed the limitations of the pure relational model. The
conflict of cultures pervades the current controversy [5,39,72] concerning the rel-
ative merits of relational and object-oriented database models. This controversy
highlights the need for alternative methods of modelling that associate form and
content in new ways.
In each of these problem areas, there is controversy surrounding formal and
pragmatic approaches. Our thesis is that fundamental progress in solving these
problems can be made only by resolving Brödner’s conflict of cultures, developing
fundamental principles to complement the closed-world culture. This motivates
a radical change in perspective on computer-based modelling.
The Empirical Modelling Project combines abstract investigations and schol-
arship with practical development of software tools and case studies. EM is a
proposal for modelling in an open development paradigm that has emerged from
our extensive investigation of principles and case-studies directed at solving the
three key problems. The choice of the epithet empirical is suggested by the fact
that features of a model are typically determined incrementally in the manner
of experimental science, and that circumscribed closed-world models can only
be derived through explicit acts of commitment on the part of the modeller.
Over the last ten years, well over a hundred students have had experience of
EM, of whom many have contributed to the research through project work at
both undergraduate and postgraduate level. The scope of EM is indicated by
the diversity of the notations and software tools we have developed, by the wide
range of case studies in modelling that have been addressed and by the many
areas of application represented. It is this empirical evidence that informs the
discussions which follow.

3.2 Empirical Modelling Principles


The main principles and tools of EM will be discussed and sketchily illustrated
with reference to the Clayton Tunnel railway accident. This model is a case-
study currently under development by Pi-Hwa Sun, a research student in the
Empirical Modelling research group. Details of the tools and notations used to
construct the model are omitted, and the emphasis is on the conceptual processes
surrounding its construction. For more technical details, the interested reader
may consult the EM website and other references cited in [4,9].
EM is concerned with representing the processes that lead to the discovery
of concepts. It differs from a logicist approach in its emphasis upon how con-
cepts are discovered in a psychological sense (cf. [55]). In EM, the discovery
Empirical Modelling and the Foundations of Artificial Intelligence 335

process relies upon embodiment in an essential way, and artefacts are seen as
indispensable for its representation. The experiential intuitions that inform the
construction of such artefacts are here described informally. Practical experience
is perhaps the best way to gain a deeper appreciation of EM principles.
The important intuitions on which EM draws are the experience of momen-
tary state (as in “the current situation”), and that of an identifiable pattern of
state transitions (as in “a phenomenon”). In the context of the Clayton Tunnel
illustration, Figure 1 depicts a particular situation. A phenomenon might be “a
train passing through the tunnel”; another might be “a train approaching the
tunnel whilst the alarm is ringing”. In EM, an artefact is used to model ex-
perimental interaction in a situation, with a view to identifying and construing
phenomena associated with this situation.
Construal in EM is relative to the egocentric perspective of a particular agent.
Whereas most computational modelling is aimed at realising a system behaviour,
the primary focus of EM is on modelling the way that an agent’s construal
of a situation develops and how subsequently the conception of a system may
emerge. The computer model serves to represent a situation, and transformations
associated with the contemplation of this situation. In this context, the computer
is being used not to compute a result but to represent a state metaphorically,
in much the same way that a physical artefact (such as a scale model, or VR
reconstruction of a historic building) can be used as a prototype. The term
‘computer artefact’ is used to convey this emphasis.
The interpretation of computer artefact adopted here is unusual, and merits
amplification. It derives from inviting the human interpreter to view the com-
puter as a physical object open to interaction, observation and experiment in
abstractly the same way as any other physical object in our environment. Such
a view contrasts with the conception of a computer as negotiating input and
output to a preconceived schema for interpretation, and in order to perform a
preconceived function. This contrast is much sharper than is suggested simply
by considering what are often termed the non-functional aspects of the computer
operation, such as speed, user convenience and visual effect. The computer arte-
fact is experienced without reference to specific function, and its state is not
to be conceived as meaningful only in relation to a predefined abstract pattern
of behaviour (e.g. as in the states of a finite state machine). The meaning and
significance of the state of the artefact is instead to be acquired through a prim-
itive process of conflating experiences of the artefact and of the external world
(cf. the blending to which Turner refers [73,74]). In this negotiation of meaning,
there is no necessary presumption that transitions between states in the artefact
reflect familiar objective external behaviours. Rather, like a physical object, the
artefact manifests itself in its current state, and my conception of this state is
informed by my previous experience, expectations and construal of the situa-
tion. By this token, changes to the state of the computer artefact reflect what
the human observer deems to be the case: for instance, that one-and-the-same
object is now in a different state, or that I now take a different view of this
one-and-the-same object.
336 Meurig Beynon

The framework for construal in EM can be illustrated with reference to an


investigation into the Clayton Tunnel Accident. In interpreting the operation
of the railway system in the vicinity of the Clayton Tunnel, the investigator
will need to identify many different agents and construe the system from their
perspectives. The most important perspective is that of an external observer.
Initially, the focus is upon how interaction with the model is shaped so as to
imitate the experience that an investigator can or in principle might be able to
get from conducting experiments in the context of the actual railway system. It
will subsequently be clear that the same principles that guide the investigator
in developing a construal can also be applied to its constituent agents.
The idea of ‘contemplating a particular situation’ is illustrated in Figure 1.
It is appropriate to think of the investigator as engaged in situated modelling, so
that the computer artefact depicted in Figure 1 is placed in the actual environ-
ment it is meant to represent. The modelling activity is intended to ensure that
the current state of the computer artefact is a good metaphor for the current
situation as conceived by the designer. The criterion for goodness is part of the
EM concept. What matters is that there is a perceived similarity between the
artefact and its referent, a similarity that is recognised through observation and
interaction with both (cf. Dreyfus’s view [43] that human cognition works by
‘similarity recognition’).
Realism and real-time modelling are not the significant issues. The principal
concern is whether observation of the system can be successfully construed: is
there in the abstract a way to account for any observed and conceivable changes
in state in the system? By way of illustration, Newtonian mechanics can be an
excellent way to construe the motion of a train, but it does not of itself deliver
a photorealistic real-time animation. In EM, the role of the artefact is to model
the way in which the investigator’s construal of the system evolves. Changes in
the state of the system and changes in the investigator’s view of the system are
equally significant. Changes of both types are represented in the artefact, and
are only distinguished through interpretation. For example, a change to the state
of the artefact depicted in Figure 1 could reflect a change in the situation (e.g.
the movement of a train), or a change in the investigator’s understanding (e.g.
the realisation that the location of the signal was inaccurately recorded, that
the resetting of the signal was influenced by the weight of the train, or that the
colour of Killick’s shirt needed to be taken into account).

Fundamental abstractions for EM from a psychological perspective.


EM offers certain principles that guide the analysis of the real-world situation
and the construction of its metaphorical representation. The psychological plau-
sibility of these principles is important. Many issues raised by Kirsh in [55] are
significant here, and these will be highlighted in the exposition. For instance, in a
convincing account of intelligence, the identification of an object—in the generic
or particular sense—cannot be taken for granted. There should be a difference
between regarding an agent as having ‘a symbol in a declarative’ and assuming
Empirical Modelling and the Foundations of Artificial Intelligence 337

it to have a concept. A psychologically convincing account of knowledge must


offer principles for determining how we extend our concepts to new domains.
EM does not attempt to address the explicit mechanisms by which conceptual
developments of this nature are shaped in a human agent’s mind. For instance,
no consideration is given to the kind of learning processes that are described
by neural networks, and might conceivably be persuasive models of brain func-
tion. EM simply acknowledges the fact that objects come to be recognised, that
concepts are developed and that connections between different situations are
established as a result of repeated observation, interaction and experiment. The
aim is to develop computer artefacts that can represent the implications of these
processes faithfully.
The most elusive but fundamental aspect of the EM approach is its emphasis
on modelling a state or situation. This is not to be interpreted as referring to
abstract computational state, but to something resembling a ‘state of mind’ that
derives its meaning from a relationship between a human agent and an external
focus of interest and attention. This emphasis accords with Kirsh’s concern about
the embodiment of cognitive skills: “The real problem must be defined relative
to the world-for-the-agent. The world-for-the-agent changes despite the world-
in-itself remaining constant.”. Two of the most important aspects of capturing
states of mind are:
– respecting the exceptionally rich and often unexpected associations between
different situations in transitions between states of mind
– faithfully registering what is directly apprehended as opposed to what might
in principle be accessible.
By way of illustration, an accident investigator might well conceive all kinds
of variants of the situation in Figure 1 from within one state of mind. Suppose
that the brakes on Train 3 had failed, that Killick had mislaid the red flag,
that Train 1 had broken down in the tunnel, that Train 2 had whistled as it
reversed from the tunnel, that a different protocol or different railway tunnel
had been involved. The significant feature of these variants is the looseness of
their relationship to each other: they are not necessarily part of one and the
same passage of observation of the railway system (e.g. “the brakes did not
fail”); they may not be possible behaviours of the actual system (e.g. “Train 2
was not equipped to whistle”); they can involve tranposing events into a totally
different context.
The importance of correctly associating observations within one state is also
illustrated in the accident scenario. The synchronisation between displaying and
seeing the red flag matters crucially. ‘Seeing the red flag’ and ‘recognising the
potential hazard’ ahead are indivisibly linked in Scott’s experience. The analysis
of the accident would be influenced if this communication of danger were em-
bodied in a different way (cf. “stop if the number displayed on the signalman’s
card is prime”).
There are teasing philosophical issues to be addressed in this connection.
What is objective and subjective about the synchronisation of agent actions?
In [69], Russell poses a conundrum that concerns establishing the time at which
338 Meurig Beynon

a murder on a train was committed from valid but inconsistent testimony about
synchronisation of events by observers on and off the train. What is the dis-
tinction between percept vs. concept? The psychological subtlety of this issue is
well-illustrated by this extract from Railway Regulations of 1840 [67]: “A Signal
Ball will be seen at the entrance to Reading Station when the Line is right for
the Train to go in. If the Ball is not visible the Train must not pass it.”. Such an
injunction to respond to what is not perceived only makes sense in the context
of an expectation that the ball might be seen.
An appropriate philosophical perspective for EM will be considered later. In
practice, EM takes a pragmatic stance. Where a logicist model has to address
the matter of inconsistency and incompleteness of knowledge explicitly, if only
by invoking meta-level mechanisms, EM aims at faithful metaphorical represen-
tation of situations as they are—or are construed to be—experienced. There is
no expectation that EM should generate abstract accounts of phenomena that
are complete and self-contained. In resolving singularities that arise in inter-
preting its artefacts, there is always the possibility of recourse to the mind that
is construing a phenomenon, and to further experimental investigation of the
phenomenon itself.

Basic concepts of EM: construing phenomena. The basic concepts of EM


are observable, dependency and agency. In the first instance, it is essential to
interpret these concepts as egocentrically defined: they are the elements of a
particular agent’s construal of its experience, and are appropriately described
with reference to personal commonsense experience.
An observable is a characteristic of my environment to which I can attribute
an identity. An observation of an observable returns a current value. ‘Current
value’ here refers to the value that I would “as of now”—that is to say, in
my current state of mind—attribute to the observable. An observable may not
always be present, but may disappear, and perhaps later return.
The state of the world for me, as of now, is represented by a collection of
observables with particular values. Observables can be physical or abstract in
nature: the corner of the table, the volume of the room, the status of my bank
account, my ownership of a house.
I might be able to see an observable, or sense it directly in some other fashion,
I might have to perform an experimental procedure to determine its current
value, or consult an instrument, I might need to invoke social or legal conventions,
I might need to use an acquired skill.
Observables are organised in my experience because they are present and
absent at the same time, as potential agents, and because their values are cor-
related in change, through patterns of dependency. Dependency patterns are
fundamental to the perception and recognition of observables, and determine
when they can be deemed to have integrity as an object. Dependency relations
need not respect object boundaries.
Empirical Modelling and the Foundations of Artificial Intelligence 339

Observables, dependency and agency are the focus for two activities: an anal-
ysis of my experience, and the construction of a computer artefact to represent
this experience metaphorically.
In analysing my experience, I adopt a stance similar to that of an experi-
mental scientist. Repeated observation of a phenomenon leads to me to ascribe
identity to particular characteristic elements. To some extent, this attribution
stems from the perceived continuity of my observation (e.g. this is the same key-
board that I have been using all the while I have been typing this sentence), but
it may stem from a more subtle presumption of conjunction (e.g. this is the same
keyboard I was using last week, though I have not been present to confirm this),
or another conceptual continuity (as e.g. when I have bought a new computer:
that was and this is my keyboard). The integrities that can be identified in this
way are observables.
Because the characterisation of observables in EM is experiential and em-
pirical, it is open to a much wider interpretation than a conventional use of
the term. When driver Scott sees the red flag, there is no physical perception
of the danger of entering the tunnel—indeed, there is no immediate physical
danger to be perceived. Nonetheless, the context for displaying the red flag has
been established indirectly with reference to expected experience. Danger, of
itself invisible—even absent, is present as a conceptual observable concomitant
with the red flag. To construe the accident, the investigator must take account
of the fictional obstruction in the tunnel that Scott infers when the red flag
is seen. And, to deconstruct Scott’s concept yet more comprehensively, though
Scott could not see even a real obstruction in the tunnel, yet an extrapolation
from his recollected experience potentially traces the path from the mouth of
the tunnel to the point of collision with this invisible imaginary obstacle.
The idea of dependency is illustrated in the concomitance of ‘red flag’ and
‘danger’ as observables. Other examples of dependencies include: the electrical
linkage between the telegraphs, whereby the state of a button in one signal box
is indivisibly coupled to the state of a dial in another, the mechanical linkage
that enables Killick to reset the distant signal, and the mechanism that causes
the alarm to sound whilst the signal has not yet been reset.
Dependencies play a very significant part in the construal of a phenomenon.
They are particularly intimately connected with the role that invoking agents
plays in accounting for system behaviour. A dependency is not merely a con-
straint upon the relationship between observables but an observation concerning
how the act of changing one particular observable is perceived to change other
observables predictably and indivisibly. This concept relies essentially upon some
element of agency such as the investigator invokes in conducting experiments—
if perhaps only “thought experiments”—with the railway system. In empirical
terms, dependency is a means of associating changes to observations in the sys-
tem into causal clusters: the needle moved because—rather than simply at the
same time as—the button was pressed.
In investigating a phenomenon, dependency at a higher level of abstraction
associates clusters of observables into agents that are empirically identified as
340 Meurig Beynon

instigators of state-change. In a commonsense interpretation of the railway sce-


nario, the agency is taken so much for granted that it may seem perverse to probe
its psychological origins, but there are good reasons to do so. The observables
that I—in the role of external observer—introduce in construing a phenomenon
may enable me to describe the corporate effect of many interacting agents, but
there is a proper distinction to be made between my observables and theirs.
There will also be typically be certain actions that cannot be attributed to any
identifiable agent (e.g. “acts of God”, such as a landslide in the tunnel). My
status as observer is reflected in the passive mode in which my construal is ex-
pressed in terms of observed actions, possibly attributed to agents, and their
perceived effect upon the system state.
My construals are potentially personal, subjective and provisional. What
I understand to be the current state will change subject to what kind of phe-
nomenon I am investigating. Judgements about observables, dependency, agency
and integrity are pragmatic and empirical matters, about which I can presume
no absolute knowledge. By way of illustration, the status of observables asso-
ciated with trains that have been involved in an accident is obscure. For some
purposes (spreadsheet update, timeless experiment), I may be uninterested in
how long it takes for the current value to be registered or to be determined. De-
pendencies amongst observables in the current state reflect the character of my
interaction with the environment: e.g. ‘nothing’ can intervene in the updating
of a spreadsheet; in a certain context buying a house and signing a document
are indivisible by convention; a vehicle will knock me over whether or not I can
experimentally determine its exact speed ‘in time’.
The above characterisation of construals is the central abstract contribution
of EM, to be discussed in a broader philosophical context below. The central
practical contribution of EM concerns the construction of artefacts that are
intimately connected with developing construals. This is the focus of the next
subsection.

Basic concepts of EM: constructing artefacts. In EM, construals cannot


be adequately represented using a formal language: they must be represented
by physical artefacts. This representation relies on the perceived correspondence
between states and interactions with the artefact, as mediated by its own ob-
servables, and those associated with the situation to which it refers.
In practice, the process of construing phenomena is closely bound up with
constructing artefacts of just this kind. Construal and artefact construction are
symbiotic processes that are interleaved and may even be conflated. Devices that
are used to demonstrate the integrity of an observable (e.g. an electrical current)
evolve into devices that can associate a value with an observable (e.g. an amme-
ter), and then become themselves an integral part of a system (e.g. a dynamo-
driven anemometer in an aircraft). As these historical examples illustrate, not all
artefacts used in the process of construal have been computer-based, but their
construction has been restricted by the recalcitrance of physical objects. The vi-
ability and topicality of EM stems from the fact that modern computer-related
Empirical Modelling and the Foundations of Artificial Intelligence 341

technology can be the basis for artefacts whose characteristics are no longer so
tightly constrained.
Construal in EM can be viewed as associating a pattern of observables, depen-
dencies and agents with a given physical phenomenon. EM techniques and tools
also serve a dual role: constructing physical artefacts to realise given patterns
of observables, dependency and agency. A key role in this construction process
is played by dependency-maintenance that combines the updating mechanism
underlying a spreadsheet with perceptualisation. One technique for this involves
the use of definitive (definition-based) notations [24].
A definitive notation is used to formulate a family of definitions of variables
(a definitive script) whose semantics is loosely similar to the acyclic network of
dependencies behind the cells of a spreadsheet. The values of variables on the left-
hand side in a definitive script are updated whenever the value of a variable that
appears on the right-hand side is updated. This updating process is conceptually
atomic in nature: it is used to model dependencies between the observables
represented by the variables in the script. A visualisation is typically attached to
each variable in a script, and the visual representation is also updated indivisibly
when the value of the variable changes. Definitive notations are distinguished by
the kind of visual elements and operators that can be used in definitions.
Definitive scripts are a basis for representing construals. In typical use, the
variables in a script represent observables, and the definitions dependencies.
A script can then represent a particular state, and actions performed in this
state can be represented by redefining one or more variables in the script or by
introducing a new definition.
The use of two definitive notations in combination is illustrated in Figure 1.
One notation is used to define the screen layout and textual annotations, the
other to maintain simple line drawings. By using such notations, it is easy to
represent the kinds of dependencies that have been identified above. For instance,
the dial displays occupied whilst the appropriate button is depressed.
If a phenomenon admits an effective construal in the sense introduced above,
we can expect counterparts of the transitions that are conceived in exploring
the system to be realisable by possible redefinitions in the artefact. In practice,
the possible redefinitions do not respect semantic boundaries. For instance, in
Figure 1, they may relate to modifying the visualisation (e.g. using a dotted
line to represent the track in the tunnel), emulating the actions of agents in
the scenario (e.g. resetting the signal), or fantasising about possible scenarios
(e.g. changing the location of the signal). This accords with the view of the
investigator as resembling an experimental scientist, who, within one and the
same environment, can select the phenomenon to be studied, decide upon the
viewpoint and procedures for observation, adjust the apparatus and develop
instruments.
In practical use, a definitive script can be used judiciously so that all inter-
action is initiated and interpreted with discretion by the human investigator.
For the script to serve a richer purpose than that considered in Naur’s account
of constructed models [63], there must be interaction and interpretation that is
342 Meurig Beynon

not preconceived. Ways of framing particular modes of interaction that are less
open-ended are nonetheless useful. For example, the actions that are attributed
to agents need to be identified, and the different categories of action available
to the investigator discriminated.
A special-purpose notation, named LSD, has been introduced for this pur-
pose. (The LSD notation was initially motivated by a study of the Specifica-
tion and Description Language SDL—widely used in the telecommunications
industry—hence its name.) The manner in which an agent is construed to act is
declared by classifying the observables through which its actions are mediated.
This classification reflects the ways in which real-world observables can be ac-
cessed by an experimenter. Certain observables can be directly observed (these
are termed oracles), some can be changed (handles), but this change is subject
to observed dependencies (derivates) and is generally possible or meaningful pro-
vided that certain conditions hold (such conditional actions are expressed as a
protocol that comprises privileges to act). It may also be appropriate for a con-
strual to take account of attributes associated with the experimenter (states).
For instance, the status of certain observations and actions may be affected by
the experimenter’s location.
An LSD account of an agent can be used in a wide variety of contexts. It
can represent what I personally can observe and change in a given situation.
Alternatively, it can express what I believe to be the role of an agent other than
myself, either from my perspective or from its own. In an appropriate context, it
can be also used to specify an agent’s behaviour (cf. the LSD Engine developed
by Adzhiev and Rikhlinsky [3]). These three perspectives on agency are discussed
in more detail in section 4.
When construing a complex phenomenon, the presence of several agents leads
to potential ambiguity about which perspective is being invoked. For this rea-
son, LSD accounts do not necessarily lead directly to operational models of
phenomena. It is not in general possible to develop a faithful computer model
of behaviour that can be executed fully automatically; the intervention of the
modeller in the role of super-agent is needed to emulate non-deterministic in-
teraction, to resolve ambiguity about the current state of the system, and to
arbitrate where the actions of agents conflict. Special tools have been developed
for this purpose: they include the Abstract Definitive Machine (ADM) [21], and
the distributed variant of the Eden interpreter [23] that has been used to generate
Figure 1.

3.3 Characteristics of the EM Construal Process

There are many important respects in which the principles of EM, as described
above, engages with the fundamental issues raised by Kirsh in [55]: it is first-
person centred, it is not primarily language-based, but experientially-based; it
involves embodied interaction and experiment; it addresses conceptualization in
psychological terms; it is concerned with intentionality and meaning rather than
logical consequence.
Empirical Modelling and the Foundations of Artificial Intelligence 343

As Kirsh remarks: logicists see “inventing conceptualizations” and “ground-


ing concepts” as modular. The most fundamental shift in perspective in EM con-
cerns the nature of the relationship between the artefact and the phenomenon it
represents. To say that the artefact metaphorically represents the phenomenon
suggests an abstract conceptual correspondence in the spirit of Forbus, Gen-
tner [44,46], Campbell and Wolstencroft [35]. What is entailed is quite differ-
ent in character: a correlation between two experiences, one of which is gained
through experiment in the world, and the other through experimental redefi-
nition in the script. The presence of dependencies between observables is the
psychological mechanism by means of which this correlation leads to a perceived
correspondence between observables in the artefact and in the world.
Notice that this process can only be appreciated from a first-person perspec-
tive. Only I have simultaneous access to experience of the artefact and the world.
This accords with the account of metaphor that is given by Turner in [74]. In
its most primitive terms, metaphor is a blending of two experiences within one
mind, not the abstract process of establishing a correspondence between abstract
structure that is analysed in depth in [35].
In combination, dependency and interaction expose identities and drive the
conceptualization process. The way in which pictorial elements in Figure 1 are
linked through dependencies in the script is crucial in being able to connect
them with the world. What is more, the psychological process of making these
connections depends upon being able to exercise a powerful form of agency: that
of being able to perform or invoke actions similar to those that an experimenter
might perform to test hypotheses about the identity and status of observables (cf.
Smith’s remark [70] that “agents are what matter for semantical connection”).
It is to be expected that pressing the button affects the dial; that applying the
brake will slow the train down; that setting the signal to caution will cause
the driver to brake. Such experiments lead us to introduce new concepts and
observables: for example, to recognise the need to consider the precise point at
which the engine triggers the treadle to reset the signal, and to introduce a visual
counterpart to the artefact.
A proper appreciation of the experiential basis of the correspondence between
computer artefact and real-world referent underlies the distinction between a
definitive script and a family of predicates. It is only by virtue of this relationship
that it makes sense to regard the variables in a script as referring directly to
particular external observables, and the script itself as representing a specific
state. In this way, EM addresses Kirsh’s concern [55]: that model theory is a
theory of logical consequence, not of intentionality or meaning, and that it does
not single out one model or family of models as the intended models. In keeping
with the proposals of Smith [70], the relationship between form and content
in EM is altogether more dynamic and more intimate than in model theory.
Sculpting the form-content relation that binds the artefact to its referent is an
integral part of the EM process.
In conjunction with its LSD interface, a definitive script is well-suited to
modelling the environment in terms of actions that are afforded, and agents’
344 Meurig Beynon

dispositions to behave. In this context, the use of dependency-maintenance in


mediating the effects of agent actions to other agents and to the external observer
is very significant—it circumvents the “ugly semantics” that, as Kirsh observes,
stem from the fact that “in stating the role an informational state plays in a
system’s dispositions to behave we characteristically need to mention myriad
other states” [55].
It is also possible to see how EM can address “tuning the perceptual system
to action-relevant changes” both from the perspective of the investigator, and
from that of other agents, though practical development of EM is still required to
realise its full potential in this respect. This is a strong motivation for developing
tools to support the design of artefacts that provide an interface that make bet-
ter use of human perceptual and manipulative skills. Issues of this nature arise
in designing definitive notations. For instance, in devising a definitive notation
for geometric modelling, only an empirical evaluation can determine what sort
of functional dependencies in a script are best suited to giving the user con-
trol of geometric features. Exploring the potential of computer-based technology
in constructing instruments in this way is another area of application for EM
techniques (cf. [47]).
A framework that deals with such experiential matters effectively must still
respect the important caveats about displacing logical concepts from an ac-
count of intelligence that are also raised in Kirsh [55]. For example: how can
the physical fact that the pen is on the desk be seen as the structured facts
|the pen|^|is on|^|the desk|? how can ‘the pen is on the desk’ and ‘the
pen is matte black’ be seen to entail ‘the matte black pen is on the desk’ ?
Within the EM framework, it is possible to record conceptual relationships
that, though they cannot be explicitly perceived, express the expected results
of experimental procedures. Naively, the object is a pen because you can write
with it; it is on the desk because it can be lifted from the desk but moves with
the desk in a characteristic way. The informality of such criteria do not perturb
the empiricist, for whom such assertions enjoy no absolute status of truth in a
closed world. The pen may be replaced by a pen-shaped sculpture that has been
welded to the table when my back is turned. Interpreting a red flag as a sign of
danger illustrates how such conceptual mechanisms can be used.
As for conjoining observations made in the same state, this is the most com-
monplace activity in constructing a definitive script. To illustrate that yet more
sophisticated logic necessarily has sometimes to be invoked in scripts, it is only
necessary to consider the complexity of the interlocking mechanisms that were
constructed in the railways of the early twentieth century. To represent the in-
divisible relationships between levers, signals and track points in such a system
would involve intricate logical dependencies. Similar considerations apply to the
general mathematical functions that can arise on the right-hand side of a defini-
tion. The process of surveying land was radically transformed by the discovery
and tabulation of trigonometric functions that could be used to reduce the num-
ber of explicit measurements needed to make a map.
Empirical Modelling and the Foundations of Artificial Intelligence 345

What is not in keeping with EM principles is the indiscriminate use of log-


ical and functional abstractions that is illustrated in the scenario of the Mensa
problem, and in impure variants of logic and functional programming: the agent
that is presumed to know the logical implications of the elementary propositions
it knows, no matter whether this is computationally feasible; negation as failure;
lazy evaluation as a general-purpose semantics for interaction. The account of
heapsort in [10] demonstrates how procedural issues can be appropriately ad-
dressed using EM principles, and also illustrates appropriate use of relatively
sophisticated logical concepts (such as whether the heap condition is valid at a
node) within a definitive script.
It remains to consider two foundational issues raised by Kirsh: the extent to
which “the kinematics of cognition are language-like” and that “learning can be
added later”.
The distinctive qualities of EM as a non-logicist approach stem from the
representational significance of the computer artefact. Changes to the artefact
can record the adjunction of new observables or the identification of new de-
pendencies. Whilst the interaction with the artefact is open to extension and
revision in this manner, the semantics of the model is fluid and arguably cannot
be expressed in propositional terms. Modification of the artefact offers a way
of representing learning “as a change in capacities behaviourally or functionally
classified” [55]. This enfranchises elements of the learning process that are non-
linguistic in character, and that are arguably concerned with private processes
of incremental understanding that can only be represented with the help of an
artefact (cf. [9]).
To illustrate the kind of learning activity that can in principle be supported
using EM, consider two educational simulations that could be readily derived
from the artefact depicted in Figure 1.
One such simulation could be used to assess the robustness of the proto-
cols and equipment used in the Clayton Tunnel scenario, perhaps to give some
insight into the pressures under which employees such as Killick worked. The
user’s objective would be to perform Killick’s specified protocol, with a view to
sending as many trains safely through the tunnel as rapidly as possible subject
to progressively more frequent signal failure and ever more frequently arriving
trains. The insight gained in this scenario is qualitative in nature and could nat-
urally lead to a closer examination and more faithful simulation of ergonomic
issues, such as train visibility and the time needed to reset the signal or deploy
the emergency flags.
An alternative simulation exercise might involve playing the role of a driver
unfamiliar with the signalling protocol who is expected to infer the conventions
from experience. Being able to infer the significance of the signal protocol de-
pends on how often a hazardous encounter with a train occurs after the signal
has been ignored on proceed with caution. This is a function of train frequency,
engine failure, and what protocols other drivers in the model are observing. In
this case, the skill to be learned is associated with apprehending dependency
and recognising agency.
346 Meurig Beynon

These illustrations strongly suggest that the development of intelligent sys-


tems cannot be divorced from the learning process. As Kirsh indicates [55], if
learning is understood as “acquiring knowledge of a domain”, there is an issue
over whether two creatures with slightly different physical attributes who had
learnt the same task in behaviourally different ways could be said to have learnt
the same thing. From an EM perspective, it is simplistic to separate abstract
knowledge from the experimental contexts in which this knowledge is demon-
strated. By way of illustration, it is certainly possible for two pianists to give
performances that cannot be distinguished by a listener, but to have learnt the
instrument in radically different ways. For instance, one may rely totally on a
musical score, whilst the other cannot read a score in real-time, but plays flu-
ently from memory. In this case, there are experiments to distinguish between
the two performers, such as switching off the lights.
It may be that, from the ideal perspective to which logicism aspires, the
development of safe railway systems should have been a process that involved
immensely careful empirical analysis and formal specification leading directly
to precise implementation. Railway history, even to the present day, makes this
scenario seem entirely implausible. Systems such as the Clayton Tunnel telegraph
were experiments that served to disclose the discrepancy between abstract and
embodied signalling protocols, and the essential role that experience plays in (as
far as possible) confining agents, observables and interactions to what has been
preconceived.
The convergence of technology and operating conventions towards interac-
tion in a closed-world environment is the recurrent theme behind the evolution of
modern railways. Technology can both assist and subvert this process. Had Kil-
lick and Scott been able to communicate by mobile phone, the Clayton Tunnel
disaster could have been averted. But as recent experience on British railways
confirms, new technologies potentially introduce new hazards.
The ontological and epistemological stance of logicism is appropriate in those
regions of experience where the empirical foundation for a closed-world model
has been established. It is inappropriate prior to this point, and (as is well-
recognised) insufficient to express experiential aspects of interaction that can
be very significant. Paradoxically, the realm over which logicism rules is one in
which in some respects intelligence is least taxed. The autonomy and initiative
that Killick and Scott exercised to such tragic effect is precisely what had to be
eliminated from the railway system in order to enhance safety.

4 The Implications of EM
Having discussed the character and significance of EM from a practical view-
point, it remains to return to the broad agenda set out in the introduction. This
section discusses how EM contributes towards three key objectives:
– giving a perspective on logicist and non-logicist approaches;
– providing a conceptual foundation for AI broader than logicism;
– providing a context for existing practice in “scruffy” AI.
Empirical Modelling and the Foundations of Artificial Intelligence 347

There is, in particular, an important need to provide a more coherent frame-


work from which to view the diverse applications of EM that have been studied,
and to understand their significance and implications. To this end, this section
draws on critiques of logicism from many different sources and perspectives.

4.1 Critical Perspectives on Logicism

There have been many criticisms of the logicist position. Where AI is concerned,
the sources include Rodney Brooks [31,32], Brian Smith [70,71], Mark Turner [73]
and Peter Naur [63]. Other relevant philosophical ideas are drawn from William
James’s ideas on Radical Empiricism, first collected for publication shortly af-
ter his death in 1910 [53], and from more contemporary work of Gooding [47]
and Hirschheim al. [52] on methodological issues in science and information sys-
tems development respectively. These indicate that the controversy surrounding
a logicist viewpoint is neither new, nor confined to AI and computer science.
Gooding’s analysis of Faraday’s work is motivated by disclosing simplistic as-
sumptions about the relationship between scientific theory and practical experi-
ment. William James addressed similar issues in his attacks upon the rationalist
viewpoint on experience. Hirschheim [52] is concerned with information system
design as involving the development of social communication systems. This ar-
guably places the design of such systems outside the paradigm for Computer
Science proposed in Denning al. [41]. For instance, it raises issues such as shared
meaning, and the management of ambiguity, inconsistencies and conflict in sys-
tem specifications.
Common themes that arise in these writings include:

– the role and limitations of language;


– the importance of agency, and of societies of agents;
– the significance of artefacts and constructed models;
– perception and action, observation and experience;
– the importance of situated and empirical activities;
– the significance of metaphor and analogy;
– the relationship between private and public knowledge.

A particularly significant challenge is the development of an appropriate com-


putational paradigm for AI. Brian Smith [70] has sought such a paradigm for
some time, and identified many of its essential characteristics. An important
theme in Smith’s work is the connection between computability and physical re-
alisibility. This endorses the philosophical position of Brooks [31,32], and points
to the difficulty of carrying out AI research without constructing physical mod-
els. It also argues for an emphasis upon empirical elements such as perception,
action, observation and experience. Metaphor, analogy and agency have an es-
sential role to play here.
The issue of whether we can address the many different aspects of a non-
logicist position represented in the work of Brooks, James, Smith and Hirschheim
without compromising conceptual integrity is particularly problematic. This is
348 Meurig Beynon

illustrated by the breadth of interpretations of the term ‘agent’ in these contexts:


Brooks refers to robots and humans as agents, and to the layers of the subsump-
tion architecture as resembling a family of competing agents such as Minsky
describes in [61]; James discusses agents as pragmatically identified elements in
a particular causal account of our experience [53]; Smith declares that “agents
are what matter for semantical connections” [70]; Hirschheim [52] is concerned
with systems analysis that embraces machines, organisms, social and psychic
systems, each of which represents agency of different character, and with design
activities in which many human agents and different viewpoints are involved.
A detailed account of the relationship between EM and the work of these
authors is beyond the scope of the paper, and the main emphasis will be upon
giving an integrated view of where seminal issues have been or potentially can
be addressed by EM. For instance: computer programming for AI is discussed
in the context of Smith’s Two Lessons in Logic in [7,20]; some of the issues
for information systems design raised by Hirschheim al. in [52] have been con-
sidered in [17,18,64], and some relating to concurrent engineering in [1]; the
status of constructed models in knowledge representation, as examined by Naur
in [63], is discussed in [9,12]; the prospects for applying EM principles to meet
Kent’s challenges for data modelling [54] are considered in [11,45]; layers of in-
telligence serving a similar role to those in Brooks’s subsumption architecture
are introduced in [14,4]; the ideas of William James [53] and Gooding [47] on
the relationship between theory and experiment are developed in [9,10,8].

4.2 A Framework for EM


EM has been represented in this paper as primarily a first-person activity. The
process of identifying similarities between two experiences is essentially an activ-
ity for one mind. Psychologically, it makes good sense for the primitive cognitive
elements to belong to the first-person, but the bias of philosophy has tended to
be towards explaining cognition with reference to third-person primitives.
If it is appropriate to ground experience through the perception of depen-
dency in the way that has been described in this paper, then a new ontology is
needed for many conventional concepts. EM imputes greater autonomy to the
isolated learner, provided only that they have interactive access to physical arte-
facts. This makes it possible to view familiar concepts, such as objective reality,
conventional theories and language, from a new perspective. Empirical activity
and the recognition of dependencies are the means to account for these sophis-
ticated communal concepts from the primitive base of private experience. Such
a development is outlined in [15].
The aim of this section is to elaborate on the implications of this recon-
struction process in three respects. There are useful links to be explored with
other research. The conceptual reorganisation supplies a framework in which
to classify the many practical applications of interaction with artefacts in EM.
The reconstruction offers a new perspective on fundamental concepts such as
agent, metaphor and intelligence that are arguably difficult to formalise satis-
factorily [58,35].
Empirical Modelling and the Foundations of Artificial Intelligence 349

EM: the first person perspective. Under some interpretations, Kant’s fa-
mous dictum: “sensation without conception is blind” might serve as a motto
for the logicist. An appropriate motto for EM might be that of the anonymous
little girl who, on being told—by a logicist, no doubt—to be sure of her meaning
before she spoke, said: “How can I know what I think till I see what I say?” [65].
This epitomises the first-person variant of EM that has been described above:
the dialogue between me and myself, in which understanding is construction
followed by reconstruction in the light of experience of what I have constructed.
First-person activities in EM have centred on interface development [22,10] and
conceptual design [2].

EM: the second person perspective. Other applications of EM are con-


cerned with the projection of agency from first to second person. The essential
principle behind this projection is that through experiment and perception of
dependency I can identify families of observables (constituting a ‘you’) who can
be construed as acting in ways congruent to my own. This congruence is repre-
sented in the same way that I represent my own experience with reference to a
suitable physical artefact. The projection process is conceptually simplest where
the ‘you’ is another person, but this not necessary. Many non-human entities can
be construed as agents in this manner: experimental science relies on just such
principles to construct instruments that make interactions we can only conceive
perceptible in an empirically reliable and coherent fashion.
Once a system is construed as having two or more agents, there can be
ambiguity about the viewpoint on system state. Where there are several LSD
accounts from different viewpoints, the integrity of observables becomes an issue.
To an external observer of a system, this can manifest in many forms of conflict
between the actions of agents within a system. It can also be associated with
legitimate inconsistency, as in the case of Russell’s two observers, who see the
same observables but have a different perception of the events. These two second-
person variants of EM are represented in case-studies in concurrent systems
modelling [16] and concurrent engineering [1] respectively.

EM: the third person perspective. One of the most complex and subtle pro-
cesses that can operate in an EM framework is the transition to the third-person
perspective. The observables that can be viewed from a third-person perspec-
tive are those elements of our experience that empirically appear to be common
to all other human agents, subject to what is deemed to be the norm (cf. the
presumptions surrounding the Mensa problem above). The identification of such
observables is associated with interaction between ourselves and other human
agents in a common environment. Objectivity is empirically shaped concurrently
by our private experience, and our experience of other people’s responses.
The extent to which objective third-person observables dominate our public
agenda can obscure the sophistication of the social conventions they require. In
matters such as observing the number of items in a collection, and confirming its
objective status, complex protocols are involved: eating an item is not permitted,
350 Meurig Beynon

interaction with the environment must be such that every item is observed and
none is duplicated, co-operation and honest reporting of observation is needed
to reach consensus.
Underpinning third-person observables are repeatable contexts for reliable
interaction, and associated behaviours of different degrees of locality and sophis-
tication. In this context, locality refers to the extent to which a pattern of activity
embraces all the agents in an environment and constrains the meaningful modes
of observation. Counting techniques provide examples of behaviours that are typ-
ically local in this sense - they involve few agents, and are applied in the context
of otherwise uncircumscribed interaction. Conventional computer programming
typically presumes a closely circumscribed context, in which human-computer
interaction is subject to global behavioural constraints (as in sequential interac-
tion between a single user and computer), and the main preoccupation is with
objective changes of state (such as are represented in reliable computer operation
and universally accepted conventions for interpretation of input-output state).

Relating EM to conventional computer-based modelling. In principle,


EM can be the prelude to the construction of a reactive system and more spe-
cialised forms of computer programming [13]. To understand the significance of
using EM in this fashion, it is essential to appreciate that modelling behaviours
is not a primitive concept in EM. In EM terms, a behaviour is a sophisticated
abstraction that involves the attribution of an identity to a pattern of state tran-
sitions. For instance, a train passing through the Clayton Tunnel is associated
with a sequence of observations that have some perceived reliability and in-
tegrity. In the context of Figure 1, the accident investigator will need to exercise
all manner of changes of state of mind that are unrelated to the characteristic
motion of the train. Observing a train in motion through the tunnel is choos-
ing to generate a particular kind of experience from the artefact, one in which
observation of position, velocity, acceleration and a clock are all involved. What
is more, it is only these abstract observables that are significant: the identity of
the train and all the particulars of the agents are immaterial.
In a classical approach to computational modelling of behaviour, the com-
puter implementation begins from the premise that the empirical activities as-
sociated with circumscribing agency and identifying a closed world have already
taken place. For instance, elementary physics declares how positions, velocities
and acceleration are to be measured, and specifies the laws of motion that govern
them. The construction of a computer simulation then involves:

– choosing a mathematical representation of the observables;


– specifying a computational model for the states and transitions;
– attaching a visualisation to display the behaviour to the computer user.

In constructing a closed-world model using the EM approach, these three issues


are addressed in reverse order. The modelling process begins with the construc-
tion of an artefact whose primary function is to supply a visual metaphor for
experience of its referent. Interaction with this artefact has an open-ended and
Empirical Modelling and the Foundations of Artificial Intelligence 351

experiential character, but leads to the identification of stable patterns of in-


teraction involving specific observables and circumscribed agency. Within the
computational framework appropriate for EM, the modeller has unrestricted in-
teraction with this artefact, so that its closed-world quality is respected only
subject to discretion on the part of the modeller. It is possible to realise the
closed world behaviour using a conventional computational model however. At
this stage, a formal mathematical representation for the appropriate observables
has been developed.
The development of a closed-world model is only possible subject to being
able to construe a phenomenon in terms of patterns of interaction that can
be entirely preconceived. In such a model, there is no uncircumscribed agency.
This makes optimisations possible. For example, when we know that we are
concerned with a train travelling into the tunnel, rather than having to consider
the possibility of a train being placed in the tunnel, it is possible to specify that
the train becomes invisible as it enters the tunnel. This is computationally much
more efficient than maintaining a dependency between the visibility of the train
and its abstract location. From an EM perspective, this change of representation
is significant, as the model is then specific to particular contexts. In particular,
certain “what if?” scenarios are excluded.

Multi-agent modelling in EM. Typical applications combine EM activity in


first, second and third-person modes. This subsection reviews such applications
within a multi-agent framework.
In the EM framework, artefacts are most appropriately classified according
to the number of agents they involve. Informally, each agent represents a source
of potential state change that cannot be—or is not as yet—circumscribed. By
this criterion, the modeller is an archetypal agent. The variety of computational
abstractions and applications is reflected in the number and nature of agents
involved and whether their activity is or is not circumscribed. In a concurrent
engineering framework [1], or a commercial spreadsheet-based programming en-
vironment [62], there are many modeller agents. In programming a reactive sys-
tem [50], there are many agents whose interactions are yet to be circumscribed.
Traditional sequential programming can conveniently be regarded as involving
three agents: the programmer (uncircumscribed), the user (to be circumscribed)
and the computer (circumscribed). Conventional formal systems are embedded
within this framework as closed-world models in which there no uncircumscribed
agents. Requirements analysis and formal specification of systems are processes
by which we pass from agent-oriented models with uncircumscribed elements to
circumscribed models.
EM typically generates a family of views of a concurrent system [1]. In many
contexts, the modeller seeks a particular view—that of the objective observer
who has a comprehensive insight into the global behaviour of a system. Such a
view can only be developed in general (if indeed it can be developed at all) by a
very complex process of evolution in which empirical knowledge has a crucial role.
In principle, EM makes it possible to represent such an evolution by distinguish-
352 Meurig Beynon

ing between asserting what is observed and asserting what is believed (cf. [2]).
The computational forum for this representation is provided by the ADM [21],
in which the modeller can prescribe the privileges of agents and retain total dis-
cretion over how these privileges are exercised. The evolution process converges
if and when the modeller has specified a set of agents, privileges and criteria
for reliability of agent response that realise the observed or intended behaviour.
System implementation is then represented in this framework as replacement of
certain agents in the model by appropriate physical devices.
More generally, EM can be applied in a concurrent engineering context [1],
where independent views may be subject to conflict, as in Gruber’s shadow
box experiment [47]. To account for the process by which such views might be
reconciled through arbitration and management requires a hierarchical model
for agent interaction in which an agent at one level acts in the role of the human
modeller in relation to those at the level below [1]. The associated “dialectic
of form and process” is specified in terms of commitments on the part of the
modeller agents similar in character to those involved in system implementation.
Our investigation of a concurrent framework for EM of this nature remains at
an early stage, but has direct relevance to requirements analysis [18] and has
been used to devise simulations of insect behaviour illustrating Minsky’s Society
of Mind paradigm [56].

4.3 EM and the World of Pure Experience


The philosophical writings with perhaps the greatest relevance for first-person
EM can be found in the work of William James. The connection between his
agenda and that of the Empirical Modelling Project is clear from this preamble
to “A World of Pure Experience” [53]:
It is difficult not to notice a curious unrest in the philosophic atmosphere
of the time, a loosening of the old landmarks, a softening of oppositions,
a mutual borrowing from one another on the part of systems anciently
closed, and an interest in new suggestion, however vague, as if the one
thing sure were the inadequacy of the extant school-solutions. The dis-
satisfaction with these seems due for the most part to a feeling that they
are too abstract and academic. Life is confused and superabundant, and
what the younger generation appears to crave is more of the tempera-
ment of life in its philosophy, even though it were at some cost of logical
rigor and of formal purity.
James’s philosophic attitude of Radical Empiricism has a yet more intimate
affinity with EM. In some respects, this is best explained with reference to the
status of an EM artefact as embodying interaction that cannot be expressed in
a formal language. There is a practical communication difficulty familiar to all
who have tried to describe the experience of interacting with definitive scripts.
To make mischievous use of a blend that Turner would appreciate [74], it might
be said that some of the most significant and subtle experiential aspects of EM
interaction are nowhere more eloquently expressed than in James’s essays.
Empirical Modelling and the Foundations of Artificial Intelligence 353

Radical Empiricism and first person EM. In traditional empiricism, the


content of what is empirically given is characterised as ‘discrete sensory partic-
ulars’ [53]. On this basis, driver Scott’s perception of danger on seeing the red
flag cannot be viewed as a given. By taking this view, traditional empiricists
found themselves in very much the same context as the modern logicist: they
sought to derive fundamental features of commonsense experience from discrete
sensory particulars by analytical processes. Bird [25], in his philosophical com-
mentary on James, summarises James’s criticism of this stance: “Though the
aim of empiricists was to identify the humblest, most basic, particular elements
of the content of our experience they were driven in this way to deploy resources
which were of a highly abstract, theoretical kind.”
In Radical Empiricism, James takes the view that the given primitives have
to be recognised as typically more than discrete sensory inputs. In particular,
though traditional empiricism recognises disjunctions as empirically given, and
acknowledges separations between elements of our experience, it does not respect
the significant conjunctive relations that also pervade experience: “the relations
between things, conjunctive as well as disjunctive, are just as much matters of
experience, neither more nor less so, than the things themselves”. Amongst these
conjunctive relations, James cites identities, continuous transition, and “the most
intimate relation . . . the relation experienced between terms that form states of
mind”.
These ideas are quite coherent when transposed from James’s commentary on
Pure Experience to first-person interaction with a definitive script. Variables in
a script represent observables with identities. The changes to which their values
are subject through interaction illustrate continuous transition in James’s sense
(“this is the same observable, but, in the new context in which I am now con-
templating it, it has another value”). New observables that are introduced into
a particular script are recruited to the same state of mind (“I have observed—in
addition—that Killick has a red shirt”).
James’s views on the primitive status of identities are particularly interesting
in the context of the history of variables in mathematics (cf. [19]). For James,
“conception disintegrates experience utterly”. It is in this spirit that the arith-
metisation of geometry in the nineteenth century subverted the idea of a variable
as a changing quantity. The historical importance of the relationship between
dependency and the old-fasioned variable is highlighted by the Russian historian
Medvedev [60]: “In this mechanical picture of the world the essential, one might
even say definitive event was the concept of a law as a dependence between
variable quantities”. (The translator’s pun was presumably unintentional!)
The open-ended and indiscriminate way in which new observables and dif-
ferent viewpoints can be adjoined to a definitive script distinguishes EM from
conventional computer-based modelling in a striking fashion. It has its direct
counterpart in experience in the way in which it is possible to embellish so freely
on the observation of any particular state.
Two quotations serve to highlight the conceptual problems that confront the
logicist who insists on interpreting observations as if they were elementary propo-
354 Meurig Beynon

sitions. The first quotation is from James’s contemporary F. H. Bradley [27], a


strong rationalist opponent of James’s views:
. . . mere experience . . . furnishes no consistent view. [The direct products
of experience] I find that my intellect rejects because they contradict
themselves. They offer a complex of diversities conjoined in a way which
it feels is not its way and which it can not repeat as its own . . . For to
be satisfied, my intellect must understand, and it can not understand a
congeries [i.e. an aggregate] in the lump.
This quotation illustrates Bradley’s disposition towards separating obser-
vations in an attempt to rationalise the world of sensations. Viewed from his
perspective, conjunctions of such diversity appear to be contradictions.
James’s response to Bradley is a rich source of reflections on how observa-
tions are experienced. It has incidental interest as a comment on the subject of
‘consciousness’, currently so topical in popular theories of AI:
To be ‘conscious’ means not simply to be, but to be reported, known, to
have awareness of one’s being added to that being . . . The difficulty of
understanding what happens here is . . . not a logical difficulty: there is no
contradiction involved. It is an ontological difficulty rather. Experiences
come on an enormous scale, and if we take them all together, they come in
a chaos of incommensurable relations that we can not straighten out. We
have to abstract different groups of them, and handle these separately if
we are to talk of them at all. But how the experiences ever get themselves
made, or why their characters and relations are just such as appear, we
can not begin to understand.
James’s characterisation of Bradley’s difficulties as ontological rather than
logical accords with the account of definitive and logical variables above. In our
experience of manipulating and interpreting definitive scripts of several hun-
dred variables, definitions can seem to be chaotically organised, and do indeed
typically combine observables that are incommensurable (e.g. colours, lengths,
strings booleans). There is no natural ordering for the definitions in a script, and
it is typically necessary to abstract different groups to interpret them effectively.
This process of abstraction serves both to organise the script (e.g. to separate
the graphical elements in the screen display into windows, lines and text), and
to create experimental environments in which to study particular features of the
script and its referent in isolation (e.g. to examine the operation of the telegraph
outside the particular context of Figure 1). New features such as dependencies
certainly emerge in the EM process through empirical investigation, but EM
does not consider the underlying activity in the human brain.
Other prominent themes in EM are also represented in James. As Naur has
also remarked [63], James pioneered the experiential view of knowledge, and
recognised the importance of identifying concepts in psychological terms.
At the root of much of James’s reflection is the idea of associating experi-
ences within one mind. For him, knowledge is first apprehended as “[one] expe-
rience that knows another”. The idea of ‘bringing together experiences in one
Empirical Modelling and the Foundations of Artificial Intelligence 355

mind’ has much in common with the concept of blending that has been ex-
plored by Turner and others [74]. The ontological stance that James and Turner
adopt is consonant with EM: the foundations of intelligence are to be sought in
first-person experience, not in third-person abstractions. It is in this spirit that
Turner regards the formal definitions of metaphor by logicists (cf. [35]), and the
grammatical structure of a language, as sophisticated abstractions rather than
primitive building blocks of human intellect. For Turner, the blend and the story
(which to Bradley would doubtless have seemed so ‘contradictory’ in nature),
are simple experientially-centred primitives.

Beyond the first-person. James’s central focus in [53] is on a first-person


perspective. Where a rationalist approach introduces the absolute as a com-
mon point of reference in discussing such personal matters as the perception of
metaphor, James identifies a fundamental discontinuity “when I seek to make
a transition from an experience of my own to one of yours . . . I have to get on
and off again, to pass from the thing lived to another thing only conceived” (cf.
the strong distinction between ‘I’ and ‘you’ observed in Japanese, where certain
verbs, such as forms of ‘to want’, can only be used in the first-person [51]).
In EM, the discontinuities between first, second and third person are in the
first instance bridged by interaction with artefacts. All this is mediated by my
own experience, even though this experience is of congruences between experi-
ences that are first private, then of you, me and artefacts, and then of patterns
that can be construed as reliably executed protocols involving many of us to-
gether with our artefacts (cf. [15]).
James’s essays do not develop the ontology of second and third person in these
terms to the full extent that is needed to account for EM modelling activities
as a whole. His views nevertheless seem to endorse the perspective that EM
commends. Central to his position is a pragmatic stance on the classification of
experience: “subjectivity and objectivity are affairs not of what an experience is
aboriginally made of, but of its classification”.
As remarked above, many semantic categories of action are expressed by re-
definition in a definitive script, but the distinction between them is potentially a
matter of interpretation. It is possible that a feature first introduced into an arte-
fact to improve the visual effect subsequently acquires a meaning in the referent.
For instance, the point at which the engine triggers the treadle to reset the signal
may be introduced as an abstract concept and subsequently be associated with
a physical mechanism. The history of experimental science illustrates a similar
recategorisation of experience [48]. For instance, in early microscopy, it was diffi-
cult to distinguish between optical imperfections and features of the object under
examination. Likewise, in studying the Earth’s magnetic field, there was at one
time controversy over whether the Earth’s field changed over time, or whether
earlier measurements were inaccurate. These aspects of the process of construal
go beyond the challenge to the adequacy of a logicist theory cited by Kirsh
in [55]: vIz. that it should be stated relative to “the actual subject-independent
356 Meurig Beynon

properties of the domain”. From an EM perspective, such properties can only


be defined with reference to experiential criteria.
In EM, the computational setting of the ADM can be seen as a forum within
which reality and agency are pragmatically shaped. In a discussion that is di-
rectly relevant to agency in EM, James considers how “the real facts of activity”
should be construed. To paraphrase his account, he questions whether agency
resides in: “a consciousness of wider time-span than ours”, or “‘ideas’ strug-
gling with one another, [so that] prevalence of one set of them is the action”,
or “nerve-cells [so that] the resultant motor discharges are the acts achieved”.
In arbitrating between these hypotheses, he advocates pragmatism: “no philo-
sophic knowledge of the general nature and constitution of tendencies, or of the
relation of larger to smaller ones, can help us to predict which of all the various
competing tendencies that interest us in this universe are likeliest to prevail”.
Drawing on previous research in EM, it is easy to envisage an environment for
many designers concurrently interacting with a ‘virtual prototype’ for a complex
reactive system (cf. [2]). Each designer has a personal library of scripts, each of
which is more or less loosely associated with modelling a particular aspect of
the virtual prototype. Corporate actions are decided through arbitration and
mediated through dependencies and circumscribed ‘intelligent’ agents. In this
scenario, the status of the virtual prototype is similar to that of an objective
reality in Radical Empiricism. The agendas, conceptions and perceptions of the
individual designers are represented via their private scripts, but can only be
conceived by the external observer as an incomprehensible and incoherently dis-
tributed resource to be organised pragmatically in myriad ways for construction
and experiment.
Such an image is consistent with James’s description of his World of Pure
Experience [53]:

Taken as it does appear, our universe is to a large extent chaotic. No one


single type of connection runs through all the experiences that compose
it. . . . space-relations fail to connect minds . . . Causes and purposes ob-
tain only among special series of facts. The self-relation seems extremely
limited and does not link two different selves together. On the face of it,
if you should liken the universe of absolute idealism to an aquarium, a
crystal globe in which goldfish are swimming, you would have to com-
pare the empiricist universe to something more like one of those dried
human heads with which the Dyaks of Borneo deck their lodges. The
skull forms a solid nucleus; but innumerable feathers, leaves, strings,
beads, and loose appendages of every description float and dangle from
it, and, save that they terminate in it, seem to have nothing to do with
one other. Even so my experiences and yours float and dangle, terminat-
ing, it is true, in a nucleus of common perception, but for the most part
out of sight and irrelevant and unimaginable to one another.

In the light of this quotation, it is more understandable that Bradley deemed


the products of raw experience to be contradictory. For him, as a rationalist:
Empirical Modelling and the Foundations of Artificial Intelligence 357

“Truth . . . must be assumed ‘consistent’. Immediate experience has to be broken


into subjects and qualities, terms and relations, to be understood as truth at
all.” Inconsistency between the viewpoints of different designers may be explica-
ble in logicist terms, but—as the above extract reveals—James also anticipated
Minsky’s concept of conflicting agency within one mind [61]. Perhaps, as Kirsh
remarks in [55], it is simply old-fashioned and parochial to hope for a logic-based
denotational semantics for distributed AI systems.

4.4 Language from an EM Perspective


In Experiment and the Making of Meaning, Gooding writes:

Most received philosophies of science focus so exclusively on the literary


world of representations that they cannot begin to address the philo-
sophical problems arising from the interaction of these worlds: empirical
access as a source of knowledge, meaning and reference, and, of course,
realism.

Through its emphasis on first-person interaction with artefacts, EM is well-


placed to provide an experiential perspective complementary to the literary
world of representations. To meet Gooding’s challenge, it needs to do more than
this: to explain the profound significance of language from within the framework
of EM.

Language and learning in EM. The transition from first-person to third-


person world in EM is associated with the empiricist perspective on learning set
out in Box 3 (see [9]). As explained in [9], Box 3 is not intended to prescribe a
learning process, but to indicate the nature of the empirical processes that con-
struct a bridge from the realm of private experience to that of public knowledge.
In Box 3, formal language is represented as relying upon the activities that are
associated with the transition from first to second person, and from first to third.
An informal account of how the use of language is related to other more primi-
tive activities in Box 3 is given in [15]. This can be seen as parallel to the work
of Turner [73] and Gooding [47] on what Quine terms “semantic ascent” [66].
A relevant perspective on language and computer-based modelling is given
by Hirschheim al. [52], who consider the influence of philosophical paradigms
on Information Systems Development (ISD). Following Burrell and Morgan [34],
they identify four major paradigms that can be informally parametrised accord-
ing to the stance they take on objectivism and subjectivism and on order and
conflict. From the preceding discussion, it is apparent that EM does not respect
this classification. That is to say, the products of EM can combine elements from
any one of these quadrants, and EM activity can instigate transitions across any
of these boundaries. In broad terms, however, concurrent engineering in EM
originates in subjectivity and conflict (associated with neohumanism), and, like
logicism, aspires to construct models that are ordered and objective (associated
with functionalism).
358 Meurig Beynon

private experience / empirical / experiential

interaction with artefacts: identification of persistent features and contexts


practical knowledge: correlations between artefacts, acquisition of skills
identification of dependencies and postulation of independent agency
identification of generic patterns of interaction and stimulus-response mechanisms
non-verbal communication through interaction in a common environment
phenomenological uses of language
identification of common experience and objective knowledge
symbolic representations and formal languages: public conventions for interpretation

public knowledge / theoretical / formal

Table 3. An Empiricist Perspective on Learning

Traditional ISD has been centred on the functionalist paradigm. In so far as


logicism is essentially suited to specifying models that are ordered and objective,
modern trends away from functionalism are non-logicist in spirit. Functionalism
is charged with “treating the social world as if it were the natural world”. Where
subjectivism is dominant, the primality of language is regarded as the only re-
ality, whence reality is seen to be socially constructed.
In his construal of Faraday’s scientific work [47], Gooding is concerned to
account for both the scientist’s interaction with others and with nature. For
Gooding, an exclusively literary account of science can only be a caricature. In
his view, modern philosophy of science “lacks a plausible theory of observation”.
The principles of EM can be interpreted as inverting the priorities of func-
tionalism, so as to “treat the natural world as if it were the social world”. This
cannot be seen as leading to socially constructed reality, since it gives inter-
action with artefacts such significance. In this respect, EM seems well-oriented
towards Gooding’s requirements, denying the primacy of language, and perhaps
also offering prospects as a theory of observation.
In claiming that language is not primitive, EM is in the tradition of the re-
search of Brooks (“Intelligence without Representation” [32]), Turner [73], and
James [53]. It is a claim that is widely challenged by philosophers. The contem-
porary philosopher Bird’s criticism of James expresses typical concerns [25]:
We are left . . . with a puzzle about the role or sense of ‘pure experience’.
It is evidently of great importance in James’s account, and yet also to-
tally inarticulate. . . . [cf.] Wittgenstein ‘a nothing would do as well as
Empirical Modelling and the Foundations of Artificial Intelligence 359

something about which nothing can be said’. For James’s pure experi-
ence has to be such that nothing can be said about it, if it is to fulfil
the role for which it is cast. . . . Without some ability to characterise the
experiences we have no means of determining their identity, and even
no clear means of assessing James’s central claim that we are presented
with conjunctive relations in experience as well as atomic sensations.

EM is offered as a possible framework in which to address such concerns. The


use of artefacts in representation can be seen as a variation on Brooks’s theme
of using the real world as its own model [32]. This is the device by which circum-
scription and commitment is avoided. Logicist representations presume circum-
scription and commitment; after this presumption, a different ontology prevails.
Such models can be derived from EM artefacts in the interests of efficiency and
optimisation, but only at the cost of restricting to a closed-world functionality.
As Gooding remarks, agency—in the sense represented by first-person agency
in EM—is absent from a scientific theory. To use Smith’s words [70], mathemat-
ical modelling is promiscuous in character—the semantics of a model is not
influenced by what formal abstractions are used in its specification, be they
objects, relations, functions or agents. A plausible argument can be made that
whatever specific behaviour can be observed in an EM artefact can be realised
using a conventional program, and that on this basis there is no fundamental on-
tological distinction to be made. The most significant issue here is that, because
of its status as a construal, an EM artefact resembles a single point in a space
of referents within what the modeller deems to be a semantic neighbourhood.
The problems of formalising unbounded determinism [42] are also relevant: the
fact that the outcome of ‘choosing a positive integer’ can always be realised post
hoc by ‘choosing a positive integer not greater than a specified bound’ does not
mean that there is no distinction to be made between these two modes of choice.

The semantics of language from the EM perspective. Turner’s thesis that


“parable precedes grammar” [73] argues for an experiential framework within
which to interpret language. EM has many advantages over logicism as a foun-
dational framework for exploring this thesis. Making connections between the
primitive elements of EM and informal uses of language is a more profitable
activity than trying to express EM in formal terms.
The first, second and third person perspectives of EM provide different view-
points on language. In first-person EM, words, as represented by the variables in
a definitive script, serve as identifiers for observables. This resembles the interpre-
tation of language that was initially proposed by Wittgenstein, and subsequently
rejected in favour of a more sophisticated theory. In second-person EM, words
figure as references to variables that represent observables of mutual interest to
agents. The interpretation of such observables is negotiated in one of two ways:
within a concurrent system through the shaping of communication protocols be-
tween agents, and in the interaction between concurrent designers through social
convention. Third-person EM is the province to which formal language refers.
360 Meurig Beynon

In studying natural language semantics, it seems appropriate to recognise


that one word can be viewed from all three perspectives. In this way, there
appear to be natural bifurcations in the meaning of certain words. From a first-
person perspective, the word time can be used for timely (“now is the time
for me to act”); from the third-person perspective, it refers to the objective
time as on a clock. In the first-person, real means authentically experienced, and
in the third-person objectively existing. In the first-person, state means present
immediate context (“what a state I’m in!”), and in the third-person abstract
point in preconceived pattern of activity (“once things get to this state, there’s
no escape”).
From this viewpoint on language, it seems simplistic to assign discrete or
narrow interpretations to words. Just as EM activity can migrate from personal,
particular, and provisional worlds to the public, general and certain domain, so
it seems can the meaning of a word. Turner’s account of metaphor [74] begins
with a private experience that involves the blending of two spaces. To the logi-
cist [35], metaphor is expressed as a relationship between abstract structures
that cannot necessarily be directly apprehended. To make psychological sense
of the formal concept of metaphor, it is essential to trace its derivation from a
private apprehension. EM supplies the appropriate context for this process.
The subtlety of the transformation of viewpoints that is characteristic of
EM is not adequately represented by identifying first, second and third person
perspectives. In developing EM in an engineering design context, it has been
essential to stress the physical nature of the artefacts, and to invoke physical re-
alisability in generalising classical foundations. This emphasis is consonant with
Brooks’s concern for engaging with physical devices in AI, and with Smith’s the-
sis that “formality reduces to physical realisibility” [70]. But this is not all that
is required to represent our experience; as Turner has argued in [73], blending
is a process that of its essence operates in the literary mind, and first engages
with the products of our personal imagination. In accounting for individual in-
sights and skills, there may be no clear objective physical points of reference. It
is in this spirit that EM activity shifts independently along the private-public,
particular-general, provisional-certain axes. The meaning of words can likewise
migrate freely within this space.
The very concepts of learning and intelligence are profoundly connected with
this process of migration. This accounts for the difficulty in formalising concepts
such as agent [58] and metaphor [35] in a way that accords them the status of
fundamental concepts of AI. An agent manifests first as an object-like collection
of observables with integrity; then as an object that appears to be associated with
characteristic potential changes of state. In some circumstances, the behaviour of
an agent can apparently be so successfully circumscribed that its effect on state
can be captured in a mathematical model. In this context, the concept of agent-
as-object is hardly discriminating enough to be of interest, whilst that of ‘totally
circumscribed agent’ forfeits the essential autonomy of agency. Yet, within an
EM process, one and the same entity can migrate from one viewpoint to the
other. What is more, in the migration process, it becomes essential to construe
Empirical Modelling and the Foundations of Artificial Intelligence 361

the entity as an agent whose role can only be represented through recourse to
first-person agency. To create the circumscribed closed world, it is essential to
pass through the experimental realm.

5 Conclusion
Brooks has argued in [31,32] that significant progress towards the principal goals
of AI research—building intelligent systems and understanding intelligence—
demands a fundamental shift of perspective that rules out what is commonly
understood to be a hybrid logicist / non-logicist approach. This paper endorses
this view, contending that logicism relies upon relating the empirical and the
rational in a way that bars access to the primitive elements of experience that
inform intelligence. EM suggests a broader philosophical framework within which
theories are associated with circumscribed and reliably occurring patterns of
experience. The empirical processes that lead towards the identification and
formulation of such theories surely require human intelligence. The application of
such theories, taken in isolation, is associated with rule-based activity as divorced
from human intelligence as the execution of a computer program. Intelligence
itself lives and operates in experience that eludes and transcends theory.

Acknowledgments
I am much indebted to all the contributors to the Empirical Modelling Project,
and to Dominic Gehring, Theodora Polenta and Patrick Sun in particular, for
their valuable philosophical, theoretical and practical input. Most of all, I am
indebted to Steve Russ, whose constructive criticism and ideas have been crucial
in identifying the essential character of EM. I also wish to thank Mike Luck for
several useful references and feedback. The idea of relating EM to first-, second-
and third-person perspectives owes much to several workshop participants, no-
tably Joseph Goguen, Kerstin Dautenhahn and Chrystopher Nehaniv. I have
also been much encouraged and influenced by Mark Turner’s exciting ideas on
blending and the roots of language. I am especially grateful to the Programme
Committee and the Workshop sponsors for their generous invitation and finan-
cial support.

References
1. V. D. Adzhiev, W. M. Beynon, A. J. Cartwright, and Y. P. Yung. A computational
model for multi-agent interaction in concurrent engineering. In Proc. CEEDA’94,
pages 227–232. Bournemouth University, 1994. 348, 349, 351, 351, 352, 352
2. V. D. Adzhiev, W. M. Beynon, A. J. Cartwright, and Y. P. Yung. A new computer-
based tool for conceptual design. In Proc. Workshop Computer Tools for Concep-
tual Design. University of Lancaster, 1994. 349, 352, 356
3. V.D. Adzhiev and A. Rikhlinsky. The LSD engine. Technical report, Moscow
Engineering Physics Institute, 1997. 342
362 Meurig Beynon

4. J. A. Allderidge, W. M. Beynon, R. I. Cartwright, and Y. P. Yung. Enabling


technologies for empirical modelling in graphics. Research Report 329, Department
of Computer Science, University of Warwick, 1997. 334, 348
5. M.D. Atkinson and et al. The object-oriented database manifesto. In Proc Int
Conf on Deductive and Object-Oriented Databases, pages 40–57, 1989. 334
6. J. Backus. Can programming be liberated from the Von Neumann style? Com-
munications of the ACM, 21(8):613–641, 1978. 333
7. W. M. Beynon. Programming principles for the semantics of the semantics of
programs. Research Report 205, Department of Computer Science, University of
Warwick, February 1992. 348
8. W. M. Beynon. Agent-oriented modelling and the explanation of behaviour. In
Proc. International Workshop Shape Modelling Parallelism, Interactivity and Ap-
plications. University of Aizu, Japan, September 1994. 348
9. W. M. Beynon. Empirical modelling for educational technology. In Proc Cognitive
Technology ’97, IEEE, pages 54–68, 1997. 334, 345, 348, 348, 357, 357
10. W. M. Beynon. Modelling state in mind and machine. Research Report 337,
Department of Computer Science, University of Warwick, 1998. 345, 348, 349
11. W. M. Beynon, A. J. Cartwright, and Y. P. Yung. Databases from an agent-oriented
perspective. Research Report 278, Department of Computer Science, University of
Warwick, January 1994. 348
12. W. M. Beynon and R. I. Cartwright. Empirical modelling principles for cognitive
artefacts. In Proc. IEE Colloquium: Design Systems with Users in Mind: The Role
of Cognitive Artefacts, December 1995. 348
13. W. M. Beynon and R. I. Cartwright. Empirical modelling principles in application
development for the disabled. In Proc. IEE Colloquium Computers in the Service
of Mankind: Helping the Disabled, March 1997. 350
14. W. M. Beynon and M. S. Joy. Computer programming for noughts and crosses:
New frontiers. In Proc. PPIG94, pages 27–37. Open University, January 1994. 348
15. W. M. Beynon, P. E. Ness, and S. Russ. Worlds before and beyond words. Research
Report 331, Department of Computer Science, University of Warwick, 1995. 348,
355, 357
16. W. M. Beynon, M. T. Norris, R. A. Orr, and M. D. Slade. Definitive specification
of concurrent systems. In Proc. UKIT 1990, IEE Conference Publications 316,
pages 52–57, 1990. 349
17. W. M. Beynon, M.T. Norris, S.B. Russ, M.D. Slade, Y. P. Yung, and Y.W. Yung.
Software construction using definitions: An illustrative example. Research Report
147, Department of Computer Science, University of Warwick, September 1989.
348
18. W. M. Beynon and S. Russ. Empirical modelling for requirements. Research
Report 277, Department of Computer Science, University of Warwick, September
1994. 348, 352
19. W. M. Beynon and S. B. Russ. Variables in mathematics and computer science.
Research Report 141, Department of Computer Science, University of Warwick,
1989. 353
20. W. M. Beynon and S. B. Russ. The interpretation of states: a new foundation for
computation? Technical report, University of Warwick, February 1992. 348
21. W. M. Beynon, M. D. Slade, and Y. W. Yung. Parallel computation in definitive
models. In Proc. CONPAR88, pages 359–367, June 1988. 342, 352
22. W. M. Beynon and Y. P. Yung. Definitive interfaces as a visualization mechanism.
In Proc. GI90, pages 285–292, 1990. 349
Empirical Modelling and the Foundations of Artificial Intelligence 363

23. W. M. Beynon and Y. W. Yung. Implementing a definitive notation for interactive


graphics. In New Trends in Computer Graphics, pages 456–468. Springer-Verlag,
1988. also University of Warwick Computer Science Research Report 111. 342
24. W.M. Beynon. Definitive notations for interaction. In Proc. HCI’85. Cambridge
University Press, 1985. 341
25. G. Bird. William James. Routledge and Kegan Paul, 1986. 353, 358
26. G. Birtwistle and et al. Simula Begin. Chartwell-Bratt, 1979. 333
27. F.H. Bradley. Appearance and Reality. Oxford University Press, 9th edition, 1930.
354
28. P. Brödner. The two cultures in engineering. In Skill, Technology and Enlighten-
ment, pages 249–260. Springer-Verlag, 1995. 332
29. F. P. Brooks. No silver bullet: Essence and accidents of software engineering. IEEE
Computer, 20(4):10–19, 1987. 333
30. F. P. Brooks. The Mythical Man-Month Revisited: Essays on Software Engineering.
Addison-Wesley, 1995. 323
31. R. A. Brooks. Intelligence without reason. In Proc. IJCAI-91, pages 569–595,
1991. 323, 323, 347, 347, 361
32. R. A. Brooks. Intelligence without representation. Artificial Intelligence, 47:139–
159, 1991. 323, 347, 347, 358, 359, 361
33. A. W. Brown. Object-oriented Databases: Applications in Software Engineering.
McGraw-Hill, 1991. 334
34. G. Burrell and G. Morgan. Sociological Paradigms and Organizational Analysis.
Heinemann, London, 1979. 357
35. J.A. Campbell and J. Wolstencroft. Structure and significance of analogical rea-
soning. AI in Medicine, 8(2):103–118, 1996. 343, 343, 348, 355, 360, 360
36. K. M. Chandy and J. Misra. Parallel Program Design: a Foundation. Addison-
Wesley, 1988. 333
37. E. F. Codd. The relational model for large shared data banks. Communications
of the ACM, 13(6):377–387, 1970. 334
38. J. Cohen and I. Stewart. The Collapse of Chaos: Finding Simplicity in a Complex
World. Viking Penguin, 1994. 333
39. C. J. Date and H. Darwen. The third database manifesto. Databse Programming
and Design, 8(1), 1995. 334
40. S.V. Denneheuvel. Constraint-solving on Database Systems: Design and Imple-
mentation of the Rule Language RL/1. CWI Amsterdam, 1991. 334
41. P. Denning and et al. Computing as a discipline. Communications of the ACM,
40(5):9–23, 1997. 347
42. E.W. Dijkstra. A Discipline of Programming. Prentice Hall, 1976. 359
43. H.L. Dreyfus. What Computers Still Can’t Do: A Critique of Artificial Reason.
MIT press, 1992. 336
44. K. Forbus, D.Gentner, A.B. Markman, and R.W. Ferguson. Analogy just looks
like level perception: Why a domain-general approach to analogical mapping is
right. Journal of Experimental and Theoretical Artificial Intelligence, 10(2):231–
257, 1998. 343
45. D. K. Gehring, Y. P. Yung, R. I. Cartwright, W. M. Beynon, and A. J. Cartwright.
Higher-order constructs for interactive graphics. In Proc. Eurographics UK Chap-
ter, 14th Annual Conference, pages 179–192, 1996. 348
46. D. Gentner. Structure-mapping:a theoretical framework for analogy. Cognitive
Science, 3:155–170, 1983. 343
47. D. Gooding. Experiment and the Making of Meaning. Kluwer, 1990. 344, 347,
348, 352, 357, 358
364 Meurig Beynon

48. I. Hacking. Representing and Intervening: Introductory Topics in the Philosophy


of Natural Science. Cambridge University Press, 1983. 355
49. D. Harel. On visual formalisms. ACM Comms., pages 514 – 530, May 1988. 333
50. D. Harel. Biting the silver bullet: Towards a brighter future for software develop-
ment. IEEE Computer, January 1992. 333, 351
51. M. Hiraga. personal communication. 355
52. R. Hirschheim, H. K. Klein, and K. Lyytinen. Information Systems Development
and Data Modelling: Conceptual and Philosophical Foundations. Cambridge Uni-
versity Press, 1995. 323, 347, 347, 348, 348, 357
53. W. James. Essays in Radical Empiricism. Bison Books, 1996. 347, 348, 348, 352,
353, 355, 356, 358
54. W. Kent. Data and Reality. North-Holland, 1978. 334, 348
55. D. Kirsh. Foundations of AI: the big issues. Artificial Intelligence, 47:3–30, 1991.
324, 324, 325, 326, 326, 329, 334, 336, 342, 343, 344, 344, 345, 346, 355, 357
56. N.S. Lam. Agent-oriented modelling and societies of agents. Master’s thesis, De-
partment of Computer Science, University of Warwick, September 1993. 352
57. D.B. Lenat and E.A. Feigenbaum. On the thresholds of knowledge. Artificial
Intelligence, 47(1):185–250, 1991. 323
58. M. Luck and M. d’Inverno. A formal framework for agency and autonomy. In
Proc. 1st Inter. Conf. on Multi-Agent Systems, pages 254–260. MIT Press, 1995.
348, 360
59. D. McDermott. A critique of pure reason. Comput Intell, 3:151–160, 1987. 322
60. F. Medvedev. Scenes from the History of Real Functions, volume 7 of Science
Networks - Historical Studies. Birhauser-Verlag, 1991. 353
61. M.Minsky. The Society of Mind. Picador, London, 1988. 333, 348, 357
62. B. Nardi. A Small Matter of Programming: Perspectives on End User Computing.
MIT Press, 1993. 333, 351
63. P. Naur. Knowing and the Mystique of Logic and Rules. Kluwer Academic Pub-
lishers, 1995. 326, 341, 347, 348, 354
64. P. E. Ness. Creative Software Development — An Empirical Modelling Framework.
PhD thesis, Department of Computer Science, University of Warwick, September
1997. 348
65. A. Partington, editor. the Oxford Dictionary of Quotations. Oxford University
Press, 1992. 349
66. W.V. Quine. Word and Object. MIT Press, 1960. 357
67. L.T.C. Rolt. Red for Danger. Pan Books, 4th edition, 1982. 327, 338
68. J. Rumbaugh et al. Object-Oriented Modeling and Design. Prentice-Hall, 1991.
333
69. B. Russell. The ABC of Relativity. George Allen and Unwin, 1969. 337
70. B.C. Smith. Two lessons in logic. Computer Intell. Vol. 3, pages 214–218, 1987.
343, 343, 347, 347, 348, 359, 360
71. B.C. Smith. The owl and the electric encyclopaedia. Artificial Intelligence, 47:251–
288, 1991. 347
72. M. Stonebraker and et al. The third generation database system manifesto. ACM
SIGMOD Record, 19(3), 1990. 334
73. M. Turner. The Literary Mind. Oxford University Press, 1996. 347, 357, 358, 359,
360
74. M. Turner. Forging connections. In this volume, 1998. 343, 352, 355, 360
75. P. Wegner. Why interaction is more powerful than algorithms. Communications
of the ACM, 40(5):80–91, 1997. 323
Empirical Modelling and the Foundations of Artificial Intelligence 365

76. D. West. Hermeneutic computer science. Communications of the ACM, 40(4):115–


116, April 1996. 323
77. M. Wooldridge and N. R. Jennings. Intelligent agents: Theory and practice. Knowl-
edge Engineering Review, 10(2):115–152, 1995. 333
Communication as an Emergent Metaphor for Neuronal
Operation
1 1 2
Slawomir J. Nasuto , Kerstin Dautenhahn and Mark Bishop

Department of Cybernetics
University of Reading, Reading, RG2 6AE, UK,
1
{sjn, kd}@cyber.rdg.ac.uk,
2
J.M.Bishop@reading.ac.uk

Abstract. The conventional computational description of brain operations has to


be understood in a metaphorical sense. In this paper arguments supporting the
claim that this metaphor is too restrictive are presented. A new metaphor more
accurately describing recently discovered emergent characteristics of neuron
functionality is proposed and its implications are discussed. A connectionist
system fitting the new paradigm is presented and its use for attention modelling
briefly outlined.

1 Introduction

One of the important roles of metaphor in science is to facilitate understanding of


complex phenomena. Metaphors should describe phenomena in an intuitively
understandable way that captures their essential features. We argue that a description
of single neurons as computational devices does not capture the information
processing complexity of real neurons and argue that describing them in terms of
communication could provide a better alternative metaphor. These claims are
supported by recent discoveries showing complex neuronal behaviour and by
fundamental limitations of established connectionist cognitive models. We suggest
that real neurons operate on richer information than provided by a single real number
and therefore their operation cannot be adequately described in standard Euclidean
setting. Recent findings in neurobiology suggest that, instead of modelling the neuron
as a logical or numerical function, it could be described as a communication device.
The prevailing view in neuroscience is that neurons are simple computational
devices, summing up their inputs and calculating a non-linear output function.
Information is encoded in the mean firing rate of neurons which exhibit narrow
specialisation - they are devoted to processing a particular type of input information.
Further, richly interconnected networks of such neurons learn via adjusting inter-
connection weights. In the literature there exist numerous examples of learning rules
and architectures, more or less inspired by varying degrees of biological plausibility.

C. Nehaniv (Ed.): Computation for Metaphors, Analogy, and Agents, LNCS 1562, pp.365 -379, 1999.
 Springer-Verlag Berlin Heidelberg 1999
366 Slawomir J. Nasuto et al.

Almost from the very beginning of connectionism, researchers were fascinated by


computational capabilities of such devices [1,2].
The revival of the connectionism in the mid-eighties featured increased interest in
analysing the properties of such networks [3], as well as in applying them to numerous
practical problems [4]. At the same time the same devices were proposed as models of
cognition capable of explaining both higher level mental processes [5] and low level
information processing in the brain [6].
However, these promises were based on the assumption that the computational
model captures all the important characteristics of real biological neurons with respect
to information processing. We will indicate in this article that very recent advances in
neuroscience appear to invalidate this assumption. Neurons are much more complex
than was originally thought and thus networks of oversimplified model neurons are
orders of magnitude below complexity of real neuronal systems. From this it follows
that current neural network ‘technological solutions’ capture only superficial
properties of biological networks and further, that such networks may be incapable of
providing a satisfactory explanation of our mental abilities.
We propose to compliment the description of a single neuron as a computational
device by an alternative, more ’natural’ metaphor :- we hypothesise that a neuron can
be better and more naturally described in terms of communication rather than purely
computation. We hope that shifting the paradigm will result in escaping from local
minimum caused by treating neurons and their networks merely as computational
devices. This should allow us to build better models of the brain’s functionality and to
build devices that reflect more accurately its characteristics. We will present a simple
connectionist model, NEural STochastic diffusion search netwORk (NESTOR), fitting
well in this new paradigm and will show that its properties make it interesting from
both the technological and brain modelling perspectives.
In a recent paper [7], Selman et al. posed some challenge problems for Artificial
Intelligence. In particular Rodney Brooks suggested revising the conventional
McCulloch Pitts neuron model and investigation of the potential implications (with
respect to our understanding of biological learning) of new neuron models based on
recent biological data. Further, Selman claimed that the supremacy of standard
heuristic, domain specific search methods of Artificial Intelligence need to be revised
and suggested that recent investigation of fast general purpose search procedures has
opened a promising alternative avenue. Furthermore, in the same paper Horvitz posed
the development of richer models of attention as an important problem, as all
cognitive tasks “... require costly resources” and “controlling the allocation of
computational resources can be a critical issue in maximising the value of a situated
system’s behaviour.”
We claim that the new network presented herein addresses all three challenges
posed in the above review paper [7], as it is isomorphic in operation to Stochastic
Diffusion Search, a fast, generic probabilistic search procedure which automatically
allocates information processing resources to search tasks.
Communication as an Emergent Metaphor for Neuronal Operation 367

2 Computational Metaphor

The emergence of connectionism is based on the belief that neurons can be treated as
simple computational devices [1]. Further, the assumption that information is encoded
as mean firing rate of neurons was a base assumption of all the sciences related to
brain modelling. The initial boolean McCulloch-Pitts model neuron was quickly
extended to allow for analogue computations.
The most commonly used framework for connectionist information representation
and processing is a subspace of a Euclidean space. Learning in this framework is
equivalent to extracting an appropriate mapping from the sets of existing data. Most
learning algorithms perform computations which adjust neuron interconnection
weights according to some rule, adjustment in a given time step being a function of a
training example. Weight updates are successively aggregated until the network
reaches an equilibrium in which no adjustments are made (or alternatively stopping
before the equilibrium, if designed to avoid overfitting). In any case knowledge about
the whole training set is stored in final weights. This means that the network does not
possess any internal representation of the (potentially complex) relationships between
training examples. Such information exists only as a distribution of weight values. We
do not consider representations of arity zero predicates, (e.g. those present in
NETtalk [8]), as sufficient for representation of complex relationships. These
limitations result in poor internal knowledge representation making it difficult to
interpret and analyse the network in terms of causal relationships. In particular it is
difficult to imagine how such a system could develop symbolic representation and
logical inference (cf. the symbolic/connectionist divide). Such deficiencies in the
representation of complex knowledge by neural networks have long been recog-
nised [9,10,11].
The way in which data are processed by a single model neuron is partially
responsible for these difficulties. The algebraic operations that it performs on input
vectors are perfectly admissible in Euclidean space but do not necessarily make sense
in terms of the data represented by these vectors. Weighted sums of quantities,
averages etc., may be undefined for objects and relations of the real world, which are
nevertheless represented and learned by structures and mechanisms relying heavily on
such operations. This is connected with a more fundamental problem missed by the
connectionist community - the world (and relationships between objects in it) is
fundamentally non-linear. Classical neural networks are capable of discovering non-
linear, continuous mappings between objects or events but nevertheless they are
restricted by operating on representations embedded in linear, continuous structures
(Euclidean space is by definition a finite dimensional linear vector space equipped
with standard metric). Of course it is possible in principle that knowledge from some
domain can be represented in terms of Euclidean space. Nevertheless it seems that
only in extremely simple or artificial problems the appropriate space will be of small
dimensionality. In real life problems spaces of very high dimensionality are more
likely to be expected. Moreover, even if embedded in an Euclidean space, the actual
set representing a particular domain need not be a linear subspace, or be a connected
subset of it. Yet these are among the topological properties required for the correct
368 Slawomir J. Nasuto et al.

operation of classical neural nets. There are no general methods of coping with such
situations in connectionism. Methods that appear to be of some use in such cases seem
to be freezing some weights (or restriction of their range) or using a ‘mixture of
experts or gated networks’ [12]. However, there is no a principled way describing how
to perform the former. Mixture of experts models appear to be a better solution, as
single experts could in principle explore different regions of a high dimensional space
thus their proper co-operation could result in satisfactory behaviour. However, such
architectures need to be individually tailored to particular problems. Undoubtedly
there is some degree of modularity in the brain, however it is not clear that the brain’s
operation is based solely on a rigid modularity principle. In fact we will argue in the
next section that biological evidence seems to suggest that this view is at least
incomplete and needs revision.
We feel that many of the difficulties outlined above follow from the underlying
interpretation of neuron functioning in computational terms, which results in entirely
numerical manipulations of knowledge by neural networks. This seems a too
restrictive scheme.
Even in computational neuroscience, existing models of neurons describe them as
geometric points although neglecting the geometric properties of neurons, (treating
dendrites and axons as merely passive transmission cables), makes such models very
abstract and may strip them of some information processing properties. In most
technical applications of neural networks the abstraction is even higher - axonic and
dendritic arborisations are completely neglected - hence they cannot in principle
model the complex information processing taking place in these arbors [13].
We think that the brain functioning is best described in terms of non-linear
dynamics but this means that processing of information is equivalent to some form of
temporal evolution of activity. The latter however may depend crucially on geometric
properties of neurons as these properties obviously influence neuron activities and
thus whole networks. Friston [14] stressed this point on a systemic level when he
pointed out to the importance of appropriate connections between and within regions -
but this is exactly the geometric (or topological) property which affects the dynamics
of the whole system. Qualitatively the same reasoning is valid for single neurons.
Undoubtedly, model neurons which do not take into account geometrical effects
perform some processing, but it is not clear what this processing has to do with the
dynamics of real neurons. It follows that networks of such neurons perform their
operations in some abstract time not related to the real time of biological networks
(We are not even sure if time is an appropriate notion in this context, in case of
feedforward nets ‘algorithmic steps’ would be probably more appropriate). This
concerns not only classical feedforward nets which are closest to classical algorithmic
processing but also many other networks with more interesting dynamical behaviour,
(e.g. Hopfield or other attractor networks).
Of course one can resort to compartmental models but then it is apparent that the
description of single neurons becomes so complex that we have to use numerical
methods to determine their behaviour. If we want to perform any form of analytical
investigation then we are bound to simpler models.
Communication as an Emergent Metaphor for Neuronal Operation 369

Relationships between real life objects or events are often far more complex for
Euclidean spaces and smooth mappings between them to be the most appropriate
representations. In reality it is usually the case that objects are comparable only to
some objects in the world, but not to all. In other words one cannot equip them with a
‘natural’ ordering relation. Representing objects in a Euclidean space imposes a
serious restriction, because vectors can be compared to each other by means of
metrics; data can be in this case ordered and compared in spite of any real life
constraints. Moreover, variables are often intrinsically discrete or qualitative in nature
and in this case again Euclidean space does not seem to be a particularly good choice.
Networks implement parameterised mappings and they operate in a way implicitly
based on the Euclidean space representation assumption - they extract information
contained in distances and use it for updates of weight vectors. In other words,
distances contained in data are translated into distances of consecutive weight vectors.
This would be fine if the external world could be described in terms of Euclidean
space however it would be a problem if we need to choose a new definition of
distance each time new piece of information arrives. Potentially new information can
give a new context to previously learnt information, with the result that concepts
which previously seemed to be not related now become close. Perhaps this means that
our world model should be dynamic - changing each time we change the definition of
a distance? However, weight space remains constant - with Euclidean distance and
fixed dimensionality. Thus the overall performance of classical networks relies heavily
on their underlying model of the external world. In other words, it is not the networks
that are ‘smart’, it is the choice of the world model that matters. Networks need to
obtain ‘appropriate’ data in order to ‘learn’, but this accounts to choosing a static
model of the world and in such a situation networks indeed can perform well. Our
feeling is that, to a limited extent, a similar situation appears in very low level sensory
processing in the brain, where only the statistical consistency of the external world
matters. However, as soon as the top down information starts to interact with the
bottom up processing the semantic meaning of objects becomes significant and this
can often violate the assumption of static world representations.
It follows that classical neural networks are well equipped only for tasks in which
they process numerical data whose relationships can be well reflected by Euclidean
distance. In other words classical connectionism can be reasonably well applied to the
same category of problems which could be dealt with by various regression methods
from statistics. Moreover, as in fact classical neural nets offer the same explanatory
power as regression, they can be therefore regarded as its non-linear counterparts. It is
however doubtful whether non-linear regression constitutes a satisfactory (or the most
general) model of fundamental information processing in natural neural systems.
Another problem follows from the rigidity of neurons’ actions in current
connectionist models. The homogeneity of neurons and their responses is the rule
rather than the exception. All neurons perform the same action regardless of individual
conditions or context. In reality, as we argue in the next section, neurons may
condition their response on the particular context, set by their immediate
surroundings, past behaviour and current input etc. Thus, although in principle
identical, they may behave as different individuals because their behaviour can be a
370 Slawomir J. Nasuto et al.

function of both morphology and context. Hence, in a sense, the way conventional
neural networks operate resembles symbolic systems - both have built in rigid
behaviour and operate in an a priori determined way. Taking different ‘histories’ into
account would allow for the context sensitive behaviour of neurons - in effect for
existence of heterogeneous neuron populations.
Standard nets are surprisingly close to classical symbolic systems although they
operate in different domains: the latter operating on discrete, and the former on
continuous spaces. The difference between the two paradigms in fact lies in the nature
of representations they act upon, and not so much in the mode of operation. Symbolic
systems manipulate whole symbols at once, whereas neural nets usually employ sub-
symbolic representations in their calculations. However, both execute programs,
which in case of neural networks simply prescribe how to update the interconnection
weights in the network. Furthermore, in practice neural networks have very well
defined input and output neurons, which together with their training set, can be
considered as a closed system relaxing to its steady state. In modular networks each of
the ‘expert’ nets operates in a similar fashion, with well defined inputs and outputs
and designed and restricted intercommunication between modules. Although many
researchers have postulated a modular structure for the brain [15], with distinct
functional areas being black boxes, more recently some [16, 17] have realised that the
brain operates rather like an open system. And due to the ever changing conditions a
system with extensive connectivity between areas and no fixed input and output. The
above taxonomy resembles a similar distinction between algorithmic and interactive
systems in computer science, the latter possessing many interesting properties [18].

3 Biological Evidence

Recent advances in neuroscience provide us with evidence that neurons are much
more complex than previously thought [19]. In particular it has been hypothesised that
neurons can select input depending on its spatial location on dendritic tree or temporal
structure [19,20,21]. Some neurobiologists suggest that synapses can remember the
history of their activation or, alternatively, that whole neurons discriminate spatial
and/or temporal patterns of activity [21].
Various authors have postulated spike encoding of information in the brain
[22,23,24]. The speed of information processing in some cortical areas, the small
number of spikes emitted by many neurons in response to cognitive tasks [25,26,27],
together with very random behaviour of neurons in vivo [28], suggest that neurons
would not be able to reliably estimate mean firing rate in the time available. Recent
results suggest that firing events of single neurons are reproducible with very high
reliability and interspike intervals encode much more information than firing
rates [29]. Others found that neurons in isolation can produce, under artificial
stimulation, very regular firing with high reproducibility rate suggesting that the
apparent irregularity of firing in vivo may follow from interneuronal interactions or
may be stimulus dependent [30].
Communication as an Emergent Metaphor for Neuronal Operation 371

The use of interspike interval coding enables richer and more structured
information to be transmitted and processed by neurons. The same mean firing rate
corresponds to a combinatorial number of interspike interval arrangements in a spike
train. What would previously be interpreted as a single number can carry much more
information in temporal coding. Moreover, temporal coding enables the system to
encode unambiguously more information than is possible with a simple mean firing
rate. Different parts of a spike train can encode qualitatively different information. All
these possibilities have been excluded in the classical view of neural information
processing. Even though a McCulloch-Pitts neuron is sufficient for production of
spike trains, spike trains by themselves do not solve the binding problem (i.e. do not
explain the mechanism responsible for integration of object features constituting an
which are processed in spatially and temporally distributed manner). However,
nothing would be gained, except possibly processing speed, if the mean firing rate
encoding would be merely replaced by temporal encoding as the underlying
framework of knowledge representation and processing still mixes qualitatively
different information by simple algebraic operations.
The irregular pattern of neuron activity in vivo [28] is inconsistent with temporal
integration of excitatory post synaptic potentials (EPSP’s) assumed in classical model
neurons. It also introduces huge amounts of noise, thus making any task to be
performed by neurons, were they unable to differentially select their input, extremely
difficult. On the other hand, perhaps there is a reason for this irregular neuronal
behaviour. If neurons are coincidence detectors rather than temporal
integrators [19,22] then the randomness of neuron firing is an asset rather than
liability.
One of the most difficult and as yet unresolved problems of computational
neuroscience is that of binding distinct features of the same object into a coherent
percept. However, in [31], Nelson postulates that it is the traditional view
‘transmission first, processing later’, that introduces the binding problem. On this view
processing cannot be separated from transmission and, when entangled with
transmission performed by neural assemblies spanning multiple neuronal areas, it
makes the binding problem non-existent [32].

4 Communication Metaphor

The brain’s computational capabilities have to be understood in a metaphorical sense


only. All matter, from the simplest particles to the most complex living organisms
undergoes physical processes which, in most sciences, are not given any special
interpretation.
However, when it comes to nervous systems the situation changes abruptly. In
neuroscience, and what follows in connectionism, it is assumed that neurons and their
systems possess special computational capabilities, which are not attributed to other,
even the most complex, biological substances (e.g. DNA). This is a very
anthropomorphic viewpoint because, by definition, computation is an intentional
notion and it assumes existence of some demon that able to interpret it. Thus we claim
372 Slawomir J. Nasuto et al.

that the very assumption of computational capabilities of real neurons leads to


homuncular theories of mind. In our opinion to say that neurons perform
computations is equivalent to saying that e.g., a spring extended by a moderate force
computes, according to Hook’s law, how much it should deform. We need to stress
that our stance does not imply that one should abandon using computational tools for
modelling and analysing the brain. However, one should be aware of their limitations.
On the other hand, although also metaphorical, treating neurons as communicating
with each other captures their complex (and to us fundamental), capability of
modifying behaviour depending on the context. Our claim is that communication as
biological information processing could describe more compactly complex neuronal
operations and provide us with intuitive understanding of the meaning of these
operations (albeit we do not impose that this meaning would be accessible to single
neurons).
Although interpreting neurons as simple numerical or logical functions greatly
simplifies their description, it introduces however problems at the higher levels of
neural organisation. Moreover, recent neurobiological evidence supports our claim
that the idea of neurons being simple computational devices has to be reconsidered.
We argue that communication better describes neuron functionality than
computation. In contrast to computation, communication is not a merely
anthropomorphic projection on reality. Even relatively simple organisms communicate
with each other or with the environment. This ability is essential for their survival and
it seems indispensable for more complex interactions and social behaviour of higher
species. The role of communication in human development and in social interactions
cannot be overestimated [33]. It seems therefore that communication is a common
process used by living systems on all levels of their organisation.
In our opinion the most fundamental qualitative properties of neurons postulated
recently are their capability to select different parts of converging signals and the
capability of choosing which signals to consider in the first place. Thus neurons can be
said to communicate to each other simple events and to select information which they
process or transmit further. The selection procedure could be based on some criteria
dependent on the previous signals’ properties such as where from and at what moment
the information arrived. This would account for neurons’ spatio-temporal filtering
capacity. Also it would explain the amount of noise observed in the brain and apparent
contrast between reliability of neural firing in vitro and their random behaviour in
vivo. What is meaningful information for one neuron can be just noise for another.
Moreover, such noise would not deter functionality of neurons that are capable of
responding to selected information.
One could object to our proposal using parsimony principle - why to introduce an
extra level of complexity if it has been shown that networks of simple neurons can
perform many of the tasks attributed to biological networks? However, we argue that
such a position addresses a purely abstract problem, which may have nothing to do
with brain modelling. What it is possible to compute with artificial neurons is, in
principle, a mathematical problem; how the same functionality is achieved in the brain
is another matter. The information processing capacity of dendritic trees is a scientific
fact not merely a conjecture. Instead of computational parsimony we propose an
Communication as an Emergent Metaphor for Neuronal Operation 373

‘economical’ one: the brain facilitates the survival of its owner and for that purpose
uses all available resources to processes information.

5 Architecture of NESTOR

Taking into account the above considerations we adopt a model neuron that inherently
operates on rich information (encoded in spike trains) rather than a simple mean firing
rate. Our neuron simply accepts information for processing dependent on conditions
imposed by a previously accepted spike train. It compares corresponding parts of the
spike trains and, depending on the result, further distributes the other parts. Thus
neurons do not perform any numerical operations on the obtained information - they
forward its unchanged parts to other neurons. Their power relies on the capability to
select appropriate information from the incoming input depending on the context set
by their history and the activity of other neurons.
Although we define a single neuron as a functional unit in our architecture we are
aware that the debate on what constitutes such a unit is far from being resolved. We
based this assumption on our interpretation of neurobiological evidence. However, we
realise that even among neuroscientist there is no agreement as to what constitutes
such elementary functional unit, (proposals range from systems of neurons or
microcircuits [34], through single neurons [35] to single synapses [13]). In fact it is
possible that qualitatively similar functional units might be found on different levels of
brain organisation.
In the characteristics of this simple model neuron we have tried to capture what we
consider to be fundamental properties of neurons. Although our model neurons are
also dimensionless, nevertheless in their information processing characteristics we
included what might follow for real neurons from their geometric properties (namely
ability to distinguish their inputs - spatio-temporal filtering).
A network of such model neurons was proposed in [36]. The NEural STochastic
diffusion search netwORk (NESTOR) consists of an artificial retina, a layer of fully
connected matching neurons and retinotopically organised memory neurons. Matching
neurons are fully connected to both retina and memory neurons.
It is important to note that matching neurons obtain both ascending and descending
inputs. Thus their operation is influenced by both bottom-up and top-down
information. As Mumford [16] notices, systems which depend on interaction between
feedforward and feedback loops are quite distinct from models based on Marr’s
feedforward theory of vision.
The information processed by neurons is encoded by a spike train consisting of two
qualitatively different parts - a tag determined by the relative position of the receptor
on the artificial retina and a feature signalled by that receptor. The neurons operate by
introducing time delays and acting as spatiotemporal coincidence detectors.
Although we exclusively used a temporal coding, we do not mean to imply that
firing rates do not convey any information in the brain. This choice was undertaken
for simplicity of exposition and because in our simplified architecture it is not
important how the information about the stimulus is encoded. What is important is the
374 Slawomir J. Nasuto et al.

possibility of conveying more information in spike trains than it would be possible if


information is only encoded in a single number (mean firing rate). As far as we are
aware there are no really convincing arguments for eliminating one of the possible
encodings and in fact both codes might be used in the brain - mean firing for stimulus
encoding and temporal structure of spike trains for tagging relevant information.
NESTOR uses a dynamic assembly encoding for the target. Finding it in the search
space results in onset of time locked activity of the assembly. Different features of the
same object are bound by their relevant position in the search space and
synchronisation of activity within the assembly may follow as a result of binding. Thus
binding in the network is achieved by using additional information contained in tags.
Effectively NESTOR implements the Stochastic Diffusion Search (SDS) [37] - a
matching algorithm whose operation depends on co-operation and competition of
agents which were realised here as model neurons. Therefore in the next section we
will describe the network operation in terms of the underlying generic mechanism of
SDS.

6 Stochastic Diffusion Search

SDS consists of a number of simple agents acting independently but whose collective
behaviour locates the best-fit to a predefined target within the specified search space.
Figure 1 illustrates the operation of SDS on an example search space consisting of a
string of digits with the target - a pattern ‘371’ - being exactly instantiated in the
search space.
It is assumed that both the target and the search space are constructed out of a
known set of basic microfeatures (e.g. bitmap pixel intensities, intensity gradients,
phonemes etc.). The task of the system is to solve the best fit matching problem - to
locate the target or if it does not exist its best instantiation in the search space. Initially
each agent samples an arbitrary position in the search space, checking if some
microfeature in that position matches with corresponding microfeature of the target. If
this is the case, then the agent becomes active otherwise it is inactive. Activity
distinguishes agents which are more likely to point to a correct position from the rest.
Next, in a diffusion phase, each inactive agent chooses at random another agent for
communication. If the chosen agent is active, then its position in the search space will
be copied by the inactive agent. If, on the other hand, the chosen agent is also inactive
then the choosing agent will reallocate itself to an arbitrary position in the search
space.
This procedure iterates until SDS reaches an equilibrium state, where a maximal
stable population of active agents will point towards common position in the search
space. In the most general case convergence of SDS has to be interpreted in statistical
sense [38]. The population supporting the solution will fluctuate, identities of
particular agents in this population will change but nevertheless the system as a whole
will exhibit a deterministic behaviour. From such competition and co-operation
between weakly randomly coupled agents emerges the deterministic behaviour of
Communication as an Emergent Metaphor for Neuronal Operation 375

SDS. This self-organisation in response to an external stimulus incoming from the


search space is one of the most important properties of SDS.

Fig. 1. SDS consisting of five agents searching in the string of digits for a pattern ‘371’. Active
agents point to corresponding features with (solid arrows). Inactive agents are connected to the
last checked features by (dashed lines). Agents pointing to the correct position are encircled by
(ovals). The first number in the agent denotes position of the potential solution and the second
number - the relative position of the checked microfeature

The time complexity of SDS was analysed in [39] and shown to be sublinear in the
presence of no noise when the perfect match is present. Further work has confirmed
that this characteristic also holds in more general conditions. As noted in [39] this
performance is achieved without using heuristic strategies, in contrast to the best
deterministic one- and two-dimensional string searching algorithms or their extensions
to tree matching [40], which at best achieve time linearity.

7 Attention Modelling with NESTOR

Conventional models of visual attention are based on concepts of separate feature


maps, which are composed of neurons selective to the appropriate feature only [41].
However recent research [42] suggests that in most visual cortical areas neurons
respond to almost any features, implying a multiplexing problem. Moreover, a
majority of cells responding to a particular feature often reside outside of the area
supposed to be responsible for extracting this feature from the scene.
Information processing by assemblies spanned by intercommunicating neurons
from distant areas of the brain has already been postulated [32] as the fundamental
operation mode of the brain. This view, together with findings on long range
interactions resulting in receptive fields spanning multiple cortical areas [43], in fact
reduces the division of the cortex into many separate areas to a mere neuroanatomical
376 Slawomir J. Nasuto et al.

taxonomy. It also supports the hypothesis that local interactions are not the most
important feature of real biological networks. The most recent findings suggest that,
contrary to assumptions of some researchers [41], attention may be operating on all
levels of visual system with the expectation of the whole system directly influencing
cell receptive fields and, as a result, information processing by single neurons (for an
excellent exposition see [44] and references therein).
These findings are qualitatively reflected in the architecture of NESTOR. Although
network architecture and neuron properties only very approximately correspond to the
architecture of the visual system and properties of real neurons, nevertheless, in the
light of the cited evidence, we think that it is an interesting candidate for modelling
visual attention.
The formation of a dynamic assembly representing the best fit to the target
corresponds to an attentional mechanism allocating available resources to the desired
object.
The analysis of properties of our model suggests that both parallel and serial
attention may be just different facets of one mechanism. Parallel processing is
performed by individual neurons and serial attention emerges as a result of formation
of an assembly and its shifts between interesting objects in the search space.

8 Conclusions

Much new evidence is emerging from the neuroscience literature. It points to the
neuron as a complex device, acting as a spatio-temporal filter probably processing
much richer information than originally assumed. At the same time our understanding
of information processing in the brain has to be revised on the systems level. Research
suggests that communication should not be disentangled from computation, thus
bringing into question the usefulness of ‘control-theoretic’ like models based on
clearly defined separate functional units.
We claim that this new evidence suggests supplementing the oversimplistic
McCulloch-Pitts neuron model by models taking into account such a communication
metaphor. It seems more accurate and natural to describe emergent neuron operations
in terms of communication - a vital process for all living organisms - exhibiting
‘computations’ only as a mean of implementing neuron functionality in biological
hardware. In this way we will avoid several problems lurking behind computational
metaphor, such as homunculus theories of mind and the binding problem.
We propose a particular model neuron and discuss a network of such neurons
(NESTOR) effectively equivalent to the Stochastic Diffusion Search. NESTOR shows
all the interesting properties of SDS and moreover we think that it serves as an
interesting model of visual attention. The behaviour of neurons in our model is context
sensitive and the architecture allows for extending to heterogeneous neural
populations.
Although the model advanced in this paper is based solely on exploring the
communication metaphor we argue that it shows interesting information processing
capabilities - fast search for the global optimum solution to a given problem and
Communication as an Emergent Metaphor for Neuronal Operation 377

automatic allocation of resources, maintaining in parallel exploration and exploitation


of the search space.
In this article we focus on the implications of communication for information
processing of single neurons, which enable us to make first steps in the analysis,
analogous to advances in analysis of purely computational models. However, we are
aware that the model proposed here occupies an opposite end, with respect to the
McCulloch Pitts model, of an entire spectrum of alternatives. It seems reasonable that
the most realistic model neurons would enjoy properties of both the computational
McCulloch Pitts and our communication based model. Nonetheless we hope that
adopting a communication metaphor will result in more adequate models of the brain
being developed, eventually helping us to better exploit the brain’s strengths and avoid
its weaknesses in building artificial systems which aim to mimic brain functionality.

Acknowledgments

The authors would like to thank an anonymous referee for critical comments which
helped us to refine and improve our paper.

References

1. McCulloch, W.S., Pitts, W.: A logical calculus immanent in nervous activity. Bulletin of
Mathematical Biophysics 5 (1943) 115-133.
2. Rosenblatt, F.: Principles of Neurodynamics. Spartan Books, Washington DC (1962)
3. Poggio, T., Girosi, F.: Networks for approximation and learning. Proceedings of the IEEE 78
(1990) 1481-1497.
4. Haykin, S.: Neural Networks: A Comprehensive Foundation. Macmillan, New York (1994)
5. Rumelhart, D. E., McClelland, J.L. (eds.): Parallel Distributed Processing. Explorations in
the Microstructure of Cognition, MIT Press, Cambridge MA (1986).
6. Fukushima, K.: Neocognitron: A hierarchical neural network capable of visual pattern
recognition. Neural Networks 1 (1988) 119-130.
7. Selman, B. et al.: Challenge Problems for Artificial Intelligence. Proceedings of AAAI-96,
National Conference on Aritifical Intelligence, AAAI Press, 1996.
8. Sejnowski, T.J., Rosenberg, C.R.: Parallel networks that learn to pronounce English text.
Complex Systems 1 (1987) 145-168.
9. Fodor, J., Pylyshyn, Z.W.: Connectionism and Cognitive Architecture: A Critical Analysis.
In: Boden, M.A. (ed.): The Philosophy of Artificial Intelligence, Oxford University Press
(1990).
10. Barnden, J., Pollack, J. (eds.): High-Level Connectionist Models, Ablex: Norwood, NJ,
(1990).
11. Pinker, S., Prince, A.: On Language and Connectionism: Analysis of a Parallel Distributed
Processing Model of Language Acquisition. In: Pinker, S., Mahler, J. (eds.): Connections
and Symbols, MIT Press, Cambridge MA, (1988).
378 Slawomir J. Nasuto et al.

12. Jordan, M.I., Jacobs, R.A.: Hierarchical mixtures of experts and the EM algorithm. MIT
Comp. Cog. Sci. Tech. Report 9301 (1993).
13. Shepherd, G.M.: The Synaptic Organisation of the Brain. Oxford University Press, London
Toronto (1974).
14. Friston, K.J.: Transients, Metastability, and Neuronal Dynamics. Neuroimage 5 (1997) 164-
171.
15. Fodor, J.A.: The Modularity of Mind. MIT Press (1983).
16. Mumford, D.: Neural Architectures for Pattern-theoretic Problems. In: Koch, Ch., Davies,
J.L. (eds.): Large Scale Neuronal Theories of the Brain. The MIT Press, London, England
(1994).
17. Farah, M.: Neuropsychological inference with an interactive brain: A critique of the locality
assumption. Behavioural and Brain Sciences (1993).
18. Wegner, P.: Why Interaction is More Powerful then Algorithms. CACM May (1997).
19. Koch, C.: Computation and the single neuron. Nature 385 (1997) 207-210.
20. Barlow, H.: Intraneuronal information processing, directional selectivity and memory for
spatio-temporal sequences. Network: Computation in Neural Systems 7 (1996) 251-259.
21. Granger, R., et al.: Non-Hebbian properties of long-term potentiation enable high-capacity
encoding of temporal sequences. Proc. Natl. Acad. Sci. USA Oct (1991) 10104-10108.
22. Thomson, A.M.: More Than Just Frequency Detectors ?. Science 275 Jan (1997) 179-180.
23. Sejnowski, T.J.: Time for a new neural code?, Nature 376 (1995) 21-22.
24. Koenig, P., et al.: Integrator or coincidence detector? The role of the cortical neuron
revisited. Trends Neurosci. 19(4) (1996) 130-137.
25. Perret, D.I., et al.: Visual neurons responsive to faces in the monkey temporal cortex.
Experimental Brain Research 47 (1982) 329-342.
26. Rolls, E.T., Tovee, M.J.: Processing speed in the cerebral cortex and the neurophysiology of
visual backward masking. Proc. Roy. Soc. B 257 (1994) 9-15.
27. Thorpe, S.J., Imbert, M.: Biological constraints on connectionist modelling. In: Pfeifer, R.,
et al. (eds.): Connectionism in Perspective. Elsevier (1989).
28. Softky, W.R., Koch, Ch.: The highly irregular firing of cortical cells is inconsistent with
temporal integration of random EPSP. J. of Neurosci. 13 (1993) 334-350.
29. Berry, M. J., et al.: The structure and precision of retinal spike trains. Proc. Natl. Acad. Sci.
USA 94 (1997) 5411-5416.
30. Mainen, Z.F., Sejnowski, T.J.: Reliability of spike timing in neocortical neurons. Science
168 (1995) 1503-1506.
31. Nelson, J.I.: Visual Scene Perception: Neurophysiology. In: Arbib, M.A. (ed.): The
Handbook of Brain Theory and Neural Networks. MIT Press: Cambridge MA (1995).
32. Nelson, J.I.: Binding in the Visual System. In: Arbib, M.A. (Ed.): The Handbook of Brain
Theory and Neural Networks, MIT Press, Cambridge MA (1995).
33. Brown, R.: Social Psychology. Free Press, New York (1965).
34. Douglas, R.J., Martin, K.A.C.: Exploring cortical microcircuits. In: McKenna, Davis,
Zornetzer, (eds.): Single Neuron Computation. Academic Press (1992).
35. Barlow, H.B.: Single units and sensation: A neuron doctrine for perceptual psychology?.
Perception 1 371-394.
36. Nasuto, S.J., Bishop, J.M.: Bivariate Processing with Spiking Neuron Stochastic Diffusion
Search Network. Neural Processing Letters (at review).
st
37. Bishop, J.M.: Stochastic Searching Networks. Proc. 1 IEE Conf. Artificial Neural
Networks, pp. 329-331, London (1989).
38. Nasuto, S.J., Bishop, J.M.: Convergence Analysis of a Stochastic Diffusion Search. Parallel
Algorithms and Applications (in press).
Communication as an Emergent Metaphor for Neuronal Operation 379

39. Nasuto, S.J., Bishop, J.M, Lauria, S.: Time Complexity Analysis of Stochastic Diffusion
Search, Proc. Neural Computation Conf., Vienna, Austria (1998).
40. van Leeuven, J. (ed.): Handbook of Theoretical Computer Science. MIT Press: Amsterdam
(1990).
41. Treisman, A.: Features and Objects: The fourteenth Bartlett Memorial Lecture. The
Quarterly Journal of Experimental Psychology 40A(2) (1998) 201-237.
42. Cowey, A.: Cortical Visual Areas and the Neurobiology of Higher Visual Processes. In:
Farah, M.J., Ratcliff, G. (eds.): The Neuropsychology of High-Level Vision. LEA Publishers
(1994).
43. Spillmann, L., Werner, J.S.: Long range interactions in visual perception. Trends Neurosci.
19(10) (1996) 428-434.
44. McCrone, J.: Wild minds. New Scientist 13 Dec (1997) 26-30.
The Second Person – Meaning and Metaphors

Chrystopher L. Nehaniv

Cybernetics and Software Systems Group


University of Aizu, Aizu-Wakamatsu City, Fukushima 965-8580, Japan
nehaniv@u-aizu.ac.jp

Abstract. A notion of meaning is introduced that is related to infor-


mation theory but requires agents and observers (who may or may not
coincide) for its genesis. We generalize Wittgenstein’s notion of language
games to interaction games between agents or between an agent and
its environment. In this setting, meaningfulness is also characterized by
use. As these realizations concern particular agents, they lead to a shift
in consideration of ‘meaning transfer’ away from an external, universal
(third person) standpoint towards aspects of mapping grounded in em-
bodiment (intra-agent or agent-environment: rfi st person structures) and
in interaction and imitation (inter-agent: second person structures). We
propose that the study of agents, constructive biology, memetics and
metaphor can benefit from considerations of the origin, design, evolu-
tion, and maintenance of channels of meaning for various observers and
agents. To take advantage of correspondences in channels of meaning,
second person methods (that is, those methods concerned with agent
correspondences) in these areas require the study of grounding structural
correspondences between source-channel-target pairs.

1 Meaning for Observers and Agents

Truth and meaning, as logicians will tell you, only make sense in reference to a
particular universe of discourse. Less obviously perhaps, meaning also only makes
sense from the standpoint of an observer, whether that observer is someone
manipulating a formal system to determine referents and applying predicates
according to compositional rules, is an animal hunting in the forest, is a Siberian
swam in a flock of swams over-wintering on a northern Japanese lake, is an
artificial agent maintaining control parameters over an industrial process, or is
the ‘mind of God’. We thus take a seemingly stricter view than that of most
logicians, that meaning only makes sense for agents, situated and embedded in
interaction with their particular Umwelt, the world around them. Actually this
is a view wider in scope in that it now includes anything that could potentially
qualify as an ‘observer’, not only a universal third-person or external impersonal
one. The agent may be as simple as an active process on the CPU of your

Current address: Interactive Systems Engineering, Department of Computer Science,
University of Hertfordshire, Hatfield, Hertfordshire AL10 9AB, United Kingdom,
E-mail: c.l.nehaniv@herts.ac.uk

C. Nehaniv (Ed.): Computation for Metaphors, Analogy, and Agents, LNCS 1562, pp. 380–388, 1999.

c Springer-Verlag Berlin Heidelberg 1999
The Second Person – Meaning and Metaphors 381

computer, a software agent, a robot, an animal, or even a logician pondering the


Platonic realm of forms.
The meaningfulness of the behavior of such a creature may be completely
in the eye of the beholder. For whom is the behavior meaningful? To whom
is it meaningful if several such creatures interact, e.g. if robots interact to
perfrom a task such as collecting objects (Beckers et al. [2]); or, as in the case
of analogous collective behavior by termites [10] using stigmergy (environmental
signs of the progress of work)? Meaningfulness may be in the designer’s eye or
in the adaptiveness of the activity as tending to increase the probability that
copies of an agent’s genes are represented in future generations (if it has any).
The latter notion of evolutionary, behavioral, survival adaptiveness (in biological
agents the tendency to increase reproductive success) hints at the possible nature
of meaning for evolved or constructed systems. Meaning arises with information
that helps an agent attain its goals.
Note that meaning in this sense starkly contrasts to – but may also be consid-
ered a compatible refinement of – Shannon’s measure of information content [22],
which is minimal for a constant unchanging signal but is maximal for random sig-
nals, both of which might well be devoid of meaning for all agents and observers.
Agent goals may be conscious or unconscious, merely surviving, maintaining the
self or reproducing, or they may be to act according to intentions. If these goals
are observer attributed rather than within the agent then the corresponding
meaning exists only in relation to such observers. The agent itself may be such
an observer, in which case meaning could then arise for it in its interaction with
its Umwelt1 .
Meaning then need not be linguistically nor even symbolically mediated. It
may or may not involve representations, but must arise in the dynamics realiz-
ing the agent’s functioning and interaction in its environment (cf. the notion of
‘structural coupling’ of Maturana and Varela [14]), supporting adaptive or self-
maintaining or reproductive behaviors, or goals, or possibly intentions or plans.
Multiple observers, as in the case of interaction among humans, may result in
multiple arisings of meaning. Any entity that exists at the level of a biological
unit of evolution (e.g., unicellular organism, differentiated multicellular organ-
ism, eusocial insect colony) could potentially be an agent or observer in our
sense, as could a human organization such as a government, tribe or corpora-
tion. Robotic and software agents are not excluded.
In the realm of constructive biology, robotics and agent construction, meaning
can also arise in the interaction channels between the agent and its environment
in which it is ‘embodied’. These channels could be artificially evolved or designed.
Similarly, these issues arise for software agents, which might in some sense be
considered embodied with respect to their particular environments. Particular
cases of such agents have been used as tools reducing workload and cognitive
1
This is the ethologist’s concept of the ‘world around’ an animal, i.e. the local envi-
ronment as experienced by the animal in its particular embodiment including senses
and means of acting on the world. Considering an animal, robot or other agent in
its Umwelt is an example of taking what we call here the ‘first-person perspective’.
382 Chrystopher L. Nehaniv

load as personal assistants, or for parsing or generating human-like metaphors


in language, or in more cognitive terms, in mappings and blend relations among
conceptual spaces.
Wittgenstein insisted on defining meaning of words and other signs in terms
of their use by agents engaged in language games (including everyday lan-
guage) [28,29]. This situated and embodied nature of agent semiotics highlights
the meaninglessness of signs and sign systems in isolation, without agents and
thus without uses, although such sign systems may have very interesting struc-
tures (possibly formally specifiable) that are difficult to describe, prescribe or
construct for given competences and desired performances.
We note that there is no reason to restrict the language of interaction games
to verbal utterances. Other signs and actions can also be used by an agent in-
teracting with its environment. Thus we speak of interaction games as a gener-
alization of Wittgenstein’s language games. The partner in an interaction games
may be another agent, or it may be the environment where the agent is situated.

2 Locus of Meaning

Where is the meaning for an agent? It is in the observer, who as we said may be
the agent itself. So in looking for meaning in any situation one must ask, Where
are the observers?
An agent interacts with the world through its sensors, embodiment and ac-
tuators. An evolved biological agent uses sensory and action channels that have
been varied and selected over the course of evolution. The channels it uses are
meaningful to it for its survival, homeostasis, reproduction, etc. The access to
the particular channels has evolved because they are of use to the agent for such
purposes, and thus meaning arises for the agent as it accesses these channels.
In this access, the agent is in the role of an observer (though not necessarily a
conscious one) and this observer is also an actor.
What is meaning then? It is information considered with respect to channels
of interaction (perception and/or action) whose source and target are determined
with respect to an observer. The source and target may be known, uncertain,
or unknown; they may be agents or aspects of environments; information in the
channel may or may not be accesible to the observer; the observer may be an
agent at one end (or possibly both ends) of the channel, or may be external to
the channel.

2.1 External Observers

The attempts and successes of formalization and rationalism to escape from con-
text, to formulate universal scientific laws that do not depend on the particular
observer and aspects of the messiness of embodiment, useful Platonic entities
such as numbers, and generic impersonal statements about ‘he’/‘she’/ ‘it’/‘they’
have been extremely important in the history of science and engineering. They
have led to great successes in physical sciences, mathematics and engineering,
The Second Person – Meaning and Metaphors 383

achieving somewhat less success in the case of animate beings, such as in biology
at the level of the organism, psychology and economics (where agents matter).
Such logical positivistic approaches tend to presuppose a single unique plane of
description, one universal coordinate system or model in which all phenomena
may be described and understood. (Note, however, that sometimes more sophis-
ticated versions allow several viewpoints, which agree where they overlap but
may also explain some areas which are not mutually explainable in a consistent
manner, e.g. in relativistic physics, the theory of manifolds in differential geom-
etry and topology – obtained by ‘gluing’ locally Euclidean pieces of space, and
more general coordinate systems affording formal understanding of systems [15]).
We propose that first- and second-person perspectives can assist in these
agent sciences. The third-person observer perspective is thus an extra-agent view.
Nevertheless, there is an agent present in this viewpoint, namely, the observer
itself.

2.2 An Agent’s Perspective


The notion ‘first person’ refers to the experience of an agent itself, the particular
embodiment of the agent in its environment, and it particular sensory-motor and
internal-state dynamics. It is thus an intra-agent perspective. The agent is con-
sidered in its own Umwelt and may be biological, an engineered physical artifact,
or a software agent cycling through a reactive, deliberative or post-reactive con-
trol loop (active process). Techniques for the first-person perspective include de-
velopmental, subsumption staged build-up, exploiting dynamics of embodiment,
non-monolithic task-specific intelligence (Brooks et al. [5,6]), and, for temporal
grounding, histories and autobiographic reconstruction [7,17]. The book of Patti
Maes [13] includes much research on situated, embodied, embedded biologically-
inspired systems and relevant issues in AI. Stojanov [24] considers how agents
might use their own embodiment (along with their internal dynamics in the form
of Piagetian schemata) to act in an environment as they learn to deal with it by
mapping it metaphorically to these internal schemata. Common metaphors, as is
by now well-known, are central to human understanding of the every day world,
and provide schemata for conceptual grounding of more abstract domanis to be
understood by starting from aspects of embodiment in the world as the primary
source of primitive or basic conceptual domains [12,20]. Correspondence of navi-
gational maps or linguistic/conceptual schemeta with spatiotemporal aspects of
embodiment are illustrated, for example, in the papers of Nehmzow [19] and Shi-
nohara [23], respectively. Notice that language or human-level concepts are not
involved in the first example, and that properties of spatiotemporal grounding
are central to both.

2.3 ‘I’ and ‘Thou’


Inheritance of characteristics resulting from reproduction in biological systems
makes the siblings and progeny of an agent resemble it. The channels of sensation
and action, and the manner of embodiment of these others is thus likely to be
384 Chrystopher L. Nehaniv

very similar to that of the agent. This similarity can be a substrate for interaction
and provides structure that the agent’s own structure can be related and mapped
to. These other agents are thus ‘second persons’, alter-egos (i.e. other ‘I’s) in the
world whose actions could be analyzed and possibly ‘understood’ as correspond-
ing to one’s own. A tendency to regard such others as ‘egomorphic’, similar to
the self, or to expect that their actions in given situations should be similar to
what one’s own would be could thus be adaptive. This egomorphic principle may
be at the root of the ability of animals to perceive signals of intent in others.
For example, a dog might not have a theory of other minds, but may well growl
when it preceives and acts on signals, such as gaze direction, of another animal
looking at a piece of food it has grasped in its teeth and paws.
A generalization of the egomorphic principle in humans is their anthropo-
morphizing tendency to view other animals and objects around them as having
human-like consciousness, feelings, intentions or goals. This tendency may lead
to appropriate behavior in response to, say, perceived threat and anger in a
snarling carnivore protecting its young, or to less successful behavior in, say,
attributing a vengeful state of mind to storm clouds and trying to appease them
with burnt offerings.
The notion ‘second person’ refers to the experience by an agent of other
agents and of the interaction dynamics with other agents. It is thus an inter-
agent notion. Aspects include theory of other mind and empathic resonance [7];
biographic reconstruction for others [17]; perception of signals of intention; inter-
action; and mapping of the self to the other. In mapping the self to the other, the
latter becomes for this observer a blend of the self with the notions of otherness:
the second person — to whom are attributed states and dynamics (e.g. inten-
tions, drives, feelings, desires, goals) and possibly a biographic history [17]. As
the second person, the other ceases to be an object and becomes an agent. As
just mentioned, it may be that such mapping from ‘I’ to ‘Thou’ also lies at
the core of the anthropomorphizing tendencies so often observed in human in-
teraction with computers and robots. How such interaction dynamics work in
natural agents and could be constructed in artificial ones leads one into the
study of imitation, social dynamics, communication and the understanding of
language games and interaction games. Some of the second person techniques
for interaction illustrated in this book are in Dautenhahn [8] (learning by imita-
tion, temporal synchronization (‘dancing’)), Barnden [1] (theory of mind, beliefs
of others), Brooks et al. (interaction dynamics), Scassellati [21] (scaffolding for
imitation, joint attention), and Kauppinen [11] (imitation and child language
acquisition via figures of speech).

3 Constructive Biology

The first person viewpoint in agent construction is strongly related to construc-


tive biology, i.e. biology motivated by the desire to understand how biological
systems actually are constructed by nature and develop over time, rather than
just to obtain descriptive understanding. This is the engineering scientific view-
The Second Person – Meaning and Metaphors 385

point that one’s understanding should enable one to, in principle, build the sys-
tems of interest. For example, Barbara Webb has shown through building that a
much simpler mechanism than expected, not involving functional decomposition
or planning, is sufficient to account for much observed cricket phonotaxis behav-
ior [27]. Valentino Braitenberg’s examples [4] of simple robots to whom human
observers attribute such states as ‘fear’, ‘aggression’, ‘love’, etc., illustrate that
meaning of an interaction for an external observer can be quite different to that
its has for the agent (in these cases, simple taxis). Constructive biology will in-
escapably lead to mappings that respect structural constraints and grounding
of agents, to the use and manipulation of hierarchies and the need for a deeper
understanding of them in relation to natural adaptive systems.
The study of correspondence via the algebraic notion of homomorphism (full,
partial or relational) provides an inroad for the precise study of correspondence
between agents interacting with their environments or with each other. Preserv-
ing structure of meaning channels for an agent coupled to its environment is
required for the usefulness of and determines the quality of metaphors and map-
pings in the design, algebraic engineering, interaction dynamics, and constructive
biology of situated agents.

4 Epilogue: Correspondences

Some insights in this book and at the interdisciplinary International Workshop


on Computation for Metaphors, Analogy and Agents held 6-10 April 1999 in
Aizu, Japan follow:

(0) Construction of meaning in interaction (Brooks et al. [6], Scassellati [21],


Dautenhahn [8]) or conceptual blending (Turner [26]) is as important as any
system acting by itself or as the naked transfer between domains.
(1) A first-person viewpoint (Goguen, pers. comm.) is preferable for design
rather than an impersonal external viewpoint (3rd person, Platonist) on
systems. This refers to the embedded nature of a system in its own envi-
ronment structurally-coupled in a manner that depends on its dynamics.
By emphasizing the phenomenology of the system as ‘experiencing’ its own
interacting with its environment, many questions that are difficult from the
viewpoint of external ‘objective’ analysis evaporate.
(2) A second-person viewpoint that aids in the transfer of meaning/knowledge
from a given system to another is facilitated by having a first-person view-
point (Nehaniv) and a crucial factor can be the recognition of another sys-
tem (the ‘you’) as similar the given system (the ‘me’) in terms of its body,
structure, and action in the world as well as historical grounding (Dauten-
hahn [8,17]). Such historical grounding is necessary for narrative intelligence
(Nehaniv and Dautenhahn [17]). It is also for problems of second-person
correspondence that a first-person viewpoint is superior to a third person
viewpoint.
386 Chrystopher L. Nehaniv

(3) An empirical (Beynon [3]), phenomenological (Goguen [9]) approach to


model-building, agent construction, and user-interface specification is essen-
tial and complementary to formal, theoretical computer science.
(4) Intelligence, believability, success and ‘aliveness’ of systems depends in large
part on observer criteria (Brooks, Dautenhahn), as does meaning (Nehaniv).
(5) A developmental (incremental/subsumption) approach to building intelli-
gent systems is important (Brooks et al. [6]) in attaining high-level behaviors
by building on basic behaviors.
(6) Traditional methods may complement rather be fully replaced by newer ap-
proaches.
(7) Respecting structure of domains and agent-environment coupling determines
the usefulness and quality of metaphors and mappings in the algebraic engi-
neering, interaction dynamics, and constructive biology of situated agents.

Channels for selecting what information to pay attention to and what to


ignore evolve in biological life and are designed in artificial agents. Information
theory takes channels as given a priori, before beginning any analysis, while
meaning is related to their function for the agent in its interaction with its world.
Correspondences between these ways of interacting can result in relatedness to
others, ground imitation and provide a substrate for interaction, for metaphors
of similar embodiment and action, for narrative intelligence, for communication
and for language.

Acknowledgements

The author is grateful to the participants at the International Workshop on


Computation for Metaphors, Analogy and Agents, for many of the insights, sur-
prises, and inspirations leading to these ideas, which have been acknowledged
above by reference to the particular individuals who generated them, although
some ideas where rather ‘in the air’ and would better be credited to the partici-
pants as a group. The thoughts expressed and any shortcomings are nevertheless
the author’s own.

References

1. John A. Barnden, An Implemented System for Metaphor-Based Reasoning


with Special Application to Reasoning about Agents. In [16], 143–153, (this
volume). 384
2. R. Beckers, O. E. Holland, and J. L. Deneubourg, From Local Actions to
Global Tasks. In Rodney A. Brooks and Pattie eds., Artificial Life IV , MIT
Press, 181–189, 1994. 381
3. Meurig Beynon, Empirical Modelling and the Foundations of Artificial Intelli-
gence. In [16], 322–364, (this volume). 386
4. Valentino Braitenberg, Vehicles: Experiments in Synthetic Psychology, MIT
Press, 1986. 385
The Second Person – Meaning and Metaphors 387

5. Rodney A. Brooks, A Robust Layered Control System for a Mobile Robot,


IEEE J. Robotics and Automation, RA-2, 14–23, April 1986. 383
6. Rodney A. Brooks, Cynthia Breazeal, Matthew Marjanović, Brian Scassellati,
and Matthew M. Williamson, The Cog Project: Building a Humanoid Robot.
In [16], 52–87, (this volume). 383, 385, 386
7. Kerstin Dautenhahn, I could be you — the phenomenological dimension of
social understanding. Cybernetics and Systems 25(8):417–453, 1997. 383, 384
8. Kerstin Dautenhahn, Embodiment and Interaction in Socially Intelligent Life-
Like Agents. In [16], 102–142, (this volume). 384, 385
9. Joseph Goguen, An Introduction to Algebraic Semiotics, with Application to
User Interface Design. In [16], 242–291, (this volume). 386
10. P. P. Grassé, La reconstruction du nid et les coordinations inter-individuelles
chez Bellicositermes natalensis et Cubitermes sp. La theorie de la stigmergie:
Essai d’interpretation des termites constructeurs. Ins. Soc., 6, 41–48, 1959.
381
11. Anneli Kauppinen, Figures of Speech, a Way to Acquire Language. In [16],
196–208, (this volume). 384
12. George Lakoff and Mark Johnson, Metaphors We Live By, University of
Chicago Press, 1980. 383
13. Pattie Maes, ed., Designing Autonomous Agents: Theory and Practice from
Biology to Engineering and Back, MIT Press, 1991. 383
14. Humberto R. Maturana and Francisco J. Varela, The Tree of Knowledge: the
Biological Roots of Human Understanding, revised edition, Shambala Publica-
tions, Inc., 1992. 381
15. C.L. Nehaniv. Algebraic Models for Understanding: Coordinate Systems and
Cognitive Empowerment In J. P. Marsh, C. L. Nehaniv, B. Gorayska, eds., Pro-
ceedings of the Second International Conference on Cognitive Technology: Hu-
manizing the Information Age, IEEE Computer Society Press, 147-162, 1997.
383
16. C. L. Nehaniv, ed., Computation for Metaphors, Analogy and Agents, (Lecture
Notes in Artificial Intelligence, Vol. 1562), Springer Verlag, (this volume). 386,
387, 388
17. C. Nehaniv and K. Dautenhahn. Embodiment and Memories — Algebras of
Time and History for Autobiographic Agents. In Robert Trappl, ed., Cybernet-
ics and Systems ’98 , Proceedings of the 14th European Meeting on Cybernetics
and Systems Research (Symposium on Embodied Cognition and Artificial In-
telligence; co-organized by Maja Mataric and Eric Prem), Vienna, Austria,
14-17 April 1998. Austrian Society for Cybernetic Studies, volume 2, 651–656,
1998. 383, 384, 385
18. C. L. Nehaniv and J. L. Rhodes, On the Manner in which Biological Com-
plexity May Grow, Mathematical and Computational Biology: Computational
Morphogenesis, Hierarchical Complexity, and Digital Evolution, Lectures Lec-
tures in the Life Sciences, Vol. 26, American Mathematical Society, 93–102,
1999.
19. Ulrich Nehmzow, “Meaning” through Clustering by Self-Organization of Spa-
tial and Temporal Information. In [16], 209–229, (this volume). 383
20. Andrew Ortony, Metaphor and Thought, 2nd edition (1st edition: 1979), Cam-
bridge University Press, 1993. 383
21. Brian Scassellati, Imitation and Mechanisms of Joint Attention: A Develop-
mental Structure for Building Social Skills on a Humanoid Robot. In [16],
176–195, (this volume). 384, 385
388 Chrystopher L. Nehaniv

22. Claude E. Shannon and Warren Weaver, The Mathematical Theory of Com-
munication, University of Illinois Press, 1963. 381
23. Kazuko Shinohara, Conceptual Mappings from Spatial Motion to Time: Anal-
ysis of English and Japanese. In [16], 230–241, (this volume). 383
24. Georgi Stojanov, Embodiment as Metaphor: Metaphorizing-In the Environ-
ment. In [16], 88–101, (this volume). 383
25. Stephin Toumlin, From Clocks to Chaos: Humanizing the Mechanistic World-
View. In Hermann Haken, Anders Karlqvist, and Uno Svedin, eds., The Ma-
chine as Metaphor and Tool, Springer Verlag, 139–153, 1993.
26. Mark Turner, Forging Connections, In [16], 11-26, (this volume). 385
27. Barbara Webb, Using Robots to Model Animals: A Cricket Test, Robotics and
Autonomous Systems, 16:117–134, 1995. 385
28. Ludwig Wittgenstein, The Blue and Brown Books, Harper & Brothers, 1958.
382
29. Ludwig Wittgenstein, Philosophical Investigations, (Philosophische Unter-
suchungen), German with English translation by G. E. M. Anscombe, 1964.
Basil Blackwell, Oxford, reprinted 3rd edition, 1968. 382
Author Index

Alty, J. L. 307
Marjanovic, M. 52
Barnden, J. A. 143
Beynon, M. 322 Nasuto, S. J. 365
Bishop, M. 365 Nehaniv, C. L. 1, 380
Breazeal, C. 52 Nehmzow, U. 209
Brooks, R. A. 52
O'Neill-Brown, P. 165
Dautenhahn, K. 102, 365
Scassellati, B. 52, 176
Fenton-Kerr, T. 154 Shinohara, K. 230
Stojanov, G. 88
Goguen, J. 242
Turner, M. 11
Hiraga, M. K. 27
Veale, T. 37
Indurkhya, B. 292
Williamson, M. W. 52
Kauppinen, A. 196
Knott, R. P. 307

You might also like