Brain-Inspired Predictive Control of Robotic Sensorimotor Systems

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 134

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/346499472

Brain-inspired predictive control of robotic sensorimotor systems

Thesis · July 2017


DOI: 10.13140/RG.2.2.33455.97442/1

CITATIONS READS

0 48

1 author:

Léo Pio-Lopez

18 PUBLICATIONS   139 CITATIONS   

SEE PROFILE

All content following this page was uploaded by Léo Pio-Lopez on 30 November 2020.

The user has requested enhancement of the downloaded file.


N° d’ordre : D. U : 2824
EDSPIC : 803

UNIVERSITE CLERMONT AUVERGNE


ECOLE DOCTORALE
SCIENCES POUR L’INGÉNIEUR DE CLERMONT-FERRAND

Thèse
Présentée par

Léo Lopez
pour obtenir le grade de

D O C T E U R D’ U N I V E R S I T É
SPECIALITE : ELECTRONIQUE ET SYSTÈMES

Titre de la thèse

Brain-inspired predictive control of robotic


sensorimotor systems

Soutenue publiquement le 5 juillet 2017 devant le jury :

Mr Yuri Lapusta Président


Mr Julien DIARD Rapporteur et examinateur
Mme Yulia SANDAMIRSKAYA Rapporteur et examinateur
Mr Mateus JOFFILY Examinateur
Mr Youcef MEZOUAR Directeur de thèse
Mr Giovanni PEZZULO Directeur de thèse
Mr Jean-Charles QUINTON Superviseur
Brain-inspired
predictive control of
robotic sensorimotor
systems
Thesis presented the 5th july 2017
At Institut Pascal,
Université Clermont Auvergne

For the grades of :


PhD in Electronics and Systems
PhD in Psychology and Cognitive Science

Respectively decerned by :
Université Clermont Auvergne, France
Università degli studi di Roma ‘La Sapienza ‘, Italy

Léo Pio-Lopez
Public defense before the jury :
Mr Yuri Lapusta, president of the jury
Mr Julien DIARD, reporter
Mrs Yulia SANDAMIRSKAYA, reporter
Mr Mateus JOFFILY, examinator
Mr Youcef MEZOUAR, PhD director
Mr Giovanni PEZZULO, PhD director
Mr Jean-Charles QUINTON, supervisor
A Eliane, Etienne, Isabelle et Antoine
Acknowledgements
First and foremost I wish to express my gratitude to my PhD supervisors Youcef Mezouar, and
Giovanni Pezzulo, whose expertise, understanding, and patience, added considerably to my
experience. I am deeply grateful to my supervisor, Jean-Charles Quinton for the discussions
we had on various scientific and non scientific subjects, his long and very precise mails on
various works and his passion for science, and vast knowledge and skills in many areas. I
would like to thank all the other members of my jury for having accepted to be part of this jury.
I wish to thank my graduate supervisors who made me discover respectively psychology and
Bayesian modeling and the field of research in general. Without their passion, I probably
wouldn’t have continued in science.
Of course, I am grateful to all my roommates along these 3 years in France and Italy with a
special thanks to Ange, who endured me along this journey in research. "Pourquoi pas", right ?
I will never thank enough my parents Isabelle and Antoine, for their unconditional support,
freedom, exceptional food, medical advices (private joke sorry), science-fiction literature, and
museums visitations. What I owe you could never be written. And I hope you won’t pull my
leg (again) for that.
A special thanks goes to Agathe who has been always a source of inspiration to me long before
this journey in PhD, I have to thank her for her stimulating philosophical view on science and
technology (just to name a few) and eternal enthousiasm. I hope you won’t blame me for not
writing a (non) exhaustive list of your contributions here.
I won’t forget my sister and brother, Cerise and Loup, for their help, intelligence ("pas gêné"),
the watch story and insights on the dark field of graphic design.
I would like to thank "La Loupiote" for its warm and hard-working atmosphere and of course
"Voyou" for his subtle advices on life in general.
A special thanks for my grandparents, Eliane and Etienne, who both highly contributed in
different ways to my curiosity and high interest for science and yes, even for the long stories
on the good old time of research on non-linear dynamic systems of the tyre at Michelin.

i
Acknowledgements

Last but not least, I would like to deeply thank Manon for her encouragement, editing assis-
tance and her curiosity and stimulating and always positive mind. She helped me a lot in
finalizing the last steps of this thesis and I will always be grateful for this.
And finally I wish to thank all the other people, Hugo, David, Jocelin etc... who contributed to
my scientific development in a manner or another.

Clermont-Ferrand, 24 décembre 2016 Léo Pio-Lopez

ii
Abstract
One of the long-term goal in robotics is to introduce in human environment autonomous
robots capable of helping humans and interacting with them in everyday life safely and
efficiently. In a non-controlled environement, a robot, in order to perceive and act must
face an incomplete information problem, its model of the world is by definition incomplete.
We assume that every modeling of any kind of real phenomenon is incomplete [121]. We
don’t have all the necessary information in order to understand and model completely any
phenomenon. For any kind of models, there is always a lack of information, there are always
hidden variables, not taken into account in the model, that could influence the phenomenon.
Consequently, a model and a phenomenon can never have the same behavior. Uncertainty is
the consequence of this incompleteness [21]. The question of the uncertainty of the model,
on the assessement of the variables is therefore fundamental. Several authors argued that the
estimation of uncertainty and the states of the world from noisy and incomplete data would
be one of the main function of the nervous system [168, 199].
In this thesis, we naturally opted for a modeling approach that can quantify that uncertainty:
the probabilistic approach. The methods developed in the thesis are largely inspired by
biological (brain) solutions to problems of prediction, inference and control. All of which have
been recently cast within the same Bayesian scheme which is widely used in motor control
and robotics: predictive coding. In this framework, prediction is the key, estimation of future
states of the body or the environment are taken into account for a particular task [33], not
only present and past states. This approach has been applied to explain a wide variety of
phenomena: action understanding [105], perception-action loops and perceptual learning
[71, 77], Bayes optimal sensorimotor integration and predictive control [80], action selection
[63, 76] and goal-directed behavior [70, 74, 79, 145, 146]. We use one of its implementation in
this thesis, the Bayesian filtering scheme. We refer in this thesis to two bodies of literature, one
in robotics / control and the other one in neuroscience, which use both of them a convergent
lexicon (e.g Bayesian inference).

iii
Abstract

Our thesis is composed of three articles:

• L Pio-Lopez, JC Quinton, and Y Mezouar. Dual filtering in operational and joint spaces
for reaching and grasping. Cognitive Processing – International Quarterly of Cognitive
Science, 2015.

• L Pio-Lopez, M Birem, JC Quinton, F Berry, and Y Mezouar. Salient feature based


slam for long-term navigation using recursive bayesian estimation. (in preparation for
submission)

• L Pio-Lopez, A Nizard, K Friston and G Pezzulo. Active inference and robotic control: a
case study. Journal of Royal Society Interface, 2016.

Our contribution corresponds to the development of several methods for robotics without to
resort on the inverse kinematics that could be too time consuming for real-word applications.
Our framework (predictive coding and the Bayesian brain hypothesis) and simulations per-
mitted us to adress different questions in cognitive science (usefulness of the visual and motor
space for reaching and grasping, robotic model of sensorimotor integration, etc.).
The first article is about Bayesian filtering and reaching. We present a new dual visuo-motor
and bio-inspired Bayesian filtering approach for reaching with some possible extensions to
manipulation. We applied the method to a manipulator with 51 degrees of freedom and tested
it on several scenarios with spatial constraints for reaching (e.g., obstacles). We also present in
this thesis results of the method on a dual arm plant and grasping. The general approach relies
on dual Bayesian filtering and can be parallelized. We show in this article that decomposing
the reaching problem in two spaces: the visual and operational spaces, reduces the complexity
of the reaching problem.
The second article deals with a new approach in the field of navigation. We introduce a new
visual Simultaneous Localization and Mapping (SLAM). We apply in navigation the similar
scheme of Bayesian filtering used previously for manipulation robotics. We create a map
from a flow of features and the method integrates information over time for localization of
the moving agent via a novel attentional feature detector. We build a visual memory with
the features acquired over time. In the memory, we have only the features observed during
learning.We compute a frame histogram using Bayesian filtering. This allows us to switch
easily between localization and navigation. The approach is neuro-inspired and close to the
notion of population coding [45] where each feature will be shared by many frames.

iv
Abstract

The third article is about active inference applied to manipulation robotics. We use the free-
energy principle, a particular implementation of predictive coding [71, 77] for reaching. This
approach is gaining prominence in cognitive science as it has been used to explain a lot of
different phenomena in perception, action and cognition. To our knowledge, it is the first
implementation of this principle to (simulated) robotics on a 7-DoFs arm of a PR2. Under
different conditions of proprioceptive and visual noise, we give a proof-of-principle of the
application of active inference to robotics. We discuss key aspects of the framework in relation
to robotics and the potential opportunities with sensory attenuation and failures of gain
control for modeling Parkinson’s disease.

Key words: active inference, manipulation robotics, predictive coding, Bayesian filtering,
navigation

v
Résumé
Les robots dans l’industrie peuvent réaliser des tâches très spécifiques avec une efficacité
élevée dans des environnements contrôlés. Cependant, ces robots doivent être préprogram-
més pour un ensemble redondant d’actions et ils sont souvent entreposés dans des zones
restreintes où l’interaction avec l’homme n’est pas autorisée pour des raisons de sécurité.
Jusqu’à présent, l’introduction de robots autonomes dans l’environnement humain est très
limitée, seuls des aspirateurs autonomes ou des tondeuses à gazon se trouvent facilement sur
le marché. Cependant, il existe une forte demande pour des robots sûrs et autonomes capables
d’interagir avec les humains dans les environnements de la vie quotidienne. Les domaines
d’application sont vastes et couvrent la robotique d’assistance, la robotique industrielle pour
des tâches demandant encore une dextérité humaine ou encore la recherche et le sauvetage,
pour n’en citer que quelques-uns [28, 65, 84].

Pour atteindre cet objectif, les robots doivent faire face à beaucoup d’incertitude sur leurs
modalités perceptives (toucher, vision, proprioception, audition, goût et olfaction), comme les
humains pendant leurs activités quotidiennes et agir efficacement sur leurs environnements.
La biologie et plus précisément dans notre cas, les sciences cognitives en général peuvent jouer
un rôle important dans le but de concevoir des robots intelligents et sûrs pour l’interaction
homme-machine. En effet, notre cerveau est l’un des meilleurs modèles disponibles pour com-
prendre la perception, l’action et la cognition. Il peut être une source d’inspiration fructueuse
pour le développement de systèmes sensori-moteurs autonomes, comme nous pouvons le
voir avec le développement de robots bio-mimétiques, neuro-inspirés ou humanoïdes et que
l’on trouve en nombre croissant dans l’industrie ou les universités [124, 139].

Au cours des dernières années, des stratégies de codage prédictif ont été proposées comme
un moyen possible par lequel le cerveau pourrait donner un sens à la quantité de données
sensorielles disponibles au cerveau au temps t [33, 37, 55, 155]. Comme Huang et Rao l’ont
déclaré [91] :

vii
Résumé

Le codage prédictif postule que les réseaux de neurones connaissent les régu-
larités statistiques inhérentes au monde naturel et réduisent la redondance en
supprimant les composants prévisibles des entrées (sensorielles), en transmettant
seulement ce qui n’est pas prévisible (les erreurs résiduelles de prédiction).

L’idée en d’autres termes est que le cerveau ou les neurones, afin d’interpréter correctement les
données sensorielles, doivent faire face au problème inverse. À partir des résultats perçus (par
exemple, la répartition spatiale de l’intensité de la lumière acquise via la rétine), le cerveau doit
inférer les causes (l’arrangement des objets qui ont donné naissance à l’image perçue) [171]. Ce
problème inverse est mal posé car plusieurs solutions existent. Comme les robots ont aussi une
relation sensori-motrice avec le monde, ils font face au même problème. Le codage prédictif
suggère un moyen intéressant de résoudre ce problème en introduisant des contraintes afin de
réduire le nombre de solutions au problème inverse. Le codage prédictif remonte au travail de
Von Helmholtz en 1867 [46, 198]. L’idée a prise de nombreuses formes depuis lors, y compris
l’approche dynamique, les réseaux neuronaux récurrents ou encore une forme bayésienne [37,
46, 79]. Cette dernière approche soutient que le cerveau implémente l’ inférence bayésienne.
Elle a gagné en importance dans les sciences cognitives ces dernières décennies [71, 77, 85, 86,
109, 123]. Elle repose sur la représentation et la manipulation de distributions de probabilité
par le cerveau et est souvent désignée comme l’hypothèse du cerveau bayésien [47, 55, 108,
151, 152]. L’approche bayésienne en sciences cognitives a été utilisée pour modéliser un large
éventail de tâches, du traitement sensoriel à la cognition de haut niveau [36, 58, 113, 178, 182].
En termes bayésiens, le problème du codage prédictif a essentiellement une forme de filtrage
bayésien [47, 78, 80, 142, 155]. L’état des variables sensorielles et motrices doit être estimé à
partir de connaissances a priori et d’un flux d’observations bruitées. Ce problème peut être
résolu en utilisant une estimation bayésienne récursive ou autrement dit le filtrage bayésien.
Cette thèse a été développée dans ce cadre du codage prédictif, de l’estimation d’incertitude et
de la modélisation bayésienne (filtrage bayésien en particulier). Les applications des méthodes
que nous décrirons dans les chapitres suivants sont en robotique de manipulation et dans le
domaine de la navigation.
Cette thèse est articulée autour d’une collection de 3 articles, avec les travaux de recherche
associés développés dans le cadre du doctorat.

Publications
Le travail développé dans cette thèse a contribué aux 3 publications suivantes et 1 poster :

viii
Résumé

Revues internationales
• L Pio-Lopez, JC Quinton, and Y Mezouar. Dual filtering in operational and joint spaces
for reaching and grasping. Cognitive Processing – International Quarterly of Cognitive
Science, 2015.

• L Pio-Lopez, M Birem, JC Quinton, F Berry, and Y Mezouar. Salient feature based slam
for long-term navigation using recursive bayesian estimation. (en préparation pour
soumission)

• L Pio-Lopez, A Nizard, K Friston and G Pezzulo. Active inference and robotic control : a
case study. Journal of Royal Society Interface, 2016.

Communications
• L Pio-Lopez, JC Quinton, and Y Mezouar. Coupled filtering in motor and visual spaces
for reaching and grasping. 2015.

La thèse est basée sur les 3 premiers articles.


Le premier article porte sur le filtrage bayésien et l’atteinte de cibles. L’introduction des
robots dans les environnements humains est en pleine expansion. Ces robots ont besoin de
naviguer et d’interagir avec les objets et les êtres humains d’une manière sûre, en d’autres
termes, ces robots doivent agir dans des environnements non contrôlés et incertains loin de
l’environnement structuré d’une usine de fabrication. Ainsi, il est clairement nécessaire de
trouver des algorithmes plus efficaces compte tenu de l’introduction croissante de robots
humanoïdes dans des environnements humains. Plusieurs approches alternatives ont été
utilisées dans la littérature afin de résoudre le problème de la cinématique inverse, comme
la logique floue [106] [90], les réseaux de neurones artificiels [22] [88] [132], l’algorithmique
évolutionnaire [102] et la modélisation probabiliste [41]. Notre méthode présentée dans cet
article fait partie de cette dernière.
Dans la robotique de manipulation, atteindre une cible dans l’espace opérationnel ou un
objet est fondamental. Ce problème nécessite de spécifier une position et une orientation
et de déterminer les paramètres des articulations permettant au bras d’atteindre cette cible.
Le problème est connu comme le problème de la cinématique inverse, le but est de trouver
les paramètres appropriés des articulations tels que la position désirée de la main du robot
soit atteinte. Plusieurs approches existent et les plus classiques telles que décrites dans [111]
reposent sur des approches algébriques [57] [81], itératives [115] ou géométriques [59] [122].

ix
Résumé

Pour les humains adultes, atteindre une cible avec la main ou un doigt est une tâche facile,
mais implique néanmoins le contrôle de nombreux degrés de liberté (ddls) et trajectoires
complexes dans l’espace articulaire. Tout être humain en bonne santé peut atteindre et saisir
des milliers d’objets tous les jours sans aucun problème. Mais comment les humains effectuent
cette tâche ? Plusieurs auteurs ont montré que les humains pouvaient utiliser des modèles
internes pour la planification des trajectoires [104]. Ils pourraient planifier leur trajectoires
de mouvement dans l’espace visuel [64]. Et après cette première étape, ils suivraient cette
trajectoire simulée contrôlant leur bras et leur main. D’autres auteurs ont montré de bonnes
similitudes entre le mouvement d’atteinte d’une cible et le filtrage bayésien [34]. A la suite
de ces auteurs, nous avons développé un modèle de cette simulation visuelle interne et de
contrôle de l’effecteur en utilisant le filtrage bayésien. Exactement, nous avons appliqué le
filtrage dans deux espaces, l’espace visuel, pour la simulation interne (méthode proche de la
simulation interne en neurosciences [98]) et l’espace moteur, pour le contrôle du manipulateur
avec des interactions réelles avec l’environnement, aboutissant à une nouvelle méthode de
double filtrage bayésien pour l’atteinte de cible et la manipulation.

Pour bien comprendre la génération des mouvements humains, ainsi que pour développer des
algorithmes efficaces pour les robots manipulateurs humanoïdes (qui partagent les mêmes
contraintes), il faut surmonter les limites et les inconvénients des approches de contrôle
classiques. Sans recourir à la cinématique inverse (qui peut prendre trop de temps ou conduire
à des approximations erronées), nous voulons montrer qu’il est possible de contrôler des
systèmes à haute dimension en simulant et en prédisant le résultat des actions locales (en
utilisant le modèle direct seulement). La complexité du problème est décomposée dans les
espaces visuels et moteurs. Les méthodes de cinématique inverse atteignent leurs limites
lorsque le système robotique a un nombre de degrés de liberté (ddls) élevé, généralement
supérieur à 6, l’incertitude et les contraintes (comme l’évitement des obstacles) deviennent
trop importantes. Dans ce cas, la complexité du calcul de la cinématique inverse augmente
rapidement ainsi que son coût de calcul.

Notre approche inclut également une perspective bio-inspirée. En effet, pour les humains en
bonne santé, atteindre et saisir les objets sont des tâches faciles. Plusieurs auteurs ont montré
que le contrôle moteur et la sélection d’actions pourraient être d’origine bayésienne et ont
montré de bons résultats en utilisant cette approche de modélisation [113] [114] [185]. Ils
soulignent l’importance de l’incertitude dans le contrôle moteur. D’autres auteurs ont montré
aussi de bonnes similitudes entre le filtrage de Kalman et le mouvement humain pour l’atteinte

x
Résumé

de cibles [34] et une implémentation neurale possible de ce filtre bayésien dans le cerveau peut
être trouvée avec [48]. A notre connaissance, le double filtrage bayésien pour la cinématique
inverse n’a jamais été appliqué à la robotique de manipulation, mais cette méthode (avec un
filtrage unique) a été récemment explorée dans le domaine de l’animation [41].
Nous avons appliqué le filtrage bayésien (approximé) pour atteindre des cibles avec des
robots simulés. Nous avons utilisé le filtrage à particules comme approximation de l’inférence
bayésienne. Le filtrage à particules est une méthode de Monte Carlo séquentielle pour le calcul
bayésien approximatif. Cette méthode nous permet d’estimer récursivement dans le temps les
poids d’importance. Nous pouvons énumérer quatre avantages de cette approche de double
filtrage :

• L’incertitude est incluse dans le modèle.

• Un nombre élevé de ddls peut être contrôlé grâce au faible coût de calcul du filtrage
bayésien.

• Le modèle inverse n’est pas nécessaire pour résoudre le problème de cinématique


inverse [41]. Seul le modèle direct est nécessaire.

• Toute l’approche peut être parallélisée.

Nous nous appuyons sur une méthode bio-inspirée et probabiliste pour générer des mou-
vements d’atteinte dans des contextes complexes. Plus précisément, nous appliquons dans
cet article le filtrage bayésien dans les espaces visuel et moteur. Le filtrage visuel permet de
définir une trajectoire initiale dans l’espace opérationnel (évitant les obstacles), qui est ensuite
raffinée par filtrage moteur, permettant un contrôle direct dans l’espace articulaire tout en
respectant les limites articulaires. La méthode a été validée en simulation sur un ensemble de
scénarios avec une ou plusieurs cibles à atteindre (par exemple avec une cible correspondant
à un doigt sur un objet à saisir) avec un ou plusieurs bras/mains soumis à de fortes contraintes
spatiales (par exemple des obstacles). Notre méthode réussit à trouver une trajectoire où les
méthodes de cinématique inverse échouent. S’appuyer sur l’espace visuel semble prometteur
pour réduire la complexité de la génération de mouvement dans l’espace moteur. Le filtrage
bayésien permet de trouver des variables cachées en tenant compte des variables observables
dans le temps. Cela permet de trouver une trajectoire tout en évitant les obstacles mais qui
n’est pas optimale. En effet, il peut arriver qu’une trajectoire définie dans l’espace visuel ne
respecte pas les limites articulaires du système robotisé. Mais c’est une méthode robuste.

xi
Résumé

L’utilisation du filtrage bayésien dans l’espace visuel n’est pas une méthode de planification
globale car le filtrage bayésien est plus une méthode utilisant la planification locale, mais
comme il est appliqué dans l’espace visuel, une trajectoire est presque toujours trouvée. Cette
approche de codage prédictif permet de contrôler un robot avec un nombre élevé de ddls
(jusqu’à 51) sans compromettre la vitesse de calcul pour du temps réel.

Le deuxième article concerne le filtrage bayésien et la navigation. Pour accomplir une tâche
spécifique, un robot doit connaître sa propre position avec précision. Dans les domaines de
l’agriculture, des voitures autonomes, des services et de la robotique d’urgence, etc., cette
étape est essentielle à la réussite de la tâche. Les robots naviguant dans de nouveaux envi-
ronnements doivent également calculer leur localisation en temps réel afin d’interagir avec
l’environnement en perpétuel changement.

Dans cet article, nous introduisons une nouvelle méthode de localisation visuelle et de carto-
graphie simultanée (SLAM en anglais [169]), centrée sur un détecteur et un descripteur basés
sur la saillance, et sans hypothèses spécifiques sur le type de mouvement de la caméra ou du
robot. L’environnement est représenté par une carte topologique où chaque noeud correspond
à une image observée pendant l’apprentissage. Nous n’ajoutons pas les images directement
dans la mémoire mais seulement les traits caractéristiques, c’est-à-dire seulement les vecteurs
descriptifs de chaque point saillant. Une image est représentée par le système comme des
ensembles de traits caractéristiques. Nous avons utilisé un arbre-kd (kd-tree [16]) pour stocker
efficacement les nouveaux points saillants et trouver les voisins les plus proches lors de leur
mise en correspondance. Dans la mémoire, nous n’avons que les caractéristiques observées au
cours de l’apprentissage et pour chaque caractéristique, nous associons l’ensemble des images
où nous les avons observées. Par conséquent, cela nous permet de trouver l’image (ou frame)
d’origine. Afin de comparer deux traits-caractéristiques ou points saillants, nous calculons une
mesure de similarité basée sur la distance euclidienne. Nous utilisons finalement le filtrage
bayésien pour intégrer les observations de traits caractéristiques au cours du temps pour
calculer un histogramme d’images. L’algorithme peut facilement basculer entre la localisa-
tion (pas d’a priori) et la navigation (prédiction de la trajectoire) tout en utilisant un nombre
adaptatif de traits caractéristiques, ce qui permet d’exploiter le temps de traitement restant
pour l’exploration. La détection de fermeture de boucle, qui est la capacité de reconnaître les
lieux précédemment visités, est finalement validée à l’aide de données GPS avec une précision
centimétrique.

Nous avons appliqué pour la navigation une structure de filtrage bayésien similaire par rapport

xii
Résumé

à celle utilisée précédemment pour la robotique de manipulation. L’état caché ici est l’image,
mais le problème est le même et consiste à intégrer des informations dans le temps pour la
localisation de l’agent en mouvement via un détecteur de caractéristiques attentionnelles.
L’idée-clé est de stocker dans une mémoire visuelle directement les traits-caractéristiques
et non les images. Les traits-caractéristiques peuvent être associés à un ensemble d’images.
L’approche est similaire à [45] et [44]. L’idée au niveau neuroscientifique est de développer
un modèle proche du fonctionnement de l’hippocampe où les cellules sont sensibles à un
ensemble de points de saillance.

La structure générale est donc très similaire à celle utilisée dans la robotique de manipulation
(sauf qu’il n’y a pas de deuxième couche de filtrage). Nous avons appliqué le même schéma
sans hypothèse sur les caractéristiques du bruit dans le système. Cette approche de codage
prédictif s’appuie également sur le concept de codage de population (population coding) [45]
où chaque trait caractéristique est partagé par de nombreuses images. Cette méthode peut être
parallélisée en utilisant le code d’assemblage neuronal (neural assembly code) et une mesure
de similarité appropriée, cela permet une bonne réduction du nombre de caractéristiques
stockées dans la mémoire visuelle sans perdre beaucoup de précision. Toute l’approche est
neuro-inspirée et pourrait être parallélisée afin de permettre un traitement en temps réel.

Enfin, la manipulation robotique et la navigation avec des sytèmes sensori-moteurs se réfèrent


finalement au même type de problème, la résolution du problème inverse par la perception (où
suis-je sur cette carte ?) et l’action (quelle est l’action optimale connaissant mes contraintes ?).
Seules les variables probabilistes changent mais le schéma général du codage prédictif est
utilisé pour ces deux tâches de la même manière.

Dans le troisième article, nous utilisons l’inférence active qui est une extension/généraliation
du codage prédictif et qui procède à la minimisation de l’énergie libre (dans le sens bayésien)
[71, 77] pour l’atteinte de cible par un bras manipulateur. L’approximation bayésienne est
variationnelle et permet l’introduction du concept de minimisation de l’énergie libre ou
de minimisation de la surprise (bayésienne) sur le long terme sur la perception de l’agent.
Cette approche gagne de l’importance en neurosciences car elle a été utilisée pour expliquer
beaucoup de phénomènes différents dans la perception, l’action et la cognition [70, 74, 79, 145,
146].

Le principe de l’énergie libre dit que l’action et la perception minimisent la surprise pour
maintenir leurs états biophysiques dans leurs limites et résister à la seconde loi de la thermo-
dynamique, en maintenant leur homéostasie [72].

xiii
Résumé

Les variables (cachées) dans cette approche sont continues et donc toute la méthode est
dynamique. Par rapport aux travaux présentés précédemment, où les variables étaient dis-
crètes, cette approche présente l’avantage de tenir compte de la caractéristique dynamique du
traitement de l’information cérébrale. En bref, l’inférence active peut être considérée comme
un schéma de filtrage bayésien standard avec des arcs réflexes classiques qui permettent à
l’action de réaliser des prédictions sur les états cachés du monde. L’énergie libre correspond à
l’énergie libre variationnelle, c’est une limite supérieure à la surprise bayésienne. Les agents
ne peuvent pas minimiser la surprise directement, mais ils peuvent minimiser cette limite
supérieure sur la surprise, à savoir l’énergie libre [72]. En d’autres termes, l’agent agit afin de
réduire la différence entre ses espérances et la réalité, le but est d’agir pour que ses prédictions
deviennent réalité. L’agent a un modèle du monde, sa prédiction, et il veut que le monde
s’adapte à sa prédiction en utilisant des actions. En termes bayésiens, le cerveau maximiserait
l’évidence du modèle des entrées sensorielles [46] [108]. Cette énergie libre peut être considé-
rée comme une erreur de prédiction et l’action réduit les erreurs de prédiction en changeant
les sensations [67].
Dans cet article, nous appliquons cette méthode pour l’atteinte de cibles avec un robot PR2
simulé. C’est à notre connaissance, la première application de cette théorie à la robotique
(de manipulation). Il est à la fois une preuve de concept de l’utilité de l’inférence active à la
robotique dans le but de développer de futurs robots minimisant l’énergie libre. Sous diffé-
rentes conditions de bruits proprioceptif et/ou visuel, nous donnons une preuve de principe
de l’application de l’inférence active à la robotique. Nous discutons aussi des aspects clés de
la théorie de l’inférence active comme l’atténuation sensorielle en lien avec la modélisation
de la maladie de Parkinson.

Mots-clefs : inférence active, robotique, codage prédictif, filtrage bayésien

xiv
Contents
Acknowledgements i

Abstract iii

Résumé vii

List of figures xix

1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.1 International journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.2 Communications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Bayesian perception and action in humans and robots 9


2.1 Uncertainty estimation in perception and action . . . . . . . . . . . . . . . . . . 9
2.2 Probability for rational reasoning with incomplete information . . . . . . . . . . 10
2.3 Bayesian modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.1 Subjective probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.2 Basic concepts and notation . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.3 Probability calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 Bayesian inference and integration in cognitive science . . . . . . . . . . . . . . 13
2.5 Predictive coding and the Bayesian brain hypothesis . . . . . . . . . . . . . . . . 16
2.5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5.2 Bayesian filtering formalism . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.6 Computational motor control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

xv
Contents

2.6.1 Forward, inverse model or both ? . . . . . . . . . . . . . . . . . . . . . . . . 22


2.6.2 Optimal control as an inference problem . . . . . . . . . . . . . . . . . . . 25
2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3 Hierarchical Bayesian filtering for reaching and grasping 29


3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Article . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3 Other results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3.1 Reaching several targets on an object . . . . . . . . . . . . . . . . . . . . . 39
3.3.2 Dual arm control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4 Other perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.4.1 Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.4.2 Dexterous manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.4.3 Robotic experimentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.4.4 Comparison with classical methods . . . . . . . . . . . . . . . . . . . . . . 42

4 Recursive Bayesian estimation for navigation 43


4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2 Article . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3 Other perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.3.1 Addition of a top-down a priori . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.3.2 Distinguishability and selection of models . . . . . . . . . . . . . . . . . . 56
4.3.3 Use of the dynamic model of the robot . . . . . . . . . . . . . . . . . . . . 56

5 Active inference and robotics 59


5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.2 Article . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.3 Other perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.3.1 Online active inference for robotics . . . . . . . . . . . . . . . . . . . . . . 75
5.3.2 Extension of the framework for navigation . . . . . . . . . . . . . . . . . . 75

6 General conclusion and perspectives 77


6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.2 Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.2.1 Comparison and distinguishability of models . . . . . . . . . . . . . . . . 80

xvi
Contents

6.2.2 Is classical probability the appropriate representation ? . . . . . . . . . . 82

A Particle filtering 85

B Glossary 89

Bibliography 91

xvii
List of Figures

1.1 Evolution of the robot ’Asimo’ from Honda towards a more humanoid body shape. 2

1.2 Designer view of a robot ’Baxter’ from Rethink Robotics that could be used in
industry. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Robot ’SDA 10’ from Motoman cooking. . . . . . . . . . . . . . . . . . . . . . . . 4

2.1 Probabilistic representation of combining auditory and visual cues for estimating
the position of the source of a sound. . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2 Predictive coding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.4 Auxiliary Forward Model (AFM) architecture. The forward model receives an
efference copy to predict the consequences of actions. Figure from [148]. . . . . 22

2.5 Integral Forward Model (IFM) architecture. Figure from [148]. . . . . . . . . . . 23

3.1 Left: initial pose of the arm and targets on the object. Right: Result of the
algorithm with the two targets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.2 10 trajectories with Bayesian filtering applied into the visual space first and then
into the motor space on a dual arm/hand robotic system. . . . . . . . . . . . . . 40

4.1 Example of salient points detected by the algorithm on two frames. . . . . . . . 44

4.2 Representation of the data architecture between the observed features F , the
memorized features f i i ∈1,K and the link between their associated data. . . . . . 45

4.3 Non-discriminative salient points. . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

6.1 Distinguishability with two gaussian models: 1st case. At left, predictions of
models m 1 and m 2 with θ1 = (30, 3) and θ2 = (50, 3). At right, distinguishability
between these two models: P (D = 1 | x y m 1 m 2 θ1 θ2 ). Figures from [50]. . . . . 81

xix
List of Figures

6.2 Distinguishability between two gaussian models: 2nd case. At left, predictions of
models m 1 and m 2 with θ1 = (75, 15) and θ2 = (75, 10). At right, distinguishability
between these two models. Figures from [50]. . . . . . . . . . . . . . . . . . . . . 81
6.3 Logarithm of the predictions of the models m 1 and m 2 with θ1 = (75, 15) and
θ2 = (75, 10). Figure from [50]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

xx
1 Introduction

Prediction is very difficult,


especially if it’s about the future.

Niels Bohr

1.1 Motivation

Robots in industry can achieve very specific tasks with high efficiency in controlled environ-
ments. However, these robots need to be pre-programmed for a redundant set of actions
and they are parked in restricted areas where interaction with humans is not permitted for
safety reasons. Until now, the introduction of autonomous robots in human environment is
very limited, only autonomous vacuum cleaners, or lawn mowers can be found easily on the
market. However, a huge demand exists for safe and autonomous robots able of interaction
with humans in everyday life environments and in industry (see Figure 1.2). The areas of
applications are large and span assistive robotics (see Figure 1.3), rehabilitation and search
and rescue just to name a few [28, 65, 84].

To accomplish this aim, autonomous robots need to deal with a lot of uncertainty on their
sensing modalities (touch, vision, proprioception, hearing, taste and olfaction) as humans
during their daily activities. Biology and more specifically in our case neuroscience can play
a huge role in the aim to build and conceive intelligent and safe robots for man-machine
interaction. Indeed, our brain is one of the best model available in order to understand
perception, action and cognition. It can be a fruitful source of inspiration for the development

1
Chapter 1. Introduction

of autonomous sensorimotor systems, as is shown by the diversity of animal-like, bio-inspired


and humanoid robots we can find in industry or academia [124, 139] (see Figure 1.1).

Figure 1.1 – Evolution of the robot ’Asimo’ from Honda towards a more humanoid body shape.

In recent years, predictive coding strategies have been proposed as a possible means by which
the brain might make sense of the truly overwhelming amount of sensory data available to the
brain at any given moment of time [33, 37, 55, 155]. As Huang and Rao stated it [91]:

Predictive coding postulates that neural networks learn the statistical regu-
larities inherent in the natural world and reduce redundancy by removing the
predictable components of the input, transmitting only what is not predictable
(the residual errors in prediction).

The idea is that the brain or neurons in order to correctly interpret sensory data must face
with the inverse problem. From the perceived outcomes (for example the spatial distribution
of the intensity of the light acquired via the retina), the brain needs to infer the causes (the
arrangement of objects that gave birth to the perceived image) [171]. The inverse problem is
generally ill-posed as several solutions exists to this problem. As robots have a sensorimotor
relation with the world, they face the same problem. Predictive coding suggests an interesting
way to solve this problem by introducing constraints in order to reduce the number of solutions
to the inverse problem. Predictive coding dates back to the work of Von Helmholtz in 1867
[46, 198]. The idea took many forms since then, including dynamicist approach [14, 153, 154],

2
1.1. Motivation

recurrent neural networks [150,170,175], artificial intelligence methods [162] or also a Bayesian
form [37, 46, 79]. This latter approach claiming that the brain performs Bayesian inference
has gained prominence in cognitive science [71, 77, 85, 86, 109, 123]. This approach relies on
the representation and manipulation of probability distributions by the brain and is often
referred as the Bayesian brain hypothesis [47, 55, 108, 151, 152]. The Bayesian approach in
cognitive science have been used to model a wide range of tasks, from sensory processing to
high-level cognition [36,58,109,113,178,182,193]. In Bayesian terms, the problem of predictive
coding has essentially a form of Bayesian filtering [47,78,80,142,155]. The state of sensory and
motor variables has to be estimated online from priors and a stream of noisy and ambiguous
observations. This problem can be solved using recursive Bayesian estimation or Bayesian
filtering.

Figure 1.2 – Designer view of a robot ’Baxter’ from Rethink Robotics that could be used in industry.

This thesis has been developed in this framework of predictive coding, uncertainty estimation,
and Bayesian modeling (Bayesian filtering in particular). The applications of the methods we
will describe in the next chapters are manipulation robotics and navigation.

3
Chapter 1. Introduction

Figure 1.3 – Robot ’SDA 10’ from Motoman cooking.

1.2 Contributions

This thesis resulted in the following contributions for addressing the problem of perception
and action in uncertain environments and to avoid the inverse problem for robotics that
could bee too time-consuming for real world robotic applications. We briefly introduce the
contributions that will be more deeply described in the next chapters. We achieved:

• A comprehensive review of the Bayesian approach in robotics, the predictive coding


approach and computational motor control. (chapter 2).

• The development of a novel neuro-inspired method for reaching and grasping on a


robot with a high number of degrees of freedom (51 DoFs). The method is based on
a dual Bayesian filtering approach and has been inspired by the results found in the
studies on human motor control and reaching. The method succeeds on scenarios
where inverse kinematics fails (chapter 3).

• A neuro-inspired Bayesian filtering scheme to navigation based on a visual saliency


algorithm. We used a new method of visual Simultaneous Localization And Mapping
(SLAM), grounded on a saliency-based detector and descriptor, and without specific
assumptions about the type of camera or robot motion.

• The application of the active inference framework to manipulation robotics. It is the


first robotic application within the frame of this theory. We gave a proof-of-principle of

4
1.3. Publications

the implementation of the framework for robotics. This work permits the investigation
of motor control with a diversity of constraints (noise on proprioception, vision or both)
and allows to compare the results with the neuroscientific and psychophysical literature.

This PhD thesis is articulated on a collection of 3 papers, with the associated research work
developed within the context of the PhD.

1.3 Publications

The work developed in this thesis contributed to the following 5 publications (3 published, 2
in preparation) and 1 poster:

1.3.1 International journals

• L Pio-Lopez, JC Quinton, and Y Mezouar. Dual filtering in operational and joint spaces
for reaching and grasping. Cognitive Processing – International Quarterly of Cognitive
Science, 2015 [128].

• L Pio-Lopez, M Birem, JC Quinton, F Berry, and Y Mezouar. Salient feature based slam
for long-term navigation using recursive bayesian estimation. (in preparation)

• L Pio-Lopez, A Nizard, K Friston and G Pezzulo. Active inference and robotic control: a
case study. Journal of Royal Society Interface, 2016 [149].

• G Pezzulo, E Cartoni, F Rigoli, L Pio-Lopez, and K Friston. Active inference, epistemic


value, and vicarious trial and error. Learning and Memory, 2016 [145].

• L Pio-Lopez, JC Quinton, and M Mermillod. When less is definitely more but only if well
projected. (in preparation)

This thesis is composed of the first three articles.

1.3.2 Communications

• L Pio-Lopez, JC Quinton, and Y Mezouar. Coupled filtering in motor and visual spaces
for reaching and grasping. International Conference on Spatial Cognition, 2015 [127].

5
Chapter 1. Introduction

1.4 Outline

The chapter 2 will present a comprehensive review of the theoretical framework in which
the present studies take place. We describe and explain the choice of Bayesian modeling for
this thesis and the key role of uncertainty for robots navigating and interacting in human
environments. We present the Bayesian calculus on which all the models presented in this
thesis rely and the concept of subjective probabilities. We present Bayesian perception in
humans and robots, computational motor control with a special focus on the Bayesian brain
hypothesis and predictive coding. We describe the main architectures in computational motor
control and challenge the need of the inverse model in human and robotic motor control.
Finally we state the links between optimal control and the general framework of predictive
coding and describe how optimal control can be cast in terms of inference.

Chapter 3 corresponds to our first article presented in this thesis. We develop a new method
based on dual Bayesian filtering for reaching and grasping. The method has been applied on
several simulated robots with a high number of DoFs (up to 51). We will introduce the interest
of the method from a computational and biological point of view, present the article and
then we will present extensions with other results on the control of a dual arm robotic system
and grasping. Indeed, the same method can be applied to a dual arm robot for reaching. We
also applied the method on a scenario with several targets for grasping, they can be set as
goals to reach for the robot with its fingers. The whole approach can be parallelized and is
particularly suited for real-time applications. The general approach relies on Bayesian filtering
and is close to predictive coding. We show that relying on two spaces for Bayesian filtering, the
visual and operational spaces, reduces the complexity of the reaching problem on different
scenarios with strong spatial constraints (e.g., obstacles). The method succeeds where inverse
kinematics methods fail.

Chapter 4 presents the Bayesian filtering approach applied to navigation. We used a new
method of visual Simultaneous Localization And Mapping (SLAM), grounded on a saliency-
based detector and descriptor, and without specific assumptions about the type of camera
or robot motion. A visual memory is built at the feature level which, combined with the
parsimony of saliency-based detector, makes it possible to generate a compact map from a
continuous flow of features. We applied in navigation a similar scheme of Bayesian filtering
used previously for manipulation robotics. The hidden state here is the position of the car

6
1.4. Outline

but the problem is the same and consists in integrating information over time for localization
of the moving agent via a novel attentional feature detector. The key idea is to store into a
visual memory directly the features and not the frames. The features can be associated to a
set of frames. The approach is close to [45] and [44]. The idea at the neuroscientific level is to
develop model close to the hippocampus functioning in which place cells are sensitive to a set
of features integrating over time as Bayesian filtering.

Chapter 5 corresponds to active inference applied to manipulation robotics. Active infer-


ence is a well-known general framework in computational and systems neuroscience but less
known outside these fields. Active inference is a corollary to the free-energy principle [72] [67]
which unifies several theories of perception and action and can describe several cognitive
functions including perceptual categorization, perceptual learning [71], Bayes optimal senso-
rimotor integration, heuristics and dynamical systems [80], action selection [63, 76] or motor
trajectories and goal-directed behaviour [70, 74, 79, 145, 146]. The approach is a Bayesian
filtering scheme for nonlinear state-space models in continuous time [71, 77, 78]. We applied
active inference to reaching on a 7-DoFs simulated robot (a PR2). In this article, we discuss
a proof-of-principle of the implementation of this scheme for manipulation robotics. Our
results show that control is based both on vision and proprioception. And if one modality is
impaired, it can be compensated but not entirely by the other. We show the interest of the
method for robotics but also for psychophysiological characteristics as sensory attenuation
or motor diseases like Parkinson’s disease. It is an opening toward computational pathol-
ogy [209]. We also discuss the links between optimal control, the dominant paradigm in motor
computational neuroscience, and the general framework of active inference.

The Chapter 6 is the general conclusion and deepens two perspectives.

7
2 Bayesian perception and action in
humans and robots

Uncertainty is not in things but in our head:


uncertainty is a lack of knowledge.

Jakob Bernoulli, Ars Conjectandi (Bernouilli, 1713)

2.1 Uncertainty estimation in perception and action

One of the long-term goal in robotics is to introduce in human environment autonomous


robots capable of helping humans and interacting with them in everyday lifes safely and
efficiently. In a non-controlled environment, a robot, in order to perceive and act on the
world must face an incomplete information problem, the robot has a lack of knowledge on the
world, for example on the human interacting with him, or on the exact position of the object it
want to grasp on a industrial working plan. Its model of the world is therefore by definition
incomplete. We make the assumption that every models of any kind of real phenomenon are
incomplete [121]. We don’t have all the necessary information in order to understand and
model completely any kind of real phenomenon. For any kind of models, there is always a
lack of information, the phenomenon can be influenced by hidden variables, not taken into
account by the model. It follows logically that a model and a phenomenon never have the
same behavior with precision. The consequence of this incompleteness is uncertainty [21].
The question of the uncertainty of the model, on the assessement of the variables is therefore
fundamental. Uncertainty is relevant in most situations in which a human or a robot needs to
make a decision and by extension it will affect the problems solved by the brain. In addition,
humans and robots have only noisy sensors and any information we acquire from the world

9
Chapter 2. Bayesian perception and action in humans and robots

is therefore uncertain at any given time. Different authors argued that the estimation of
uncertainty and the (hidden) states of the world from noisy and incomplete data would be
one of the main function of the nervous system [168, 199].

In this thesis, we naturally opted for a modeling approach that can quantify that uncertainty:
the probabilistic approach. This approach presents several interests for cognitive science
and robotics. Indeed, any robots and humans, as they need to act, perceive, infer and take
decision in the real-world must face the problem of uncertainty. The probabilistic approach
seems particularly appropriate for this kind of problems as it presents a formal theory for the
evaluation of uncertainty.

2.2 Probability for rational reasoning with incomplete information

As we saw it, a robot must face the problem of uncertainty [19, 121]. Probabilistic modeling is
a mathematical framework that can deal with uncertainty and incompleteness. Our work is
based on the Bayesian theory of probabilities. The reader may note that other methods exist
in order to integrate uncertainty in a mathematical framework like fuzzy logic [56, 107], the
Dempster-Shafer theory [136, 165, 208] or the quantum probability theory [10, 147, 211] just to
name a few.

Bayesian approaches have been widely used in robotics for action, perception and control
in order to allow robots to handle uncertainty in their environments [20, 61, 180] and more
particularly since the development of Bayesian nets [140] and graphical models [120]. While
Bayesian inference has been proved to be an NP-hard problem [39] and one limitation of such
approach is that it can be highly demanding in computational power, the field of approximate
Bayesian computation has done several advances that allowed researchers to use Bayesian
models and compute them in a manageable timeframe [43, 160, 173].

Probabilistic robotics has been developed particularly in the field of navigation, notably with
the Bayes filter [180], manipulation robotics [83, 144] and for humanoid robotics [166], just to
name a few.

Several other approaches exist for robotic control or modeling of perception, action and
cognition. In the field of reaching for example (a subfield of manipulation robotics where the
aim is to acquire a target, to reach it with the end-effector), the most classicals as reviewed in

10
2.3. Bayesian modeling

[111] rely on algebraic [57, 81], iterative [115] and geometric methods [59, 122]. These methods
have their limits when the robotic system has a high number of degrees of freedom (DoFs),
generally superior to 6, uncertainty and constraints (such as obstacles avoidance). In this
case, the complexity of the inverse kinematics (IK) computation increases rapidly, becoming
time consuming which is a problem if the aim is to develop real-time robotic applications.
Thus, there is a clear need to find more efficient algorithms given the growing introduction
of humanoid robots into uncontrolled human environments. Several alternative approaches
have been used in the literature in order to solve the IK problem such as fuzzy logic [90, 106],
artificial neural network [22, 88] [132], or evolutionary algorithms [102]. Nevertheless, very
few of these approaches can deal accurately with uncertainty. Bayesian modeling, relying on
probabilities, offer a quantitative and clear expression of uncertainty into the models. The
approach presents the advantage to rely on simple rules of calculus for inference.

2.3 Bayesian modeling

We present in this section the theoretical and mathematical basics to understand the simula-
tions achieved in the framework of Bayesian modeling.

2.3.1 Subjective probabilities

Bayesian modeling pertains to the view based on the subjective interpretation of probability.
According to this view, a prior degree of belief is updated to posterior degree of belief in a
scientific hypothesis. The probability in this framework is based on the notion of plausibility.
Plausibility is intuitively the degree of certitude we assign about the truth of a proposition.
The knowledge a subject can have is therefore fundamental to assign this degree of certitude.
The pillar of this approach is the Cox theorem, which explains how to go from the notion of
plausibility to the mathematical notion of probability. Probabilities in this framework are
interpreted as an agent’s rational degree of belief.

Formally, in the subjectivist approach, P (X ) of a variable X does not exist. In this framework,
when we use distribution of probabilities, we describe the knowledge of a subject π about the
variable X and we should formally define the probability distribution about the variable X as
P (X | π).

11
Chapter 2. Bayesian perception and action in humans and robots

In the frequentist approach, it is natural to define the probability distribution about X as


P (X ). The probability exists in the world independently of the observer π. It is not possible to
have two different probability distributions P (X | π1 ) and P (X | π2 ) as it is possible with the
subjectivist approach. When possible, the right part of the probability corresponding to the
subject will be left implicit for the sake of simplicity.

2.3.2 Basic concepts and notation

In this section, we briefly present the calculus rules, notations and concepts of Bayesian
modeling.

Variables Probabilistic variables are written in uppercase letters: V1 ,V2 , ... , Vn . The modeler
must define the relevant variables of the Bayesian model and for each of them, he must
define its variation domain D and its number k of possible states.

Values The values are noted in lower case. They permit to define the variation domain of
the variables and their number of possible states. The variable V1 can have k values
v 1 , v 2 , ..., v k . And D V1 = {v 1 , v 2 , ... , v k }.

Joint distribution Formally, the joint distribution is the probability distribution P (V1 V2 ... Vn )
of a set of variables V1 , V2 , ... , Vn .

2.3.3 Probability calculus

Product rule The product rule define the probability distribution of the conjunction of the
variables A and B as a probability distribution over A and B :

P (A B | C ) = P (A | C ) P (B | A C )

= P (B | C ) P (A | B C ).

12
2.4. Bayesian inference and integration in cognitive science

Bayes theorem From the product rule we can derive the well-known Bayes theorem.

P (B | C ) P (A | B C )
P (B |A C ) = , if P (A | C ) 6= 0.
P (A | C )

The two rules can be used interchangeably.

Sum rule The sum rule defines how the probability values over a variable X sum to 1:

P (A | C ) + P (¬A | C ) = 1.

Marginalization rule And finally, from the sum and product rules, we can derive the marginal-
ization rule:

X
P (A B | C ) = P (B | C ).
A

Conditional independance When we have:

X
P (A | B C ) = P (A | B ).
A

We say A is independant of C knowing B .

2.4 Bayesian inference and integration in cognitive science

The general idea about perception and action in Bayesian modeling is the estimation of states
of the world S that are hidden from the subject or non-observable based on observables O.
These are typically informations coming from the sensors (audition, vision, tactile information,
etc.). We can compute with the Bayes rule the estimation of S given the observed information
O:

P (S)P (O | S)
P (S | O) = (2.1)
P (O)

On a very simple example, if we want to estimate if it will rain (S = ’rain’) given the humidity is

13
Chapter 2. Bayesian perception and action in humans and robots

high (O = ’High humidity’). If most of the time it rains when the humidity is high P (S | O) = 0.9
and the weather is very humid half of the time in the center of France, P (O) = 0.5, but it rains a
0.9×0.25
quarter of the time P (S) = 0.25 then P (S | O) = 0.5 = 0.45. This simple scheme has been
used to explain several phenomena in perception, action, and cognition in general. But the
used Bayesian models can have several different forms and range from hidden markov models
to Kalman filters, with sensor fusion, just to name a few.

The manner the brain integrates several sources of information is often considered a proof
that the brain can accomplish Bayesian computation (see Figure 2.1). For example, Ernst
and Banks show that kind of computation with visual and tactile modalities in one of the
most cited article in the field [58]. The task of the subjects was to accomplish a visual or
haptic estimation of height. They showed that the nervous system seems to combine visual
and haptic information very closely to a maximum-likelihood integrator. In this study the
combination of the cues is close to the statistical optimal. This result on cue combination
has been extended to different modalities (see [6] for a review). For example, it has been
shown that subjects can optimally combine visual and auditory cues [2, 12]. This explains the
McGurk effect where there is a discrepancy between the sound and what the lips of a person
are really saying. In this case, the subjects actually hear a sound between the auditory and
visual syllable [134]. This statistically-optimal cue combination has also been found in the
visual system, with cues of texture and motion [96], two different textures [119], or texture and
binocular disparity [110, 130].

A cue can also be combined with prior knowledge. It has been shown experimentally that
they can be learned over time [18] and that subjects combine new sensory information, or
likelihood in Bayesian terms, with the prior in a manner close to the statistical optimal. To
name a few, it has been shown for tasks of reaching, pointing or timing [113, 135, 177]. The
reader may note note that to use prior knowledge can lead to biased decision-making or
perception and action.

One other main principle of the nervous system is that it acts dynamically and acquire and
process sensory information over time. In Bayesian modeling, the integration of information
over time is often modeled as an Hidden Markov Model (HMM) or a Kalman filter (a special
case of Bayesian filtering) if the variables are continuous and if we assume they are also
gaussians [101]. This approach respects the Markov property which means that it assumes
the state of the world at time t only depends on the state of the world at time t − 1. Compared

14
2.4. Bayesian inference and integration in cognitive science

Figure 2.1 – Probabilistic representation of combining auditory and visual cues for estimating the
position of the source of the sound. a In this case, the visual cue is the most informative (green). The
peak of the posterior distribution corresponding to the combination of the cues (red) is shifted towards
the mean of the likelihood function associated to the vision cue. b In this case, the uncertainties on the
sensory modalities are close. The peak of the posterior is closer to the likelihood on audition (blue)
compared to the previous case. Figure from [108].

to other methods used to model the way the brain interact dynamically with the world like
state-space models [161, 179], the Kalman filter or HMMs assess uncertainty over time. As
we saw it above, uncertainty evaluation over time seems to be a main feature of our brains.
Kalman filters have been successfully used for explaining perception and action for a lot of
different tasks [34, 95, 112, 116, 118, 143, 172, 192, 194, 203, 205] (see next section).

Very important results to explain cognition, perception and action have also been achieved
using Bayesian decision-making or in other words how to combine uncertainty with cost
or reward about the actions on the world. It has been shown that humans can maximize
expected utility [131] during movement with constraints and can estimate their own motor
uncertainties [52]. In neuroscience, the structures involved in the reward value and probability

15
Chapter 2. Bayesian perception and action in humans and robots

are the orbitofrontal cortex, the striatum, the amygdala and dopamine neurons of the midbrain
[42, 62, 137, 138].

Given the success of the Bayesian approach in modeling cognition, several neuronal hypothe-
ses try to explain how the brain encodes uncertainty. One hypothesis rely on neurons sub-
population that would encode uncertainty using neurotransmettors like acetylcholine or nore-
pinephrine [5]. Good experimental support has been associated with this idea [62,93,133,167].
One other hypothesis states that uncertainty is encoded in connections between neurons, in
the number of synapses and their strenght [3]. Several other hypotheses exist (see [4, 195, 197])
in order to explain how the brain encodes uncertainty.

2.5 Predictive coding and the Bayesian brain hypothesis

2.5.1 Overview

Predictive coding has been proposed in recent years as a possible strategy for the brain in
order to perceive and act in the world. As Huang and Rao stated it [91]:

Predictive coding postulates that neural networks learn the statistical regu-
larities inherent in the natural world and reduce redundancy by removing the
predictable components of the input, transmitting only what is not predictable
(the residual errors in prediction)

This framework has a wide explanatory success including binocular rivalry [89], motor plan-
ning [29], action understanding [105], perception-action loops and perceptual learning [71,77],
Bayes optimal sensorimotor integration and predictive control [80], action selection [63, 76]
and goal-directed behavior [70, 74, 79, 145, 146]. This framework gives a neurally plausible and
computationally tractable way of explaining how Bayesian inference can be accomplished
into the brain (see Figure 2.2). It suggests also one way in which the brain could apply con-
straints in order to solve the inverse problem of perception [33, 37, 155]. More specifically,
predictive coding usually refers to an internal model (or several) encoded into different brain
regions [171]. Causes of sensory inputs are encoded by the internal model as parameters of a
generative model. These known causes represent the new sensory input. Minimizing the error
between the actual sensory data and the sensory inputs predicted by the expected causes

16
2.5. Predictive coding and the Bayesian brain hypothesis

determines which of the causes best explains the sensory data (see Figure 2.2).

The problem of perception and action via predictive coding has generally in the framework of
the Bayesian brain hypothesis a form of Bayesian filtering [47, 78, 80, 142, 155]. The state of
sensory and motor variables has to be estimated online from priors and a stream of noisy and
ambiguous observations. This problem can be solved using recursive Bayesian estimation or
Bayesian filtering. This kind of model is part of the class of probabilistic models of time series
or HMMs.

The estimation of hand position during and after movements have been explained using these
methods [95, 205] but also salient aspects in the control of posture [118, 143, 172, 194]. The
Bayesian filtering approach has also been used at different time scales to make estimates on
longer period of time and learning has been seen as such a long-term estimation [116]. In
this framework, learning is conceptualized as a form of Bayesian estimation over time. This
concept has been used to model the way humans adapt to force fields [17], the minimization
of arm-reaching movement errors over time using sensory information [34, 203], visuo-motor
perturbations [192], and the adaptation of saccades by monkeys [112].

Lastly, several hypotheses are trying to explain how the brain could neurally implement
Bayesian filtering, in the form of hidden markov models or Kalman filtering [48, 82, 204].
In neuroscience, the Bayesian approach is known as the Bayesian brain hypothesis. This
approach has been increasingly used the last decades [55,108,156]. The idea is to see the brain
as a probabilistic machine with the aim to predict the world. The hypothesis presents two
statements:

• The brain performs Bayesian inference to enable us to make judgements and guide
action in the world.

• The brain represents sensory information in the form of probability distributions.

This hypothesis entails a single principle by which perception, action and cognition is accom-
plished.

The predictive coding approach is gaining prominence in the neuroscience field. The idea is
that the brain by minimizing the difference between actual sensory signals and the signals
expected on the basis of continuously updated predictive models, would infer the most likely

17
Chapter 2. Bayesian perception and action in humans and robots

Figure 2.2 – Predictive coding. A. Representation of the information flow in hierarchical predictive
coding in three different regions. The lowest brain regions is R1 and the highest is R3. In blue we have
the top down projections. They correspond to predictions coming from deep regions to the superficial
layers. In red correspond bottom-up projections, coming from error units (orange) of the superficial
cortical layers and terminating in states units (light blue) in the deep (infragranular) layers. Precision
can be related with prediction errors and associated weights to top-down and bottom-up signals. B It
represents probability densities over the value of the sensory signal. At left, high precision of sensory
signal (red) has a strong influence on the posterior (green) and expectation (dotted line) comparatively
to the prior (blue). At the opposite, low precision weighting has a weak influence on the posterior
(green) and expectation (dotted line) comparatively to the prior (blue). Figure from [163].

causes of its sensory inputs [163]. According to its prediction errors, it will change its internal
model of the world (perception) or via reflex arcs would suppress these predictions errors via
actions (action). Several implementations of this theory have been developed and among
them, the free-energy principle [79] that we will use in the third article of this thesis.

Predictive coding is formally very close to Bayesian filtering. In the next section, we will
present its mathematical formalism.

18
2.5. Predictive coding and the Bayesian brain hypothesis

2.5.2 Bayesian filtering formalism

Two class of variables are involved, S 0 , ... , S t (or with another notation S 0:t ) which are the
(hidden) state variables considered on a time horizon ranging from 0 to t , and O 0 , ... ,O t (or
O 0:t ) the time series of observation variables on the same horizon. The decomposition of the
joint probability of this class of model is:

t
Y
P (S 0 , ... , S t ,O 0 , ... ,O t ) = P (S 0 )P (O 0 | S 0 ) P (S i | S i −1 )P (O i | S i ) (2.2)
i =1

To sum up, the model is described by three terms:

• P (S 0 ) which is a prior on the state at time t = 0.

• P (S t | S t −1 ) which is the "evolution model" or "transition model" and corresponds to


the knowledge concerning the transition between two time steps.

• And P (O t | S t ), or the "observation model", which corresponds to what we can observe


if the state is S t .

The model is assumed to be markovian on the states and the observations are conditionally
independant. Therefore, we have:

• P (S t | S 1:t −1 O 1:t −1 ) = P (S t | S 1:t −1 ), in other words S t is independant of anything hap-


pening before t − 1 given S t −1 .

• And P (S t −1 | S t :T O t :T ) = P (S t −1 | S t ) with T > t . It means that the past is independant


of the future given the present.

• And for the independance of observations: P (O t | S 1:t O 1:t −1 ) = P (O t | S t ). O t is condi-


tionally independant of measurements and state histories.

What we want to answer in Bayesian filtering is determining the marginal posterior distribution
or filtering distribution over the state S t knowing all the observations O 0 , ... ,O t , in other words

19
Chapter 2. Bayesian perception and action in humans and robots

we want to know:

P (S t +k | O 0 , ... ,O t ) with k = 0. (2.3)

If k > 0, it is "prediction", if k < 0, this scheme is called "smoothing" [51] (see Figure 2.3). With
Bayesian filtering, the aim is to estimate the present state given past observations while with
"prediction", we want to extrapolate future states given past observations and with smoothing
one searches to estimate past states given observations obtained before or after the states we
want to estimate. The reader may note that the terms can be ambiguous. An estimation of an
hidden states for filtering, "smoothing" or "prediction" can also be called a prediction as we
don’t know it by definition.

Figure 2.3 – Depending on the available measurements with respect to the time of the estimated state,
state estimation problems can be divided in prediction, smoothing and filtering. Figure from [159].

We can decompose this probability distribution in a recursive way [54, 159]. We want to set up
a recursive relationship where we base our next estimate on the previous estimate and the

20
2.5. Predictive coding and the Bayesian brain hypothesis

latest measurement:

P (S t O 1:t )
P (S t | O 1:t ) = (Bayes theorem)
P (O 1:t )
P (O 1 )P (O 2 | O 1 )...P (O t −1 | O 1:t −2 )P (S t | O 1:t −1 )P (O t | S t O 1:t −1 )
= R
P (O 1:t S t )dS t
P (O 1 )P (O 2 | O 1 )...P (O t −1 | O 1:t −2 )P (S t | O 1:t −1 )P (O t | S t O 1:t −1 )
= R
P (O 1 )P (O 2 | O 1 )...P (O t −1 | O 1:t −2 ) P (S t | O 1:t −1 )P (O t | S t O 1:t −1 )dS t
P (S t | O 1:t −1 )P (O t | S t )
=R (conditional independance of measurements)
P (S t | O 1:t −1 )P (O t | S t )dS t
(2.4)

We have now the prediction term and we can compute it with the following form:

Z
P (S t | O 1:t −1 ) = P (S t S t −1 | O 1:t −1 ) dS t −1
Z
= P (S t | S t −1 O 1:t −1 )P (S t −1 | O 1:t −1 ) dS t −1 (2.5)
Z
= P (S t | S t −1 )P (S t −1 | O 1:t −1 ) dS t −1

This term is also known as the Chapman-Kolmogorov equation [159]. This last prediction
permit us to compute the update. Finally, we have the following recursive Bayesian estimation:

Z
P r ed i ct i on : P (S t | O 1:t −1 ) = P (S t | S t −1 )P (S t −1 | O 1:t −1 ) dS t −1

(2.6)
P (S t | O 1:t −1 )P (O t | S t )
Upd at i ng : P (S t | O 1:t ) = R
P (S t | O 1:t −1 )P (O t | S t )dS t

These probability density functions are hard to compute as they imply the calculation of high-
dimensional and complex integrals. Different approximation methods exist, like sampling [54]
and variational methods [13].

The problem of action and perception with the predictive coding approach under the Bayesian

21
Chapter 2. Bayesian perception and action in humans and robots

brain hypothesis seems to be mainly a problem of Bayesian filtering [47, 78, 142]. All the
presented studies in this thesis have been built around this notion. The reader may note
that several neural implementations of Bayesian or Kalman filtering have been developped
[48, 78, 204].

2.6 Computational motor control

2.6.1 Forward, inverse model or both ?

Forward models, or models that predict the next state of a system given a motor command are
widely used in cognitive science and computational motor neuroscience [49, 87, 100]. Inverse
model in these domains are coming from the optimal control theory or cybernetics [38, 182]
and estimates the motor command needed in order to achieve a particular position of the
body. Different roles can be assigned to forward models in the case of motor control. Two
approaches can be distinguished: auxiliary forward models (AFM) or integral forward models
(IFM). We use the taxonomy introduced in Pickering et al [148] for these architectures.

Figure 2.4 – Auxiliary Forward Model (AFM) architecture. The forward model receives an efference
copy to predict the consequences of actions. Figure from [148].

Auxiliary forward model

In the first case, AFM, the forward model is auxiliary and is not part of the core of perception
and action (see Figure 2.6.1).In this framework, an inverse model is used [53, 104, 126, 206]. It
permits to compute a motor command which is sent to the forward model as an efference

22
2.6. Computational motor control

copy in order to assess the sensory consequences (which are compared online with the actual
outcomes for correction error and learning).

The inverse model sends motor commands to the muscles of a human arm (or to the robot)
and the forward model receives an efference copy of this action command and predict the
sensory data resulting from this action. Before the actual command, the output of the forward
model (called also the corollary discharge) is computed, for example the position of the
end-effector, and then an error between the resulting action and the sensory prediction
can be accomplished. The forward model is independent of the plant (muscular or robotic
implementation). The inverse model and the forward model converts respectively intentions
into motor commands and motor commands into sensory consequences.

Integral forward model

This kind of model can be linked to predictive coding [37, 67, 123, 155]. In this case, forward
model is also used to predict the sensory consequences of action but in a more broader
sense (see Figure 2.5). Perception itself use the forward generative model to predict its own
sensations in a top-down way. The prediction error is used in order to refine the parameters of
the generative model in order to better predict the states of the world.

Figure 2.5 – Integral Forward Model (IFM) architecture. Figure from [148].

Control of action has been added to this framework [80]. Actions are also the results of our
predictions in the sense that prediction errors are eliminated by movements. For example, if

23
Chapter 2. Bayesian perception and action in humans and robots

my goal is to grasp a pen with my hand, but the pen is 50 cm away from it, the prediction error
is high and will be transformed in action via spinal reflexes [80]. This approach of action is
close to the ideomotor theory [97, 129], where it is the idea of moving that triggers action. The
inverse model is not used in this framework.

One core variable of the approach is the precision (or inverse variance of the signal) [73] of
selected aspects of the sensory prediction errors. For action, the precision of the prediction
errors on the predicted movement is set high, so that they are trusted and this high confidence
triggers action to quash them. The proprioceptive predictions errors or in other words the
desired movement have a high degree of confidence and they are eliminated by he movement
or motor actions. When the precision of the proprioceptive prediction errors is set to high,
the precision of sensory prediction errors is set to low, this means that the movement has
not been achieved but this can also be linked to the sensory attenuation we can observe in
self-produced movement [29].

High precision also shows an interesting property which is that one can use the same frame-
work to explain action understanding. If the prediction errors of proprioception is set to low,
the agent doesn’t engage action but use its forward model to predict and understand the
actions of others.

Do we need the inverse model ?

These two schemes differs in the use (or not) of the inverse model. This imply very different
view on the cognitive structure for motor control. In the IFM framework, finally there is no
distinct mechanism to predict our own actions or the actions of others. By extension, it means
that if the brain regions responsible of the forward model is impaired, both motor control and
motor recognition will be impaired. By contrast in AFM, even if the forward model is damaged,
it will be possible to accomplish skilled movement using the inverse model, but not to learn
new movement or make online error correction of the movement during a particular task.

Another point is that the IFM framework could show a lack of flexibility as the same model is
used either for action production and action prediction. But precision weighting influenced
by the context permits to use only certain aspects of the forward (generative) model and then
let some room for adaptibility, and simplifications.

24
2.6. Computational motor control

In IFM, efference copy is also suppressed [148]. There is only descending prediction that leads
to action. It is just the context that change in action and action recognition of another agent.

In the neuroscience literature, it is not totally clear if the cerebellum learns and uses the inverse
model for action. Some approaches claim both are used [94], others support the cerebellum
learns the forward model [210] or emphasize sensorimotor dynamics and perceptual inference
[99, 175, 196, 207]. It is therefore not clear if the inverse model is really needed for motor
control [11, 117, 158].

Optimal control theory [68, 181, 182] has been widely used in neuroscience and robotics for
motor control. In this theoretical framework, behaviors can be reduced to the optimization of
a value function, relying on forward-inverse models [53, 104, 126, 181, 206]. Nevertheless, this
question of the usefulness of this approach to understand motor behavior became contro-
versial [68]. This line of thinking claims that maybe optimal control should be understood in
terms of inference and prior beliefs (thanks to the complete class theorem), and not in terms
of value functions. With predictive coding and more particularly action-oriented predictive
coding (or active inference), there is no need for the inverse model. We go more in depth into
this approach in the next section.

2.6.2 Optimal control as an inference problem

Optimal control

Optimal control is widely used in the robotics community. Briefly, it corresponds to minimizing
some cost function in order to compute robot commands for a particular motor task. This
scheme also assume that the optimality equation can be solved [15]. It appears that this kind
of equation is difficult to solve from the computational point of view and often necessits to
compute approximate solutions, ranging from backward induction to dynamic programming
or reinforcement learning [174]. This scheme is formally equivalent to the AFM framework
seen above [148]. The key components of the conventional optimal control approach [53,
126, 181, 182] are an inverse model( or optimal controller), a forward model and a state
estimation. The movement we have the intention to achieve is specified by command signals
minimizing some cost function that are computed by the inverse model. The hidden states of
the motor plan are estimated using sensory signal. The predicted changes are optimized using
updates coming from the sensory prediction errors. The outputs of the forward model are

25
Chapter 2. Bayesian perception and action in humans and robots

the predicted changes, based on state estimates and optimal control signals. The controller
sends an efference copy to the forward model in order to refine state estimation adding
noisy sensory prediction errors with predictions to compute Bayes-optimal state estimates.
This conventional scheme rests upon distinct forward and inverse models, which need to
be learned. Forward model learning is part of sensorimotor learning and is generally Bayes-
optimal while the learning of the inverse model needs a form of dynamic programming or
reinforcement learning and assumes that movements can be specified in terms of a cost
function [68].

Optimal control, cost functions and priors

Recent developments in motor control theory highlight the role of sensorimotor dynamics
and perceptual inference over conventional optimal control based on the inverse and forward
models [175,176, 196, 207]. With predictive coding, there is no use of the inverse model, we use
prior belief about the movement (in an extrinsinc frame) instead of optimal control signals for
movements (in an intrinsinc frame). In active inference [80] (or action-oriented predictive
coding), there is no inverse model or cost function and the resulting trajectories are Bayes-
optimal, at the opposite, optimal control relies on the inverse model that provides control
information about optimal trajectories according to the cost function.

Optimal control can be seen as an inference problem. The idea is to replace cost functions with
a random variable conditioned on a desired observation. This permits an equivalence between
minimizing cost and maximizing the likelihood of desired observations [40,68,141,164]. Recent
studies have been achieved to resolve the inference problem like the computation of optimal
policies using a likelihood maximization problem [186] or the use of variational approaches
for optimal decision-making problems in Markov decision processes [26, 184].

In action-oriented predictive processing, optimal control can be seen as a special case. The
statement is based on the fact that some policies cannot be specified as a cost function but
can be described as a prior. This come from variational calculus and says that a trajectory or a
policy have several components: a curl free component that changes value and a divergence-
free component that doesn’t change value. Therefore, the divergence-free motion can be only
specified by a prior and not by a cost function. It is possible to express as the gradient of
a Lyapunov or value funtion a policy or motion that is curl-free [7]. But only priors beliefs

26
2.7. Conclusion

can express divergence-free motion, like walking or atmospheric circulation. This kind of
movement is called solenoidal, the valuation is equal for every part of the trajectory and
cannot be cast in a cost function [68,69]. For in-depth (mathematical) treatment of this notion,
the reader can refer to [69].

In addition, a correspondance between cost function and prior beliefs can be done with the
complete class theorem [30] [157]. It says that at least one prior belief and cost function can
describe a Bayes-optimal behavior. Prior beliefs about state transitions replace cost functions.
Action fullfils prior beliefs and the agent believes it minimizes future costs moving through
states-space.

The inverse problem is resolved in active inference and predictive coding by replacing optimal
control signals that specify muscle or articulations movements with prior beliefs about the
desired trajectories. Nevertheless, we have to note that the computational complexity of a
problem is not reduced when we formulate it as an inference problem [125].

For a more detail discussion on the links between optimal control and active inference see
[1, 68, 75].

2.7 Conclusion

In this review of the field, we presented Bayesian modeling and the choice of this approach for
the thesis and more particularly for its mathematical treatment of uncertainty, a characteristic
of any human environment we can find for perception and action. We reviewed the usefulness
of Bayesian modeling for perception and action. We presented predictive coding and one of
its formalism close to Bayesian filtering. We saw that the Bayesian modeling framework is
particularly suited to model the brain, perception, action and cognition in general and that
predictive coding, (and one of its implementation, the free-energy principle and its corollary
active inference) it is possible to integrate optimal control in the framework.

In this thesis, we developed several methods for manipulation robotics and navigation which
have been conceptualized around this notion of predictive coding, and without to rely on the
inverse model. A method we will find in the different articles we present in the chapters 3 and
4. We developed methods with the aim to avoid the inverse problem, which presents a huge
difficulty for the computation of the inverse kinematics in robotics when the number of DoFs

27
Chapter 2. Bayesian perception and action in humans and robots

is high. We saw also that optimal control can be cast in terms of Bayesian inference, and more
particularly active inference [80], an approach we will use in the Chapter 5. When pertinent,
we will discuss our results comparing them with human data.

28
3 Hierarchical Bayesian filtering for
reaching and grasping

As far as the laws of mathematics refer to reality, they are not certain;
and as far as they are certain, they do not refer to reality.

Albert Einstein

3.1 Introduction

The introduction of robots in human environments is growing. These robots need to navigate
and interact with objects and humans in a safe way, in other words these robots have to act
in uncontrolled and uncertain environments far away from the structured environment of a
manufactured plant. Thus, there is a clear need to find more efficient algorithms given the
growing introduction of humanoids robots into uncontrolled human environments. Several
alternative approaches have been used in the literature in order to solve the IK problem such
as fuzzy logic [106] [90] , artificial neural networks [22] [88] [132], evolutionary algorithms [102]
and probabilistic modeling [41]. Our method presented in this report is part of the latter.

In manipulation robotics, reaching a target into the operational space or an object is funda-
mental. This problem requires to specify a position and an orientation and to determine the
joints parameters allowing the arm to reach this target. The problem is knwown as inverse
kinematics (IK), it is about finding the joint parameters such as the desired position of the
end-effector is provided. Several approaches exist and the most classicals as reviewed in [111]
rely on algebraic [57] [81], iterative [115] and geometric methods [59] [122].

For human adults, reaching a target feels like an easy task, but nevertheless involves many

29
Chapter 3. Hierarchical Bayesian filtering for reaching and grasping

degrees of freedom and complex trajectories in the joint space. Any healthy human can reach
and grasp thousands of object everyday without any problem. But how humans perform
this task ? Several authors have shown that humans could use internal models for trajectory
planning [104]. They could plan their trajectory for movement in the perceived visual space
[64]. And after this first step they would follow this simulated trajectory controlling of their arm
and hand. Other authors have shown good similarities between the reaching movement and
Bayesian filtering [34]. Following those authors, we developed a model of this visual internal
simulation and control of the effector using Bayesian filtering. Exactly, we applied Bayesian
filtering in two spaces, the visual space, for internal simulation and in the motor space, for the
control of the manipulator with actual interactions with the environment, resulting in a dual
Bayesian filtering for reaching and manipulation.

To fully understand human movement generation, as well as to develop efficient algorithms


for humanoid or dexterous manipulation robots (that share the same constraints), overcoming
the limits and drawbacks of classical control approaches is required. Without resorting to
inverse kinematics (that may be excessively time consuming or lead to rough approximations),
we want to show that it is possible to control high dimensional systems by simulating and pre-
dicting the outcome of local actions (forward model only), as long as the problem complexity
is broken down into both the visual and motor spaces. The IK methods have their limits when
the robotic system has a high number of degrees of freedom (DoFs), generally superior to 6,
uncertainty and constraints (such as obstacles avoidance). In this case, the complexity of the
inverse kinematics computation increases, rapidly becoming time consuming and having a
high computational cost.

Our approach include also a bio-inspired perspective. Indeed, for healthy humans, reaching
and grasping objects are easy tasks. Several authors have shown that motor control and action
selection could be Bayesian and showed good results using this modeling approach [113] [114]
[185]. They highlight the importance of uncertainty in motor control. Other authors have also
shown good similarities between Kalman filtering and human movement for reaching [34]
and a possible neural implementation of this Bayesian filter into the brain can be found
in [48]. To our knowledge, Bayesian filtering for inverse kinematics has never been applied to
manipulation robotics, in a context of control of the robotic system but this method has been
explored recently in the field of animation [41].

We applied (approximate) Bayesian filtering to reaching and grasping on simulated robots. We

30
3.1. Introduction

used particle filtering as approximation of Bayesian inference. Particle filtering is a Sequential


Monte Carlo (SMC) method for approximate bayesian computation (ABC). We used sequential
importance sampling method for the estimation of the particles. This method allows us to
estimate recursively in time the importance weights.

We can list four advantages of this approach:

• Uncertainty is included into the model.

• A high number of DoFs can be controlled thanks to the low computational cost of
approximate Bayesian filtering.

• The inverse model is not needed in order to resolve the inverse kinematics problem [41].
Only the forward model is required.

• The whole approach can be easily parallelized.

We rely on a bio-plausible and probabilistic method for generating reaching movements in


complex settings. Specifically, we apply (approximate) Bayesian filtering in the visual and
motor spaces. Visual filtering permits to define an initial rough trajectory in the operational
space (avoiding obstacles), which is then refined by motor filtering, allowing direct control in
the joint space while respecting joint limits.

The method was validated in simulation on a set of scenarios, with one or several targets
(e.g. fingers on an object for grasping) to be reached with one or several arms/hands. With
strong spatial constraints (e.g. obstacles), it succeeds in finding a trajectory where inverse
kinematics methods fail. Relying on the visual space seems promising to reduce the complexity
of movement generation in the motor space. Bayesian filtering permits to find hidden variables
considering observables over time. This permits to find a trajectory while avoiding obstacles
but which is not optimal. Of course, sometimes, a trajectory defined into the visual space
may not enforce limits of the joints of the robotic system. But this is a robust method. Using
Bayesian filtering into the visual space is not a method of global planification as Bayesian
filtering is more a method using local planification but as it is applied in the visual space, a
trajectory is almost always found. This predictive coding approach permits to control a robot
with a high number of DoFs (up to 51) without to jeopardize the real-time characteristic of the
algorithm.

31
Chapter 3. Hierarchical Bayesian filtering for reaching and grasping

3.2 Article

32
Cogn Process (2015) 16 (Suppl 1):S293–S297
DOI 10.1007/s10339-015-0710-0

SHORT REPORT

Dual filtering in operational and joint spaces for reaching


and grasping
Léo Lopez1 • Jean-Charles Quinton1,2 • Youcef Mezouar2,3

Published online: 1 August 2015


Ó Marta Olivetti Belardinelli and Springer-Verlag Berlin Heidelberg 2015

Abstract To study human movement generation, as well fundamental. This task generally requires to specify a
as to develop efficient control algorithms for humanoid or desired end-effector pose (position and orientation) and to
dexterous manipulation robots, overcoming the limits and determine the joint parameters allowing the robot to reach
drawbacks of inverse-kinematics-based methods is needed. it, a problem also known as inverse kinematics (IK). Sev-
Adequate methods must deal with high dimensionality, eral classical approaches exist to solve this problem, which
uncertainty, and must perform in real time (constraints rely on algebraic (Fu et al. 1987), iterative (Korein and
shared by robots and humans). This paper introduces a Badler 1982) and geometric methods (Lee 1982).
Bayesian filtering method, hierarchically applied in the Nevertheless, for healthy human adults, reaching a tar-
operational and joint spaces to break down the complexity get feels like an easy task, yet involves many degrees of
of the problem. The method is validated in simulation on a freedom (DoFs) and the online generation of complex
robotic arm in a cluttered environment, with up to 51 trajectories in the joint space. To fully understand human
degrees of freedom. movement generation, as well as to develop efficient
algorithms for humanoid or dexterous manipulation robots
Keywords Bayesian filtering  Reaching  Operational (that share the same constraints), overcoming the limits and
space  Joint space  Grasping drawbacks of classical control approaches is required.
Without resorting to inverse kinematics (that may be
excessively time-consuming or lead to rough approxima-
tions), we want to show that it is possible to control high-
Introduction dimensional systems by simulating and predicting the
outcome of local actions (forward model only), as long as
A clear trend to include robots in our everyday environ- the problem complexity is broken down into smaller sub-
ments is emerging. These robots need to find their way and spaces. We here focus on the decomposition of the global
interact with objects and humans in uncontrolled, uncertain sensorimotor problem of reaching one or several targets
and often cluttered environments, way different from the with one or several effectors in the operational and joint
structured environment of a manufacturing plant. In spaces (or visual and motor spaces in cognitive science).
manipulation robotics, using the arm to reach a target in the We rely on a probabilistic method for generating reaching
operational space or placing fingers on objects is movements in complex settings. Specifically, we apply
(approximate) Bayesian filtering successively in the oper-
& Léo Lopez
ational and joint spaces. Operational filtering permits to
leo.pio.lopez@gmail.com define an initial rough trajectory in the operational space
(avoiding obstacles), which is then refined by joint filter-
1
Pascal Institute, Clermont University, Clermont-Ferrand, ing, allowing direct control in the joint space while
France
enforcing joint limits. Additionally, this method is bio-
2
Pascal Institute, CNRS (UMR 6602), Aubiere, France logically plausible, as such filtering has been proposed to
3
Pascal Institute, IFMA, Aubiere, France be implemented in the brain (Deneve et al. 2007).

123
S294 Cogn Process (2015) 16 (Suppl 1):S293–S297

The method was validated in simulation on a set of Since this is not sufficient to make this method directly
scenarios. With strong spatial constraints (e.g., obstacles), applicable for online movement planning on systems with a
the method succeeds in finding a trajectory where inverse large number of DoFs, we apply Bayesian filtering in a two
kinematics methods fail. Relying on the operational space steps procedure, successively in the operational space and
thus seems promising to reduce the complexity of move- joint space. The first step is thus about generating a rough
ment generation in the joint space. 3D trajectory, similar to what humans can imagine and
internally simulate when they want to catch an object. The
resulting trajectory is then used as a guiding thread to
Model follow during step 2, where Bayesian filtering is applied in
the joint space. This 3D trajectory permits to define a set of
Several problems in science and engineering involve esti- sub-targets to reach for the arm before attaining the final
mating some (hidden) states given observables or noisy target. The following subsections describe the evolution
measurements. This problem can be solved using Bayesian models and observation models used in the operational and
filtering methods (Doucet et al. 2001), which we adopt in joint spaces.
this paper and apply to robotics. Two classes of variables
are thus involved : S0 ; . . .; St which are the state variables Evolution model in the operational space
considered on a time horizon ranging from 0 to t and
O0 ; . . .; Ot the time series of observation variables on the In the operational space, the state variables are the position
same horizon. The decomposition of the joint probability of (in 3D world coordinates) and the observation variables
this class of model is: correspond to a set of constraints. In the examples provided
PðS0 ; . . .; St ; O0 ; . . .; Ot Þ ¼ PðS0 ÞPðO0 jS0 Þ in the results, the position will correspond to the center of
the hand. The evolution prior describes the nature of the
Y
t ð1Þ movement, and we chose here a basic random walk (see
PðSi jSi1 ÞPðOi jSi Þ
i¼1
Eq. 4) instead of a more complex model-based prior.
Nevertheless, the prior could be learned or improved, for
which can be understood by considering the three follow- instance relying on human demonstration (Argall et al.
ing right-hand terms: 2009). In any case, we want to simulate a trajectory
– PðS0 Þ is a prior on the state at time t ¼ 0. between the actual position of the end effector of the robot
– PðSt jSt1 Þ is the ‘‘evolution model,’’ which corresponds and a target in the operational space while coarsely
to the knowledge concerning the possible transitions avoiding obstacles.
between two time steps. PðSt jSt1 Þ ¼ randomwalk
– PðOt jSt Þ is the ‘‘observation model,’’ which corre- !
X ð4Þ
sponds to the information we can observe and thus ¼ U St St1 ;
exploit whether the state is St . S

The question to answer with Bayesian filtering is how to A Gaussian noise model controls the actual displacement in
determine the probability distribution over the states at operational space. We deliberately used a Gaussian noise
time t knowing the sequence of observations, i.e., model in order to find trajectories in complex environments
PðSt jO0 ; . . .; Ot Þ. We can decompose this probability dis- in an online manner even if we have to sacrifice the opti-
tribution in a recursive way (Doucet et al. 2001), using the mality of the trajectory. The covariance matrix does not
predict (Eq. 2) and update steps (Eq. 3). enforce the limits of the joints of the robot.
Z
PðSt jO0:t1 Þ ¼ PðSt jSt1 ÞPðSt1 jO0:t1 Þ dSt1 ð2Þ Evolution model in the joint space

PðOt jS0:t O0:t1 ÞPðS0:t1 O0:t1 Þ In the joint configuration space and for the proof-of-con-
PðSt jO0:t Þ ¼ R ð3Þ
PðOt jSt ÞPðSt jO0:t1 Þ; dSt cept purpose of the present paper, the state variables again
These probability density functions are hard to compute as correspond to the joint configuration of the center of the
they imply the calculation of high-dimensional and com- hand. The evolution model has also been chosen in order to
plex integrals. We need approximation in order to use find a trajectory avoiding objects in a coarse manner. The
Bayesian filtering in real time. The approximation we have equation is therefore the same as in the operational space
chosen is particle filtering which is an instance of (see Eq. 4).
Sequential Monte Carlo (SMC) methods (Doucet et al. A Gaussian noise model also controls the displacements
2001). for the simulation of the trajectory. However, we used this

123
Cogn Process (2015) 16 (Suppl 1):S293–S297 S295

time a fine-tuning method for this evolution model. The As many targets as we want can be added with this algo-
P
covariance matrix S enforces the limits of each joint. The rithm, by simply adapting the considered end effectors and
sampling of particles is performed following this model associated distances. Although any heuristics could be
and if the kinematic properties of the joints are not used, a direct model of the robot allows the correct com-
enforced for any of the St due to random walk, we assigned putation of the distances from the arm and fingers to the
the variables of interest of the corresponding joint config- target. We have to precise that for the following tasks the
uration to its lower or upper limit depending of the limit targets to reach are the sub-targets defined after the filtering
that has been reached. This approach avoids the case where in the operational space. A sub-target is considered reached
the algorithm enters in an endless rejection of values for the when the distance between the end effector and the sub-
joint configuration, because the generation of particles is target decreases below some error threshold, as the filtering
confined to a physically unreachable part of the space. in the operational space does not enforce the joint limits.
Once again, as it corresponds to a low-level body schema The sub-target the closest from the final target is then
representation of the system (Sturm et al. 2009), this model selected until the arm reaches the final one (Fig. 1).
could be learned (e.g., via human demonstration or motor
babbling), aiming for more naturalistic movements and to
limit the useless rejection of particles. Results

Observation model in the operational space We applied our algorithm on several examples in order to
show the advantages of dividing the problem into the
The observation variables in this case correspond to several operational and joint spaces. All the simulations have been
constraints the robot has to enforce. The first constraint is developed with OpenRave (Diankov and Kuffner 2008).
to avoid obstacles to guarantee its own integrity, and the
second one is to generate a movement as efficient as pos-
1) Inializaon
sible toward the target. In this paper, efficiency is simply
defined as the minimization of the distance required to
approach the target. In a probabilistic way, this can be
written as:
(   target
exp jXendeffector  Xtarget j
PðOt jSt Þ / ð5Þ 2) Filtering in the
0 if obstacles operaonal space
This formulation of the observation model is really flexible
as we can add as many constraints as we want. We can also
associate weights to the constraints in order to implement a
prioritization of the constraints.

Observation model in the joint space 3) Filtering in the


joint space
The constraints are the same as for the simulated trajectory
in the operational space, but in order to demonstrate how
the hand and fingers should be preferably used only when
the target/object has been approached, we weight the
constraints as follows:
( Q
expð2  jD1 jÞ: ni¼2 expðjDi jÞ
PðOt jSt Þ / ð6Þ
0 if obstacles
where D1 is the distance from the hand to the target and Di
with i ¼ 2; . . .; n are the distances from the fingertips to the
target. The exponential function allows us to define a
probability density of the distances from the targets. The Fig. 1 Description of the method. The first step is the initialization of
weighted combination of these constraints allows the all the variables. The second step consists in the filtering in the
operational space. This permit to define a first trajectory to follow for
algorithm to focus first on the reaching of the target object control like a guiding thread. The last step is the filtering in the joint
and in a second time to grasp it, by setting the parameter. space which is used for controlling the robotic system

123
S296 Cogn Process (2015) 16 (Suppl 1):S293–S297

We start by demonstrating in the first example why the


coupled Bayesian filtering is more efficient than Bayesian
filtering in the joint space alone. We then turn to a second
example demonstrating the scalability of the algorithm, by
applying it on a 51 DoFs robotic systems. In both cases, our
method allows the online generation of reaching
movements.

Cluttered environment

We demonstrate here that the hierarchical decomposition


of the problem and thus the application of dual Bayesian
filtering is clearly helpful to find solutions in cluttered
environments. We defined an hard to reach target, located
under a large obstacle in a simulated human environment,
while the manipulator has to simultaneously deal with Fig. 3 50 Trajectories with Bayesian filtering applied into the joint
space only. The arm always hit the bench
several other obstacles and its own joint limits in order to
reach it (Fig. 2).
On 50 trials, applying the algorithm directly and only in
the joint space leads to 0 % of reaching successes. The
manipulator always converges to a local extrema above the
main obstacle and cannot escape it, due to a lack of global
planification (Fig. 3).
Although the same reason explains why the application
of dual Bayesian filtering only reaches a success rate of
30 %, the definition of the guiding thread in operational
space significantly improves performance, even when it
cannot be followed closely (Fig. 4).

Redundant arm

On an ultra-redundant system, standard inverse kinematics


methods can have problem to compute a solution in real Fig. 4 50 Trajectories with Bayesian filtering applied into the
time. Our method can be applied on a such robot with a operational space first and then into the joint space
high number of DoFs without a significant increase in
computational cost. Indeed, the computational complexity
of our algorithm only depends on the number of particles dimensionality. Nevertheless, filtering in the operational
(Doucet et al. 2001), but would nevertheless requires an space (always 3D) again acts as a scaffolding step, able to
exponential increase in the number of particles if filtering reduce the complexity of the search in the high-dimen-
was applied in the joint space only, because of the curse of sional space. Using the previous setup, except for a custom
WAM arm with 51 DoFs, we obtained 64 % of reaching
success (Fig. 5).

Conclusion and perspectives

We presented in this paper a biologically plausible Baye-


sian filtering method for reaching and grasping. We applied
it on two proof-of-concept simulated setups to show its
advantages as an alternative to inverse kinematics. Our
results demonstrate that under some conditions, the prob-
Fig. 2 Left initial pose of the arm and target. Right other orientation
of the scene. The target for reaching is in orange and is located under
lem of reaching can be simplified by breaking it down into
the bench two spaces, applying Bayesian filtering successively in the

123
Cogn Process (2015) 16 (Suppl 1):S293–S297 S297

in conjonction with the distance in order, for instance, to


minimize energy consumption. An arbitrary random walk
is finally used for the evolution model, but these models
could be learned from improving the sampling of particles,
allowing the generation of more human-like movements
and further testing the biological plausibility of such
method.

References

Argall BD, Chernova S, Veloso M, Browning B (2009) A survey of


robot learning from demonstration. Robot Auton Syst
57(5):469–483
Deneve S, Duhamel J-R, Pouget A (2007) Optimal sensorimotor
Fig. 5 Trajectories with the 51 DoFs arm integration in recurrent cortical networks: a neural implementa-
tion of Kalman filters. J Neurosci 27(21):5744–5756
Diankov R, Kuffner J (2008) Openrave: a planning architecture for
operational space and then joint space. This hierarchical autonomous robotics. Robotics Institute, Pittsburgh, PA, Tech-
decomposition generates a reference trajectory in a low- nical report CMU-RI-TR-08-34, 79
dimensional space, later exploited to constrain the explo- Doucet A, De Freitas N, Gordon N (2001) An introduction to
ration of the high-dimensional space. This method does not sequential Monte Carlo methods. In: Sequential Monte Carlo
methods in practice. Springer, New York, pp 3–14
require an inverse model, a high number of DoFs can be Fu KS, Gonzalez R, Lee CG (1987) Robotics: control sensing vision
controlled, and can be easily extended to deal with a large and intelligence. McGraw-Hill, New York
number of targets and effectors. Additionally, the method Korein JU, Badler NI (1982) Techniques for generating the goal-
performs in real time, and computation time could be easily directed motion of articulated structures. IEEE Comput Graph
Appl 2(9):71–81
reduced through parallelization (natural for a particle filter Lee CG (1982) Robot arm kinematics, dynamics, and control.
approximation). Computer 15(12):62–80
Several extensions are possible, including its applica- Sturm J, Plagemann C, Burgard W (2009) Body schema learning for
tions on more complex robots with several kinematic robotic manipulators from visual self-perception. J Physiol Paris
103(3):220–231
chains. Also, basic distance to target functions are used for
the observation models, but other functions could be used

123
3.3. Other results

3.3 Other results

3.3.1 Reaching several targets on an object

We can add as many targets we want as new constraints. This is interesting for grasping objects
when we have to define several targets on the object we want the robot to grab [92].

In this case, the observation model also computes the distances from the fingertips to the
targets and we removed the computation of the distance from the end-effector to the global
target.

Figure 3.1 – Left: initial pose of the arm and targets on the object. Right: Result of the algorithm with
the two targets.

For this illustration of the utility of our algorithm for manipulation, we didn’t use any pre-grasp
planner. We defined only two targets associated to the appropriate fingers. On the right, the
target is for two fingers of the hand.

We can observe that the targets are reached and the algorithm can handle several targets. In
association with a grasp planner, our method could be useful for manipulation of objects and
particularly dexterous manipulation where a high number of DoFs have to be controlled.

3.3.2 Dual arm control

This method is not computationnally intensive and we can control much more DoFs. The
illustration has been done on a dual arm robotics system composed of two Barrett arm and
two Wam hand for a total of 22 DoFs. On the whole, we consider the two arms as a single arm

39
Chapter 3. Hierarchical Bayesian filtering for reaching and grasping

for the computation of the evolution model. The constraints correspond to the previous cases
when the Bayes filter is applied in the joint space except we add constraints for another arm.
The rest of the algorithm is identical to the one-arm case. We can also define several targets to
reach.

Figure 3.2 – 10 trajectories with Bayesian filtering applied into the visual space first and then into the
motor space on a dual arm/hand robotic system.

On 10 trials, we obtained 100% of successes to reach the target (see Figure 3.2). As in the
case with one arm and one hand, the variance of the trajectory is pretty large and due to our
corse-grained definition of the evolution model.

3.4 Other perspectives

We presented in this report a bio-inspired Bayesian model for reaching and grasping. We
applied it on several examples to show its advantages as a good alternative for IK. Indeed, this
method doesn’t need to know the inverse model, it is parallelizable, and a high number of
DoFs can be controlled. Nevertheless, several improvements could be done.

40
3.4. Other perspectives

3.4.1 Learning

Our model of evolution is simple, it corresponds to random walk. An interesting perspective


for this work will be to learn this model of evolution P (S t | S t −1 ). Indeed, the evolution
model integrates the nature of the movement. Learning this model via human demonstration
could be an interesting line of research for the future [8] [9]. We could obtain a more natural
movement and this approach could avoid reaching too rapidly the limits of the articulations as
it can happen during our simulations. This could also enhance the results in reaching targets
in difficult cases as it would avoid awkward configurations of the articulations.

Our method needs however to know the forward model. This model could also be learned and
refined during each action. This kind of life-long learning could have the advantage that in
case of failure or mechanical breakdown of the robot, the forward model will be adjusted. This
could permit to deal with the wear of the robotics system and will result in a more adaptive
method.

3.4.2 Dexterous manipulation

Our method can control robots with several DoFs. A good application would be to test it on
a humanoid robot on a case of dexterous manipulation. Dexterous manipulation can allow
finer movements and manipulation of objects and may be particularly interesting for industry
or service robotics.

A dexterous hand can have more than 20 DoFs (24 with the wrist on the Shadow hand [189]).
Such a large number of DoFs is difficult to control and it will be interesting to test our method
on such a case. This will necessit to use a grasp planner or a pre-grasp [25] [103] in order to
determine all the targets on the object.

3.4.3 Robotic experimentation

We showed a proof-of-concept of predictive coding approach in simulated robotics. In order


to clearly validate this approach, we need to implement it on a real robot to show its usefulness
for concrete applications. This will be probably our next line of research.

41
Chapter 3. Hierarchical Bayesian filtering for reaching and grasping

3.4.4 Comparison with classical methods

Clearly, in order to show the real interest of this method, a comparison with a state-of-the
art method in inverse kinematics or global planification like a stack of task solver for IK for
reaching will be needed.

42
4 Recursive Bayesian estimation for
navigation

There are things known and there are things unknown,


and in between are the doors of perception.

Aldous Huxley

4.1 Introduction

To accomplish a specific task, a robot needs to know its own position with precision. In
the areas of agriculture, autonomous cars, service and emergency robotics, etc., this step is
key for successful accomplishment of the task. Robots navigating in new environments also
need to compute their positions in real-time in order to interact in continuously changing
environment.

In this chapter, we introduce a new method of visual Simultaneous Localization And Map-
ping (SLAM), grounded on a saliency-based detector and descriptor, and without specific
assumptions about the type of camera or robot motion. The environment is represented by
a topological map where each node corresponds to a frame observed during learning. We
do not add the frames directly into the memory but only the features, or in other words only
the descriptive vectors of each salient point (see Figure 4.1). A frame is known by the system
as features assemblies. We store features in a visual memory. In the memory, we have only
the features observed during learning and for each feature, we associate the set of frames
where we observed them. Therefore, this allows us to progressively refine our estimation of
the location (associated frames in memory) once we have observed a sequence of features. In

43
Chapter 4. Recursive Bayesian estimation for navigation

order to compare two features, we compute a similarity measure based on euclidian distance.
We finally use Bayesian filtering for integrating the feature observations overtime to com-
pute a frame histogram. The algorithm can easily switch between localization (no-prior) and
navigation (trajectory prediction) while using an adaptive number of features, allowing the
remaining processing time to be exploited for refining or exploration. Loop-closure detection,
which is the ability to recognize previously visited places, is finally validated using GPS data
with centimetric precision.

For each feature detected, similar features are searched in the visual memory. The visual
memory is a kd-tree [16] which contains all features detected during the learning phase. Then,
we update the frame histogram using the similarity between the observed feature and the
matched features. The algorithm iterates until it satisfies a criterion or there is no time left.

Figure 4.1 – Example of salient points detected by the algorithm on two frames.

We applied to navigation the similar scheme of Bayesian filtering used previously for manipu-
lation robotics. The hidden state here is the frame but the problem is the same and consists in
integrating information over time for localization of the moving agent via a novel attentional
feature detector. The key idea is to store into a visual memory directly the features and not
the frames (see Figure 4.2). The features can be associated to a set of frames. The approach is
close to [45] and [44]. The idea at the neuroscientific level is to develop a model close to the
hippocampus functioning in which place cells are sensitive to a set of features integrating over
time as Bayesian filtering.

44
4.1. Introduction

Figure 4.2 – Representation of the data architecture between the observed features F , the memorized
features f i i ∈1,K and the link between their associated data.

The general structure is therefore very similar to the one used in manipulation robotics (except
there is no second layer of filtering). We applied the same scheme with no assumptions on
the characteristics of the noise found in the system. This can be parallelized using neural
assembly code and an appropriate similarity measure, leading to a good reduction in the
number of features stored in visual memory without loosing much precision.

Finally, manipulation robotics and navigation with sensori-motor system exhibit the same
type of problem, the resolution of the inverse problem in perception (where I am on this map
?) and action (what is the optimal action knowing my constraints). Only the probabilistic
variables change but the general scheme of predictive coding is used for this two tasks in the
same way.

45
Chapter 4. Recursive Bayesian estimation for navigation

4.2 Article

46
Salient feature based localization
using recursive Bayesian estimation
Léo Pio-Lopez∗† , Merwan Birem ∗† , Jean-Charles Quinton∗† , François Berry∗† , Youcef Mezouar∗‡
∗ Institut
Pascal (UMR 6602 CNRS / UBP / IFMA)
‡ Institut
Français de Mécanique Avancée
† Université Blaise Pascal

Campus des Cézeaux, 24 Avenue des Landais, BP 80026, 63171 AUBIERE Cedex, France
j-charles.quinton@univ-bpclermont.fr

Abstract—Performing long-term navigation requires to build robot, even though also affected by environmental conditions
and maintain a precise enough map for the mobile robot to (lighting, weather).
efficiently plan and move in a potentially complex and open en- Maps can be metrical (places defined in a single reference
vironment. Relying solely on vision, we introduce a novel method
for performing visual Simultaneous Localization And Mapping frame), topological (places defined relatively to each others)
(SLAM), grounded on a saliency-based detector and descriptor, or hybrid. Without GPS information, metrical maps usually
and without specific assumptions about the type of camera or require 3D reconstruction methods. Whether depth is inferred
robot motion. A visual memory is built at the feature level which, or directly obtained from range imaging devices, these meth-
combined with the parsimony of saliency-based detector, makes ods rely on adequate camera models and efficient matching
it possible to generate a compact map from a continuous flow of
features. Frames are therefore represented as features assemblies, of images [1]. Since localization is done at the coordinates
features being themselves coarsely quantized to reduce memory level, these methods need additional mechanisms to be robust
requirements and voluntarily introduce overlaps and redundancy. to drift errors, and thus correctly match revisited places [2].
An extension of k-d trees is used to efficiently store features Loop-closure detection, which is the recognition of previously
and find nearest neighbors when matching them. Recursive explored places, is indeed necessary to exploit any map. It thus
Bayesian estimation is finally applied at the frame level to
integrate feature information over time into a frame histogram. also applies to topological maps on which this paper is built,
The algorithm can easily switch between localization (no-prior) which basically are graphs in which places are represented as
and navigation (trajectory prediction) while using an adaptive nodes and traversability between places as edges. However, we
number of features, allowing the remaining processing time to will not need to commit to any assumptions about the robot
be exploited for refining or exploration. The sparsity of the map trajectory or the type of camera.
as well as its ability to efficiently represent an environment is
demonstrated on a webcam camera input from an outdoor urban A. Loop-closure techniques
dataset. Loop-closure detection, which is the ability to recognize
previously visited places, is finally validated using GPS data with For long-term autonomous navigation, useful information
centimetric precision. must be extracted to both reduce the amount of memory and
computation times required for storing and matching camera
I. I NTRODUCTION images. Again, this can be done either at the image level (e.g.
histogram, GIST [3]) or with local features. Precise image
Navigating in large environments either requires to use level matching techniques have tremendously improved in
reactive methods (e.g. obstacle avoidance), rely on existing the recent years, and some of them can run in real time on
maps (that might be stored on distant databases), build a map nowadays low end computers [4]. However, their robustness
online, or a combination of the three. Reactive methods by to rotation and translation usually depends on the type of
themselves cannot deal with complex path finding problems, camera used, and they often requires omnidirectional images
and existing maps must first be generated and then kept to perform efficiently. Moreover, any global descriptor may
coherent with a continuously changing world. Since most of be sensitive to large distortions, occlusions and changes (for
the useful information may not be available when not taking instance induced by moving objects).
a first-person perspective, building maps while navigating in On the other end, local features will not be influenced
the real world seems like an essential ingredient. Simultaneous by changes occurring in other parts of the camera image,
localization and mapping (SLAM) of robots can be achieved and modern well-spread feature descriptors such as SIFT are
through a variety of methods, and by exploiting a diverse set quite robust to physical motion of the robot, thanks to their
of sensors. GPS sensors might seem like the way to go for scale and rotation invariance [5]. Nevertheless, their excessive
localization, but even with high end models, their reliability locality is a weakness for bad weather conditions (such as
drastically diminishes with occlusions between the robot and fog or rain), and these feature descriptors by themselves are
satellites (e.g. indoor). To demonstrate our approach, we will not particularly discriminative. Only their absolute location or
thus solely rely on vision, which is always available to the relative organization within the image makes it possible to
efficient match images [6]. Hundreds to thousands of features B. Probabilistic feature integration
per image are typically extracted then filtered to guarantee a In addition to the use of a novel attentional feature detector/
good match. Again for long-term navigation, storing a large descriptor (Section II-A), one of the main scientific novelty
number of features might not be cost-effective, and we thus of this paper lies in the visual memory being kept at the
chose in this paper to turn to features based on saliency maps, feature level (Section II-B). Instead of storing keyframes,
initially developed to model the human attentional system [7]. the memory directly stores features, that can be associated
The algorithm generates one feature at a time, in decreasing to an arbitrary set of frames. The first direct implication is
order of saliency, and the first ones (< 20) are generally quite that a flow of features can be fed to the system, adapting
discriminative and highly repeatable. For both the detection to temporal constraints (e.g. framerate, available processing
and description of features, we actually use a modified version time). From a neuro-inspired perspective, such an approach is
of the original algorithm that already produced significant nevertheless partially shared in [14], where hippocampus place
results in mobile robotics [8]. Compared to [8] however, we cells are sensitive to a constellation of features (combining
use a descriptor where normalization is made locally and landmarks and azimuth), that are grabbed and integrated over
makes it more robust to distant changes in the image. The time. A second implication is that though each feature might
descriptor is composed of up to 13 conspicuity values for not be very robust by itself and will be shared by many
colored images, each reflecting the uniqueness of features on frames, frames are reciprocally represented by specific sets of
multiple scales (using Gaussian filter pyramids) but for a single features. This can again be paralleled with neural assembly
channel (2+1 intensity contrasts, 4+1 orientations, 4+1 color codes found in [14], but also in data mining and learning
opponents1 ). contexts, since sparse distributed representations lead to high
More technically, features can then be quantized to reduce compression ratio for large databases [15]. Using population
both memory requirements and computational cost for nearest coding combined with an adapted similarity measure between
neighbors requests. This quantization is usually done through features, a crude quantization can then turn to a strength by
clustering techniques, for instance involving k-means and k- guaranteeing redundancy. The number of features stored in
d trees to build an optimal vocabulary [9]. Features can also visual memory can then be reasonably reduced without loosing
be grouped and searched efficiently in a hierarchical manner, much precision.
for instance detecting loop-closure roughly at the image level A frame histogram finally accumulates similarities between
first, before refining the matching process at the feature level observed and memorized features (Section II-C). The other
[10]. These techniques usually commit to a bag-of-words novelty of the paper is that localization and navigation are then
(BoW) approach [11], and often make use of an inverted directly performed in a probabilistic framework. The frame
index to speed up the process of going back and forth between histogram indeed approximates a probability distribution over
features and images. Features can also be extended and made frames, and recursive Bayesian estimation is applied for its
more discriminative by adding the pixel coordinates of the update.
feature (somehow violating the original assumptions behind
BoW), and geometrical constraints can be added during the II. M ETHOD
matching process. Although they won’t be used in this paper, We here commit to a Bayesian formulation of the SLAM
the algorithm presented could easily benefit from most of these problem very close to Cummins’ FABMAP [16], [17], but
techniques. with a focus on saliency and individual features. Instead
Localization in such a context is only a mean to precisely of considering each frame separately in the sequence, we
control a robot and navigate in the environment, using optimal may indeed consider processing features on the flow. This
dynamic graph-based planning globally [12] and/or visual ser- first allows limiting the number of features to process for
voing locally [13]. The matching process being usually done at each frame, since some features are quite stable over long
the image level, the designed architectures usually memorize sequences. We can also introduce active vision mechanisms
key frames and their associated features [13]. The selection of to pro-actively extract image regions and tune processing
the single best matching key image is then the ultimate goal filters depending on previously observed features (conditional
of loop-closure detection algorithms. To increase the precision processing). Combining these components, we aim at devel-
and recall statistics, and if still resisting the introduction of oping methods to dynamically focus on regions and select
constraints about the robot sensors and dynamics (e.g. motion features which allow the optimal learning and discrimination
amplitude), the only solution is to increase the number of of places at the individual feature level. The architecture
keyframes, or the number of features per keyframes. This for visual memory based SLAM presented in this paper
inevitably leads to a trade-off in memory and computational can be decomposed into the following steps, each roughly
performance on the long run. corresponding to a component of the system illustrated on
Fig.1. Frames from the camera are processed by the saliency
map algorithm which can then detect features in sequence until
1 The 4 + 1 indicates that 4 subcomponents (e.g. 0◦ , 45◦ , 90◦ and
a new frame arrives. For one detected feature, an attentional
135◦ orientations) are merged into a single conspicuity map (orientation),
which is then itself normalized relatively to the overall saliency from all descriptor is computed and fed into the visual memory. The
channels. visual memory searches for similar features that might have
Recursive Bayesian estimation
Camera image Frame histogram
Probability Prediction

Topological map
Robot / camera

p(F|D11...Dnk)

Localization
Fbest

Frames from
(Pn,Fn) Active matched Fk __ __ __ __ __ __
vision (F21,s21) (F24,s24) ... (F13,s13)
Feature descriptor
Saliency maps CO Visual Legend (tree)
...
x x k-d tree
memory Non selected node
CS y _ _
VM1
°
VM3 v4 _° Selected node
(xk,yk) Feature detector v1 Dnk _° VM5
°
VM6 __ _ y Node (splitting dim.)
... _ _ F31 x
_
°
VM4 VM2
°
...
__ + ...
__ Di Leaf (enriched feat.)
v10 F3m v10 °

IOR

Figure 1. Architecture of the system. Low level components extracting features from the camera image are displayed on the left, while more abstract and
cognitive components extend to the right. Even though there are dependencies between the components, most of them are intrinsically parallel and can be
pipelined. The components detailed in this paper are highlighted in dark (feature level visual memory and frame level histogram).

already been presented, on-the-fly enriching the memory if the processing of the original image. Although the information of
feature is novel. The frame histogram is then updated based the original image is initially projected into channels by using
on the similarity between the observed and matched features. different filters (Gabor, luminance, color opponents), the whole
If the histogram satisfies specific criteria, it can trigger a loop- process can then easily be described as a graph, where only 4
closure event and enrich an associated topological map of the basic operations are intertwined:
environment (abstraction), or bias the detection of subsequent • Down-scaling of a map by a factor 2 with application of
features (active perception). In this paper, the focus is on the a Gaussian kernel (to model pyramidal receptive fields,
visual memory and frame histogram (in dark bold on Fig.1), i.e. of varying size)
whereas other components will be briefly introduced to specify • Subtraction of maps at different scales (to reproduce the
their related characteristics, starting with the salient feature behavior of center-surround retinal cells)
detector/descriptor. • Summation of maps at a common scale (to synthesize
and make results coherent)
A. Saliency-based features • Normalization (to adapt to global characteristics of the
Each picture Pn provided by the camera is associated to image)
a frame number F n . In the following sections, we will let As in most of the recent implementations of the original
F n = n, as frames should arrive in sequence for autonomous algorithm (please refer to [7] for more details), intensity
robotic applications. The only component directly dependent information is processed over two separate channels in order
on this assumption is the frame histogram, and only when we to detect both dark/bright and bright/dark contrasts. The three
couple it to a model of the robot motion, for which temporal scales for each channel are merged into 2 intensity maps
continuity is required. (Iof , If o ), 4 color maps (Cgreen , Cblue , Cred , Cyellow ) and
Saliency maps from the VOCUS model described in [8] are 4 orientation maps (O0◦ , O45◦ , O90◦ , O135◦ ). The maps of
computed over Pn . Saliency maps are designed to detect in each type are then merged into 3 conspicuity maps (CI , CC ,
a purely bottom-up fashion conspicuous regions of an image. CO ), and finally combined into a single saliency map (S). A
Conspicuity can be defined as the uniqueness of some local feature which appears seldom in a scene on any or several
characteristics relatively to the surroundings, in our case color, of the channels is assigned a higher saliency than a frequent
intensity or orientation. This neuro-inspired mechanism is one.
massively parallel, performing multi-scale and multi-channel
1) Detector: The detection process is iterative, allowing might be found in different frames, specially when going
to pipeline operations and to potentially feed back into the back to previously visited places. This is a fact accentuated
detection process from higher cognitive levels of the archi- by the coarse quantization applied when searching for features.
tecture. The pixel with maximal saliency S(u, v) is selected.
After it has been described and sent to further components 2) Quantization: We will consider that any extracted fea-
of the system, the salient region SR corresponding to the L1 ture Dnk that lies in the area defined by [Di − R, Di + R]
ball of 5 pixels around (u, v) is erased from the saliency map, could potentially be associated to Di , with range R =
thus implementing an inhibition of return mechanism (IOR) (0.3, 0.3, 0.5, ..., 0.5). Let us consider all descriptor compo-
and allowing the attentional system to turn to the next most nents have been normalized in [0, 1], which is not always
salient feature, selecting κn features for frame Fn . verified for salient information due to the non-global normal-
2) Descriptor: For the k th observed feature detected in ization. This quantization, selected to test the robustness of
picture Pn , the feature descriptor Dnk regroups the normalized the method, might seem like a very weak matching constraint
pixel coordinates (x, y) ∈ [0, 1]2 computed from (u, v) and 10 since 0.5 maximally represents half the space for components
components representing how much each individual channel v1 to v10 when not near the space borders. Due to the curse of
Xi contributed to the associated conspicuity map CX in the dimensionality [18], each enriched feature will still cover less
saliency computation. For one picture and its associated frame, than 10−4 of the whole descriptor space. Put another way, the
it can exist k features. The descriptor and its components are maximal number of enriched features that can fit in the feature
defined as: space with this range is above 106 . In practice for long term
Dnk = (x, y, v1 , ..., v10 ) (1) navigation, this number would nevertheless seem quite low
Xi (SR) if each place was represented by a single feature. But since
vi = , ∀Xi ∈ {Iof , ..., O135 }, i ∈ J1, 10K (2) individual frames are encoded by assemblies of features (only
CXi (SR)
up to 30 in this paper as we don’t need much precision), we
where Xi (SR) corresponds to the mean value of the map Xi can still ideally represent more than 10181 different places with
in region SR. For the tracking and navigation applications this crude representation.
presented in [8], normalization is made relatively to the whole Since many features with quantitatively different
map. Here instead, the local normalization make this descriptor coordinates in the descriptor space might get associated
less sensitive to distant changes in the image, changes that to the same enriched feature, one might consider weighting
may frequently happen during localization or loop-closure their coordinates in order to converge to their center of
detections (such as a mobile object appearing or disappearing mass. This would be especially easy since the number of
from the field of view). Three more components could be associated frames provides the required weight information.
added to the descriptor, using the CI , CC and CO maps Yet, because of the additional complexity introduced by such
normalized with S, but were not used in this architecture. online adaptation as well as the data structure introduced
below, Di is directly mapped onto the descriptor of the first
B. Visual memory
feature from the first frame associated to it (the one that
The features produced by the saliency-based detector triggered its creation). Later, only the frame number F n
and descriptor are then sent to the visual memory, whose associated to the observed feature will be added to the list of
purpose is to sparsely represent the explored environment frames Fi of the nearest enriched feature found VMi . This is
and to efficiently search for previously observed features. again to make the memory more compact and rather rely on
For localization to be possible, features must be associated assemblies to provide precision.
to places, or at least frames in the limited context of this
paper (not presenting the topological map aspects of the work).
3) k-d tree: Following the above descriptions, efficiently
1) Enriched features: The previously introduced observed searching and inserting new elements in a set of up to 106
features Dnk are thus extended to be integrated in the memory. enriched features while comparing vectors in R12 calls for
We define the enriched feature of the visual memory VMi as dedicated data structures. A specific class of binary space
follows: partitioning trees named k-d tree has been chosen, as it allows
VMi = (Di , F n ) (3) efficient dimension parallel range search [19]. For each node
in the tree, the space covered by the child nodes is split in
where F n = {Fij }j∈J1,φi K is the set of φi frames that two along a single dimension. It must be noted that the k-d
have been associated to the feature described by Di . If an tree is here used to store features, not to cluster and quantize
observed feature is added to the memory, this means that features into a vocabulary tree, as classically done in this
it has not yet been matched with any preexisting enriched domain [9]. The fixed search range and non adaptive enriched
feature VMi . A new enriched feature is thus created with feature descriptors imposed earlier also contribute to making
Di = Dnk . Dnk becomes Di when integrated into the memory. space partitioning structures practical. Indeed, once features
Having the associated frames {Fij } reduced to a singleton have been added, little or no movement will occur between
should nevertheless be the exception. Indeed, a single feature nodes (only to balance the tree if required).
The worst case complexity
 for 
range search in standard k-d observed until now. This conditional probability on an history
trees is known to be O d.N
1
1− d
, where d is the dimension of frames is to make sure navigation can also be performed, as
of the space to partition [20]. With d = 12, we reach an it introduces a dependency between Fn and Fn−1 , ∀n. Since
approximate complexity of O(d.N ) that a simple array would features and at least frames are to be processed one at a time,
outperform. Several extensions have been developed over the while giving a continuous estimate of the robot current posi-
years to not run into this limitation. Several of them are tion, batch estimations techniques do not apply [22]. Similarly,
implemented here with a small trade-off in memory, including: we want to detect loop-closure at the frame level, so that there
storing bounds at each node for fast intersection checking (also is no strong a priori on the type of probability distributions to
called min/max k-d trees), or grouping features into leaves to use, except that we want it to be multimodal, thus excluding
limit the depth of the tree (both already implemented in [21] the simpler Kalman filtering techniques. However, we can still
for the same purpose). Using saliency based features is also derive using recursive Bayesian estimation (i.e. Bayes filter):
     
not neutral, since the saliency map computations guarantee that 1:n n 1:n−1
P Fn | VM ∝ P VM | Fn P Fn | VM
descriptors will be well spread in the space. Indeed, the center-
(5)
surround and normalization steps are designed to maximize
using the fact that the Di are all independent. For the same
conspicuity on a per channel basis.
reason, the first term can be derived as:
C. Frame histogram  n
 Y  n

P VM | Fn = P VMk | Fn (6)
For any observed input feature Dnk , the visual memory
k∈J1,κn K
returned a set VMi = {VMi }i∈J1,δkn K of matching enriched
features. We can now build and update a frame histogram in which, applying Bayes’ law, the feature specific conditional
based on the votes of each enriched feature for its associated probability is derived in:
     
frames {Fij }j∈J1,φi K . n n n
P VMk | Fn ∝ P Fn | VMk P VMk (7)
1) Similarity measure: The votes are computed as the where we find the votes described in the previous paragraph, as
similarity between the observed feature descriptor and the well as the a priori distribution based on the similarity between
descriptors of matching memorized features, and defined as memorized features and observed features. The second term in
follows: (5) can also be derived using Chapman-Kolmogorov equation:
   
kDi − Dnk k2 1:n−1 X   1:n−1

s (Di , Dnk ) = exp − (4) P Fn | VM = P Fn | Fn−1 P Fn−1 | VM
σ̃ 2 kRk2
Fn−1
where R (range of the search) is used as a normalization (8)
factor relatively to the potential spread of search results. in which we find a model of the system evolution, that can be
The constant σ̃ guarantees the similarity will be negligible expressed using a stochastic matrix and considering the {F i }
for features outside the search range, by contributing to the as states in a Markov model. Knowing the speed of the vehicle
standard deviation of the similarity function Gaussian profile. relative to the camera framerate (real or reduced to build
Also thanks to the insertion method in the tree as well as the memory), the histogram can be shifted  to reproduce the

n−1 1:n−1
the search constraints, features that are indeed further away expected sequence of frames. Finally, P F | VM
from the observed features should not obtain votes for frames is the recursion term, estimated for the frame F n−1 that
where the features were never actually matched. A single vote was previously processed. All constants not displayed in the
per frame is counted for a single observed feature, and votes derivations can be evaluated using the k-d tree content, and an
are normalized to reflect the discriminative power of features. inverse index to speed up some of the computations.
Votes associated to each enriched feature therefore form a For initialization or forced localization, the prior probability
conditional probability distribution over the frame space,noted over frames can simply be reset to a uniform distribution.
P (F|Di ). The set {Di } itself can be interpreted as a sparse Since a frame cannot be directly represented in the histogram
approximation of the probability distribution of observing before being observed, the probability F n must be well
feature Dnk in the feature state based on the similarities differentiated from Fn , which reflects all potential loop-closure
(population coding), leading to a Bayesian formulation of the detections from already memorized frames {F i }i∈J1,n−1K .
problem. For localization, the maximum likelihood can be adopted
n to take a decision, but it only works well when the Bayes
2) Recursive Bayesian estimation: Let VM be the set of filter integrates the model of evolution between successive
all matched features for all extracted features from frame F n . frames (by smoothing and cleaning the histogram). We thus
The whole point of the architecture is to enable localization chose a weighted sum of frames, with a window ws of 4
from a flow of features. For any camera image, we thus want frames around the maximum (hand-tuned on our datasets). The
1:n
to estimate P Fn |VM , i.e. the probability of actually resulting decimal value for the frame number estimate allows
looking at a given frame, or at another frame corresponding to to go above the resolution of the k-d tree (again exploiting
a similar place, while integrating information from all features population coding and interpolation).
When K increases in J2, 6K, the error also increases ex-
ponentially. The error displayed in the table corresponds
to the worst case for K=6. Since the trajectory goes
many times through the same places but with different
trajectories, this scenario assesses that the frames won’t
be mismatched (at least in navigation mode).
• Generalization: Learning occurs only on frames
1:12:1440 (first lap around the urban platform).
Localization is then tested on all segments where
(a) Discriminant (H = 7.50) (b) Uniformity (H = 9.11)
location and orientation are roughly shared with the
learning frame set (even if on different lane/side of the
road), for instance the beginning of the second lap in
1441:12:1640. Figures in parenthesis correspond to an
increase smoothing (ws = 8), reflecting the tendency
that the decision should be more distributed if the level
of noise increases for each measurement (see Fig.2 for
an illustration).
Results about the accuracy of the localization and the
(c) Loop-closure (H = 8.47) (d) Ambiguity (H = 9.35) precision of loop-closures are reproduced in Table I. The
Figure 2. Typical frame histograms (all expected). Good matching in (a) error is given as the distance in meters between the location
and (c) corresponding to a low entropy (given in parentheses), with multiple of the robot for the frame provided and the interpolated
loop-closure detection in (c) when using the full histogram (10998 frames).
There are no clear peaks in (b) because the vehicle is static, and the vehicle
location of the robot for the estimated frames. The precision
is in a totally unknown place in (d). (%) is computed as the percentage of loop-closures detected
that are correct. Although the error for pure localization is
quite high when going back to earlier areas while taking a
slightly different path, the impact of the navigation prior is to
III. R ESULTS
be noticed and drastically reduces the error. The error must
The method was tested using several datasets supported by again be compared with the maximum spatial resolution of
the Pascal Institute laboratory, described and freely available the camera input provided, with an average distance between
online (http://ipds.univ-bpclermont.fr/). The datasets were ac- successive images of 1.4m (see Fig.3).
quired using an autonomous urban transport platform named
VIPALAB. The results presented in this paper correspond only Scenario
Recognition Interpolation Generalization
1:12:10998 6:12:10998 1400:12:10998
to the relatively small PAVIN-Elo dataset (1284m, 12min, Measure Error (m) % Error (m) % Error (m) %
1.75m/s), using color images from a low cost Logitech Quick- Localization 0.45 (0.06) 100 3.50 77 3.19 (3.25) 33 (40)
Cam Pro 9000 webcam embedded on the roof of the vehicle Navigation 0.16 (0.00) 100 0.39 99 1.65 (1.09) 71 (94)

and looking forward (640x480 pixels, 15 fps, see Fig.1). The


Table I
synchronous data from an RTK-GPS system was used as E RROR AND PRECISION FOR DIFFERENT SCENARIOS / CONFIGURATIONS .
ground truth for absolute centrimetric localization.

A. Accurracy
Only 1/12 frames were used in order to make the video
B. Computational performance
input sparser (10988/12 = 917 frames, with saliency map of
160x120 pixels). Although learning and localization can be In terms of memory and in order to represent the 1.2km
performed simultaneously, we focus here on specific scenarios trajectory, the k-d tree creates 841 nodes/leaves, and has a
where the k-d tree is authorized to grow for a fixed sequence depth equal to 14. The leaves also contain 2693 enriched
of images, and then frozen to test its performance on another features, which are associated to the 27510 features ob-
set of images: served during the learning phase on the sequence 1:12:10998
• Recognition: Learning is done on frames 1:12:10998, and (917 × 30). The structure reaches a data compression ratio
localization on the exact same frames. This simply tests of 7 (up to 10 without the k-d tree optimizations), the vast
the ability of the visual memory to keep frames separated majority of the original features descriptors (12 real values)
enough with features assemblies. The two error values being replaced by a single frame number (integer). The full
visible on Table I correspond to the presence (45cm) or process from range search algorithm to histogram update runs
absence (6cm) of interpolation for taking a decision. in 17ms/feature with a far from optimized Matlab2014a object
• Interpolation: Learning is also performed on frames oriented implementation, running on a laptop equipped with a
1:12:10998, but localization is tested on K:12:10998. Quad Core i7 Q740 CPU.
to the command of the robot, turning a perceptual system into
a sensorimotor architecture.
A good convergence estimate is still required to get robust
loop-closure detections. For now, only the precision statistics
were computed, but the recall is of equal importance. We
explored different criteria, including an activity threshold com-
bined with a peak detection mechanism (e.g. max(F) > 2F),
as well as an entropy threshold (as illustrated on Fig.2). We did
find a significant correlation between these histogram statistics
and the GPS error committed. Yet, the decision-making part
could be taken care of by dynamic neural fields (DNF),
or other classes of non-linear dynamical systems exhibiting
clear bifurcations. The histogram activity could indeed be
directly mapped into a population of laterally interacting
(a) Superposition of the frames corresponding the maximum GPS error neuronal units, and even integrate the predictive aspect of
the architecture [23], partially bridging the gap with [14].
Other architectures coupling anticipation with graph based
representations could be used and add value to the system,
by also bringing an active vision component [24].
Active vision through covert attention can easily be im-
plemented with saliency maps, as described in [8]. Weights
between the different channels and conspicuity maps can
be biased, but processing can also be narrowed down to a
specific area of the field of view. Using statistics on the frame
histogram and on the associated features, it is possible to
select either the most discriminative features to confirm or
infirm a decision, or simply to actively explore the field of
view and progressively enrich the memory with non redundant
information.
Finally and as already underlined in the paper, most of the
(b) Fully reconstructed trajectory from the GPS data (GPS error in color) components within this architecture are also intrinsically and
Figure 3. Trajectory and error visualizations for the interpolation scenario massively parallel: saliency maps are based on convolutions,
with navigation active k-d tree can be physically split in memory with request an-
swered in parallel on different nodes, histogram manipulation
gets translated into SIMD operations. An open perspective is
therefore the parallel hardware implementation and extension
IV. C ONCLUSION AND PERSPECTIVES
of this architecture, benefiting from recent developments in
dynamic reconfiguration capabilities.
This paper introduces a combination of saliency-based de-
scriptors with a coarse quantization to produce a compact ACKNOWLEDGMENT
visual memory for large-scale and possibly long-term navi-
gation applications on autonomous mobile robots. The lack of The research leading to these results has received funding
precision and discriminative power of the individual features is from the French program "investissement d’avenir" managed
then compensated by frames / places being encoded by feature by the National Research Agency (ANR), from the European
assemblies, and a Bayesian integration of information over Union (Auvergne European Regional Development Funds) and
time and features. from the "Région Auvergne" in the framework of the IMobS3
LabEx (ANR-10-LABX-16-01).
Nevertheless, the integration of topological maps to abstract
from the frames was not described in this paper. The adjacency
R EFERENCES
matrix of a graph representation of the map could be directly
translated into a more coherent transition matrix for the proba- [1] E. Royer, M. Lhuillier, M. Dhome, and J.-M. Lavest, “Monocular vision
bilistic filtering process. This matrix would better account for for mobile robot localization and autonomous navigation,” International
Journal of Computer Vision, vol. 74, no. 3, pp. 237–260, 2007.
loop-closures, and activity would be propagated through all [2] G. Sibley, C. Mei, I. Reid, and P. Newman, “Vast-scale outdoor naviga-
possible paths. Frames corresponding to the same place could tion using adaptive relative bundle adjustment,” The International Journal
be bound together, while keeping the flow of information at of Robotics Research, vol. 29, no. 8, pp. 958–980, 2010.
[3] A. Oliva and A. Torralba, “Building the gist of a scene: The role of
the feature level. Planning performed at the map level can global image features in recognition,” Progress in brain research, vol.
directly influence the filtering process, while also contributing 155, pp. 23–36, 2006.
[4] O. Pele and M. Werman, “Fast and robust earth mover’s distances,” in
Computer vision, 2009 IEEE 12th international conference on. IEEE,
2009, pp. 460–467.
[5] H. Zhang, B. Li, and D. Yang, “Keyframe detection for appearance-
based visual slam,” in Intelligent Robots and Systems (IROS), 2010
IEEE/RSJ International Conference on. IEEE, 2010, pp. 2071–2076.
[6] K. L. Ho and P. Newman, “Detecting loop closure with scene se-
quences,” International Journal of Computer Vision, vol. 74, no. 3, pp.
261–286, 2007.
[7] L. Itti, C. Koch, E. Niebur et al., “A model of saliency-based visual
attention for rapid scene analysis,” IEEE Transactions on pattern analysis
and machine intelligence, vol. 20, no. 11, pp. 1254–1259, 1998.
[8] S. Frintrop, VOCUS: A visual attention system for object detection and
goal-directed search. Springer, 2006, vol. 2.
[9] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, “Object
retrieval with large vocabularies and fast spatial matching,” in Computer
Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference on.
IEEE, 2007, pp. 1–8.
[10] H. Korrapati, J. Courbon, Y. Mezouar, and P. Martinet, “Image sequence
partitioning for outdoor mapping,” in Robotics and Automation (ICRA),
2012 IEEE International Conference on. IEEE, 2012, pp. 1650–1655.
[11] A. Angeli, D. Filliat, S. Doncieux, and J.-A. Meyer, “Fast and incre-
mental method for loop-closure detection using bags of visual words,”
Robotics, IEEE Transactions on, vol. 24, no. 5, pp. 1027–1037, 2008.
[12] J.-C. Quinton and J.-C. Buisson, “Multilevel anticipative interactions for
goal oriented behaviors,” Proceedings of EpiRob, pp. 103–110, 2008.
[13] J. Courbon, Y. Mezouar, and P. Martinet, “Autonomous navigation
of vehicles from a visual memory using a generic camera model,”
Intelligent Transportation Systems, IEEE Transactions on, vol. 10, no. 3,
pp. 392–402, 2009.
[14] N. Cuperlier, M. Quoy, and P. Gaussier, “Neurobiologically inspired
mobile robot navigation and planning,” Frontiers in neurorobotics, vol. 1,
p. 3, 2007.
[15] Numenta, Hierachical Temporal Memory including HTM Cortical
Learning Algorithms (white paper), 2011.
[16] M. Cummins, “Probabilistic localization and mapping in appearance
space,” Ph.D. dissertation, University of Oxford, 2009.
[17] M. J. Cummins and P. M. Newman, “Fab-map: Appearance-based place
recognition and mapping using a learned visual vocabulary model,” in
Proceedings of the 27th International Conference on Machine Learning
(ICML-10), pp. 3–10.
[18] R. Bellman, Adaptive control processes: a guided tour. Princeton
university press Princeton, 1961, vol. 4.
[19] M. de Berg, M. van Kreveld, M. Overmars, and O. Schwarzkopf,
Computational Geometry: Algorithms and Applications. Springer-
Verlag, 2000 (2nd edition), pp. 95–110.
[20] D.-T. Lee and C. Wong, “Worst-case analysis for region and partial
region searches in multidimensional binary search trees and balanced
quad trees,” Acta Informatica, vol. 9, no. 1, pp. 23–29, 1977.
[21] J.-C. Quinton and T. Inamura, “Human-robot interaction based learning
for task-independent dynamics prediction,” Proceedings of EpiRob, pp.
133–140, 2007.
[22] S. Särkkä, Bayesian filtering and smoothing. Cambridge University
Press, 2013, vol. 3, pp. 30–50.
[23] J.-C. Quinton and B. Girau, “Predictive neural fields for improved
tracking and attentional properties,” in Proceedings of IEEE International
Joint Conference on Neural Networks (IJCNN 2011) (San José, USA),
2011, pp. 1629–1636.
[24] J.-C. Quinton, N. Catenacci, L. Barca, and G. Pezzulo, “The cat is on the
mat. or is it a dog? dynamic competition in perceptual decision making,”
IEEE Transactions on Systems, Man and Cybernetics: Systems, 2013.
4.3. Other perspectives

4.3 Other perspectives

4.3.1 Addition of a top-down a priori

Three types of visual saliency can be included for visual saliency:

• Weighting of characteristics During the fusion of maps, we could weight particular


maps. We could advantage color over orientation manually or define maps via learning
[66].

• Spatial modulation This approach consists in advantaging certain parts of the images.
For example, we could define that points at the center of the image are more important
than points at the extremities or borders. For example because of the known center bias
for pictures taken from humans [24] .

• Scale modulation Most of the time, algorithms of saliency vision are multi-scale. It will
be possible to weight the different scales in order to advantage particular scales over
others.

In our case, spatial modulation is particularly appropriate. Indeed, we aim to find the most
salient point of the images for localization, independently of the characteristics of the point.
We can also directly apply spatial modulation on the final saliency map. In an ongoing work,
we found that white points at the center of the image like the white lines of the road are always
detected as salient points (see Figure 4.3). And when the histogram is computed, this point
increase the probability of all the images and is therefore not useful for localization.

Figure 4.3 – Non-discriminative salient points.

55
Chapter 4. Recursive Bayesian estimation for navigation

The idea is therefore to find the points in zones that contains discriminant points. These zones
are computed in function of points already found. When we will detect a known point, we will
find all the associated frames and find the zones to look at in order to have more power of
dicrimination for localization.

The difference with the article is that the computation of the points is iterative, updating for
each iteration a mask that will serve for spatial modulation. Each new detected points will
depend on all the precedent detected points. This approach permits in preliminary work to
increase the precision of the localization.

4.3.2 Distinguishability and selection of models

In addition of the mask, another promising approach consists in applying Bayesian distin-
guishability of models. If we have several models of salience, with this approach developed in
the field of optimal experimental design [50], it is possible to find the most discriminant point
in order to choose the appropriate model of salience.

In the case of changing environments, the robot could use the optimal model for navigation
when the weather is changing for example. This will permit to include into the system the
notion of contextuality and adaptability of the navigating agent (like an autonomous car) to
changing environments. This could be particularly suited in case of navigation in degraded
environments or rapidly changing weather (snow, fog, then bright sun, etc...) where an
appropriate saliency algorithm is rapidly needed in order to avoid accidents.

4.3.3 Use of the dynamic model of the robot

The robot has a continuous movement. The dynamical model of the robot is known and we
have a topological map, therefore we can predict the next localization of the robot using both
its estimated position and movement. Using the frame histogram, at time t, the algorithm
found that the robot is at frame Ft , with the topological map and knowing the movement of
the robot we can predict which frame will be the next observed image.

If at the next frame our prediction is correct, the algorithm will converge faster. And if the
prediction is false, the next features observed will favor another frame than the prediction. So
this model allows to reduce the number of localization errors without propagating an error

56
4.3. Other perspectives

between the frames. Preliminary work showed that this approach increases precision.

57
5 Active inference and robotics

I succumbed to the lure of the oracle, he thought.


And he sensed that succumbing to this lure
might be to fix himself upon a single-track life.
Could it be, he wondered, that the oracle didn’t tell the future?
Could it be that the oracle made the future ?

Frank Herbert, Dune Messiah

5.1 Introduction

In this chapter, we extend the previous approach in Bayesian filtering to generalized filtering
using the framework of the active inference [71, 77]. The Bayesian approximation is varia-
tional and permits the introduction of the concept of the minimization of free-energy or the
minimization of (Bayesian) surprise on the long-term on the perception of the agent. This
approach is gaining prominence in neuroscience as it has been used to explain a lot of different
phenomena in perception, action and cognition.

The free-energy principle says that both action and perception minimize surprise in order to
maintain their biophysical states within limits and resist the second law of thermodynamics,
therefore maintaining their homeostasis [72].

Three assumptions are at the basis of active inference:

• The free-energy of sensory inputs described by a generative model is minimized by the

59
Chapter 5. Active inference and robotics

brain.

• This generative model is non-linear, dynamic and hierarchical.

• Neuronal firing rates encode the expected state of the world, under this model.

The (hidden) variables in this approach are continuous and therefore the whole method
is dynamical. Compared to previous presented works where the variables were discretes,
this approach presents the advantage to take into account the dynamical characteristic of
brain information processing. In brief, active inference can be regarded as standard Bayesian
filtering schemes extended with classical reflex arcs that enable action to fulfill predictions
about hidden states of the world. The free-energy corresponds to a variational free-energy,
it is an upper limit of the (negative-logarithm) Bayesian model evidence. Agents cannot
minimize surprise directly but they can minimize this upper bound on surprise, namely
the free-energy [72]. It is the process in which beliefs about (hidden or fictive) states of the
world maximize model evidence of observations. In other words, the agent acts in order to
reduce the difference between its expectancies and reality. The aim is to act in order to have
its predictions becoming reality. The agent has a model of the world, its prediction, and it
wants the world to fit to its prediction using actions. In Bayesian terms, the brain would
maximise model evidence of sensory inputs [46, 108]. The second assumption is the result of
the observation that the world is both non-linear and dynamic. For the hierarchical aspect,
this structure is embedded in the brain [60] [212] and has been studied in deep particularly
for the visual system. The third hypothesis is the Laplace hypothesis. It permits to associate a
single quantity to any unknown state, the conditional mean [71].

In this article, we apply this method to reaching with a PR2 simulated robot. It is to our
knowledge, the first application of this theory to (manipulation) robotics. It is a proof of
concept of the usefulness of active inference to robotics in the aim to develop future robots
minimizing free-energy and given the high number of DoFs, the results have been compared to
human datas. Our results show the multimodal characteristic of control. Based both on vision
and proprioception and if one modality is impaired, it can be compensated (not entirely) by
the other.

60
5.2. Article

5.2 Article

61
Downloaded from http://rsif.royalsocietypublishing.org/ on September 28, 2016

Active inference and robot control:


rsif.royalsocietypublishing.org
a case study
Léo Pio-Lopez1,2, Ange Nizard1, Karl Friston3 and Giovanni Pezzulo2
1
Pascal Institute, Clermont University, Clermont-Ferrand, France
2
Institute of Cognitive Sciences and Technologies, National Research Council, Rome, Italy
Research 3
The Wellcome Trust Centre for Neuroimaging, Institute of Neurology, University College London, London, UK
LP-L, 0000-0001-8081-1070
Cite this article: Pio-Lopez L, Nizard A,
Friston K, Pezzulo G. 2016 Active inference and Active inference is a general framework for perception and action that is
robot control: a case study. J. R. Soc. Interface gaining prominence in computational and systems neuroscience but is less
13: 20160616. known outside these fields. Here, we discuss a proof-of-principle implemen-
http://dx.doi.org/10.1098/rsif.2016.0616 tation of the active inference scheme for the control or the 7-DoF arm of a
(simulated) PR2 robot. By manipulating visual and proprioceptive noise
levels, we show under which conditions robot control under the active infer-
ence scheme is accurate. Besides accurate control, our analysis of the internal
Received: 3 August 2016 system dynamics (e.g. the dynamics of the hidden states that are inferred
during the inference) sheds light on key aspects of the framework such as
Accepted: 1 September 2016
the quintessentially multimodal nature of control and the differential roles
of proprioception and vision. In the discussion, we consider the potential
importance of being able to implement active inference in robots. In particu-
lar, we briefly review the opportunities for modelling psychophysiological
Subject Category: phenomena such as sensory attenuation and related failures of gain control,
Life Sciences – Engineering interface of the sort seen in Parkinson’s disease. We also consider the fundamental
difference between active inference and optimal control formulations, show-
ing that in the former the heavy lifting shifts from solving a dynamical
Subject Areas:
inverse problem to creating deep forward or generative models with
systems biology, biomathematics, dynamics, whose attracting sets prescribe desired behaviours.
computational biology

Keywords:
active inference, free energy, robot control
1. Introduction
Active inference has recently acquired significant prominence in computational
and systems neuroscience as a general theory of brain and behaviour [1,2]. This
Author for correspondence: framework uses one single principle—surprise (or free energy) minimization—
Giovanni Pezzulo to explain perception and action. It has been applied to a variety of domains,
e-mail: giovanni.pezzulo@istc.cnr.it which includes perception– action loops and perceptual learning [3,4]; Bayes
optimal sensorimotor integration and predictive control [5]; action selection
[6,7] and goal-directed behaviour [8 –12].
Active inference starts from the fundaments of self-organization which
suggests that any adaptive agent needs to maintain its biophysical states
within limits, therefore maintaining a generalized homeostasis that enables it
to resist the second law of thermodynamics [2]. To this aim, both an agent’s
actions and perceptions both need to minimize surprise, that is, a measure of
discrepancy between the agent’s current predictive or desired states. Crucially,
agents cannot minimize surprise directly but they can minimize an upper
bound of surprise, namely the free energy of their beliefs about the causes of
sensory input [1,2].
This idea is cast in terms of Bayesian inference: the agent is endowed with
priors that describe its desired states and a (hierarchical, generative) model of
the world. It uses the model to generate continuous predictions that it tries to
fulfil via action; that is to say, the agent activity samples the world to minimize
prediction errors so that surprise (or its upper bound, free-energy) is sup-
pressed. More formally, this is a process in which beliefs about (hidden or

& 2016 The Authors. Published by the Royal Society under the terms of the Creative Commons Attribution
License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original
author and source are credited.
Downloaded from http://rsif.royalsocietypublishing.org/ on September 28, 2016

latent) states of the world maximize Bayesian model evidence and x [ X for a particular value. The tilde notation 2
of observations, while observations are sampled selectively to ~x ¼ ðx, x0 , x0 0 , . . .Þ corresponds to variables in generalized coordi-
conform to the model [13,14]. The agent has essentially two nates of motion [24]. Each prime is a temporal derivative. p(X )

rsif.royalsocietypublishing.org
ways to reduce surprise: change its beliefs or hypotheses denotes a probability density.
( perception), or change the world (action). For example, if it
— V is the sample space from which random fluctuations v [ V
believes that its arm is raised, but observes it is not, then it are drawn.
can either change its mind or to raise the arm—either way, — Hidden states C : C # A # V ! R. They depend on actions
its prediction comes true (and free energy is minimized). and are part of the dynamics of the world that causes sensory
As we see, in active inference, this result can be obtained states.
by endowing a Bayesian filtering scheme with reflex arcs — Sensory states S : C # A # V ! R. They are the agent’s sen-
that enable action, such as raising a robotic arm or using it sations and constitute a probabilistic mapping from action
to touch a target. In this example, the agent generates a ( pro- and hidden states.

J. R. Soc. Interface 13: 20160616


prioceptive) prediction corresponding to the sensation of — Action A : S # R ! R. They are the agent’s actions and depend
raising alarm, and reflex arcs fulfil this prediction effectively on its sensory and internal states.
— Internal states R : R # S # V ! R. They depend on sensory
raising the hand (and minimizing surprise).
states and cause actions. They constitute the dynamics of
Active inference has clear relevance for robotic motor con-
states of the agent.
trol. As in optimal motor control [15,16], it relies on optimality — A recognition density qðC ~ j~
mÞ, which corresponds to the
principles and (Bayesian) state estimation; however, it has agent’s beliefs about the causes C (and brain state m describ-
some unique features such as the fact that it dispenses with ing those beliefs).
inverse models (see the Discussion). Similar to planning-as- — A generative density pðC ~ , ~sjmÞ corresponding to the density
inference and KL control [17–22], it uses Bayesian inference, of probabilities of the sensory states s and world states C,
but it is based on the minimization of a free-energy functional knowing the predictive model m of the agent.
that generalizes conventional cost or utility functions.
According to Ashby [25], in order to restrict themselves in a
Although the computations underlying Bayesian inference or
limited number of states, an agent must minimize the dispersion
free-energy minimization are generally hard, they become
of its sensory and hidden states. The Shannon entropy corre-
tractable as active inference uses variational inference, usually
sponds to the dispersion of the external states (here S # C).
under the Laplace assumption, which enables one to summar- Under ergodic assumption, this entropy equals the long-term
ize beliefs about hidden states with a single quantity (the average of Gibbs energy
conditional mean). The resulting (neural) code corresponds
9
to the Laplace code, which is simple and efficient [3]. ~ , ~sjmÞlt
HðS, CÞ ¼ kGðC =
Despite its success in computational and systems neuro- ð2:1Þ
~ , ~sjmÞ: ;
G ¼ % ln pðC
science, active inference is less known in related domains
such as motor control and robotics. For example, it remains
unproven that the framework can be adopted in challenging One can see that the Gibbs energy is defined in terms of the gen-
robotic set-ups. In this article, we ask if active inference can erative model. k:l is the expectation or the mean under a density
be effectively used to control the 7-DoF arm of a PR2 robot when indicated. However, agents cannot minimize this energy
(simulated using Robot Operating System (ROS)). We present directly, because hidden states are unknown by definition.
a series of robot reaching simulations under various conditions However, mathematically
(with or without noise on vision and/or proprioception), in
order to test the feasibility of this computational scheme in HðS,CÞ ¼ HðSÞ þ HðCjSÞ
ð2:2Þ
robotics. Furthermore, by analysing the internal system ¼ k% ln pð~sðtÞjmÞ þ HðCjS ¼ ~sðtÞÞlt :
dynamics (e.g. the dynamics of the hidden states that are inferred
during the inference), our study sheds light on key aspects of the With this latter equation, we observe that sensory surprise
framework such as the quintessentially multimodal nature of % ln pð~sðtÞjmÞ minimizes the entropy of the external states and
control and the relative roles of proprioception and vision. can be minimized through action if action minimizes conditional
Finally, besides providing a proof of principle for the usage of entropy. In this sense
active inference in robotics, our simulations help to illustrate 9
the differences between this scheme and alternative approaches aðtÞ' ¼ arg minð% ln pð~sðtÞjmÞÞ >
=
a
in computational neuroscience and robotics, such as optimal ~ ðtÞ' ¼ arg minðHðCjS ¼ ~sðtÞÞÞ: >
m
ð2:3Þ
;
control, and the significance of these differences from both a m
~
technological and biological perspective.
Unfortunately, we cannot minimize sensory surprise directly
(see equation (2.3)) as this entails a marginalization over hidden
2. Methods states which is intractable

In this section, we first define, mathematically, the active infer- ð


ence framework (for the continuous case). We then describe its % ln pð~sjmÞ ¼ % ln pðC,~sjmÞ dC: ð2:4Þ
application to robotic control and reaching.

Happily, there is a solution to this problem that comes


2.1. Active inference formalism from theoretical physics [26] and machine learning [27]
The free-energy term that is optimized (minimized) during called variational free energy, which furnishes an upper bound
action control rests on the tuple ðV, C, S, A, R, q, pÞ [23]. on surprise. This is a functional of the conditional density,
A real-valued random variable is denoted by X : V # . . . ! R which minimized by action and internal states, to produce
Downloaded from http://rsif.royalsocietypublishing.org/ on September 28, 2016

action and perception A generalized gradient descent on variational free energy is 3


' 9 defined in the second pair of equations. This method is termed
aðtÞ ¼ arg min Fð~sðtÞ, m
~ ðtÞÞ >
>
> generalized filtering and rests on conditional expectations to pro-

rsif.royalsocietypublishing.org
a >
>
>
> duce a prediction (first) term and an update (second) term
~ ðtÞ' ¼ arg min Fð~sðtÞ, m
m ~Þ >
>
>
>
m
~ = based upon free-energy gradients that, as we see below, can be
~ , ~sjmÞlq þ H½qðC
~ j~
ð2:5Þ expressed in terms of prediction errors (this corresponds to the
Fðs, mÞ ¼ kGðC mÞ) >
>
>
> basic form of a Kalman filter). D is a differential matrix operator
>
>
¼ D½qðC~ j~ ~ j~smÞ) % ln pð~sðaÞjmÞ >
mÞjjpðC >
> that operates on generalized motion and D~ m describes the gener-
>
>
; alized motion of conditional expectations. Generalized motion
* % ln pð~sðaÞjmÞ:
comprises vectors of velocity, acceleration, jerk, etc.
The term D½:jj:) is the Kullback –Leibler divergence (or cross- The generative model has the following hierarchical form
entropy) between two densities. The minimizations on a and m ~ 9
s ¼ gð1Þ ðxð1Þ , nð1Þ Þ þ vð1Þn >
>
>

J. R. Soc. Interface 13: 20160616


correspond to action and perception, respectively, where the ð1Þ >
>
x_ ¼ f ð1Þ ðx, nð1Þ Þ þ vx >
>
internal states m~ parametrize the conditional density q. We >
>
.. >
=
need perception in order to use free energy to finesse the (intract- .
ð2:8Þ
able) evaluation of surprise. The Kullback –Leibler term is nði%1Þ
¼ g ðx , n Þ þ vn >
ðiÞ ðiÞ ðiÞ ðiÞ >
>
>
non-negative, and the free energy is therefore always greater ðiÞ
x_ ðiÞ ¼ f ðiÞ ðx, nðiÞ Þ þ vx >
>
>
>
>
than surprise as we can see it in the last inequality. When free .. >
;
energy is minimized, it approximates surprise and as a result, .
the conditional density q approximates the posterior density
over external states The level of the hierarchy in the generative model corre-
sponds to i. f (i) and g (i) and their Gaussian random fluctuations
~ j~
D½qðC mÞjjpðC~ j~smÞ) + 0 )
( vx and vv on the motion of hidden states and causes define a
~
qðCj~mÞ + pðC ~ j~smÞ ð2:6Þ probability density over sensations, causes of the world and
:
H½qðC~ j~
mÞ) + HðCjS ¼ ~sÞ hidden states that constitute the free energy of posterior or con-
ditional (Bayesian) beliefs about the causes of sensations. Note
This completes a description of approximate Bayesian infer- that the generative model becomes probabilistic because of
ence (active inference) within the variational framework. This the random fluctuations (where sensory or sensor noise corre-
free-energy formulation resolves several issues in perception sponds to fluctuations at the first level of the hierarchy and at
and action control problems, but in the following, we focus on fluctuations at higher levels induces uncertainty about hidden
action control. According to equation (2.5), free energy can be states). The inverse of the covariances matrices of these random
minimized using actions via its effect on hidden states and sen- fluctuations is called precision (i.e. inverse covariance) and is
sation. In this case, action changes the sensations to match the denoted by ðPðiÞ ðiÞ
x , Pn Þ.
agent’s expectations.
The only outstanding issue is the nature of the generative
model used to explain and sample sensations. In continuous
time formulations, the generative model is usually expressed in
terms of coupled (stochastic) differential equations. These
2.3. Prediction errors and predictive coding
We can now define prediction errors on the hidden causes and
equations describe the dynamics of the (hidden) states of the states. These auxiliary variables represent the difference between
world and the ensuing behaviour of an agent [5]. This leads us conditional expectations and their predicted values based on the
to a discussion of the agent’s generative model. level above. Using A , B :¼ AT , B:
9
_ ðiÞ ðiÞ @ ~gðiÞ ðiÞ ðiÞ >
>
m
~x ¼ D~ mx þ ðiÞ , Pn ~1n >
>
>
2.2. The generative model @m ~x >
>
>
>
>
Active inference generally assumes that the generative model ðiÞ >
>
@~f ðiÞ ðiÞ ðiÞ ðiÞ
>
>
>
supporting perception and action is nonlinear, dynamic and þ ðiÞ , Px ~1x % DPx ~1x >
>
>
@m~x >
>
deep (i.e. hierarchical), of the sort that might be entailed by >
>
ðiÞ >
>
cortical and subcortical hierarchies in the brain [28]. _ ðiÞ ðiÞ @ g
~ ðiÞ ðiÞ
=
9 m
~n ¼ D~ mn þ ðiÞ , Pn ~1n ð2:9Þ
s ¼ gðx, v, aÞ þ ws > @m ~ >
>
>
= >
>
x ¼ fðx, v, aÞ þ wx ðiÞ >
>
ð2:7Þ @~f >
ðiþ1Þ ðiþ1Þ >
_a ¼ %@ a Fð~s,~
mÞ >
> þ ðiÞ , PðiÞ x ~ 1ðiÞ
x % Pn ~1n > >
>
; @m~n >
>
m~_ ¼ %D~ m % @m ~ Fð~s, m
~ Þ: >
>
>
>
ðiÞ ðiÞ ~ðiÞ ðiÞ ðiÞ >
>
~1x ¼ D~mx % f ð~ mx , m ~n Þ >
>
In bold, we have real-world states and in italic, internal states >
>
>
;
of the agent. s is the sensory input, x corresponds to hidden ðiÞ
~1n ¼ m ði%1Þ
~n ðiÞ
% g~ ð~ ðiÞ
mx , m ðiÞ
~n Þ
states, v to hidden causes of the world and a to action. Intuitively,
hidden states and causes are used by the brain as abstract quan- ~1nðiÞ and ~1ðiÞ
x correspond to prediction errors on hidden causes and
tities in order to predict sensations. Dynamics over time is linked hidden states, respectively. The precisions PðiÞ ðiÞ
n and Px weights
by hidden states, whereas the hierarchical levels are linked by the prediction errors, so that more precise prediction errors
hidden causes. The ~ notation means that we are using general- have a greater influence during generalized filtering.
ized coordinates of motion, i.e. a vector of positions, velocities, The derivation of equation (2.8) enables us to express the
accelerations, etc. [5]. ~s, ~
n and a corresponds to sensory input, gradients equation (2.7) in terms of prediction errors. Effectively,
conditional expectations and action, respectively. precise prediction errors update the prediction to provide a
One can observe a coupling between these differential Bayes optimal estimate of hidden states as a continuous function
equations: sensory states depend upon action a(t) via causes of time—where free energy corresponds to the sum of the
(x, v) and the functions (f, g). While action depends upon sen- squared prediction error (weighted by precision) at each level
sory states via internal states ~ nðtÞ. These differential equations of the hierarchy. Heuristically, this corresponds to an instan-
are stochastic owing to random fluctuations (vx, vv). taneous gradient ascent in which prediction errors are
Downloaded from http://rsif.royalsocietypublishing.org/ on September 28, 2016

assimilated to provide for online inference. For a more detailed 4


explanation of the mathematics under this scheme, see [24].

rsif.royalsocietypublishing.org
2.4. Action
A motor trajectory (e.g. the trajectory of raising the arm) is pro-
duced via classical reflex arcs that suppress proprioceptive
prediction errors
ð1Þ ð1Þ
1ð1Þ
a_ ¼ %@ a F ¼ ð@ a ~ n Þ , Pn ~1n ð2:10Þ

Intuitively, conditional expectations in the generative model


drive (top-down) proprioceptive predictions (e.g. the propriocep-

J. R. Soc. Interface 13: 20160616


tive sensation of raising one’s own arm), and these predictions
are fulfilled by reflex arcs. This is because the only way for an
agent to minimize its free energy through action (and suppress
proprioceptive prediction errors) is to change proprioceptive sig-
nals, i.e. raise the arm and realize the predicted proprioceptive
sensations. According to this scheme, reflex arcs thus produce
a motor trajectory (raising the arm) to comply with set points
or trajectories prescribed by descending proprioceptive predic-
tions (cf. motor commands). At the neurobiological level, this
process is thought to occur at the level of cranial nerve nuclei
and spinal cord.

Figure 1. PR2 robot simulated using ROS [29].


2.5. Application to robotic arm control and reaching
Having described the general active inference formalism, we now
illustrate how it can be used to elicit reaching movements with a
robot: the 7-DoF arm of a PR2 robot simulated using the ROS The behaviour of the robot arm during its reaching task is
[29] (figure 1). Essentially, in our simulations, the robot has to specified in terms of the robot’s prior beliefs that constitute its
reach a target by moving (i.e. raising) its arm. We see that generative model. Here, these beliefs are based upon a basic
the key aspect of this behaviour rests on a multimodal integra- but efficient feedback control. In other words, by specifying a
tion of visual and proprioceptive signals [30,31], which play particular generative model, we create a robot that thinks it
differential—yet interconnected—roles. will behave in a particular way: in this instance, we think it
In this robotic setting, the hidden states are the angle of the behaves as an efficient feedback controller, as follows. Within
joints ðx1 , x2 , . . . , x7 Þ. The visual input is the position of the the joint configuration space (thanks to geometrical consider-
end effector, here the arm of the PR2 robot. This location ations), the prior control law provides a per-joint angular
ðn1 , n2 , n3 Þ can be seen as autonomous causal states. We increment to be applied according to the position of the end
assume that the robot knows the true mapping between effector, allowing its convergence towards the target position.
the position of its hand Pos and the angles of its joints. In In order to avoid the singular configurations of the PR2 arm,
other words, we assume that the robot knows its forward two actions a and b are superposed. The first one is a per-joint
model and can extract the true position of its end effector in action: each joint tries to align the portion of arm it supports
three-dimensional coordinates into the visual space. with the target position. The second action is distributed over
2 3 the shoulder, and the elbow providing the flexion – extension
x
gðx, nÞ ¼ gðx, nÞ ¼ 4 n 5 ð2:11Þ primitive in order to reach or escape the singular configurations
Pos of the first action (e.g. stretched arm).
Let T ¼ (t1, t2, t3) be the target position in the Euclidean
If we assume a Newtonian dynamics with viscosity k and space, Ji ¼ ðji1 , ji2 , ji3 Þ the position of the joint i in W, W ¼ R3 ,
elasticity k, then we obtain the subsequent equations of w ¼ T % J the vector describing the shortest path in W to reach
motion that describe the true ( physical) evolution of hidden the target, fi ¼ Ti % J=jjTi % Jjj the unit vector linking each
states joint to the arm’s distal extremity, Posi ¼ ðPosi1 , Posi2 , Posi3 Þ
2 3 the unit vector collinear to the rotation axis of the joint i. Let ‘,’
x0 1
6 7 be the dot product in W and ‘#’ the cross product. The feedback
6 x0 2 7
6 . 7 error to be regulated to zero by the first action of the control law
6 .. 7 for the joint i is
6 7
6 x 0 7
2 3 6 7 7
ei ¼ ðw # fi Þ , Posi :
x_ 1 6 ða % k x % k x0 Þ 7 ð2:13Þ
6 1 1 1 1 1 7
6 x_ 2 7 6 7
6 7 6 m1 7
6 .. 7 6 7 Classically, the first action is designed as a PI controller that
6 . 7 6 ða2 % k1 x2 % k1 x 2 Þ 7 0
6 7 6 7 ensures
6 x_ 7 7 6 m1 7
fðx̃, vÞ ¼ 6 7 6
6 x_ 0 1 7 ¼ 6 ða3 % k1 x3 % k1 x0 3 Þ 7
7 ð2:12Þ ð t¼t0
6 0 7 6 7 e_ i ¼ ai ¼ %pp ei þ pi ei ðtÞ dt, ð2:14Þ
6 x_ 2 7 6 m1 7
6 7 6 7 t¼0
6 . 7 6 ða4 % k2 x4 % k2 x0 4 Þ 7
4 .. 5 6 7
6
m2
7 where t0 is the current time and fpp,pig are two positive settings
x_ 0 7 6 7
6
6 ða5 % k2 x5 % k2 x 5 Þ 7
0 7 used to adjust the convergence rate. To preclude wind-up
6 7 phenomena, the absolute value of the integral term is bounded
6
6 m2 7
7
6 .. 7 by amax . 0.
4 . 5 To operate as expected, the second action needs to predict
ða7 % k2 x7 % k2 x7 Þ=m2
0
the influence of the ‘stretched arm’ singularity. This is
Downloaded from http://rsif.royalsocietypublishing.org/ on September 28, 2016

achieved with two parameters gm and gc. They are defined as the reach a desired position in three dimensions with its 7-DoF 5
dot products arm. We simulated various starting and desired positions,
" "9
but in this illustration, we focus on the sample problem

rsif.royalsocietypublishing.org
" w f "=
gm ¼ "" , 2 "" illustrated in figure 2, where the start position is on the
jjwjj jjf2 jj ð2:15Þ
;
g c ¼ w , f2 , bottom-centre, and the desired position is the green dot.
The four panels of figure 2 exemplify the robot reaching
where the absolute value of gc is bounded by gmax . 0. Then, the
second action is defined as under the four scenarios that we considered. In the first scen-
9 ario (figure 2a), there was no noise on proprioception and
b1 ¼ b3 ¼ b5 ¼ b6 ¼ b7 ¼ 0 =
vision. In the second, third and fourth scenarios, propriocep-
b2 ¼ gm gc p p2 ðshoulderÞ ð2:16Þ
b4 ¼ %gm gc p p4 ðelbowÞ, ; tion (figure 2b), vision (figure 2c) or both (figure 2d) were
noisy, respectively. We used noise with a log precision of 4.
where p p2 and p p4 are additional positive settings used to balance As illustrated by the figures, in the absence of noise (first

J. R. Soc. Interface 13: 20160616


the contribution of the two joints (roughly: p p2 ¼ p p4 =2). Finally, scenario), the reaching trajectory is flawless and free of static
the controller provides the empirical prior
error (figure 2a). Trajectories become less accurate when
Fi ¼ ai þ bi : ð2:17Þ either proprioception (second scenario) or vision (third scen-
ario) are noisy, still the arm reaches the desired target
In practice, to obtain reasonable behaviour, the controller set-
(figure 2b,c). However, when both proprioception and
tings were chosen as: pp ¼ 0.3, pi ¼ 0.01, amax ¼ 0.001, pp2 ¼ 2.25,
vision are noisy, the arm becomes largely unable to reach
pp4 ¼ 5, gmax ¼ 0.1.
Finally, we obtain the following generative model the target (figure 2d).
2 3 A more direct comparison between the four scenarios is
x0 1
6 7 possible, if one considers the average of 20 simulations
6 x2 0
7
6 7 from a common starting point (figure 3). Here, the four col-
6 ... 7
6 7 ours correspond to the four scenarios: first scenario (no
6 x7 0 7
6
2 3 6 ð F % k x % k x0 Þ 7
7 noise) is blue; second scenario (noisy proprioception) is
x_ 1 6 1 1 1 1 1 7
black, third scenario (noisy vision) is red and fourth scenario
6 7
6 x_ 2 7 6 m1 7
6 7 6
6 .. 7 6 ðF2 % k1 x2 % k1 x 2 Þ 7 0 7 (noisy proprioception and vision) is yellow. To compare the
6 . 7 6 7 trajectories under the four scenarios quantitatively, we com-
6 7 6 m1 7
6 x_ 7 7 6 7
x,nÞ ¼ 6
f ð~ 7 ¼ 6 ð F3 % k x
1 3 % k 1 3 7
x 0
Þ ð2:18Þ puted the sum of Euclidian distances between the position
6 x_ 0 1 7 6 7
6 0 7 6
6 x_ 2 7 6 m 1
7
7 of the end effector for each iteration of the algorithm under
6 7 6 ð F % k x % k x0 Þ 7 the best trajectory (corresponding to scenario 1) and the
6 . 7 6 4 2 4 2 4 7
4 .. 5 6 7
6 m2 7 other trajectories. We obtained a difference between the
0
x_ 7 6 7
6 ðF5 % k2 x5 % k2 x 5 Þ 70
normal and noisy proprioception scenarios of 0.2796;
6 7
6 m2 7 between normal and noisy vision scenarios of 0.143; and a
6 7
6 . 7
6
6
.
. 7
7
difference between the normal and noisy proprioception
4 ðF7 % k2 x7 % k2 x0 7 Þ 5 and vision scenarios of 1.2169.
:
m2 These differences can be better appreciated if one con-
siders the internal dynamics of the system’s hidden states
Importantly, we see that the generative model has a very
(i.e. angles of the arm) during the different conditions, as
different form from the true equations of motion. In other
words, the generative model has prior beliefs that render shown in figures 4– 7 for the simulations without noise,
motor behaviour purposeful. It is this enriched generative noisy proprioception, noisy vision and noisy proprioception
model that produces goal-directed behaviour, which fulfils the and vision, respectively. The hidden states are inferred
robot’s predictions and minimizes surprise. In this instance, while the agent optimizes its expectations as described
the agent believes it is going to move with its arm towards the above (see equation (2.12)). In turn, action (a(t)) is selected
target until it touches it. The distance between the end effector based on the hidden states (technically, action is part of the
and the target is used as an error that drives the motion, as if generative process but not the generative model).
the end effector is pulled to the target. The ensuing movement The four panels of figures 4–7 show the conditional pre-
therefore resolves the Bernstein’s problem that tries to solve the dictions and prediction errors during the task. In each figure,
converse problem of pushing the end effector towards the
the top right panel shows the hidden states, and the grey
target (which is an ill-posed problem). This formulation of
areas correspond to 90% Bayesian confidence intervals. The
motor control is related to the equilibrium point hypothesis
[32] and the passive motion paradigm [33,34] and, crucially, figures show that adding noise to proprioception (figure 5)
dispenses with inverse models. Note that no solution of an makes the confidence interval much larger compared with
optimal control problem is required here. This is because the a standard case with no noise (figure 4). Confidence intervals
causes of desired behaviour are specified explicitly by the gener- further increase when both proprioception and vision are
ative or forward model (the arm is pulled to a target), and do noisy (figure 7). The top left panel shows the conditional pre-
not have to be inferred from desired consequences; see section dictions of sensory signals (coloured lines) and sensory
Discussion for a comparison of active inference and optimal prediction errors (red). These are errors on the proprioceptive
control schemes. and visual input, and are small in relation to predictions. The
bottom left panel shows the true expectation (dotted line) and
conditional expectation (solid line) about hidden causes. The
bottom right panel shows actions (coloured lines) and true
3. Results causes (dotted lines). In the noisy proprioception scenario
We tested the model in four scenarios. In all the scenarios, the (figure 5), one of the hidden states (top right panel) and
robot arm started from a fixed starting position and had to one action (bottom right panel) rises with time. This
Downloaded from http://rsif.royalsocietypublishing.org/ on September 28, 2016

(a) (b) 6
1.1

rsif.royalsocietypublishing.org
1.2
1.0
1.1
0.9
1.0
z (m)

0.8
0.9

z (m)
0.7
0.8
0.6
0.7
0.5
–0.5 0.6
–0.4 –0.6

J. R. Soc. Interface 13: 20160616


–0.4
y (–0.3 0.5 –0.2
m) –0.2 0.66 0.64
–0.1 0.70 0.68 0.75 0
0.74 0.72 0.70 0.65 y (m)
0.76
x (m) x (m)

(c) (d) 1.5


1.2
1.4
1.1
1.3
1.0 1.2
0.9 1.1
z (m)

z (m)
0.8 1.0
0.9
0.7
0.8
0.6
0.7
0.5
–0.6 0.6
–0.4 0.5
–0.2 1.0
y(
m)

0 0.5
0.64 0.62
x(

0.74 0.72 0.70 0.68 0.66 0 0.1


0.78 0.76 –0.8 –0.7 –0.6 –0.5 –0.4 –0.3 –0.2 –0.1
m
)

x (m) y (m)

Figure 2. Reaching trajectories in three dimensions from a start to a goal location under four scenarios. (a) Scenario 1: reaching in three dimensions with 7 DoF.
(b) Scenario 2: reaching in three dimensions with noisy proprioception. (c) Scenario 3: reaching in three dimensions with noisy vision. (d) Scenario 4: reaching in
three dimensions with noisy proprioception and vision. The blue trajectory is the mean of 20 trajectories shown in grey.

1.2
4. Discussion
1.1 Our case study shows that the active inference scheme can
control the seven DoFs arm of a simulated PR2 robot—focus-
1.0
ing here on the task of reaching a desired goal location from a
0.9 ( predefined) start location.
z (m)

Our results illustrate that action control is accurate with


0.8
intact proprioception and vision, and only partly impaired if
0.7 noise is added to either of these modalities. The comparison
of the trajectories of figure 2b,c shows that adding noise to pro-
0.6
prioception is more problematic. The analysis of the dynamics
0.5 of internal system variables (figures 4–7) helps us understand-
0 ing the above results, highlighting the differential roles of
–0.2
–0.4 proprioception and vision in this scheme. In the noisy proprio-
y(

–0.6 0.75 0.65 0.60


m

0.70
ception scenario (figure 5), hidden states are significantly more
)

x (m)
uncertain compared with the reference case with no noise
Figure 3. Reaching trajectories from a common starting point. In blue, no (figure 4). Yet, despite the uncertainty about joint angles, the
noise (scenario 1). In black, noise on proprioception (scenario 2). In red, robot can still rely on (intact) vision to infer where the arm is
noise on vision (scenario 3). In yellow, noise on vision and proprioception in space, and thus it is able to reach the target ultimately—
(scenario 4). The trajectories are the mean of 20 simulations from the although it follows largely suboptimal trajectories (in relation
same start and goal (green) locations. to its prior beliefs preferences). Multimodal integration or com-
pensation is impossible if both vision and proprioception are
corresponds to an internal degree of freedom that does not sufficiently degraded (figure 7). In the noisy vision scenario,
have any effect on the trajectory. The figures show some figure 6, noise has some effect on inferred causes but only
slight oscillations after 20 iterations, which are due to the affects hidden states (and ultimately action selection) to a
fact that the arm is moving in the proximity of the target. minor extent.
Downloaded from http://rsif.royalsocietypublishing.org/ on September 28, 2016

(a) prediction and error (b) hidden states 7


1.5 1.0

rsif.royalsocietypublishing.org
0.8
1.0 0.6
mx(t)
0.4
0.5 0.2
0
0 –0.2
–0.4

J. R. Soc. Interface 13: 20160616


–0.5 –0.6
~
g(m(t)) –0.8
–1.0 –1.0
5 10 15 20 25 30 35 40 5 10 15 20 25 30 35 40

(c) (d)
1.5 hidden causes 1.5 perturbation and action

1.0
v(t)
1.0
0.5

0.5 0

–0.5
0 mv(t)
–1.0 a(t)

–0.5 –0.5
5 10 15 20 25 30 35 40 5 10 15 20 25 30 35 40
iterations iterations
Figure 4. Dynamics of the model internal variables in the normal case. The conditional predictions and expectations are shown as functions of the iterations. (a) the
panel shows the conditional predictions (coloured lines) and the corresponding prediction errors (red lines) based upon the expected states on the upper right.
(b) The coloured lines represent the expected hidden states causing sensory predictions. These can be thought of displacements in the articulatory state space. In this
panel and throughout, the grey areas denote 90% Bayesian confidence intervals. (c) The dotted lines represent the true expectation and the solid lines show the
conditional expectation of the hidden cause. (d) Action (solid line) and true causes (dotted line).

This pattern of results shows that control is quint- if one requires, for example, to estimate the pose of a
essentially multimodal and based on both vision and to-be-grasped object.
proprioception, and adding noise to either modality can be The above-mentioned results are consistent with a large
( partially) compensated for by appealing to the other, more body of studies showing the importance of proprioception
precise dimension. However, proprioception and vision for control tasks. Patients with impaired proprioception can
play differential roles in this scheme. Proprioception has a still execute motor tasks such as reaching, grasping and loco-
direct effect on hidden states and action selection; this is motion, but their performance is suboptimal [35–38]. In
because action dynamics depend on reflex arcs that suppress principle, the scheme proposed here may be used to explain
proprioceptive (not visual) prediction errors (see §2.4). If the human data under impaired proprioception [4] or other def-
robot has poor proprioceptive information, then it can icits in motor control—or even to help design rehabilitation
use multimodal integration and the visual modality to com- therapies. In this perspective, an interesting issue that we
pensate and restore efficient control. However, if both have not addressed pertains to the attenuation of propriocep-
modalities are degraded with noise, then multimodal inte- tive prediction errors during movement. Heuristically, this
gration becomes imprecise, and the robot cannot reduce sensory attenuation is necessary to allow the prior beliefs of
error accurately—at least in the simplified control scheme the generative model to supervene over the sensory evidence
assumed here, which (on purpose) does not include any that movement has not yet occurred (or in other words, to
additional corrective mechanism. Adding noise to vision is prevent internal states encoding the fact that there is no
less problematic, given that in the (reaching) task considered movement). This speaks to a dynamic gain control that med-
here, it plays a more ancillary role. Indeed, our reaching task iates the attenuation of the precision of prediction errors
does not pose strong demands on the estimation of hidden during movement. In the example shown above, we simply
causes for accurate control; the situation may be different reduced the precision of ascending proprioceptive prediction
Downloaded from http://rsif.royalsocietypublishing.org/ on September 28, 2016

(a) prediction and error (b) hidden states 8


1.5 1.5

rsif.royalsocietypublishing.org
1.0
1.0 mx(t)

0.5
0.5
0
0
–0.5

J. R. Soc. Interface 13: 20160616


~
g(m(t)) –0.5
–1.0

–1.5 –1.0
5 10 15 20 25 30 35 40 5 10 15 20 25 30 35 40

(c) (d)
1.5 hidden causes 2.5 perturbation and action
2.0

1.5
1.0 v(t)
1.0

0.5
0.5
0

–0.5
0
mv(t) –1.0
a(t)
–1.5

–0.5 –2.0
5 10 15 20 25 30 35 40 5 10 15 20 25 30 35 40
iterations iterations
Figure 5. (a – d) Dynamics of the model internal variables in the noisy proprioception case. The layout of the figure is the same as figure 4. Please see the previous
figure legends for details.

errors to enable movement. Had we increased their precision, people characterize experimental data by comparing the evi-
the ensuing failure of sensory attenuation would have sub- dence for different models in Bayesian model comparison.
verted movement; perhaps in a similar way to the poverty We hope to explore this in future work.
of movements seen in Parkinson’s disease—a disease that We next hope to port the scheme to a real robot. This will
degrades motor performance profoundly [39]. This aspect be particularly interesting, because there are several facets of
of precision or gain control suggests that being able to active inference that are more easily demonstrated in a real-
implement active inference in robots will allow us to perform world artefact. These aspects include a robustness to exogen-
simulated psychophysical experiments to illustrate sensory ous perturbations. For example, the movement trajectory
attenuation and its impairments. Furthermore, it suggests a should gracefully recover from any exogenous forces applied
robotic model of Parkinson’s disease is in reach, providing to the arm during movement. Furthermore, theoretically, the
an interesting opportunity for simulating pathophysiology. active inference scheme is also robust to differences between
Clearly, some of our choices when specifying the genera- the true motor plant and the various kinematic constants in
tive model are heuristic—or appeal to established notions. the generative model. This robustness follows from the fact
For example, adding a derivative term to equation (2.14) that the movement is driven by (fictive) forces whose fixed
could change the dynamics in an interesting way. In general, points do not change with exogenous perturbations—or
the framework shown above accommodates questions about many parameters of the generative model (or process).
alternative models and dynamics through Bayesian model Another interesting advantage of real-world implementations
comparison. In principle, we have an objective function (vari- will be the opportunity to examine robustness to sensorimotor
ational free energy) that scores the quality of any generative delays. Although not necessarily a problem from a purely
model entertained by a robot—in relation to its embodied robotics perspective, biological robots suffer non-trivial
exchange with the environment. This means we could delays in the signalling of ascending sensory signals and des-
change the generative model and assess the quality of the cending motor predictions. In principle, these delays can be
ensuing behaviour using variational free energy—and select absorbed into the generative model—as has been illustrated
the best generative model in exactly the same way that in the context of oculomotor control [40]. At present, these
Downloaded from http://rsif.royalsocietypublishing.org/ on September 28, 2016

(a) prediction and error (b) hidden states 9


1.5 0.8

rsif.royalsocietypublishing.org
0.6
mx(t)
1.0 0.4

0.2
0.5
0

–0.2
0
–0.4

J. R. Soc. Interface 13: 20160616


–0.5 –0.6
~
g(m(t))
–0.8

–1.0 –1.0
5 10 15 20 25 30 35 40 5 10 15 20 25 30 35 40

(c) hidden causes (d) perturbation and action


1.2 1.5

1.0
1.0
v(t)
0.8
0.5
0.6

0.4 0

0.2
–0.5
0
mv(t) –1.0 a(t)
–0.2

–0.4 –1.5
5 10 15 20 25 30 35 40 5 10 15 20 25 30 35 40
iterations iterations
Figure 6. (a–d) Dynamics of the model internal variables in the noisy vision case. The layout of the figure is the same as figure 4. Please see the previous figure
legends for details.

proposals for how the brain copes with sensorimotor delays in robot performing a particular motor task. Optimal control
oculomotor tracking remain hypothetical. It would be extre- theory requires a mechanism for state estimation as well as
mely useful to see if they could be tested in a robotics setting. two internal models: an inverse and forward model. This
As noted above, active inference shares many similarities scheme also assumes that the appropriate optimality equation
with the passive movement paradigm (PMP, [33,34]). Although can be solved [41]. In contrast, active inference uses prior
strictly speaking, active inference is a corollary of the free- beliefs about the movement (in an extrinsic frame of reference)
energy principle, it inherits the philosophy of the PMP in the instead of optimal control signals for movements (in an intrin-
following sense. Active inference is equipped with a generative sic frame of reference). In active inference, there is no inverse
model that maps from causes to consequences. In the setting of model or cost function and the resulting trajectories are
motor control, the causes are forces that have some desired Bayes optimal. This contrasts with optimal control, which
fixed point or orbit. It is then a simple matter to predict the calls on the inverse model to finesse problems incurred by sen-
sensory consequences of those forces—as sensed by proprio- sorimotor noise and delays. Inverse models are not required in
ception or robotic sensors. These sensory predictions can active inference, because the robot’s generative (or forward)
then be realized through open loop control (e.g. peripheral model is inverted during the inference. Active inference also
servos or reflex arcs); thereby realizing the desired fixed point dispenses with cost functions, as these are replaced by the
(cf. the equilibrium point hypothesis [32]). However, unlike the robot’s (prior) beliefs (of note, there is a general duality
equilibrium point hypothesis, active inference is open loop. between control and inference [15,16]). In brief, replacing the
This is because its motor predictions are informed by deep gen- cost function with prior beliefs means that minimizing cost
erative models that are sensitive to input from all modalities corresponds to maximizing the marginal likelihood of a gen-
(including proprioception). The fact that action realizes the erative model [42–44]. A formal correspondence between
(sensory) consequences of (prior) causes explains why there cost functions and prior beliefs can be established with the
is no need for an inverse model. complete class theorem [45,46], according to which there is
Optimal motor control formulations [15,16] are fundamen- at least one prior belief and cost function that can produce a
tally different. Briefly, optimal control operates by minimizing Bayes-optimal motor behaviour. In sum, optimal control for-
some cost function in order to compute motor commands for a mulations start with a desired endpoint (consequence) and
Downloaded from http://rsif.royalsocietypublishing.org/ on September 28, 2016

(a) prediction and error (b) hidden states 10


1.5 0.8

rsif.royalsocietypublishing.org
0.6
1.0 mx(t)
0.4
0.2
0.5
0
0 –0.2
–0.4
–0.5
–0.6

J. R. Soc. Interface 13: 20160616


–0.8
–1.0 ~
g(m(t)) –1.0
–1.5 –1.2
5 10 15 20 25 30 35 40 5 10 15 20 25 30 35 40

(c) hidden causes (d) perturbation and action


1.2 1.5

1.0 1.0
v(t)
0.8
0.5
0.6
0
0.4
–0.5
0.2
–1.0
0
mv(t)
–1.5 a(t)
–0.2

–0.4 –2.0
5 10 15 20 25 30 35 40 5 10 15 20 25 30 35 40
iterations iterations
Figure 7. (a –d) Dynamics of the model internal variables in the ‘all noisy’ case. The layout of the figure is the same as figure 4. Please see the previous figure
legends for details.
tried to reverse engineer the forces (causes) that produce the Although active inference resolves many problems that
desired consequences. It is this construction that poses a diffi- attend optimal control schemes, there is no free lunch. In
cult inverse problem with solutions that are not generally active inference, all the heavy lifting is done by the generative
robust—and are often problematic in robot control. Active model—and in particular, the priors that define desired set-
inference finesses this problem by starting with the causes of points or orbits. The basic idea is to induce these attractors
movement, as opposed to the consequences. by specifying appropriate equations of motion within the
Accordingly, one can see the solutions offered under opti- generative model of the robot. This means that the art of
mal control as special cases of the solutions available under generating realistic and purposeful behaviour reduces to
an (active) inferential scheme. This is because some policies creating equations of motion that have desired attractors.
cannot be specified using cost functions but can be described These can be simple fixed-point attractors as in the example
using priors; specifically, this is the case of solenoidal move- above. They could also be much more complicated, pro-
ments, whose cost is equal for every part of the trajectory ducing quasi-periodic motion (as in walking) or fluent
[47]. This comes from variational calculus, which says that sequences of movements specified by heteroclinic cycles.
a trajectory or a policy has several components: a curl-free All the more interesting theoretical examples in the theoreti-
component that changes value and a divergence-free com- cal literature to date rest upon some form of itinerant
ponent that does not change value. The divergence-free dynamics inherent in the generative model that sometimes
motion can be only specified by a prior and not by a cost have deep structure. A nice example of this is the handwrit-
function. Discussing the relative benefits of control schemes ing example in [48] that used Lotka–Volterra equations to
with or without cost functions and inverse models is specify a sequence of saddle points—producing a series of
beyond the scope of this article. Here, it suffice to say that movements. Simpler examples could use both attracting
inverse models are generally hard to learn for robots, and and repelling fixed points that correspond to contact points
cost functions sometimes need to be defined in ad hoc and collision points, respectively, to address more practical
manner for robot control tasks. By eluding these constraints, issues in robotics. Irrespective of the particular repertoire of
active inference may offer a promising alternative to optimal attractors implicit in the generative model, the hierarchi-
control schemes. For a more detailed discussion on the links cal aspect of the generative models that underlie active
between optimal control and active inference, see [47]. inference enables the composition of movements, sequences
Downloaded from http://rsif.royalsocietypublishing.org/ on September 28, 2016

of movements and sequences of sequences [49–51]. In other more generally, to consider future ( predicted) and not only 11
words, provided one can write down (or learn) a deep gen- currently sensed contingencies [17,52– 56]. Planning mechan-
erative model with itinerant dynamics, there is a possibility isms have been described under the active inference scheme

rsif.royalsocietypublishing.org
of simulating realistic movements that inherit deep temporal and can solve challenging problems such as the mountain –
structure and context sensitivity. car problem [5], and can thus been seamlessly integrated in
In conclusion, in this article, we have presented a proof- the model presented here—speaking to the scalability of the
of-concept implementation of robot control using active active inference scheme. Finally, one reason for using a bio-
inference, a biologically motivated scheme that is gaining logically realistic model such as active inference is that it
prominence in computational and systems neuroscience. may be possible to directly map internal dynamics generated
The results discussed here demonstrate the feasibility of the by the robot simulator (e.g. of hidden states) to brain signals
scheme; having said this, further work is necessary to fully (e.g. EEG signals reflecting predictions and prediction errors)
demonstrate how this scheme works in more challenging generated during equivalent action planning or performance.

J. R. Soc. Interface 13: 20160616


domains or whether it has advantages (from both technologi-
cal and biological viewpoints) over alternative control Data accessibility. All data underlying the findings described in the
schemes. Future work will address an implementation of manuscript can be downloaded from https://github.com/LPioL/
the above scheme on a real robot with the same degrees of activeinference_ROS.
freedom as the PR2. Other predictive models could be devel- Authors’ contribution. L.P.L., A.N., K.F. and G.P. conceived the study and
wrote the manuscript. L.P.L. and A.N. performed the simulations. All
oped, the generative model illustrated above is very simple authors gave final approval for publication.
and does not take advantage of the internal degrees of free- Competing interests. We declare we have no competing interests.
dom. A key generalization will be integrating planning Funding. This work was supported by the French– Italian University
mechanisms that may allow, for example, the robot to pro- (C2-21). K.F. is supported by the Wellcome trust (ref no. 088130/
actively avoid obstacles or collisions during movement—or Z/09/Z).

References
1. Friston K. 2010 The free-energy principle: a unified 11. Pezzulo G, Cartoni E, Rigoli F, Pio-Lopez L, Friston K. Front. Psychol. 4, 92. (doi:10.3389/fpsyg.2013.
brain theory? Nat. Rev. Neurosci. 11, 127–138. 2016 Active inference, epistemic value, and 00092)
(doi:10.1038/nrn2787) vicarious trial and error. Learn. Mem. 23, 322–338. 21. Todorov E. 2006 Linearly-solvable Markov
2. Friston K, Kilner J, Harrison L. 2006 A free energy (doi:10.1101/lm.041780.116) decision problems. Adv. Neural Inf. Process Syst. 19,
principle for the brain. J. Physiol.-Paris 100, 70– 87. 12. Pezzulo G, Rigoli F, Friston K. 2015 Active inference, 1369– 1376.
(doi:10.1016/j.jphysparis.2006.10.001) homeostatic regulation and adaptive behavioural 22. Maisto D, Donnarumma F, Pezzulo G. 2015 Divide et
3. Friston K, Kiebel S. 2009 Predictive coding under control. Prog. Neurobiol. 134, 17 –35. (doi:10.1016/ impera: subgoaling reduces the complexity of
the free-energy principle. Phil. Trans. R. Soc. B 364, j.pneurobio.2015.09.001) probabilistic inference and problem solving. J. R. Soc.
1211–1221. (doi:10.1098/rstb.2008.0300) 13. Dayan P, Hinton GE, Neal RM, Zemel RS. 1995 The Interface 12, 20141335. (doi:10.1098/rsif.2014.1335)
4. Friston K et al. 2012 Dopamine, affordance and Helmholtz machine. Neural Comput. 7, 889–904. 23. Friston K, Adams R, Perrinet L, Breakspear M. 2012
active inference. PLoS Comput. Biol. 8, e1002327. (doi:10.1162/neco.1995.7.5.889) Perceptions as hypotheses: saccades as experiments.
(doi:10.1371/journal.pcbi.1002327) 14. Knill DC, Pouget A. 2004 The Bayesian brain: the Front. Psychol. 3, 151.
5. Friston KJ, Daunizeau J, Kilner J, Kiebel SJ. 2010 role of uncertainty in neural coding and 24. Friston K, Stephan K, Li B, Daunizeau J. 2010
Action and behavior: a free-energy formulation. computation. Trends Neurosci. 27, 712 –719. Generalised filtering. Math. Probl. Eng. 2010,
Biol. Cybern. 102, 227–260. (doi:10.1007/s00422- (doi:10.1016/j.tins.2004.10.007) 621670. (doi:10.1155/2010/621670)
010-0364-z) 15. Todorov E, Jordan MI. 2002 Optimal feedback 25. Ashby WR. 1947 Principles of the self-organizing
6. Friston K, Schwartenbeck P, FitzGerald T, Moutoussis control as a theory of motor coordination. Nat. dynamic system. J. Gen. Psychol. 37, 125–128.
M, Behrens T, Dolan RJ. 2013 The anatomy of Neurosci. 5, 1226–1235. (doi:10.1038/nn963) (doi:10.1080/00221309.1947.9918144)
choice: active inference and agency. Front. Hum. 16. Todorov E. 2008 General duality between optimal 26. Feynman RP. 1998 Statistical mechanics: a set of
Neurosci. 7, 598. (doi:10.3389/fnhum.2013.00598) control and estimation. In 47th IEEE Conf. on lectures (advanced book classics). Boulder, CO:
7. FitzGerald T, Schwartenbeck P, Moutoussis M, Dolan Decision and Control (CDC), 2008, IEEE. Westview Press Incorporated.
RJ, Friston K. 2015 Active inference, evidence pp. 4286– 4292. 27. Hinton GE, Van Camp D. 1993 Keeping the neural
accumulation, and the urn task. Neural Comput. 27, 17. Botvinick M, Toussaint M. 2012 Planning as networks simple by minimizing the description
306–328. (doi:10.1162/NECO_a_00699) inference. Trends Cogn. Sci. 16, 485–488. (doi:10. length of the weights. In Proc. the Sixth Annual
8. Friston KJ, Daunizeau J, Kiebel SJ. 2009 1016/j.tics.2012.08.006) Conf. Computational Learning Theory, ACM.
Reinforcement learning or active inference? 18. Donnarumma F, Maisto D, Pezzulo G. 2016 Problem pp. 5–13.
PLoS ONE 4, e6421. (doi:10.1371/journal.pone. solving as probabilistic inference with subgoaling: 28. Zeki S, Shipp S. 1988 The functional logic of cortical
0006421) explaining human successes and pitfalls in the connections. Nature 335, 311– 317. (doi:10.1038/
9. Friston K, Rigoli F, Ognibene D, Mathys C, FitzGerald tower of Hanoi. PLoS Comput. Biol. 12, e1004864. 335311a0)
T, Pezzulo G. 2015 Active inference and epistemic (doi:10.1371/journal.pcbi.1004864) 29. Quigley M et al. 2009 ROS: an open-source robot
value. Cogn. Neurosci. 6, 187–214. (doi:10.1080/ 19. Pezzulo G, Rigoli F. 2011 The value of foresight: operating system. In Proc. Open-Source Software
17588928.2015.1020053) how prospection affects decision-making. Front. Workshop Int. Conf. Robotics and Automation, Kobe,
10. Friston K, FitzGerald T, Rigoli F, Schwartenbeck P, Neurosci. 5, 79. (doi:10.3389/fnins.2011.00079) Japan, May, vol. 3, p. 5.
O’Doherty J, Pezzulo G. 2016 Active inference and 20. Pezzulo G, Rigoli F, Chersi F. 2013 The mixed 30. Körding KP, Wolpert DM. 2004 Bayesian integration
learning. Neurosci. Biobehav. Rev. 68, 862–879. instrumental controller: using value of information in sensorimotor learning. Nature 427, 244–247.
(doi:10.1016/j.neubiorev.2016.06.022) to combine habitual choice and mental simulation. (doi:10.1038/nature02169)
Downloaded from http://rsif.royalsocietypublishing.org/ on September 28, 2016

31. Diedrichsen J, Verstynen T, Hon A, Zhang Y, Ivry RB. 39. Konczak J, Corcos DM, Horak F, Poizner H, Shapiro Comput. Biol. 5, e1000464. (doi:10.1371/journal. 12
2007 Illusions of force perception: the role of M, Tuite P, Volkmann J, Maschke M. 2009 pcbi.1000464)

rsif.royalsocietypublishing.org
sensori-motor predictions, visual information, and Proprioception and motor control in Parkinson’s 50. Pezzulo G. 2012 An active inference view of
motor errors. J. Neurophysiol. 97, 3305 –3313. disease. J. Motor Behav. 41, 543–552. (doi:10. cognitive control. Front. Psychol. 3, 478. (doi:10.
(doi:10.1152/jn.01076.2006) 3200/35-09-002) 3389/fpsyg.2012.00478)
32. Feldman AG, Levin MF. 2009 The equilibrium-point 40. Perrinet LU, Adams RA, Friston KJ. 2014 Active 51. Pezzulo G, Donnarumma F, Iodice P, Prevete R,
hypothesis—past, present and future. In Progress inference, eye movements and oculomotor delays. Dindo H. 2015 The role of synergies within
in motor control, pp. 699 –726. Berlin, Germany: Biol. Cybern. 108, 777– 801. (doi:10.1007/s00422- generative models of action execution and
Springer. 014-0620-8) recognition: a computational perspective. Comment
33. Mussa-Ivaldi F. 1988 Do neurons in the motor 41. Bellman R. 1952 On the theory of dynamic on “Grasping synergies: a motor-control approach to
cortex encode movement direction? An alternative programming. Proc. Natl Acad. Sci. USA 38, the mirror neuron mechanism” by A. D’Ausilio et al.
hypothesis. Neurosci. Lett. 91, 106 –111. (doi:10. 716 –719. (doi:10.1073/pnas.38.8.716) Phys. Life Rev. 12, 114–117. (doi:10.1016/j.plrev.

J. R. Soc. Interface 13: 20160616


1016/0304-3940(88)90257-1) 42. Cooper GF. 2013 A method for using belief 2015.01.021)
34. Mohan V, Morasso P. 2011 Passive motion networks as influence diagrams. (https://arxiv.org/ 52. Lepora NF, Pezzulo G. 2015 Embodied choice: how
paradigm: an alternative to optimal control. Front. abs/1304.2346) action influences perceptual decision making. PLoS
Neurorob. 5, 1– 28. (doi:10.3389/fnbot.2011.00004) 43. Shachter RD. 1988 Probabilistic inference and Comput. Biol. 11, e1004110. (doi:10.1371/journal.
35. Butler AJ, Fink GR, Dohle C, Wunderlich G, Tellmann influence diagrams. Oper. Res. 36, 589–604. pcbi.1004110)
L, Seitz RJ, Zilles K, Freund H-J. 2004 Neural (doi:10.1287/opre.36.4.589) 53. Pezzulo G, van der Meer MA, Lansink CS, Pennartz
mechanisms underlying reaching for remembered 44. Pearl J. 2014 Probabilistic reasoning in intelligent CMA. 2014 Internally generated sequences in
targets cued kinesthetically or visually in left or systems: networks of plausible inference. learning and executing goal-directed behavior.
right hemispace. Hum. Brain Mapp. 21, 165 –177. San Francisco, CA: Morgan Kaufmann. Trends Cogn. Sci. 18, 647–657. (doi:10.1016/j.tics.
(doi:10.1002/hbm.20001) 45. Brown LD. 1981 A complete class theorem for 2014.06.011)
36. Diener H, Dichgans J, Guschlbauer B, Mau H. 1984 statistical problems with finite sample spaces. Ann. 54. Pezzulo G, Cisek P. 2016 Navigating the affordance
The significance of proprioception on postural Stat. 9, 1289 –1300. (doi:10.1214/aos/1176345645) landscape: feedback control as a process model
stabilization as assessed by ischemia. Brain 46. Robert CP. 1992 L’analyse statistique bayésienne. of behavior and cognition. Trends Cogn. Sci. 20,
Res. 296, 103–109. (doi:10.1016/0006- Paris, France: Economica. 414–424. (doi:10.1016/j.tics.2016.03.013)
8993(84)90515-8) 47. Friston K. 2011 What is optimal about motor 55. Stoianov I, Genovesio A, Pezzulo G. 2016 Prefrontal
37. Dietz V. 2002 Proprioception and locomotor control? Neuron 72, 488 –498. (doi:10.1016/j. goal-codes emerge as latent states in probabilistic
disorders. Nat. Rev. Neurosci. 3, 781–790. (doi:10. neuron.2011.10.018) value learning. J. Cogn. Neurosci. 28, 140 –157.
1038/nrn939) 48. Friston K, Mattout J, Kilner J. 2011 Action (doi:10.1162/jocn_a_00886)
38. Sainburg RL, Ghilardi MF, Poizner H, Ghez C. 1995 understanding and active inference. Biol. Cybern. 56. Verschure P, Pennartz C, Pezzulo G. 2014 The why,
Control of limb dynamics in normal subjects and 104, 137 –160. (doi:10.1007/s00422-011-0424-z) what, where, when and how of goal directed choice:
patients without proprioception. J. Neurophysiol. 73, 49. Kiebel SJ, Von Kriegstein K, Daunizeau J, Friston KJ. neuronal and computational principles. Phil. Trans.
820–835. 2009 Recognizing sequences of sequences. PLoS R. Soc. B 369, 20130483. (doi:10.1098/rstb.2013.0483)
5.3. Other perspectives

5.3 Other perspectives

5.3.1 Online active inference for robotics

We can add to this work concerns about the speed of the scheme and therefore the technical
aspect of the approach. In our framework, the whole simulation was pretty slow, around
30 minutes for one simulation and 50 iterations. Technically, the code used was in Matlab
©and the whole approach took place in simulated robotics which is not always as fast as with
a real robot. To our knowledge, no attempts have been made to increase the speed of the
computations.

Several additions to this approach could be developed to enhance the speed of the computa-
tions. Another language, like C/C++ could be used for code rewriting or the whole approach
could be paralellized and implemented on GPUs as it is based on (dynamic) expectation-
maximization [202]. It will be a necessary requirement for free-energy minimizing robots
acting online with humans.

Online active inference could be interesting given the need of online learning or online
algorithm for robotic control and the large number of phenomena in perception, action
and cognition already implemented into this general framework like reaching, planning,
recognition just to name a few.

5.3.2 Extension of the framework for navigation

The whole approach is very similar to the problem of navigation where an agent needs to reach
a pre-defined target respecting different constraints. The work presented above is very similar
to the problem of navigation. The only change would be to the generative model but the
general framework would remain the same. The prediction could define a target to reach on
the images arriving in frames and the scheme would fulfill the prediction resulting in moving
of the autonomous agent. And then another target should be defined via an artificial vision
algorithm and theoretically it could end with a navigating agent minimizing free-energy.

Planning mechanisms have been described under the active inference scheme and can solve
challenging problems such as the mountain-car problem [80]. This could be integrated for
manipulation but also for navigation leading to obstacle-avoidance via minimization of free-

75
Chapter 5. Active inference and robotics

energy. To our knowledge, this approach has never been applied to navigation robotics.

76
6 General conclusion and perspectives

Whoever undertakes to set himself up as a judge of Truth and Knowledge


is shipwrecked by the laughter of the gods.

Albert Einstein

6.1 Summary

In this thesis we presented three articles developed around the notion of predictive coding, the
Bayesian brain hypothesis and Bayesian filtering. The objective was to develop new methods
for control of robotics systems capable of managing uncertainty without to use the inverse
model that could be a limiting factor for real-time robotic applications. We also used the
(simulated) robot as a platform to explore sensorimotor integration.

The first article presented was about dual Bayesian filtering for reaching and grasping. We
developed a new approach capable of controlling a high number of degrees of freedom.
This neuro-inspired approach shows interesting results for control of manipulation robots.
Without resorting to inverse kinematics (that may be excessively time consuming or lead to
rough approximations), we showed that it is possible to control high dimensional systems by
simulating and predicting the outcome of local actions (forward model only), as long as the
problem complexity is broken down into both the visual and motor spaces, like we simulate
the trajectory first and then we follow it with motor actions. We relied on a bio-plausible and
probabilistic method for generating reaching movements in complex settings. Specifically, we
apply (approximate) Bayesian filtering in the visual and motor spaces. Visual filtering permits

77
Chapter 6. General conclusion and perspectives

to define an initial rough trajectory in the operational space (avoiding obstacles), which is
then refined by motor filtering, allowing direct control in the joint space while respecting
joints limits. The method was validated in simulation on a set of scenarios, with one or several
targets (e.g. fingers on an object for grasping) to be reached with one or several arms/hands.
With strong spatial constraints (e.g. obstacles), it succeeds in finding a trajectory where inverse
kinematics methods fail. For psychology, it seems that relying on the visual space seems
promising to reduce the complexity of movement generation in the motor space. And for
robotics, we presented an online method for controlling a high number of DoFs.

The second article extended the approach to navigation. In association with a new saliency
algorithm, we presented a new neuro-inspired approach to produce a visual memory for
navigation and autonomous mobile robots. The Bayesian filtering over time and feature and
the encoding of frames by features assemblies permit to compensate to the lack of precision
of the individual features. The sparsity of the map as well as its ability to efficiently represent
an environment is demonstrated on a webcam camera input from an outdoor urban dataset.
Loop-closure detection, which is the ability to recognize previously visited places, is finally
validated using GPS data. The idea at the neuroscientific level is to develop a model close to the
hippocampus functioning in which place cells are sensitive to a set of features integrating over
time as Bayesian filtering. The method also relies on the concept of population coding where
each feature will be shared by many frames. This can be parallelized using neural assembly
code and an appropriate similarity measure, leading to a reduction in the number of features
stored in the visual memory without loosing much precision. For robotics, the interest of the
approach is also that it could be parallelized, which is an advantage for autonomous robots
evolving in human environments.

To finish we presented the first application of active inference framework to robotics. We


showed that the scheme can control a 7-DoFs arm of a PR2 robot. To our knowledge, it is the
first application if this theory to robotics. Our results emphasize that control is multimodal
and based both on vision and proprioception and if one modality is impaired, it can be
compensated (not entirely) by the other. Proprioception has a particular role on control
because action dynamics depends on reflex arcs that suppress proprioceptive (not visual)
errors. Therefore, proprioception has a direct effect on hidden states and the movement of
the arm. When both modalities are impaired, the multimodal integration is not accurate
enough for reaching in our case study. This result is in line with a large body of studies on

78
6.2. Perspectives

proprioception for control tasks. Our work could be extended for studying sensory attenuation
as we can see in Parkinsons’s disease leading to an opening to the field of computational
pathology. The scheme could also be applied to a real robot. Several aspects of active inference
could be emphasized with such an application like robustness to exogenous perturbations
or to sensorimotor delays. With this article, we made a proof-of-principle of robot control
using active inference but other predictive models could be developed. Our generative model
could include planning mechanisms to avoid obstacles or collisions as it already has been
implemented in previous studies [80]. For cognitive science in general, active inference has
the advantage to possibly map internal dynamics generated by the robot to brain signals and
could be an interesting basis for future studies on embodied cognition.

At the end, we developed several methods for robotics without to resort on the inverse kinemat-
ics that could be too time consuming for real-word applications. Our framework (predictive
coding and the Bayesian brain hypothesis) permitted us to adress different questions in cog-
nitive science (usefulness of the visual and motor space for reaching and grasping, robotic
model of sensorimotor integration, etc.). Nevertheless, our approach can be extended and
challenged in several ways. We will present in the last section two perspectives on this work.

6.2 Perspectives

First, we have chosen Bayesian modeling for its interesting representation of knowledge and
uncertainty. In the introduction, we spoke about the other approaches in the field, connec-
tionnist or with optimization approach. How to distinguish them ? Can we demonstrate in an
appropriate mathematical framework that these methods can explain different aspects of the
motor system or that they are indistinguishable ? With this second question, we will introduce
the Bayesian distinguishability of models. And in second, finally, is classical probability theory
the appropriate representation of probabilities ? Indeed, it exists other theories representing
probabilities with other axioms and it is not clear yet in the literature if the theories of prob-
ability are equivalent and which one could be used by the brain. These two questions will
outline the perspectives of our general conclusion.

79
Chapter 6. General conclusion and perspectives

6.2.1 Comparison and distinguishability of models

Several approaches in science exist in order to describe the same phenomenon: Bayesian
modeling, connectionist models, optimization model, etc. With Bayesian modeling, one can
follow several ways to describe any function. If one want to realize a real add to science, model
comparison and distinguishability of models seems more and more necessary. In Bayesian
modeling, it exists several methods for comparison and distinguishability of models. If our
models are constrained enough to make some experimental predictions, these approaches
allow the researcher to select the experimental points in an optimal manner for distinguisha-
bility of models and test them. For some models, it is possible that the experimental space
chosen by the researcher leads to indistinguishability of models [50]. The equation of the
Bayesian distinguihability is the following:

q
¢2
P (D = 1 | x y m 1 m 2 θ1 θ2 ) = P (M = m 1 θ1 | x y) − P (M = m 2 θ2 | x y)
¡
sµ (6.1)
P (y | x m 1 θ1 ) − P (y | x m 2 θ2 ) 2

=
P (y | x m 1 θ1 ) + P (y | x m 2 θ2 )

We can apply it on a very simple example in order to give an intuition of the approach.
The first example is about the distinguishability of two gaussian models with very different
predictions. Graphically, we observe with θ1 = (30, 3) et θ2 = (50, 3) that the two models are
very distinguishable except when the two models express the same probability to obtain y
(see Figure 6.1). At the intersection of the graphical representations of the two models, the
distinguishability falls deeply.

80
6.2. Perspectives

Figure 6.1 – Distinguishability with two gaussian models: 1st case. At left, predictions of models
m 1 and m 2 with θ1 = (30, 3) and θ2 = (50, 3). At right, distinguishability between these two models:
P (D = 1 | x y m 1 m 2 θ1 θ2 ). Figures from [50].

A second example is when the two models are close in term of predictions (see Figure 6.2). We
obtain this case with parameters θ1 = (75, 15) and = θ2 = (75, 10). Again the distinguishability
is very low when the predictions of the two models are very close.

Figure 6.2 – Distinguishability between two gaussian models: 2nd case. At left, predictions of models
m 1 and m 2 with θ1 = (75, 15) and θ2 = (75, 10). At right, distinguishability between these two models.
Figures from [50].

We can observe in a non-intuitive manner that, far from the average, distinguishability is very
high. This is due to the relative distance between the model predictions, which is very high far
from the average (see Figure 6.3).

81
Chapter 6. General conclusion and perspectives

Figure 6.3 – Logarithm of the predictions of the models m 1 and m 2 with θ1 = (75, 15) and θ2 = (75, 10).
Figure from [50].

With this approach of distinguishability, we can define an experimental space where we can
really compare the different approaches for the sake of science. It is also possible to use this
approach, very close to active learning, directly to robotics, in order to choose the optimal
action for distinguishability of models, or in order to choose the best model of movement for
example for a specific task.

6.2.2 Is classical probability the appropriate representation ?

In this thesis, we applied the Bayesian formalism to cognitive robotics. We have to mention
that other probabilistic frameworks exist to describe action, perception, and cognition and
it is possible that the brain uses richer representations than classical probabilities based on
Kolmogorov axioms (CP). One of the approach currently active in the field is the quantum
theory of probabilities (or QP) [32]. In short, compared to the classical, Bayesian approach,
there is a non-commutativity of the events A and B . Therefore A ∩ B is not necessarily
the same as B ∩ A. The sequence of events is taken into account in the quantum view on
probability. For example, it is well-known for polling companies that the trustworthiness of a
politician A followed by the politician B can be judged differently if we ask the respondents
to judge first politician A or B [183]. The sequence of events can affect and change the
probability distributions. The classical Bayesian approach succeeded to explain and model a
wide range of cognitive phenomena. But it exists several violations of CP in order to model
human judgement like order/context effects, violations of the law of total probabilities, or
failures of compositionality [190, 191]. QP has been applied on a wide range of phenomena in

82
6.2. Perspectives

perception, action, and decision-making including bistable perception [10], overdistribution


in episodic memory [27], entanglement in associative memory [31], violations of rational
decision making [147, 211], probability judgment errors [35], order effects on inference [187],
causal reasoning [188], or vagueness [23].

We emphasize QP as the possibility that the brain uses richer probabilistic representations
to deal with uncertainty in the world, but other extensions of CP exists. The theory of belief
functions (Dempster-Shafer theory) [165, 208] considers a belief function as a set function
more general than a probability measure but whose values can still be interpreted as degrees
of belief. We could also focus on the theory of imprecise probabilities [201]. It doesn’t seem
clear yet if CP, QP or the other extensions are formally equivalent even if some attempts have
been made to reduce the evidence theory to QP for example [200].

83
A Particle filtering

Particle filtering is a Sequential Monte Carlo (SMC) method for approximate bayesian com-
putation (ABC). The idea behind this technique is to approximate recursively the density
P (S t | O 0:t ). The particle set corresponds to a finite weighted sum of Diracs centered on
locations into the state space. A weight w k(i ) is assigned to each particle S k(i ) with i = (1, ... , N )
and N the number of particles. With this formulation, we obtain:

w t(i ) δS (i ) (S t ),
X
P (S t | O 0:t ) ≈ (A.1)
t
i =1:N

where δS (i ) denotes the delta-Dirac mass located in S (it ) .


t

We used sequential importance sampling method for the estimation of the particles. This
method allows us to estimate recursively in time the importance weights. The algorithm of
particle filtering has three steps: the sampling of particles, the calculation of the weights and
the resampling when needed.

• The sampling of particles corresponds to the generation of the new particles from the
approximation of the true distribution P (S k | O 0:t −1 ). This approximation is called the
importance function and denoted π(S k | S 0:t −1O 0:t ).

• The calculation of the new importance weights w k(i ) allows to estimate recursively the
new importance function according to the new observation at time t. The weights are

85
Appendix A. Particle filtering

then computed with:

P (O t | S (it ) )P (S (it ) | S (it −1


)
)
w t(i ) = w t(i−1
)
w k(i ) = 1.
X
, with (A.2)
π(S (it ) (i )
| S o:t −1 O 0:t ) i =1:N

• And the last step is about resampling. The aim of this step is to improve the approxi-
mation ff the importance function by removing the particles with weak weights and to
generate new particles from those with strong weights.

Several implementations of this algorithm can be coded, indeed, there exist several versions
of the importance sampling [54]. We used the bootstrap filter for this work [54].

Following [41], we also applied the following approximation:

• The importance function is set to the evolution law:

(i ) (i )
π(S t | S 0:t −1 O 0:t ) ) = P (S t | S t −1 ).
(A.3)

• And as a consequence of this last equation the weights are computed as:

w t(i ) ∝ w t(i−1
)
P (O t | S (it ) ). (A.4)

The general algorithm of bootstrap filter is the following:

86
Algorithm 1 Particle filtering algorithm
INITIALIZATION
1: t =0
2: for i=1, ... , N do
3: Sample S 0(i ) ∼ P (S 0 )
4: t =1
IMPORTANCE SAMPLING
5: Sample Ŝ (it ) ∼ P (S t | S (it −1
)
) (= π(S t | S 0:t −1O 0:t ) ))
(i ) (i ) (i )
6: Ŝ 0:t = (S 0:t −1 , Ŝ t )
7: Evaluation of the importance weights:
8: ŵ t(i ) = P (O t | Ŝ (it ) )
9: end for
10: Normalization of the importance weights.
SELECTION STEP
11: Resampling of the N particles according to the importance weights.
12: t = t +1
13: Go to importance sampling step

This method has several interesting properties. Indeed, without to cite all of them, its imple-
mentation is pretty fast and it can be parallelized [54].

87
B Glossary

Active inference: the combined mechanism by which perceptual and motor systems conspire
to reduce prediction errors using the twin strategies of altering predictions to fit the world and
altering the world (including the body) to fit the predictions.

Corollary discharge: often (incorrectly) used synonymously with ‘efference copy’, this names
the output of the forward model (the predictor mechanism), which may be used to influence
further processing.

Efference copy: a copy of the current motor command that may be given as input to a forward
model.

Forward model: a mechanism that predicts the future state of a system. In standard control
theory, this would be an internal loop whose input is a copy (an efference copy) of a control
signal such as a motor command and whose output is a prediction about the next sensory
state. In AFMs, an ‘inverse model’ computes the motor command, which is then copied to the
forward model, which issues predictions. In IFMs, proprioceptive predictions issued by the
forward model act directly as motor commands and there is no role for an efference copy.

Generative model: a description that allows a system to self-generate data that are similar to
the observed data. Usually, that means a model that captures the statistical structure of some
set of observed inputs by tracking (in effect, schematically recapitulating) the causal matrix
responsible for that structure. The dynamics of the units encoding such a model are used to
predict inputs to the system. A generative model thus generates consequences from causes in
the same way that a forward model maps from causes to consequences. Forward models are

89
Appendix B. Glossary

thus examples of generative models.

Inverse–forward scheme: a scheme that posits two distinct models – an inverse model (or
optimal control model) that converts intentions into motor commands and a forward model
that converts motor commands into sensory consequences (which are compared with actual
outcomes for online error correction and learning).

Inverse model: a mechanism that takes the intended position of the body as input and
estimates the motor commands that would transform the current position into the desired
one.

Joint action: an action that involves appropriately timed and coordinated contributions from
two (or more) agents.

Predictive coding: a data-compression strategy in which only the discrepancies between


predicted and expected values (residual errors, or ‘prediction errors’) are used to drive further
processing.

Proprioception: the ‘inner’ sense that informs us about the relative locations of our bodily
parts and the forces and efforts that are being applied. Simulation: interpreting another’s
actions by reproducing the processes that one would use to perform that action.

Glossary from [148].

90
Bibliography

[1] Rick A Adams, Stewart Shipp, and Karl J Friston. Predictions not commands: active
inference in the motor system. Brain Structure and Function, 218(3):611–643, 2013.

[2] David Alais and David Burr. The ventriloquist effect results from near-optimal bimodal
integration. Current biology, 14(3):257–262, 2004.

[3] Si Wu Shun-ichi Amari. Neural implementation of bayesian inference in population


codes. In Advances in Neural Information Processing Systems 14: Proceedings of the 2001
Neural Information Processing Systems (NIPS) Conference, volume 1, page 317. The MIT
Press, 2002.

[4] Charles H Anderson. Basic elements of biological computational systems. International


Journal of Modern Physics C, 5(02):313–315, 1994.

[5] J Yu Angela and Peter Dayan. Uncertainty, neuromodulation, and attention. Neuron,
46(4):681–692, 2005.

[6] Dora E Angelaki, Yong Gu, and Gregory C DeAngelis. Multisensory integration: psy-
chophysics, neurophysiology, and computation. Current opinion in neurobiology,
19(4):452–458, 2009.

[7] P Ao. Potential in stochastic differential equations: novel construction. Journal of


physics A: mathematical and general, 37(3):L25, 2004.

[8] Brenna D Argall, Sonia Chernova, Manuela Veloso, and Brett Browning. A survey of
robot learning from demonstration. Robotics and autonomous systems, 57(5):469–483,
2009.

[9] Christopher G Atkeson and Stefan Schaal. Robot learning from demonstration. In ICML,
volume 97, pages 12–20, 1997.

91
Bibliography

[10] Harald Atmanspacher and Thomas Filk. A proposed test of temporal nonlocality in
bistable perception. Journal of Mathematical Psychology, 54(3):314–321, 2010.

[11] Amy J Bastian. Learning to predict the future: the cerebellum adapts feedforward
movement control. Current opinion in neurobiology, 16(6):645–649, 2006.

[12] Peter W Battaglia, Robert A Jacobs, and Richard N Aslin. Bayesian integration of visual
and auditory signals for spatial localization. JOSA A, 20(7):1391–1397, 2003.

[13] Matthew James Beal. Variational algorithms for approximate Bayesian inference. Uni-
versity of London United Kingdom, 2003.

[14] Christian Bell, Tobias Storck, and Yulia Sandamirskaya. Learning to look: A dynamic
neural fields architecture for gaze shift generation. In International Conference on
Artificial Neural Networks, pages 699–706. Springer, 2014.

[15] Richard Bellman. On the theory of dynamic programming. Proceedings of the National
Academy of Sciences, 38(8):716–719, 1952.

[16] Jon Louis Bentley. Multidimensional binary search trees used for associative searching.
Communications of the ACM, 18(9):509–517, 1975.

[17] Max Berniker and Konrad Kording. Estimating the sources of motor errors for adaptation
and generalization. Nature neuroscience, 11(12):1454–1461, 2008.

[18] Max Berniker, Martin Voss, and Konrad Kording. Learning priors for bayesian computa-
tions in the nervous system. PloS one, 5(9):e12686, 2010.

[19] Pierre Bessière. Probability as an alternative to logic for rational sensory–motor rea-
soning and decision. In Probabilistic reasoning and decision making in sensory-motor
systems, pages 3–18. Springer, 2008.

[20] Pierre Bessière, Christian Laugier, and Roland Siegwart. Probabilistic reasoning and
decision making in sensory-motor systems, volume 46. Springer Science & Business
Media, 2008.

[21] Pierre Bessiere, Emmanuel Mazer, Juan Manuel Ahuactzin, and Kamel Mekhnacha.
Bayesian programming. CRC Press, 2013.

92
Bibliography

[22] Z Bingul, HM Ertunc, and C Oysu. Applying neural network to inverse kinematic
problem for 6r robot manipulator with offset wrist. In Adaptive and Natural Computing
Algorithms, pages 112–115. Springer, 2005.

[23] Reinhard Blutner, Emmanuel M Pothos, and Peter Bruza. A quantum probability per-
spective on borderline vagueness. Topics in Cognitive Science, 5(4):711–736, 2013.

[24] Ali Borji, Dicky N Sihite, and Laurent Itti. Quantifying the relative influence of pho-
tographer bias and viewing strategy on scene viewing. Journal of Vision, 11(11):166,
2011.

[25] Christoph Borst, Max Fischer, and Gerd Hirzinger. A fast and robust grasp planner for
arbitrary 3d objects. In Robotics and Automation, volume 3, pages 1890–1896. IEEE,
1999.

[26] Matthew Botvinick and James An. Goal-directed decision making in prefrontal cortex: a
computational framework. In Advances in neural information processing systems, pages
169–176, 2009.

[27] Charles J Brainerd, Zheng Wang, and Valerie F Reyna. Superposition of episodic memo-
ries: Overdistribution and quantum models. Topics in cognitive science, 5(4):773–799,
2013.

[28] Rodney A Brooks, Cynthia Breazeal, Matthew Marjanović, Brian Scassellati, and
Matthew M Williamson. The cog project: Building a humanoid robot. In Computation
for metaphors, analogy, and agents, pages 52–87. Springer, 1999.

[29] Harriet Brown, Karl J Friston, and Sven Bestmann. Active inference, attention, and
motor preparation. Frontiers in psychology, 2:218, 2011.

[30] Lawrence D Brown. A complete class theorem for statistical problems with finite sample
spaces. The Annals of Statistics, pages 1289–1300, 1981.

[31] Peter Bruza, Kirsty Kitto, Douglas Nelson, and Cathy McEvoy. Is there something
quantum-like about the human mental lexicon? Journal of Mathematical Psychology,
53(5):362–377, 2009.

[32] Peter D Bruza, Zheng Wang, and Jerome R Busemeyer. Quantum cognition: a new
theoretical approach to psychology. Trends in cognitive sciences, 19(7):383–393, 2015.

93
Bibliography

[33] Andreja Bubic, D Yves Von Cramon, and Ricarda I Schubotz. Prediction, cognition and
the brain. Frontiers in human neuroscience, 4:25, 2010.

[34] Johannes Burge, Marc O Ernst, and Martin S Banks. The statistical determinants of
adaptation rate in human reaching. Journal of Vision, 8(4):20–20, 2008.

[35] Jerome R Busemeyer, Emmanuel M Pothos, Riccardo Franco, and Jennifer S Trueblood.
A quantum theoretical explanation for probability judgment errors. Psychological review,
118(2):193, 2011.

[36] Nick Chater, Joshua B Tenenbaum, and Alan Yuille. Probabilistic models of cognition:
Conceptual foundations. Trends in cognitive sciences, 10(7):287–291, 2006.

[37] Andy Clark. Whatever next? predictive brains, situated agents, and the future of cognitive
science. Behavioral and Brain Sciences, 36(03):181–204, 2013.

[38] Roger C Conant and W Ross Ashby. Every good regulator of a system must be a model of
that system. International journal of systems science, 1(2):89–97, 1970.

[39] Gregory F Cooper. The computational complexity of probabilistic inference using


bayesian belief networks. Artificial intelligence, 42(2-3):393–405, 1990.

[40] Gregory F Cooper. A method for using belief networks as influence diagrams. arXiv
preprint arXiv:1304.2346, 2013.

[41] Nicolas Courty and Elise Arnaud. Inverse kinematics using sequential monte carlo
methods. In Articulated Motion and Deformable Objects, pages 1–10. Springer, 2008.

[42] Howard C Cromwell and Wolfram Schultz. Effects of expectations for different reward
magnitudes on neuronal activity in primate striatum. Journal of Neurophysiology,
89(5):2823–2838, 2003.

[43] Katalin Csilléry, Michael GB Blum, Oscar E Gaggiotti, and Olivier François. Approximate
bayesian computation (abc) in practice. Trends in ecology & evolution, 25(7):410–418,
2010.

[44] Mark Cummins and Paul Newman. Fab-map: Probabilistic localization and mapping in
the space of appearance. The International Journal of Robotics Research, 27(6):647–665,
2008.

94
Bibliography

[45] Nicolas Cuperlier, Mathias Quoy, and Philippe Gaussier. Neurobiologically inspired
mobile robot navigation and planning. Frontiers in neurorobotics, 1:3, 2007.

[46] Peter Dayan, Geoffrey E Hinton, Radford M Neal, and Richard S Zemel. The helmholtz
machine. Neural computation, 7(5):889–904, 1995.

[47] Sophie Deneve. Bayesian spiking neurons 1: inference. Neural computation, 20(1):91–
117, 2008.

[48] Sophie Deneve, Jean-René Duhamel, and Alexandre Pouget. Optimal sensorimotor
integration in recurrent cortical networks: a neural implementation of kalman filters.
The Journal of neuroscience, 27(21):5744–5756, 2007.

[49] Michel Desmurget and Scott Grafton. Forward modeling allows feedback control for
fast reaching movements. Trends in cognitive sciences, 4(11):423–431, 2000.

[50] Julien Diard. Bayesian model comparison and distinguishability. In International


Conference on Cognitive Modeling (ICCM 09), pages 204–209, 2009.

[51] Julien Diard, Pierre Bessiere, and Emmanuel Mazer. A survey of probabilistic models
using the bayesian programming methodology as a unifying framework. 2003.

[52] Jörn Diedrichsen. Optimal task-dependent changes of bimanual feedback control and
adaptation. Current Biology, 17(19):1675–1679, 2007.

[53] Jörn Diedrichsen, Reza Shadmehr, and Richard B Ivry. The coordination of movement:
optimal feedback control and beyond. Trends in cognitive sciences, 14(1):31–39, 2010.

[54] Arnaud Doucet, Nando De Freitas, and Neil Gordon. An introduction to sequential
monte carlo methods. In Sequential Monte Carlo methods in practice, pages 3–14.
Springer, 2001.

[55] Kenji Doya, Shin Ishii, Alexandre Pouget, and Rajesh PN Rao. Bayesian brain. Proba-
bilistic approaches to neural coding, 2007.

[56] Dimiter Driankov and Alessandro Saffiotti. Fuzzy logic techniques for autonomous
vehicle navigation, volume 61. Physica, 2013.

[57] Joseph Duffy and Joseph Duffy. Analysis of mechanisms and robot manipulators. Edward
Arnold London, 1980.

95
Bibliography

[58] Marc O Ernst and Martin S Banks. Humans integrate visual and haptic information in a
statistically optimal fashion. Nature, 415(6870):429–433, 2002.

[59] Roy Featherstone. Position and velocity transformations between robot end-effector
coordinates and joint angles. The International Journal of Robotics Research, 2(2):35–45,
1983.

[60] Daniel J Felleman and David C Van Essen. Distributed hierarchical processing in the
primate cerebral cortex. Cerebral cortex, 1(1):1–47, 1991.

[61] João Filipe Ferreira and Jorge Dias. Probabilistic approaches to robotic perception.
Springer, 2014.

[62] Christopher D Fiorillo, Philippe N Tobler, and Wolfram Schultz. Discrete coding of
reward probability and uncertainty by dopamine neurons. Science, 299(5614):1898–
1902, 2003.

[63] Thomas FitzGerald, Philipp Schwartenbeck, Michael Moutoussis, Raymond J. Dolan,


and Karl Friston. Active inference, evidence accumulation, and the urn task. Neural
Comput, 27(2):306–328, Feb 2015.

[64] J Randall Flanagan and Ashwini K Rao. Trajectory adaptation to a nonlinear visuomotor
transformation: evidence of motion planning in visually perceived space. Journal of
neurophysiology, 74(5), 1995.

[65] Terrence Fong, Illah Nourbakhsh, and Kerstin Dautenhahn. A survey of socially interac-
tive robots. Robotics and autonomous systems, 42(3):143–166, 2003.

[66] Simone Frintrop. VOCUS: A visual attention system for object detection and goal-directed
search, volume 3899. Springer, 2006.

[67] Karl Friston. The free-energy principle: a unified brain theory? Nature Reviews Neuro-
science, 11(2):127–138, 2010.

[68] Karl Friston. What is optimal about motor control? Neuron, 72(3):488–498, 2011.

[69] Karl Friston. Policies and priors. In Computational Neuroscience of Drug Addiction,
pages 237–283. Springer, 2012.

96
Bibliography

[70] Karl Friston, Thomas FitzGerald, Francesco Rigoli, Philipp Schwartenbeck, John
O’Doherty, and Giovanni Pezzulo. Active inference and learning. Neuroscience & Biobe-
havioral Reviews, 2016.

[71] Karl Friston and Stefan Kiebel. Predictive coding under the free-energy principle. Philo-
sophical Transactions of the Royal Society B: Biological Sciences, 364(1521):1211–1221,
2009.

[72] Karl Friston, James Kilner, and Lee Harrison. A free energy principle for the brain.
Journal of Physiology-Paris, 100(1):70–87, 2006.

[73] Karl Friston, Jérémie Mattout, and James Kilner. Action understanding and active
inference. Biological cybernetics, 104(1-2):137–160, 2011.

[74] Karl Friston, Francesco Rigoli, Dimitri Ognibene, Christoph Mathys, Thomas FitzGerald,
and Giovanni Pezzulo. Active inference and epistemic value. Cognitive Neuroscience,
6:187–214, Feb 2015.

[75] Karl Friston, Spyridon Samothrakis, and Read Montague. Active inference and agency:
optimal control without cost functions. Biological cybernetics, 106(8-9):523–541, 2012.

[76] Karl Friston, Philipp Schwartenbeck, Thomas FitzGerald, Michael Moutoussis, Timothy
Behrens, and Raymond J Dolan. The anatomy of choice: active inference and agency.
Frontiers in human neuroscience, 7, 2013.

[77] Karl Friston, Tamara Shiner, Thomas FitzGerald, Joseph M. Galea, Rick Adams, Harriet
Brown, Raymond J. Dolan, Rosalyn Moran, Klaas Enno Stephan, and Sven Bestmann.
Dopamine, affordance and active inference. PLoS Computational Biology, 8(1):e1002327,
Jan 2012.

[78] Karl Friston, Klaas Stephan, Baojuan Li, and Jean Daunizeau. Generalised filtering.
Mathematical Problems in Engineering, 2010, 2010.

[79] Karl J Friston, Jean Daunizeau, and Stefan J Kiebel. Reinforcement learning or active
inference? PLoS One, 4(7):e6421, 2009.

[80] Karl J Friston, Jean Daunizeau, James Kilner, and Stefan J Kiebel. Action and behavior: a
free-energy formulation. Biological cybernetics, 102(3):227–260, 2010.

97
Bibliography

[81] King Sun Fu, Ralph Gonzalez, and CS George Lee. Robotics: Control Sensing Vision and
Intelligence. Tata McGraw-Hill Education, 1987.

[82] Joshua I Gold and Michael N Shadlen. Banburismus and the brain: decoding the
relationship between sensory stimuli, decisions, and reward. Neuron, 36(2):299–308,
2002.

[83] Kenneth Y Goldberg and Matthew T Mason. Bayesian grasping. In Robotics and Au-
tomation, pages 1264–1269. IEEE, 1990.

[84] Michael A Goodrich and Alan C Schultz. Human-robot interaction: a survey. Founda-
tions and trends in human-computer interaction, 1(3):203–275, 2007.

[85] Thomas L Griffiths, Charles Kemp, and Joshua B Tenenbaum. Bayesian models of
cognition. In The Cambridge Handbook of Computational Psychology. Cambridge
University Press, 2008.

[86] Thomas L Griffiths and Joshua B Tenenbaum. Optimal predictions in everyday cognition.
Psychological science, 17(9):767–773, 2006.

[87] Rick Grush. The emulation theory of representation: Motor control, imagery, and
perception. Behavioral and brain sciences, 27(03):377–396, 2004.

[88] Ali T Hasan, Abdel Magid S Hamouda, N Ismail, and HMAA Al-Assadi. An adaptive-
learning algorithm to solve the inverse kinematics problem of a 6 dof serial robot
manipulator. Advances in Engineering Software, 37(7):432–438, 2006.

[89] Jakob Hohwy, Andreas Roepstorff, and Karl Friston. Predictive coding explains binocular
rivalry: An epistemological review. Cognition, 108(3):687–701, 2008.

[90] David W Howard and Ali Zilouchian. Application of fuzzy logic for the solution of inverse
kinematics and hierarchical controls of robotic manipulators. Journal of Intelligent and
Robotic Systems, 23(2-4):217–247, 1998.

[91] Yanping Huang and Rajesh PN Rao. Predictive coding. Wiley Interdisciplinary Reviews:
Cognitive Science, 2(5):580–593, 2011.

[92] Kai Huebner, Kai Welke, Markus Przybylski, Nikolaus Vahrenkamp, Tamim Asfour, Dan-
ica Kragic, and Rüdiger Dillmann. Grasping known objects with humanoid robots: A
box-based approach. In Advanced Robotics, pages 1–6. IEEE, ICAR, 2009.

98
Bibliography

[93] Scott A Huettel, Allen W Song, and Gregory McCarthy. Decisions under uncertainty:
probabilistic context influences activation of prefrontal and parietal cortices. The
Journal of neuroscience, 25(13):3304–3311, 2005.

[94] Masao Ito. Control of mental activities by internal models in the cerebellum. Nature
Reviews Neuroscience, 9(4):304–313, 2008.

[95] Jun Izawa and Reza Shadmehr. On-line processing of uncertain information in visuo-
motor control. The Journal of Neuroscience, 28(44):11360–11368, 2008.

[96] Robert A Jacobs. Optimal integration of texture and motion cues to depth. Vision
research, 39(21):3621–3629, 1999.

[97] William James. The principles of psychology. Read Books Ltd, 2013.

[98] Marc Jeannerod. Neural simulation of action: a unifying mechanism for motor cognition.
Neuroimage, 14(1):S103–S109, 2001.

[99] Viktor K Jirsa and JA Scott Kelso. The excitator as a minimal model for the coordination
dynamics of discrete and rhythmic movement generation. Journal of motor behavior,
37(1):35–51, 2005.

[100] Michael I Jordan and David E Rumelhart. Forward models: Supervised learning with a
distal teacher. Cognitive science, 16(3):307–354, 1992.

[101] Rudolph Emil Kalman. A new approach to linear filtering and prediction problems.
Journal of basic Engineering, 82(1):35–45, 1960.

[102] P Kalra, PB Mahapatra, and DK Aggarwal. An evolutionary approach for solving the
multimodal inverse kinematics problem of industrial robots. Mechanism and machine
theory, 41(10):1213–1229, 2006.

[103] Daniel Kappler, Lillian Y Chang, Nancy S Pollard, Tamim Asfour, and Rüdiger Dill-
mann. Templates for pre-grasp sliding interactions. Robotics and Autonomous Systems,
60(3):411–423, 2012.

[104] Mitsuo Kawato. Internal models for motor control and trajectory planning. Current
opinion in neurobiology, 9(6):718–727, 1999.

99
Bibliography

[105] James M Kilner, Karl J Friston, and Chris D Frith. Predictive coding: an account of the
mirror neuron system. Cognitive processing, 8(3):159–166, 2007.

[106] S-W Kim, Ju Jang Lee, and Masanori Sugisaka. Inverse kinematics solution based on
fuzzy logic for redundant manipulators. In Intelligent Robots and System, volume 2,
pages 904–910. IEEE, IROS, 1993.

[107] George Klir and Bo Yuan. Fuzzy sets and fuzzy logic, volume 4. Prentice hall New Jersey,
1995.

[108] David C Knill and Alexandre Pouget. The bayesian brain: the role of uncertainty in
neural coding and computation. TRENDS in Neurosciences, 27(12):712–719, 2004.

[109] David C Knill and Whitman Richards. Perception as Bayesian inference. Cambridge
University Press, 1996.

[110] David C Knill and Jeffrey A Saunders. Do humans optimally integrate stereo and texture
information for judgments of surface slant? Vision research, 43(24):2539–2558, 2003.

[111] Raşit Köker, Cemil Öz, Tarık Çakar, and Hüseyin Ekiz. A study of neural network based
inverse kinematics solution for a three-joint robot. Robotics and Autonomous Systems,
49(3):227–234, 2004.

[112] Konrad P Kording, Joshua B Tenenbaum, and Reza Shadmehr. The dynamics of memory
as a consequence of optimal adaptation to a changing body. Nature neuroscience,
10(6):779–786, 2007.

[113] Konrad P Körding and Daniel M Wolpert. Bayesian integration in sensorimotor learning.
Nature, 427(6971):244–247, 2004.

[114] Konrad P Körding and Daniel M Wolpert. Bayesian decision theory in sensorimotor
control. Trends in cognitive sciences, 10(7):319–326, 2006.

[115] James U Korein and Norman I Badler. Techniques for generating the goal-directed
motion of articulated structures. IEEE Computer Graphics and Applications, 2(9):71–81,
1982.

[116] Alexander T Korenberg and Zoubin Ghahramani. A bayesian view of motor adaptation.
Current Psychology of Cognition, 21(4/5):537–564, 2002.

100
Bibliography

[117] Leonard F Koziol, Deborah Budding, Nancy Andreasen, Stefano D’Arrigo, Sara Bul-
gheroni, Hiroshi Imamizu, Masao Ito, Mario Manto, Cherie Marvel, Krystal Parker, et al.
Consensus paper: the cerebellum’s role in movement and cognition. The Cerebellum,
13(1):151–177, 2014.

[118] Arthur D Kuo. An optimal state estimation model of sensory integration in human
postural balance. Journal of Neural Engineering, 2(3):S235, 2005.

[119] Michael S Landy and Haruyuki Kojima. Ideal cue combination for localizing texture-
defined edges. JOSA A, 18(9):2307–2320, 2001.

[120] Steffen L Lauritzen and David J Spiegelhalter. Local computations with probabilities
on graphical structures and their application to expert systems. Journal of the Royal
Statistical Society, series B (Methodological), pages 157–224, 1988.

[121] Olivier Lebeltel, Pierre Bessière, Julien Diard, and Emmanuel Mazer. Bayesian robot
programming. Autonomous Robots, 16(1):49–79, 2004.

[122] CS George Lee. Robot arm kinematics, dynamics, and control. Computer, 15(12):62–80,
1982.

[123] Tai Sing Lee and David Mumford. Hierarchical bayesian inference in the visual cortex.
JOSA A, 20(7):1434–1448, 2003.

[124] Nathan F Lepora, Paul Verschure, and Tony J Prescott. The state of the art in biomimetics.
Bioinspiration & biomimetics, 8(1):013001, 2013.

[125] Michael L Littman, Stephen M Majercik, and Toniann Pitassi. Stochastic boolean
satisfiability. Journal of Automated Reasoning, 27(3):251–296, 2001.

[126] Dan Liu and Emanuel Todorov. Evidence for the flexible sensorimotor strategies pre-
dicted by optimal feedback control. The Journal of Neuroscience, 27(35):9354–9368,
2007.

[127] Leo Lopez, Jean-Charles Quinton, and Youcef Mezouar. Coupled filtering in visual and
motor spaces for reaching and grasping. In COGNITIVE PROCESSING, volume 16, pages
S93–S94. SPRINGER HEIDELBERG TIERGARTENSTRASSE 17, D-69121 HEIDELBERG,
GERMANY, 2015.

101
Bibliography

[128] Léo Lopez, Jean-Charles Quinton, and Youcef Mezouar. Dual filtering in operational
and joint spaces for reaching and grasping. Cognitive processing, 16(1):293–297, 2015.

[129] Hermann Lotze. Medicinische psychologie. 1852.

[130] Stefan Louw, Jeroen BJ Smeets, and Eli Brenner. Judging surface slant for placing objects:
a role for motion parallax. Experimental brain research, 183(2):149–158, 2007.

[131] Laurence T Maloney, Julia Trommershäuser, and Michael S Landy. Questions without
words: A comparison between decision making under risk and movement planning
under risk. Integrated models of cognitive systems, pages 297–313, 2007.

[132] H José Antonio Martín, Javier de Lope, and Matilde Santos. A method to learn the
inverse kinematics of multi-link robots by evolving neuro-controllers. Neurocomputing,
72(13):2806–2814, 2009.

[133] Allison N McCoy and Michael L Platt. Risk-sensitive neurons in macaque posterior
cingulate cortex. Nature neuroscience, 8(9):1220–1227, 2005.

[134] Harry McGurk and John MacDonald. Hearing lips and seeing voices. Nature, 264:746–
748, 1976.

[135] Makoto Miyazaki, Daichi Nozaki, and Yasoichi Nakajima. Testing bayesian models of
human coincidence timing. Journal of neurophysiology, 94(1):395–399, 2005.

[136] Robin R Murphy. Dempster-shafer theory for sensor fusion in autonomous mobile
robots. IEEE Transactions on Robotics and Automation, 14(2):197–206, 1998.

[137] Camillo Padoa-Schioppa and John A Assad. Neurons in the orbitofrontal cortex encode
economic value. Nature, 441(7090):223–226, 2006.

[138] Joseph J Paton, Marina A Belova, Sara E Morrison, and C Daniel Salzman. The primate
amygdala represents the positive and negative value of visual stimuli during learning.
Nature, 439(7078):865–870, 2006.

[139] Linda Dailey Paulson. Biomimetic robots. Computer, 37(9):48–53, 2004.

[140] Judea Pearl. Probabilistic reasoning in intelligent systems: Networks of plausible rea-
soning, 1988.

102
Bibliography

[141] Judea Pearl. Probabilistic reasoning in intelligent systems: networks of plausible inference.
Morgan Kaufmann, 2014.

[142] Laurent U Perrinet, Rick A Adams, and Karl J Friston. Active inference, eye movements
and oculomotor delays. Biological cybernetics, 108(6):777–801, 2014.

[143] Robert J Peterka and Patrick J Loughlin. Dynamic regulation of sensorimotor integration
in human postural control. Journal of neurophysiology, 91(1):410–423, 2004.

[144] Anna Petrovskaya, Oussama Khatib, Sebastian Thrun, and Andrew Y Ng. Bayesian esti-
mation for autonomous object manipulation based on tactile sensors. In International
Conference on Robotics and Automation (ICRA), pages 707–714. IEEE, 2006.

[145] Giovanni Pezzulo, Emilio Cartoni, Francesco Rigoli, Leo Pio-Lopez, and Karl Friston.
Active inference, epistemic value, and vicarious trial and error. Learn Mem, 23(7):322–
338, Jul 2016.

[146] Giovanni Pezzulo, Francesco Rigoli, and Karl Friston. Active inference, homeostatic
regulation and adaptive behavioural control. Prog Neurobiol, 134:17–35, Nov 2015.

[147] EM Phothos and JR Busemeyer. A quantum probability model explanation for violations
of rational decision making. Proceedings of Royal Society B, 276:2171–2178, 2009.

[148] Martin J Pickering and Andy Clark. Getting ahead: forward models and their place in
cognitive architecture. Trends in cognitive sciences, 18(9):451–456, 2014.

[149] Léo Pio-Lopez, Ange Nizard, Karl Friston, and Giovanni Pezzulo. Active inference and
robot control: a case study. Journal of The Royal Society Interface, 13(122):20160616,
2016.

[150] Alexandre Pitti, Philippe Gaussier, and Mathias Quoy. Iterative free-energy optimization
for recurrent neural networks (inferno). PloS one, 12(3):e0173684, 2017.

[151] Alexandre Pouget, Jeffrey M Beck, Wei Ji Ma, and Peter E Latham. Probabilistic brains:
knowns and unknowns. Nature neuroscience, 16(9):1170–1178, 2013.

[152] Alexandre Pouget, Peter Dayan, and Richard Zemel. Information processing with popu-
lation codes. Nature Reviews Neuroscience, 1(2):125–132, 2000.

103
Bibliography

[153] Jean-Charles Quinton and Bernard Girau. Predictive neural fields for improved tracking
and attentional properties. In Neural Networks (IJCNN), The 2011 International Joint
Conference on, pages 1629–1636. IEEE, 2011.

[154] Jean-Charles Quinton, Bernard Girau, et al. Spatiotemporal pattern discrimination


using predictive dynamic neural fields. BMC Neuroscience, 13(Suppl 1):O16, 2012.

[155] Rajesh PN Rao and Dana H Ballard. Predictive coding in the visual cortex: a func-
tional interpretation of some extra-classical receptive-field effects. Nature neuroscience,
2(1):79–87, 1999.

[156] Rajesh PN Rao, Bruno A Olshausen, and Michael S Lewicki. Probabilistic models of the
brain: Perception and neural function. MIT press, 2002.

[157] Christian P Robert. L’analyse statistique bayésienne. Insee, 1992.

[158] Manuel Jan Roth, Matthis Synofzik, and Axel Lindner. The cerebellum optimizes per-
ceptual predictions about external sensory events. Current Biology, 23(10):930–935,
2013.

[159] Simo Särkkä. Bayesian filtering and smoothing, volume 3. Cambridge University Press,
2013.

[160] Lawrence K Saul, Tommi Jaakkola, and Michael I Jordan. Mean field theory for sigmoid
belief networks. Journal of artificial intelligence research, 4(1):61–76, 1996.

[161] Robert A Scheidt, Jonathan B Dingwell, and Ferdinando A Mussa-Ivaldi. Learning to


move amid uncertainty. Journal of neurophysiology, 86(2):971–985, 2001.

[162] Jürgen Schmidhuber. Driven by compression progress: A simple principle explains es-
sential aspects of subjective beauty, novelty, surprise, interestingness, attention, curios-
ity, creativity, art, science, music, jokes. In Anticipatory Behavior in Adaptive Learning
Systems, pages 48–76. Springer, 2009.

[163] Anil K Seth. The cybernetic bayesian brain. In Open Mind. Open MIND. Frankfurt am
Main: MIND Group, 2014.

[164] Ross D Shachter. Probabilistic inference and influence diagrams. Operations Research,
36(4):589–604, 1988.

104
Bibliography

[165] Glenn Shafer et al. A mathematical theory of evidence, volume 1. Princeton university
press Princeton, 1976.

[166] Aaron P Shon, Joshua J Storz, and Rajesh PN Rao. Towards a real-time bayesian imitation
system for a humanoid robot. In International Conference on Robotics and Automation
(ICRA), pages 2847–2852. IEEE, 2007.

[167] Tania Singer, Hugo D Critchley, and Kerstin Preuschoff. A common role of insula in
feelings, empathy and uncertainty. Trends in cognitive sciences, 13(8):334–340, 2009.

[168] A Mark Smith. Alhacen’s Theory of Visual Perception: A Critical Edition, with English
Translation and Commentary, of the First Three Books of Alhacen’s De Aspectibus, the
Medieval Latin Version of Ibn Al-Haytham’s Kitab Al-Manazir, volume 1. American
Philosophical Society, 2001.

[169] Randall C Smith and Peter Cheeseman. On the representation and estimation of spatial
uncertainty. The international journal of Robotics Research, 5(4):56–68, 1986.

[170] Jascha Sohl-Dickstein and Diederik P Kingma. Note on equivalence between recurrent
neural network time series models and variational bayesian models. arXiv preprint
arXiv:1504.08025, 2015.

[171] MW Spratling. A review of predictive coding algorithms. Brain and cognition, 2016.

[172] Ian H Stevenson, Hugo L Fernandes, Iris Vilares, Kunlin Wei, and Konrad P Körding.
Bayesian integration and non-linear feedback control in a full-body motor task. PLoS
Comput Biol, 5(12):e1000629, 2009.

[173] Mikael Sunnåker, Alberto Giovanni Busetto, Elina Numminen, Jukka Corander, Matthieu
Foll, and Christophe Dessimoz. Approximate bayesian computation. PLoS Comput Biol,
9(1):e1002803, 2013.

[174] Richard S Sutton and Andrew G Barto. Toward a modern theory of adaptive networks:
expectation and prediction. Psychological review, 88(2):135, 1981.

[175] Jun Tani. Learning to generate articulated behavior through the bottom-up and the
top-down interaction processes. Neural Networks, 16(1):11–23, 2003.

105
Bibliography

[176] Jun Tani, Masato Ito, and Yuuya Sugita. Self-organization of distributedly represented
multiple behavior schemata in a mirror system: reviews of robot experiments using
rnnpb. Neural Networks, 17(8):1273–1289, 2004.

[177] Hadley Tassinari, Todd E Hudson, and Michael S Landy. Combining priors and noisy
visual cues in a rapid pointing task. The Journal of neuroscience, 26(40):10154–10163,
2006.

[178] Joshua B Tenenbaum, Thomas L Griffiths, and Charles Kemp. Theory-based bayesian
models of inductive learning and reasoning. Trends in cognitive sciences, 10(7):309–318,
2006.

[179] Kurt A Thoroughman and Reza Shadmehr. Learning of action through adaptive combi-
nation of motor primitives. Nature, 407(6805):742–747, 2000.

[180] Sebastian Thrun, Wolfram Burgard, and Dieter Fox. Probabilistic robotics. MIT press,
2005.

[181] Emanuel Todorov. Optimality principles in sensorimotor control. Nature neuroscience,


7(9):907–915, 2004.

[182] Emanuel Todorov and Michael I Jordan. Optimal feedback control as a theory of motor
coordination. Nature neuroscience, 5(11):1226–1235, 2002.

[183] Roger Tourangeau, Lance J Rips, and Kenneth Rasinski. The psychology of survey response.
Cambridge University Press, 2000.

[184] Marc Toussaint, Laurent Charlin, and Pascal Poupart. Hierarchical pomdp controller
optimization by likelihood maximization. In UAI, volume 24, pages 562–570, 2008.

[185] Marc Toussaint and Christian Goerick. A bayesian view on motor control and planning.
In From Motor Learning to Interaction Learning in Robots, pages 227–252. Springer,
2010.

[186] Marc Toussaint and Amos Storkey. Probabilistic inference for solving discrete and
continuous state markov decision processes. In Proceedings of the 23rd international
conference on Machine learning, pages 945–952. ACM, 2006.

[187] Jennifer S Trueblood and Jerome R Busemeyer. A quantum probability account of order
effects in inference. Cognitive science, 35(8):1518–1552, 2011.

106
Bibliography

[188] Jennifer S Trueblood and Jerome R Busemeyer. A quantum probability model of causal
reasoning. 2012.

[189] Paul Tuffield and Hugo Elias. The shadow robot mimics human actions. Industrial
Robot: An International Journal, 30(1):56–60, 2003.

[190] Amos Tversky and Daniel Kahneman. Availability: A heuristic for judging frequency and
probability. Cognitive psychology, 5(2):207–232, 1973.

[191] Amos Tversky and Daniel Kahneman. Judgment under uncertainty: Heuristics and
biases. Springer, 1975.

[192] Robert J van Beers. Motor learning is optimally tuned to the properties of motor noise.
Neuron, 63(3):406–417, 2009.

[193] Robert J van Beers, Anne C Sittig, and Jan J Denier van Der Gon. Integration of proprio-
ceptive and visual position-information: An experimentally supported model. Journal
of neurophysiology, 81(3):1355–1364, 1999.

[194] Herman van der Kooij, Ron Jacobs, Bart Koopman, and Frans van der Helm. An adaptive
model of sensory integration in a dynamic environment applied to human stance
control. Biological cybernetics, 84(2):103–115, 2001.

[195] David C Van Essen, Charles H Anderson, and Daniel J Felleman. Information processing
in the primate visual system: an integrated systems perspective. Science, 255(5043):419,
1992.

[196] Paul FMJ Verschure, Thomas Voegtlin, and Rodney J Douglas. Environmentally mediated
synergy between perception and behaviour in mobile robots. Nature, 425(6958):620–624,
2003.

[197] Iris Vilares and Konrad Kording. Bayesian models: the structure of the world, uncertainty,
behavior, and the brain. Annals of the New York Academy of Sciences, 1224(1):22–39,
2011.

[198] Hermann Von Helmholtz. Handbuch der physiologischen Optik, volume 9. Voss, 1867.

[199] Hermann von Helmholtz and James Powell Cocke Southall. Treatise on physiological
optics, volume 3. Courier Corporation, 2005.

107
Bibliography

[200] Apostolos Vourdas. Quantum probabilities as dempster-shafer probabilities in the


lattice of subspaces. arXiv preprint arXiv:1410.2044, 2014.

[201] Peter Walley. Towards a unified theory of imprecise probability. International Journal of
Approximate Reasoning, 24(2):125–148, 2000.

[202] Wei-Jen Wang, I-Fan Hsieh, and Chun-Chuan Chen. Accelerating computation of dcm
for erp with gpu-based parallel strategy. In 9th International Conference on Autonomic
& Trusted Computing (UIC/ATC), pages 679–684. IEEE, 2012.

[203] Kunlin Wei and Konrad Körding. Uncertainty of feedback and state estimation deter-
mines the speed of motor adaptation. Frontiers in computational neuroscience, 4:11,
2010.

[204] Robert Wilson and Leif Finkel. A neural implementation of the kalman filter. In Advances
in neural information processing systems, pages 2062–2070, 2009.

[205] Daniel M Wolpert, Zoubin Ghahramani, and Michael I Jordan. An internal model for
sensorimotor integration. Science, 269(5232):1880, 1995.

[206] Daniel M Wolpert and Mitsuo Kawato. Multiple paired forward and inverse models for
motor control. Neural networks, 11(7):1317–1329, 1998.

[207] Florentin Wörgötter and Bernd Porr. Temporal sequence learning, prediction, and
control: a review of different models and their relation to biological mechanisms. Neural
Computation, 17(2):245–319, 2005.

[208] Ronald R Yager and Liping Liu. Classic works of the Dempster-Shafer theory of belief
functions, volume 219. Springer, 2008.

[209] Yuichi Yamashita and Jun Tani. Spontaneous prediction error generation in schizophre-
nia. PLoS One, 7(5):e37843, 2012.

[210] Fatemeh Yavari, Shirin Mahdavi, Farzad Towhidkhah, Mohammad-Ali Ahmadi-Pajouh,


Hamed Ekhtiari, and Mohammad Darainy. Cerebellum as a forward but not inverse
model in visuomotor adaptation task: a tdcs-based and modeling study. Experimental
brain research, 234(4):997–1012, 2016.

[211] Vyacheslav I Yukalov and Didier Sornette. Decision theory with prospect interference
and entanglement. Theory and Decision, 70(3):283–328, 2011.

108
Bibliography

[212] Semir Zeki and Stewart Shipp. The functional logic of cortical connections. Nature,
1988.

109

View publication stats

You might also like