Download as pdf or txt
Download as pdf or txt
You are on page 1of 68

'HIHQFH5HVHDUFKDQG 5HFKHUFKHHWGpYHORSSHPHQW

'HYHORSPHQW&DQDGD SRXUODGpIHQVH&DQDGD

Perception Module for Autonomous Mobile


Robotics
Perception Module Mk.I

I. MacKay
Denman Software Corp.

Contract Number: W7702-05R100/001/EDM


Contract Scientific Authority: D. Erickson (403-544-4048)

The scientific or technical validity of this Contract Report is entirely the responsibility of
the Contractor and the contents do not necessarily have the approval or endorsement of
the Department of National Defence of Canada.

'HIHQFH5 '&DQDGD
Contract Report
DRDC Suffield CR 2012-121
December 2007
Perception Module for Autonomous Mobile
Robotics
Perception Module Mk.I

I. MacKay
Denman Software Corp.

Contract Number: W7702-05R100/001/EDM


Contract Scientific Authority: D. Erickson (403-544-4048)

The scientific or technical validity of this Contract Report is entirely the responsibility of
the Contractor and the contents do not necessarily have the approval or endorsement
of the Department of National Defence of Canada.

Defence Research and Development - Suffiel


Contract Report
DRDC Suffield CR 2012-121
December 2007
© +HU 0DMHVW\ WKH 4XHHQ DV UHSUHVHQWHG E\ WKH 0LQLVWHU RI 1DWLRQDO 'HIHQFH 
© 6D PDMHVWp OD UHLQH UHSUpVHQWpH SDU OH PLQLVWUH GH OD 'pIHQVH QDWLRQDOH 
Abstract
The Perception Module (PM) is a scientif c instrument to explore and demonstrate
binocular machine vision. The Perception Module’s purpose is to capture binocular
imagery data from an articulated camera pair and transform that information into
distance estimates in a retino-centric reference frame. The Perception Module was
designed to be cost-effective, light, and robust; suitable for inclusion on military
robotics and scientif c experiments. The module’s potential application is wide and it
could be mounted to many platform types. The Perception Module is a component of
the nScorpion Explosive Ordinance Disposal (EOD) / Improvised Explosive Device
(IED) robot system. nScorpion ( northern scorpion - paruroctonus boreus ) is an
advanced demonstration robot intended to advance EOD/IED robotics state of the art.
The project goal is to demonstrate autonomy for current robotics and augment soldiers’
and EOD/IED technicians’ capability. By implementing novel binocular vision
mechanisms and algorithms, the Perception Module will demonstrate novel techniques
to improve localization and understand vision. This memorandum presents some of the
theory, justif cation, design, application, and improvement recommendations.

Resume
Le Module de perception (MP) est un instrument scientifique visant à explorer la
vision artificielle binoculaire et en faire la démonstration; il capte une imagerie
binoculaire à partir de deux caméras articulées puis transforme ces données en
estimations de distance dans un cadre de référence rétinocentrique. Il a été conçu pour
être abordable, léger et robuste, approprié aux applications robotiques militaires et
expériences scientifiques. Ses applications potentielles sont vastes, et on peut le fixer à
bien des types de plates-formes. C’est l’un des composants du système robotique de
neutralisation des explosifs et munitions (NEM) et de lutte contre les dispositifs
explosifs de circonstance (C-IED) nScorpion (northern scorpion - paruroctonus
boreus), un robot de démonstration avancé destiné à faire des percées en NEM et C-
IED. Le projet nScorpion vise à démontrer l’autonomie des robots actuels et amplifier
les capacités des soldats et des techniciens en NEM et C-IED. En mettant en œuvre
des mécanismes et algorithmes novateurs de vision artificielle binoculaire, le Module
de perception fera la démonstration de nouvelles techniques visant à améliorer la
localisation et mieux comprendre la vision. Le présent document en décrit en partie les
théories, les justifications, la conception et les applications, et recommande certaines
améliorations.

DRDC Suffield CR 2012-121 i


Executive Summary

The Perception Module attempts to apply vision neuroscience, particularly foveal


binocular vision, to autonomous mobile robotics. Binocular vision is the general term
for vision using two eyes. Current machine vision research employs stereo vision
camera systems. Unlike stereo vision, which uses stereopsis (disparity mapping)
primarily, the Perception Module will use vergence eye movements primarily in a
binocular vision system. A binocular vision’s general form does not depend on
co-planar image planes to perceive. A binocular view is arrived at when the eyes are
rotated so that the subject of attention lies in the centre of axis on both retinas ( on the
fovea in the case of animals with foveal vision). This binocular f xation is also known
generally as active vision or visual servoing. Humans possess foveal binocular vision
that delivers higher resolution sensor input when we aim our eyes at the same target.
The Perception Module will allow research into novel perception techniques and
improve understanding when implemented algorithms are tested in real situations.

In Friedman[1], a simple experiment using monocular and binocular distance


estimation proves that we use extra information from binocular vision to judge
distances faster. Unlike stereo vision systems that make use of stereopsis primarily, the
general form of binocular vision does not depend on co-planar image planes that
facilitate disparity computation. It is believed that human eye movements[2] alternate
between visual f xations and saccades to move foveas around to perceive objects of
interest. This is intuitive and can be demonstrated by anyone; hold up one hand at eye
level in your peripheral vision and attempt to perceive the hand detail- the view is
blurry. Compare peripheral hand detail to the hand detail when your hand is in front of
your eyes at visual f eld centre. Vergence is one physiological mechanism capable of
estimating a solution to the correspondence problem between binocular vision images
because the extreme precision with which vergence eye movements can be controlled
[3] makes it possible to infer slight distance changes.

This report outlines the modifie design of the Perception Module, describes questions
it should answer, and recommends system improvements. The inclusion of aural and
visual sensors make queued perception experiments possible and could lead to
improved operation performance. The successful implementation onto an EOD robot
will make a number of important perception and localization experiments possible.
While this autonomy will not exceed human performance, it can remove some of the
positive control burden, a factor in current EOD operations [4], and demonstrate a
glimpse of a future when force-multiplied humans team with autonomous devices.

ii DRDC Suffield CR 2012-121


This memorandum also discusses in greater detail the role and importance of this type
of component in future autonomous systems. The Perception Module was designed to
be cost-effective, light, and robust; suitable for inclusion on military robotics and
scientifi systems. The potential application of this module is wide, it could be
mounted to many different platform types. The target platform size range is from 50kg
and up. For larger vehicles it is conceivable to integrate several Perception Modules
toimprove localization and situational awareness.
There are a number of improvements that should be undertaken before the system
integration is complete:

1. Separate the head from the neck, machine a new neck plate, and re-cable the
interface from the neck to the head;

2. Modify the head plate to include a spindle on the neck servo pulley to support the
rotation of the neck;

3. Replace the Flea cameras with Flea2 cameras to improve camera stability and
update frequency;

4. Add limit switches to the boundaries of eye motion to prevent damage and improve
calibration.

DRDC Suffield CR 2012-121 iii


Sommaire

Le Module de perception cherche à appliquer à la vision artificielle des robots


autonomes les connaissances acquises en neurologie de la vision, notamment la vision
fovéale binoculaire. Ce terme, vision binoculaire, désigne la vision à l’aide de deux
yeux. Dans les systèmes actuels de vision artificielle, on utilise deux caméras pour
donner une vision stéréoscopique. Contrairement à ces systèmes, où la stéréoscopie
sert surtout à cerner les différences entre la vision des deux caméras, le Module de
perception utilisera surtout les mouvements oculaires vergents dans un système de
vision binoculaire. En règle générale, la vision binoculaire n’a pas besoin pour
percevoir les objets de deux images coplanaires (plan identique). On obtient plutôt une
vision binoculaire si on tourne les yeux de façon à centrer l’objet d’intérêt sur l’axe des
deux rétines; pour les animaux dotés de vision fovéale, il s’agit de la fossette centrale,
ou fovéa. On appelle aussi cette fixation binoculaire « vision active ». Nous en sommes
dotés; par conséquent, la résolution de notre vision est optimale si nous pointons nos
deux yeux sur la même cible. Le Module de perception rendra possibles des recherches
sur des techniques de perception novatrices et permettra de mieux comprendre les
enjeux à l’application d’algorithmes à des situations réelles.

Friedman[1] a démontré par une expérience simple (vision monoculaire et estimation


binoculaire de la distance) que nous utilisons l’information additionnelle fournie par la
vision binoculaire pour jauger plus vite les distances. Contrairement aux systèmes de
vision stéréoscopique, les systèmes de vision binoculaire en général ne dépendent pas
d’images coplanaires qui facilitent la détection des différences. Certains pensent que
les mouvements oculaires chez l’être humain[2] alternent entre une vision fixe et des
mouvements saccadés pour déplacer la fovéa autour de l’objet d’intérêt. C’est une
observation très intuitive que tous peuvent vérifier. Placez une main à la hauteur des
yeux, mais dans votre vision périphérique et essayez de distinguer les détails : l’image
est floue. Comparez maintenant cela aux détails si vous placez votre main au centre de
votre champ de vision. Le phénomène physiologique qu’est la vergence peut
approximer une solution au problème de convergence entre deux images binoculaire,
car la grande précision avec laquelle le cerveau peut contrôler les mouvements
oculaires vergents [3] permet de dégager de légers changements de distance.

Le document décrit les modifications à la conception du Module, décrit les questions


auxquelles il devrait pouvoir répondre, et recommande d’autres améliorations à ce
système. Y ajouter des capteurs sonores et visuels permettrait de mener des
expériences en perception séquentielle et pourrait aussi améliorer les performances du
Module, et le mettre en œuvre sur robot NEM permettrait de mener beaucoup
d’expériences importantes en perception et en localisation. Sans dépasser les capacités
humaines, un robot autonome permettra d’alléger le contrôle intégral des robots (un
facteur dans les opérations actuelles de NEM) [4], et donnera un aperçu d’un avenir où
des humains aux capacités augmentées font équipe avec des appareils autonomes.

Le document décrit aussi en détail le rôle et l’importance de ce type d’élément dans les
systèmes autonomes de l’avenir. Le Module de perception a été conçu pour être
abordable, léger et robuste, approprié aux applications robotiques militaires et

iv DRDC Suffield CR 2012-121


expériences scientifiques. Ses applications potentielles sont vastes, et on peut le fixer
à bien des types de plates-formes; on vise en fait celles de 50 kg et plus. Sur des
véhicules plus importants, on pourrait utiliser et intégrer plus d’un Module afin
d’améliorer la localisation et la connaissance de la situation.

Il faudrait apporter plusieurs améliorations avant de pouvoir bien intégrer ce


système :

1. séparer la tête du cou, fabriquer une nouvelle plaque pour le cou et


rebrancher l’interface entre la tête et le cou;
2. modifier la plaque de la tête pour ajouter un axe à la poulie de
servocommande du cou, pour faciliter la rotation du cou;
3. remplacer les caméras Flea par les modèles Flea 2, ce qui améliorerait la
stabilité des caméras et la fréquence de rafraîchissement des images;
4. ajouter des contacts de fin de course à la limite des mouvements oculaires, ce
qui préviendrait les dommages et améliorerait l’étalonnage.

DRDC Suffield CR 2012-121 v


Table of contents

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i

Resume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i

Executive Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii

Sommaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

Table of contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

List of figure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

List of tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2. Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3. Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.1 Hardware Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.1.1 Neck Computer . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.1.2 Neck Housing . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.1.3 Head Housing . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.1.4 Head Rotate Mechanism. . . . . . . . . . . . . . . . . . . . . . 15

3.1.5 Camera Pan/Tilt Mechanism. . . . . . . . . . . . . . . . . . . . 16

3.1.6 Cameras and lenses. . . . . . . . . . . . . . . . . . . . . . . . 16

3.1.7 Audio Input. . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.1.8 Head controller. . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.1.9 Power system. . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.2 Software Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.2.1 Software Development Toolchain . . . . . . . . . . . . . . . . 19

3.2.2 Low-level Hardware drivers . . . . . . . . . . . . . . . . . . . 19

3.2.3 Servo Motor Control . . . . . . . . . . . . . . . . . . . . . . . 19

vi DRDC Suffield CR 2012-121


3.2.4 Optical Encoders . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2.5 Eyemotor Objects . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2.6 Communications handling of parameters, control, and reporting. 21

3.2.7 RTEMS Eyemotor tasks . . . . . . . . . . . . . . . . . . . . . 21

3.2.8 RTEMS Communication task . . . . . . . . . . . . . . . . . . . 21

3.2.9 RTEMS Background task . . . . . . . . . . . . . . . . . . . . . 22

3.2.10 Demonstration Embedded application . . . . . . . . . . . . . . 22

4. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

Annexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

A Technical Specif cations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

A.1 Perception Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

A.2 Main Computer (Neck) . . . . . . . . . . . . . . . . . . . . . . . . . . 33

A.3 Controller (Head) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

A.4 Cameras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

A.5 Lenses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

B Bill of Materials (BOM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

C Assembly Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

C.1 Eye camera mounts, Encoder, and Servo Motor . . . . . . . . . . . . . 39

C.2 Neck assembly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

C.3 Head assembly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

C.4 Head to Neck Assembly . . . . . . . . . . . . . . . . . . . . . . . . . 41

DRDC Suffield CR 2012-121 vii


D Demonstration Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

D.1 Demonstration Operation Mode. . . . . . . . . . . . . . . . . . . . . . 42

D.1.1 Eye/Head operations . . . . . . . . . . . . . . . . . . . . . . . 42

D.1.2 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

E Message Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

F List of abbreviations/acronyms/initialisms . . . . . . . . . . . . . . . . . . . . . 45

G Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

H Def nitions (from Merriam-Webster) . . . . . . . . . . . . . . . . . . . . . . . . 51

List of f gures

Figure 1. Barr-Stroud naval range f nder . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Figure 2. The four main image intensity factors affecting sensed luminance . . . . . . . 4

Figure 3. Simplif ed pinhole camera visual f xation estimating distance from vergence . . 7

Figure 4. Distance estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Figure 5. Original KTH Binocular Vision System taken from Mertschat [7] . . . . . . . 9

Figure 6. Perception Module Mark. I . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Figure 7. Head assembly of Perception Module . . . . . . . . . . . . . . . . . . . . . . 11

Figure 8. Neck Computer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Figure 9. Neck assembly of Perception Module . . . . . . . . . . . . . . . . . . . . . . 14

Figure 10. Left eye (left) and right eye (right) images from Perception Module . . . . . . 17

Figure 11. Audio Oscilloscope Demonstration Program . . . . . . . . . . . . . . . . . . 23

Figure 12. Demonstration GUI Parameters Tab . . . . . . . . . . . . . . . . . . . . . . . 24

Figure 13. Demonstration GUI Conf guration Tab . . . . . . . . . . . . . . . . . . . . . 25

Figure 14. Demonstration GUI Controls Tab . . . . . . . . . . . . . . . . . . . . . . . . 26

viii DRDC Suffield CR 2012-121


List of tables

Table B.1. Head Part BOM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

Table B.2. Head Parts BOM (contd.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

Table B.3. Head Parts BOM (contd.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Table B.4. Neck Parts BOM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Table B.5. Fastener List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Table E.2. Message Format Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

DRDC Suffield CR 2012-121 ix


This page intentionally left blank

x DRDC Suffield CR 2012-121


1. Introduction

The Perception Module is a component in the overall design of the nScorpion EOD
robot system. nScorpion is an advanced demonstration robot intended to advance the
state of the art in EOD robotics. The goal of this research project is to demonstrate a
higher degree of autonomy for current robotics and augment the capability of soldiers
or EOD/IED technicians. By improving overall autonomy, nScorpion intends to show
that humans can supervise individual or teams of robots and in some cases remain one
step back from dangerous situations. While this autonomy will not exceed human
performance, it can remove some of the positive control burden, a factor in current
EOD operations [5] and demonstrate a glimpse of a future when force-multiplied
humans team with autonomous devices.

In order to advance the ability of the current EOD robotics, a quantum leap in sensing
and computation is required as well as a paradigm shift away from the pure positive
control teleoperation philosophy. Delegating autonomy/authority down to the machine
for certain tasks makes it practical, in terms of bandwidth and human situational
awareness, to control an arbitrary number of autonomous robots by one operator.
Autonomous operation implies that the machine itself would need a sensor system
capable of distinguishing objects as small as EOD targets, like the 68mm diameter
PMA-3 AP mine, in a simple- to moderate- complexity environment. The largest
bandwidth sensor to date is an electro-optic camera; time-of-f ight sensor arrays alone
are not suff cient for volumetric mapping in these environments. Therefore, a visual
system capable of processing many frames per second is required at a minimum. It
must be able to give precise pose estimates from objects of interest below 10cm in size
and rapidly adjust this estimate in a volumetric sense so that the nScorpion may move
to intercept or manipulate. Since it must be autonomous, the computing power to
conduct data fusion levels 0 (pre-processing) through 3 (threat assessment) on the
sensor data, as def ned by the JDL def nitions[6], must be onboard.

Modularity, f exibility, robustness, and cost-effectiveness were the most important


design factors applied to this system. The overall design was partitioned into several
pieces and each was required as a self-contained module, or a system of modules, that
could be interchanged or pulled outright from the design and used elsewhere as needed.
This mandate will allow design reuse, and equivalent experimentation and operation by
simply moving the modules to other platforms or situations. This also made it possible
to demonstrate a number of side issues: one; that multiple contractors possess the skills
and experience to build said modules independently, and two; that hardware has
become a commodity and can be left to the private sector where practical.

The current state of the art machine vision, in general, employs co-planar stereo vision
sensors implementing, among other techniques, disparity mapping. Disparity mapping
computes volumetric data from the difference between images (left/right) of objects
that can be seen in both. Beyond the diff culties in calibrating intrinsic parameters and
extrinsic parameters, a disparity mapping / stereo vision system is computationally

DRDC Suffield CR 2012-121 1


Figure 1: Barr-Stroud naval range f nder

complex. Instead, the Perception Module proposes to employ selective perception,


where the eyes focus on an interrogated object. This technique is not new; Figure 1
demonstrates an historical equivalent, a Barr-Stroud rangef nder used for naval gun
targetting. This rangef nder uses the overlap of two images of the target and a human
operator looking through the eyepiece fusing those images to produce a distance
estimate. In this case, the human visual system is slaved to an instrument and together
this system perceives distance to target using triangulation.

The Perception Module uses a binocular vision system that can estimate distance using
eye vergence techniques. The eye vergence method estimates distance using the
difference in angles (vergence) measured from the eyes as they focus on the same target
object. Essentially, rather than solving a large matrix inversion problem, this method
relies on the opto-mechanical system to move the eyes into a solution. With the eye
vergence approach a maximal solution can be local but not necessarily global without

2 DRDC Suffield CR 2012-121


relying on the precise parameters of the cameras. To compensate for errors in the
estimate, you can obtain a solution by adjusting image overlap and focus. The main
disadvantages of this method are that it will require multiple high-speed visual f xation
movements to generate estimates, it does not give an estimate for objects out of the
selective area of interrogation, and accuracy will probably be lower than disparity
mapping. Each of these will need to be overcome to meet real-time requirements. This
selective perception activity may be beyond the mechanical system specif cations for
larger search areas and at higher robot speeds, but it is expected to be suff ciently fast
for small areas ( roughly 30m2 search area for the demonstration experiment) and robot
speeds below 10 kph.

The Perception Module requires a system of sensors and computing that could meet the
onerous real-time machine vision, world modelling, localization, and control set out in
this project. Given no available binocular vision design at DRDC Suff eld, a baseline
was selected to reduce effort. The Perception Module design is based on the “KTH
head”, a binocular vision system developed at Stockholm University, by Oliver
Mertschat et. al [7]. The KTH head design demonstrated modular, cost-effective, and
f exible design qualities that were suited to f t into the nScorpion EOD system. Figure 5
details the original KTH binocular vision system. The purpose of Mertschat’s binocular
vision system was to provide a vision based sensor system for the control/operation of
small inexpensive mobile robots made of Commercial Off The Shelf (COTS)
components. The advantage of adopting this design as a baseline was that it made it
possible to also conduct disparity mapping if and when required to augment the
selective perception of the vergence technique. Unlike the KTH head, the Perception
Module is destined to be mounted above the robot platform and so it has improved
look-up and look-down capability. This allows it to look at the platform and look at
small objects near the front of the platform for manipulation. A number of additional
electronics and mechanical improvements undertaken at the same time are detailed in
Section 3.

This memorandum outlines the Perception Module and discusses in greater detail the
role and importance of this type of component in future autonomous systems. Section
2. presents a brief review of the theory. Section 3. describes the design of the
Perception Module. Section 4. comments on the evaluation of the design in its current
form. Section 5. summarizes the capabilities and outlines some improvements to the
Perception Module.

2. Theory

This memorandum has enough space to brief y and unjustly outline vision theory.
According to Aristotle, seeing is knowing what is where. According to Marr[3], vision
is an information processing task conducted by the visual system operating on three
levels. There are many great references; general references are Pinker[8], Hubel[2],
Marr[3], Grimson[9], and Friedman[1]. Some references on the neurocomputational

DRDC Suffield CR 2012-121 3


Figure 2: The four main image intensity factors affecting sensed luminance

theory of mind include Pinker[8] and Churchland[10]. Mertschat [7] presented a


general review of human vision. References [7],[2], [3] describe eye physiology.
Stereopsis and stereo vision have been studied since Wheatstone[11] and others
[12][13], modelled in detail[14], and applied[15-17] - even in aural disparity[18].
More specifi visual servoing / active vision / foveated vision references are [19-25].

The general idea is that vision is the result of the eye’s and brain’s neural activity
processes arrived at by multiple parallel computing systems (or Parallel Distributed
Processing PDP [10]) based on an understanding of real objects’ luminance properties.
Figure 2 describes the four main factors that affect the luminance sensed on a retina or
a camera image plane from objects in the f eld of view; they are illumination, surface
luminance, object geometry, and viewpoint. Refer to Annex H for word def nitions.

In Friedman[1], a simple experiment using monocular and binocular distance


estimation proves that we use extra information from binocular vision to judge
distances faster. Binocular vision is the general term for vision using two eyes. Unlike
stereo vision systems that use stereopsis primarily, the general form of binocular
vision does not depend on co-planar image planes that facilitate disparity computation.
Subconsciously, we know where objects are in our local perceptual space; roughly if
we focus elsewhere and use our lower resolution peripheral vision or accurately if we

4 DRDC Suffield CR 2012-121


use our higher resolution foveal vision. Marr [3] hypothesizes (Marr [3] p. 148-149)
that distance estimates to subject objects lie implicitly in our visual system’s neuronal
activations, and Hubel [2] classif ed 7 groups of binocular cells from several animal
cortexes that respond to activations from the ipsilateral (same side eye) eye,
contralateral (opposite side eye) eye, or both.

It is believed that human eye movements[2] alternate between visual f xations and
saccades to move foveas (the higher resolution colour cone cell area of the retinas [26]
- see Annex H) around to perceive objects of interest. This is intuitive and can be
demonstrated by anyone; hold up one hand at eye level in your peripheral vision and
attempt to perceive the hand detail- the view is blurry. Compare peripheral hand detail
to the hand detail when your hand is in front of your eyes at visual f eld centre. Hubel
[2]described the process succinctly:

“First, you might expect that in exploring our visual surroundings we let
our eyes freely rove around in smooth, continuous movement. What our two
eyes in fact do is f xate on an object: we adjust the positions of our eyes so that
the images of the object fall on the two foveas; then we hold that position for a
brief period, say, half a second; then our eyes suddenly jump to a new position
by f xating on a new target whose presence in our visual f eld has asserted
itself, either by moving slightly, by contrasting with the background, or by
presenting an interesting shape. ”-Hubel[2]p79.

Saccades and micro-saccades are the simultaneous movements (or stationary w.r.t.
object in view and head/body movement) of the eyes to coalesce the retinal/foveal
imagery into a single binocular view. Micro-saccades are minuscule (approximately 1
to 2 arc minutes- imperceptible to the eye [2]) movements believed to update the cone
and rod cell image that respond to changes in luminance - these cells need movement or
the image is lost. Fixation is the maintenance of gaze in a constant direction, in the case
of vergence it is the visual fixatio on a subject object to produce a binocular view. To
look at an object closer by, the eyes rotate 'towards each other' (convergence), while for
an object farther away they rotate 'away from each other' (divergence). Vergence is one
physiological mechanism capable of estimating a solution to the correspondence
problem between binocular vision images because the extreme precision with which
vergence eye movements can be controlled [3] makes it possible to infer slight distance
changes. The diff culty lies in determining the true correspondence between images.
Fukushima et. al. [27] proposed that primates with frontal eyes use vergence
movements (eyes rotate in opposite direction ) to track small objects moving towards or
away from them and the smooth pursuit system (eyes rotate in same direction ) to track
small objects in frontal pursuit. Without duplicating vision entirely, the fact that
distance estimation from visual processing is unconscious and yet accurate suggests
that autonomous machines could attain real-time localization using such techniques.

After visual information is captured and pre-processed, a number of decoding

DRDC Suffield CR 2012-121 5


processes are used to fuse this raw data for vision. Marr [3] describes some of the
known decoding processes as:

“(1) stereopsis, (2) directional selectivity, (3) structure from apparent


motion, (4) depth from optical f ow, (5) surface orientation from surface
contours, (6) surface orientation from surface texture, (7) shape from shading,
(8) photometric stereo, and (9) lightness and colour as an approximation to
ref ectance” -Marr[3] p. 103.

The Perception Module attempts to apply vision neuroscience, particularly foveal


binocular vision. A binocular view is arrived at when the eyes are rotated around eye
axes so that the subject object of attention- also referred to as selective perception- lies
in the centre of axis on both retinas (the fovea in the case of animals with foveal vision).

Future papers will describe the mathematics in proper detail. For the purpose of
presenting the vergence technique, an oversimplif ed 2D scenario is presented.
Consider Figure 3, which describes a simplif ed pinhole camera- based binocular vision
system. The cameras are assumed to be coplanar with the target object and in the
frontal f eld of view of both cameras. Let us suppose that EL represents the left eye, and
ER represents the right eye as pinhole cameras with a f at 2D image plane parallel to the
3D Subject object in view. OL and OR represent the left and right eye origins (which
would be the respective foveal origins in the human eye [2]). Angles A and B represent
the rotated angle from perpendicular to the image plane for the left and right eyes
respectively. Distance b represents the variable baseline between the optical axes and
distance d represents an estimated distance to a point on the 3D subject of the f xation.
It could be argued that baseline b is f xed for further approximation. Using saccades
and micro-saccades, the angles A and B are adjusted until there exists agreement in the
visual system that the Subject lies in the optical centre of both eyes. The line c
bisecting Subject is co-planar with the lines c’ bisecting the images planes of left and
right eye. The image appearing on the image planes will be reversed and upside down.
The Line c intersects Subject at the estimated distance d from the binocular baseline b.
In reality, there would be a slight skew and misalignment of the left eye vs. right eye
image planes in biological eyes as well as the Charge Coupled Device (CCD)
electro-optical cameras used in the Perception Module owing to the intrinsic and
extrinsic parameters of the camera/eye. But in this ideal model it is assumed that the
eyes’ image planes are perfectly co-planar and therefore c, c′ L , and c′ R are co-planar.
l ′ L (Image vertical centre line left eye) and l ′ R (Image vertical centre line right eye) are
adjusted to correspond with l (Subject object centre line). The Perception Module
collectively tilts the eyes so that in general terms this will be the case post-calibration.

Both eyes are able to rotate around a vertical optical axis out of the paper co-located
with OL and OR respectively. It is assumed that the optical axis and the rotation axes
are identical unlike the human eye where the optical origin and the foveal origin are
not. The f xation movement attempts to align the subject vertical centre line l with the

6 DRDC Suffield CR 2012-121


Figure 3: Simplif ed pinhole camera visual f xation estimating distance from vergence

l ′ L and l ′ R to infer distance d. Figure 4 presents a simplif ed triangular distance


estimate. Distance d then must rely on the accuracy of the correspondence of left and
right horizontal image plane axes to the subject axis. If they are not corresponding, the
distance estimate will not hold. Subject will appear transformed (perhaps scaled) as
images on the left and right eyes’ image planes. Due to parallax, the image pixels in
the left and right corresponding image planes will not be identical.

Given the above assumptions, the known angles A and B, the known/ measured
baseline b, we can determine d. The trivial case is presented: where subject object is
within the baseline plane b and directly in front of view. Distance d can then be inferred
from b, A, and B.

(tanAtanB)
d=b
(tanA + tanB)
The bisecting point x along baseline b intersecting d can be determined. In general, it
will not bisect along baseline b unless angles A and B are identical. The following
equation can determine where the bisection point is.

(tanA)
x = b (tanA+tanB)

This basic triangulation techniques can be used to estimate distance to subjects as the
robot localizes itself.

DRDC Suffield CR 2012-121 7


Figure 4: Distance estimation

3. Design
This section outlines the design of the Perception Module. DRDC did not have a
current binocular vision system in-use or developed in-house, therefore an external
baseline design was chosen. Refer to Figure 5 for the KTH design baseline. Refer to
Figure 6 for a front view of the assembled Perception Module. The Perception Module
consists of two sub-components, the head and the neck. The head houses 2 cameras
mounted in an actuated pan tilt assembly on a rotating neck. This actuated pan tilt
assembly gives the head 4 degrees of freedom (DOF) to adjust the eyes in relation to
the Perception Module origin. In addition, three microphones are located at 120 degree
intervals around the plastic case for aural sensing. The cameras and the microphones
are interfaced to a computer in the neck housing. The head’s components are controlled
by a micro-controller located in the head. The microphones and eyes are cabled to the
neck computer for processing. The head controller and neck computer communicate to
one another via RS-232 and CANBus interfaces. The overall assembly is enclosed and
housed in an outdoor splash and dust-proof housings that meets IP54 rating.

The following sections describe the hardware and software respectively. Further
hardware details are available in the attached annexes. Annex A describes the technical
specif cations of the Perception Module Mk.I. Annex B contains the bill of materials
(BOM) for the design. Annex C describes the assembly instructions. The following
section 3.1 describes the hardware design and 3.2 describes the software design.

8 DRDC Suffield CR 2012-121


(a) (a)

(b) (b)

Figure 5: Original KTH Binocular Vision System taken from Mertschat [7]
(a) Axonometric 3D Model (top) from front cover/ Figure 8 [7]and (b) mechanical drawing
(bottom) Figure 10 [7]

DRDC Suffield CR 2012-121 9


Figure 6: Perception Module Mark. I
(Front view on test stand.) The head portion is located under the transparent plastic cover in
the top half of the image. The neck portion is the white metal computer container in the
bottom half of the image. The two camera lenses inside the head portion (image centre) are
the front end of the Flea cameras.

10 DRDC Suffield CR 2012-121


Figure 7: Head assembly of Perception Module
(Top down view with plastic cover removed.) The eyes can be seen at the lower edge of
the image, the controller in the top right, neck sleeve centre, and servos/encoders are at
various positions on the aluminium plate.

DRDC Suffield CR 2012-121 11


3.1 Hardware Design
3.1.1 Neck Computer

The Neck PC (shown in Figure 8) is a PC/104 Advanced Digital Logic


ADL855 based P4M running at 1.4 Ghz. It is powered by a PC/104 DC-DC
converter and has the Dual Firewire (IEEE 1394) interface card and CANbus
interface card attached. The neck computer can operate on an input voltage of
36V DC source from the EOD robot.

This sealed housing has no fans and uses conduction cooling transfer heat to
the aluminum housing. The heavy aluminum conduction plates transfers
processor heat to the housing exterior. This conduction reduces normal CPU
temperature above 20 degrees C ambient. The CPU core temperature must
stay below 100 degrees C. Extreme thermal conditions can occur when the
CPU is operating near 100% inside the sealed container. This poses a serious
operating restriction to the proper function of the computer. This became the
critical risk for the Perception Module. The conduction heat sink has proven
to dissipate heat fast enough that the computer does not shut down when
operating at normal ambient temperatures. Elevated temperature operation
testing has not been done at this point. The CPU speed or the frequency of
update could be can be tuned to alleviate this condition by reducing processor
cycles. The housing has been powder coated white to minimize self-heating in
a sunny environment.

3.1.2 Neck Housing

The neck electronics housing is a Commercial Off The Shelf (COTS)


VersaTainer PC/104 computer container. The housing itself is strong enough
to carry the head mass and therefore it acts as the neck without further
reinforcement. The neck connects to the head using a four-point contact
bearing located at the top of the housing. This housing is sealed with no air
intake or exhaust. The completed VersaTainer can handle 4 cards in the stack.
As can be seen in Figure 8, the computer stack and harnesses do not leave
much open space. Figure 9 shows a top down view of the housing without
cables attached. The hard drive f ts down one vent space, the lithium battery,
and the cables f t into the remaining spaces.

This neck housing is sealed to meet IP54 requirements. The IP54 rating
def nes protection against dust, stray thin metallic wires, and sprinkling of
water against the enclosure.

The 36V DC Power and CANBus interface cables exit the neck housing
bottom plate. An alternate bottom plate used during development allows
access to all onboard device ports: keyboard, mouse, video, USB, and IDE.
This development bottom is not IP54 rated. The neck bearing hole holds the

12 DRDC Suffield CR 2012-121


Figure 8: Neck Computer
(Removed from neck housing.) Note the aluminium conductive plate on the far side of the
PC/104 CPU stack.

two f rewire camera cables, the head controller CANbus/RS-232 interface


cable, three microphone cables, and power cable.

3.1.3 Head Housing

The head is based on a 1/8-inch aluminum plate. The electronics, servos and
camera assembly are mounted on the plate similar to [7]. A spherical acrylic
dome was considered for covering the camera f eld of view but has been
rejected for several reasons:

1. The dome surface would interfere with the camera lenses when tilted at an
upward viewing angle; and

2. It would require approximately 6 inches above the base plate to cover the
perimeter yet only 4 to 5 inches are available.

The cylindrical acrylic cover is a 6-inch high cylinder with a solid end plate
on top. There is a small joint between the top and sides which is visible as a
small elliptical discontinuity in the camera images. This aberration could be
used as a calibration guide for imagery. The cylindrical section extends

DRDC Suffield CR 2012-121 13


Figure 9: Neck assembly of Perception Module
(Top down from head view.) The container for PC/104. The neck computer is visible
in the centre of the image

14 DRDC Suffield CR 2012-121


approximately 1 inch below the main plate providing a skirt to protect the
neck joint area and to allow the window to extend for the cameras to look
down. The cylinder is mounted at six points below the head plate and f ush
with the gasket. The cover can be removed to provide easy access to the
cabling and head mounting bolts while the head is attached to the neck. The
disadvantage of the acrylic cover is that glare from behind the cameras ref ects
back onto the front cover and distorts the image. A cloth cover was added to
the rear portion to shield glare from rear light sources.

The gasket below the main plate provides a seal to the IP54 rating specif ed
and prevents damage to the cameras, servos and cables. Thermal dissipation is
not an expected problem with the electro-mechanical components in the head.
By using low power devices and high eff ciency switching power supplies
there should be minimal heat dissipation required. The placement of
components on the head was planned and calculated to provide a statically
balanced head. The offset is less than 100 gm/cm in the horizontal plane.
Since we have a bearing race diameter of 4 cm this should prevent any undue
wear or binding on the neck bearing.

3.1.4 Head Rotate Mechanism.

The Head rotate mechanism follows the KTH reference design with a few
modif cations. An idler pulley was added to the encoder pulley for position
feedback. The pulley widths and sizes were increased to help use the motor
torque without belt slip over the pulley’s teeth. The neck sail winch servo
(HITEC HS-785HB) rotates the head on the neck bearing. This is a functional
replacement for the obsolete servo mentioned in [7].

The neck rotation gear ratio is 30/48 instead of 36/48 in the KTH design[7].
This restricts total head rotation to 788 degrees (30 / 48 * 3.5 * 360). The
original KTH neck rotation of 945 degrees is more than required for 360+
degrees as specif ed. This gear reduction increased torque and will rotate the
head faster under load. The drive belt is now a 9mm wide belt instead of the
6mm belt used at KTH. The mounting of the pulley on the servo shaft prevents
using high tensions on the belt as the pulley and shaft effectively bend.

The USDigital H1 encoder attached to the idler will provide 2000 counts per
revolution, or 4000 counts per head revolution. This beyond the requirements
since the servo control resolution is not likely to be 1/1000. The encoder has
an index pulse for identifying the head origin forward position. Since 2
revolutions of the encoder are possible for one revolution of the head therefore
a start-up rotation sequence is required to correctly identify the proper index
pulse. An absolute index pulse is important for the vergence method.

The bearing used is a four-point contact bearing (Kaydon Part Number


JHA15XL0). [7] describes in detail the requirements for the bearing. It is

DRDC Suffield CR 2012-121 15


important that the load rotates evenly and smoothly to reduce noise in the
image processing.

3.1.5 Camera Pan/Tilt Mechanism.

The camera pan/tilt mechanism follows the KTH reference design [7] adapted
for Point Grey Research (PGR) Flea cameras and encoders for position
feedback. Each eye camera (Flea) can independently pan approximately ± 38
degrees and collectively tilt ± 40 degrees. These motions will allow the eyes
to verge onto objects The f rst point of contact of the moving mechanism with
the acrylic cover are the front lower corners of the tilt bar. These corners have
been rounded off to prevent rubbing.

The base has only 2 small machine screws for fastening to mounting plates.
The location of these screws precludes having the pivot point right at the
image sensor plane or focal point. The camera cable connectors protrude
signif cantly out the camera back end (see Figure 7) and restricts eye
placement and movement. The connectors required modif cation to allow for
full movement and non-binding operation.

The camera servos (HITEC HS-925MG) control the pan and tilt functions.
These servos are high powered and fast servos and are functional
replacements for the servos mentioned in [7]. Performance tests showed these
servos to be almost violent in their motions.

The encoders (USDigital E2) attached to camera axles and the (USDigital H1)
the tilt pivot servo provide 2000 counts per revolution. This delivers
approximately 5 counts per degree. They have an index pulse for identifying
the eyes centered and level position as an absolute encoder position.

Dual Ball bearings are used on all pivot axles to support and align the shafts
and provide smooth motions. Like the head rotate motion, the pan and tilt
must be smooth to reduce noise in image processing.

3.1.6 Cameras and lenses.

The modif ed DRDC Perception Module design required electro-mechanical


zoom lenses aff xed to the eyes. After investigating the various add-on zoom
lenses available for C/CS mounts we found that the smallest were approx.
80mm long with a mass of 300 g. Attaching zoom lenses to the front of the
camera and pivoting it around the image plane would create several
mechanical problems in balance and control. The increase in inertia would
require a change to the gear motor and mounting assembly. The length of the
lenses would push the pivot back towards the center of the head instead of
towards the edge as desired. The tilt angle desired tilt angles it would be
extremely difficult to package these within the overall size constraints
specified in the request for proposal (RFP).

16 DRDC Suffield CR 2012-121


Figure 10: Left eye (left) and right eye (right) images from Perception Module
Images are uncalibrated and shown at 20% image size. Images are 1024x768 (72 dpi) from a
Sony 1/3-inch CCD using 12-bit A/D conversion (4096 colour). Note the different objects that
overlap in left/right images (i.e. the cabinet) and the object appearing only in one view (i.e.
the chair). Note the changing luminance ( particularly the cabinet and wall) comparing left and
right images.

We are using a high-resolution color version of PGR’s Flea camera with


digital zoom capability. This would give a zoom factor of approximately 3.5.
This stays close to the proposed reference design without repackaging the
system to meet the size and environmental constraints.

The Flea has the primary image dimensions of 1024 W x 768 H pixels at
12-bit resolution and several operating modes where the user (computer) can
select binning and custom image sizes to specify regions of interest. This will
allow a zoom in/out capability completely in software. After trying several
proposed lenses, the 8mm F1.6 CS lenses have depth of f eld and minimal
distortion around the edges. These lenses have a manual focus and are
approximately 30mm long. The size and angle of the Firewire connectors on
the Flea cameras required signif cant labour in adjusting the cabling and
routing so the cameras have full motion without drag as well as clearance
inside the neck.

3.1.7 Audio Input.

Three microphones are placed equidistant around the head cover, halfway
between the base plate and the top cover, and are connected to the neck

DRDC Suffield CR 2012-121 17


compute for processing. The microphones chosen are directional noise
canceling microphones. They are mounted in rubber grommets to seal and
provide dampening for mechanical noise from the robot. Cabling will go
directly in to the VersaTainer and the sound input of the CPU board. The
sound controller on the CPU board converts 2 audio inputs without mixing
them so an additional USB Sound card is added for the rear microphone
channel. Minimal work was done to evaluate sound levels from the different
microphones so far.

3.1.8 Head controller.

The head controller is a Motorola PowerPC based MPC555 microcontroller


board capable of the various pulse-width modulation (PWM), timer
input/output compare, digital, and analog inputs from the electromechanical
devices inside the head. The SS555 controller is mounted in the head behind
the left camera (where KTH Camera circuit board was located). Power, serial,
and CANBus cables are routed to the neck via a plastic neck sleeve. The
BDM cable can be easily attached with the acrylic cover removed during
development. The SS555 has all software drivers to support and operate all of
the motors and the sensors involved. It runs Real Time Execution
Management System (RTEMS) operating system as described in section 3.2

3.1.9 Power system.

Power Input will come in the bottom of the VersaTainer on an 18-gauge cable.
The Main Power switch will be mounted on the rear of neck housing. A
PC-104 form factor DC- Switching power supply in the neck module converts
the input 36V DC to the main computer 5V DC. This power supply is a
high-voltage high-eff ciency card capable of meeting the demands of the
Pentium M speed switch.

A separate power supply and distribution board energizes the head circuits at
the various voltages of 5V DC (encoders and control logic) and 6V DC
(servos). This design simplifie cabling going through the neck joint with one
power cable and converting power in situ. Head power circuits consist of 5
independent high-eff ciency switching power supplies. The power/distribution
boards are stacked on top of the head controller. This design isolates servo
motor loads from control logic supplies. The modularity distributes the load,
and allows individual power boards to be exchanged if needed.

The cameras are powered via the Firewire 12V DC from the neck computer.
Despite the cameras maker's claim they are “Hot-Pluggable”, it turns out
that the computer interface cards do not tolerate the cameras being powered
down independently. After losing 3 interface cards to this anomaly, 12V DC

18 DRDC Suffield CR 2012-121


supply was removed from the head power board. This increases heat inside
the neck but increases the reliability of the system as a whole.

3.2 Software Design

This section describes the software design of the Perception Module MkI. There are
three layers of software for the Perception Module. The low-level device drivers
interface servo motors, encoders, and control loop functions and are embedded on the
head controller. The RTEMS operating system operates above the low-level drivers;
controls for the head reside on the neck computer. A test/demonstration, Eyetest,
sample application on the neck computer controls the head controller over a serial link.
Test PC applications demonstrate and exercise the Perception Module individual head
devices. These applications are FlyCap from PGR to display the cameras’ output, and
sound board-based oscilloscope programs display the 3 audio input channels.

3.2.1 Software Development Toolchain

The SS555 developer kit from Intec Automation was used to develop the
device interface and control applications in the head controller. This software
was later ported to run on RTEMS as a full multi-tasking application.
Microsoft Visual tools were used to develop the Eyetest application on the
neck computer.

3.2.2 Low-level Hardware drivers

The MPC555 Time Processor Unit (TPU) has functions for quadrature
encoders complete with capture on an index pulse for alignment. The PWM
function of the MPC555 is used to provide a programmed pulse with a
specif ed repetition rate. No special TPU functions were required.

3.2.3 Servo Motor Control

As described in section 3.1, four servos control eye pan (2), eye tilt (1), and
neck rotation (1) using PWM output signals from the SS555. The low-level
driver can provide 1024 position resolution for full scale of rotation of the
servo. This will translate into 1.0 to 2.0 ms pulse width for the servo. The
low-level driver API specif es the desired angle as a signed integer value. An
API for setting conf guration, limits, and calibration parameters will be
provided. These functions were implemented using the Intec Automation
Runtime library functions for MIOS and PWM. All of the parameters are
changeable on the f y.

DRDC Suffield CR 2012-121 19


3.2.4 Optical Encoders

The 4 optical quadrature encoders are connected to the TPU channels of the
SS555 that measures position of the pan/tilt/neck mountings. The current
values of position are available through the API as integers related to angle.
The low-level API includes calibration for individual parameters of all
encoders is provided. All of the parameters are dynamically changeable.

The encoder inputs will be able to capture the present count when the index
goes by. This allows us to store an offset from this index count for our
calibrated center position without tricky physical alignment procedures.

Power-up procedures must include initialization of the encoders so that the


absolute pose is known before the Perception Module operates. This
procedure centres the servo and then oscillates back and forth until the index
is detected and count latched. Then the control loop can set the servo to centre
and the operation can proceed. This procedure works for the neck rotation
encoder as well even though there will be multiple index pulses for full servo
travel. This neck rotation requires a larger swing for calibration. These
functions were implemented TPU libraries from Codewarrior ported to GNU
Compiler Collection (GCC) in accordance with the RFP.

3.2.5 Eyemotor Objects

The servomotor and encoder pair have been encapsulated into an eyemotor
object. Four eyemotor instances are implemented as tasks in the RTEMS
application. At power-up, the servomotor is commanded to its center position.
During hardware assembly this is conf gured to be approximately the motor
centre. Once the motor has settled at its centre, the API commands
movements back and forth to f nd the encoder’s index pulse. This
initialization produces a high precision absolute reference. This absolute may
not be aligned ground truth eyemotor centre so a software offset can be
applied the encoder value that aligns the encoder centre with the eyemotor
centre. This soft calibration process allows the hardware assembly to be
simple and non-critical for alignment purposes. The software calibration after
the head is assembled gives us the desired precision and accuracy for the
agreement of the binocular imagery.

The Eyemotor combination incorporates the hardware interface for the RC


servo and the quadrature encoder with a PID control loop. The eyemotor is
given commands to move the motor to a given absolute position expressed in
arbitrary units and that position is achieved and maintained using a PID
control loop. Once the desired position is reached (within a specif ed
tolerance) it is reported to the host.

The eyemotor libraries include functions for controlling the power supplies

20 DRDC Suffield CR 2012-121


and reading the analog inputs to monitor voltages. These functions were
implemented using the Intec Automation Runtime library functions for
multi-purpose input/output (MPIO) and queued analog to digital conversion
(QADC). The QADC library needs the pull-ups resistors in the MPC555
disabled (PDMCR) for proper operation. All of the parameters are changeable
on the f y.

3.2.6 Communications handling of parameters, control, and reporting.

A communications handler is provided to take individual messages and update


or report on the values involved. This interface was designed to use individual
small messages that could easily f t in a CANBus Message packet. An ASCII
interface has been provided, and a CANBus interface can be easily added.
The bulk of the work is done in functions separate from the I/O format. Refer
to Annex E for a table of all message format entries.

The data scheme was organized around a row/column layout as we have


several instances (columns) of many of the data variables (rows). Each data
item may be set or queried independently. This table is easily visualized when
looking at the parameters display of the Windows Eyetest application. The
handling of the rows of the table are def ned in a table for length, data type
etc. This allows easy extension of the protocol to include more information as
desired. The ASCII based interface includes a checksum element to ensure
integrity in case of loss of data. Each message is formatted as follows:

!ss,rr,cc,vvvv <CR><LF>

where

ss is the checksum, rr is row number (+128 for write of value), cc is column


number (0..7), vvvv is the data value that may be integer or real. <CR><LF>
are the message terminators.

The current data rows are detailed in Annex D.

3.2.7 RTEMS Eyemotor tasks

The Eyemotor tasks run on a periodic basis and run the calibration or control
loop for an eyemotor. Until a calibration phase has successfully completed we
cannot run the PID control loop. Each Eyemotor has a separate instance of the
task running at its own specif ed periodic rate.

3.2.8 RTEMS Communication task

The communication task monitors the COM port for input and passes
messages as they come in to the EyeConf g functions mentioned above. The

DRDC Suffield CR 2012-121 21


f nal version will use the CANBus interface to communicate with the neck
computer.

3.2.9 RTEMS Background task

The background periodic task broadcasts a status report once a minute to the
demonstration application.

3.2.10 Demonstration Embedded application

This application will initialize the drivers and calibration of the servos and
encoders. It will then poll the various inputs and send the information out a
telemetry port on a serial port. Parameter and position settings over the
telemetry link will demonstrate the various conf guration API functions. This
application runs under RTEMS and creates tasks for handling the 4
eyemotors, the serial comm link, and background supervisory duties.

Figure 11 shows the audio oscilloscope demonstration programs with the


Eyetest application in the background. Figure 12 shows the parameter tab of
the Eyetest application. Figure 13 shows the conf guration tab of the Eyetest
program. Figure 14 shows the window controls that can be used to send
commands to the head controller. Refer to Annex D for further detail of the
demonstration applications.

4. Discussion
The purpose of the Perception Module is to capture imagery data from a binocular
vision pair and transform that information into distance estimates in the reference
frame of nScorpion robot. The kinematics of the Perception Module are
straightforward. Each servo motor and rotation mechanism has a large gear ratio that
essentially decouples the joint movements from each other. The noise and vibration
characteristics, on the other hand, will be non-trivial and their investigation should
take considerable effort. It will be important to understand these dynamics if one
intends to apply background correction techniques. These factors will be impacted by
such things as speed, track material and the head mounting beams and brackets.

22 DRDC Suffield CR 2012-121


Figure 11: Audio Oscilloscope Demonstration Program

DRDC Suffield CR 2012-121 23


Figure 12: Demonstration GUI Parameters Tab

24 DRDC Suffield CR 2012-121


Figure 13: Demonstration GUI Conf guration Tab

DRDC Suffield CR 2012-121 25


Figure 14: Demonstration GUI Controls Tab

26 DRDC Suffield CR 2012-121


The eyemotor joint can rotate over a limited angle about respective servos. Close
spacing between the cameras and servos restricts panning outwards, and the other eye
restricts for movements inward. The slight misalignments between the camera image
plane with respect to the camera centre, camera mounting point and camera image
plane, intrinsic and extrinsic parameters of the CCD image planes are all contributing
factors to the overall success and accuracy of object localization. The available
software configuratio parameters should decrease image processing error. A further
improvement could be inserting adjustable shims under the machined camera mounts to
adjust the gaze to correct for camera-specif c parameters. This would require a special
mounting plate and the calibration steps detailed below. The Flea cameras could also be
replaced with Flea2 versions that have improved update rates and better mounting
points.

The eye tilt joint can rotate over a bounded angle constrained by the head cover above
and the head plate below. The f rewire cable and connectors, as described in section
3.1, imposes a dead load on the rear of the cameras with the strain relief as the moment
arm. With the selection of a smaller gauge f rewire cable that is modif ed to allow for
eye movement, the dead load is minimized.

Neck pan, as detailed in section 3.1, is limited by the cables that connect the head
components to the neck computer below. This constraint is not viewed as a concern for
general operation because the eye movement DOF in conjunction with vehicle DOF
allows for trajectory planning in a redundant trajectory space. The neck position
resolution is more than suff cient for the current operation. The sail winch servo has
enough torque to rotate the head portion, but the plastic spline that connects the servo
gear to the pulley is insuff cient to the load causing it to bend. It should be improved
with a spindle mounted to the head plate.

Generally, the mechanisms can operate with soft limits to prevent damage. Limit
switches could be applied to the boundaries of eye movement to prevent damage and to
reconf rm calibration during initialization. This would improve damage avoidance and
calibration.
One improvement to the design would be the use of a slip ring system to decrease
cabling between the neck and the head. Slip rings can provide several amperes of
power and up to 8 signals concurrently. There is potential to allow 360 degree
continuous rotation using slip rings. However, current slip ring designs add noise to the
signals so the only practical cable to replace would be the power cable. The
communications via CANbus and the Firewire signals would be affected using a slip
ring. Firewire cabling would be attenuated using a slip ring and would require a
dedicated 12V DC source. Power converters onboard the head would reduce the ripple
from the injected noise if a slip ring was used for power only. Slip ring signal filterin
would increase power consumption by the head so that is a factor to consider versus
the continuous spin capability offered by the slip ring.

DRDC Suffield CR 2012-121 27


Cabling could also be reduced by adding a computer in the head. This would reduce the
two f rewire cables through the neck joint, which would allow for more than 1 full
rotation to neck pan. This improvement would increase the mass of the head, and that
would impact both part placement and head dynamics.

Another improvement for calibration purposes would be to install a laser range f nder
between the eyes co-planar with the eye image planes in order to provide a ground truth
of estimated distance. This would provide a data collection source to confir the
capability of any algorithm used for localization. An alternative is to place a simple
LED laser pointer co-planar so that the eyes can test calibrate on object along the
central axis. This would not capture data but it could allow manual calibration.

It is anticipated that the current CPU processing speed will not be suff cient for
real-time operation in complex environments. An increased size versatainer would
allow another CPU. This would increase overall processing, but at a risk to heat
overload on a sealed container. A more attractive option is to develop hardware that is
designed application specif c. Another option is to add a dedicated sound card to
PC/104 stack for the sound processing which will delegate some of the processing load
off the main CPU. This would improve performance by delegating sound processing
without signif cant heat problems.

A recommended improvement that will decrease the mass held at the top of the neck
mast is to separate the head from neck and recable the neck harness. The neck
computer stack could then mount onto the EOD robot. This would improve the head
dynamics by decreasing the moment arm, thereby improving robot stability.

System integration onto the dedicated EOD robot, which will include the integration of
4 microcontrollers and computer(s), will take signif cant effort. The four
microcontrollers control unique devices attached to them and connected by the
CANbus and serial (RS-232) interfaces. A common communication protocol must be
established that can relay messages of a restricted CANbus nature. This protocol must
include generic housekeeping and communication messages as well as a stable f nite
state machine for transitions to various run-levels or operation states. This work is a
critical risk to overall success.

The Perception Module operating system is RTEMS. RTEMS is an open source


real-time operating system (RTOS) with a proven track record of real-time performance
on many platforms and deterministic worst-case execution times (WCET). Knowing a
task's worst-case execution times is crucial in systems with hard real-time constraints,
in which missing a deadline can have catastrophic consequences[28]. This Operating
System (OS) was chosen due to the critical real-time constraints imposed by EOD
robots that must be able to respond even when faced with multiple high priority
interrupts.

28 DRDC Suffield CR 2012-121


It is planned that an RTEMS partition will be installed for operation of the nScorpion
robot system rather than encompassing the entire hard drive. A Linux partition will be
installed for development tools and for building the RTEMS executable. This will
allow for data capture statistical analysis of data by writing data from the RTEMS
supercore onto the Linux ext3 partition during use. In case there is a crash in the
software, the partitions should save other partitions.
Calibration will be required for the servos, encoders, and cameras. A couple of 2D
calibration instruments will be required. One 2D instrument will measure the distances
along the camera optical origins. This 2D calibration will also survey distances to
conf rm co-planar images. This instrument will be placed across the camera origins and
imagery from the cameras will measure straight forward distance. A second 2D
calibration instrument will be used to zero the cameras for straight ahead vision. It will
consist of parallel and intersecting lines at f xed distances from the cameras and used to
adjust the camera positions so that the image planes are compensated. A further 3D
display will be used to calibrate the imagery from left right- attempt to estimate
chromatic aberrations, image skew, image misalignment, extrinsic, and intrinsic
parameters. The 3D display will also identify image plane skew. In order to display the
images extra-robot the Firewire cables will be transformed into BNC analog signals via
a conversion board and interfaced to the 3D display. The control system in place will
then be used to pan and tilt camera during examination.
The 3D Display used for calibration can turn the autonomous system into a simple
teleoperation system. This would entail additional labour but it could provide
additional capability demonstration. The 3D display allows a human observer/operator
to view the stereo imagery when the images are transported off the robot and onto a
operator control unit (OCU). This would require additional short range wireless
ethernet equipment and wireless video transmission/reception integrated onto the
existing system. An experiment arises out of this capability: can 3D vision transmitted
to the operator improve/extend the capability to perform EOD operations? Does this
capability allow the in-use teleoperation robotics to carry out further human tasks?
Beyond calibration, system performance will be estimated to determine how close it
resembles human vision. For example, during human vision saccades the eye angular
velocity can approach 1000 degrees per second[2]. While the Perception Module has 4
DOF in comparison to the 11 DOF for the human visual system, it is anticipated the
Perception Module should perform some operations on par. Whether it is determined
that it approximates human vision performance or not, it will reveal the improvements
for future versions of the Perception Module.
To conf rm the validity of the vergence method, disparity and vergence algorithms
experiments will determine how effective the localization operation with both
algorithms coordinating or operating exclusively. Other important questions will be
answered through experimentation. Can the eye movements alternate between f xation
and stereopsis fast enough to support real time motion. Does the digital zoom improve
vision effectiveness, can it compensate for the resolution differential between the fovea
and the CCD camera? How much disparity mapping is necessary to complement the

DRDC Suffield CR 2012-121 29


selective perception? Can queued perception - the use of one sensor to queue the
attention of another- using aural sensors to aim visual sensors quickly and improve
operation performance?

5. Conclusions
The purpose of the Perception Module is to capture imagery data from a binocular
vision pair and transform that information into distance estimates in the reference
frame of the nScorpion robot. There are a number of improvements that should be
undertaken before system integration is complete:

1. Separate the head from the neck, machine a new neck plate, and recable the
interface from the neck to the head;

2. Modify the head plate to include a spindle on the neck servo pulley to support the
rotation of the neck;

3. Replace the Flea cameras with Flea2 cameras to improve camera stability and
update frequency;

4. Add limit switches to the boundaries of eye motion to prevent damage and improve
calibration.

The potential application of this module is wide, it could be mounted to many different
platform types. The target platform size range is from 50kg and up. For larger vehicles
it is conceivable to integrate several Perception Modules to improve localization and
situational awareness. The inclusion of aural and visual sensors make queued
perception experiments possible and could lead to improved operation performance.
The successful implementation onto an EOD robot will make a number of important
perception and localization experiments possible.

While this autonomy will not exceed human performance, it can remove some of the
positive control burden, a factor in current EOD operations [4], and demonstrate a
glimpse of a future when force-multiplied humans team with autonomous devices.

30 DRDC Suffield CR 2012-121


References

1. Friedman, Francis L. (1960). Physics, Copp Clark Publihing.

2. Hubel, David H. (1987). Eye, Brain, and Vision, 2nd. ed. Scientif c American
Library. W. H. Freeman and Company New York.

3. Marr, David (1982). Vision, 1st ed. W.H. Freeman and Company.

4. Nguyen H. and Bott J. (2000). Robotics for law enforcement. In SPIE, (Ed.), SPIE
International Symposium on Law Enforcement Technologies, SPAWAR US Navy.
SPIE Press.

5. H.G., Nguyuen and J.P., Bott (2000). Robotics for Law Enforcement: Beyond
Explosive Ordnance Disposal. (Technical Report 1839). US Navy SPAWAR.
SPAWAR Systems Center San Diego.

6. U.S. Department of Defense Data Fusion Subpanel of the Joint Directors of


Laboratories, Technical Panel for C3 (1991). Data fusion lexicon.

7. Mertschat, Oliver (1998). Mechanical design of a binocular vision system for a


modular robot. (Technical Report CVAP224). Stockholm University. Department
of Numerical Analysis and Computing Science Stockhom Sweden.

8. Pinker, Steven (1993). How The Mind Works, W. W. Norton & Company.

9. W. E. L. Grimson (1981). From Images to Surfaces: A computational study of the


Human Early vision system, The MIT Press.

10. Paul M. Churchland (1989). A Neurocomputational Perspective: The Nature of


Mind and the Structure of Science, 3rd ed. Cambridge, Massachusetts: The MIT
Press.

11. Charles Wheatstone (1838). Contributions to the Physiology of Vision.-Part the


First. On some remarkable, and hitherto unobserved, Phenomena of Binocular
Vision. Philosophical Transactions, 128, 371–394.

12. Qian, N. and Zhu, Y. (1997). Physiological Computation of Binocular Disparity.

13. O. Jokinen and H. Haggr’en (1995). Relative orientation of two disparity maps in
stereo vision, pp. 157–162. Zurich: ISPRS Intercommission Workshop From Pixels
to Sequences - Sensors, Algorithms and Systems.

14. Qian, Ning (1994). Computing Stereo Disparity and Motion with Known Binocular
Cell Properties. Neural Computation, 6(3), 390–404.

15. Murray, Don and Little, James J. (2000). Using Real-Time Stereo Vision for
Mobile Robot Navigation. Autonomous Robots, 8(2), 161–171.

DRDC Suffield CR 2012-121 31


16. Thrun, Sebastian, Buecken, Arno, Burgard, Wolfram, Fox, Dieter, Froehlinghaus,
Thorsten, Hennig, Daniel, Hofmann, Thomas, Krell, Michael, and Schmidt, Timo
(1996). Map Learning and High-Speed Navigation in RHINO. (Technical Report
IAI-TR-96-3).

17. Xiong, Y. and Shafer, S.A. (1994). Variable Window Gabor Filters and Their Use
in Focus and Correspondence. In CVPR94, pp. 668–671.

18. Nakadai, K., Okuno, H., and Kitano, H. (2002). Realtime sound source localization
and separation for robot audition.

19. Ales Ude and Chris Gaskett and Gordon Cheng (2006). Foveated vision systems
with two cameras per eye, Orlando.

20. Zaharescu, A., Rothenstein, A., and Tsotsos, J. (2004). Towards a Biologically
Plausible Active Visual Search Model.

21. Berthouze, L., Rougeaux, S., Chavand, F., and Kuniyoshi, Y. (1996). Calibration of
a foveated wide angle lens on an active vision head. In IEEE/PAMI Computer
Vision and Pattern Recognition, San Francisco, USA.

22. Hutchinson, S. A., Hager, G. D., and Corke, P. I. (1996). A tutorial on visual servo
control. IEEE Trans. Robotics and Automation, 12(5), 651–670.

23. G. Hager and W. Chang and A. Morse (1995). Robot hand-eye coordination based
on stereo vision. IEEE Control Systems Magazine, 15(1), 30–39.

24. Prokopowicz, P.N. and Cooper, P.R. (1993). The Dynamic Retina: Contrast and
Motion Detection for Active Vision. In CVPR93, pp. 728–729.

25. Horii, Akihiro (1992). The Focusing Mechanism in the KTH Head Eye System.
(Technical Report ISRN KTH/NA/P–92/15–SE).

26. Iwasaki, Masayuki and Inomara, Hajime (1986). Relation Between Superf cial
Capillaries and Foveal Structures in the Human Retina. Investigative
Ophthalmology & Visual Science IOVS.org, 27, 1698–1705. (with nomenclature of
fovea terms).

27. Fukushima, K., Yamanobe, T., Shinmei, Y., Fukushima, J., Kurkin, S., and
Peterson, B. W. (2002). Coding of smooth eye movements in three-dimensional
space by frontal cortex. Nature, 419, 157–162.

28. Colin, A. and Puaut, I. (1999). Worst-Case Execution Time Analysis of the
RTEMS Real-Time Operating System.

32 DRDC Suffield CR 2012-121


Annex A
Technical Specif cations
A.1 Perception Module
1. Physical Height (without mounting bracket) 27.5cm

2. Diameter Head 25.4 cm

3. Diameter Neck 18 cm

4. Weight 6.2 kg

5. Minimum Operating Voltage 15 V DC

6. Maximum Voltage 36 V DC

7. Power normal 20 watts

8. Power maximum 30 watts

A.2 Main Computer (Neck)


1. Intel Pentium M 1.4Ghz

2. Ram 1 Gb

3. Hard Disk 40 Gb

4. Interfaces: IDE, USB, PS2, XVGA, Com, CAN

A.3 Controller (Head)


1. Controller MPC555 40Mhz

2. Program Flash 448 Kb Ram 256 Kb

3. Interfaces: Com, CAN, PWM, Quadrature, Analog In

A.4 Cameras
1. PGR Hi-Col Flea 1 digital cameras

2. Resolution : 1024x768 @ 30Fps (15 FPS in demonstration program FlyCap)

3. Interface: Firewire IEEE1394A @ 400Mhz

DRDC Suffield CR 2012-121 33


A.5 Lenses
1. Length 30mm

2. Type: 8mm F1.6

3. Mount: CS lens

4. Focus: Manual

34 DRDC Suffield CR 2012-121


Annex B
Bill of Materials (BOM)
Part# Qty Description Dimension Notes/Material
Ch 01 1 Acrylic cover Dome 150x254mm J-RRC10

Ch 02 1 HD foam weather-strip 8x12mm ProForm #pf913

Ch 03 1 lower base plate 1/8" x 9.75" 6061 Al

Ch 04 2 Tilt Bearing Holder 42x62x5mm 6061 Al

Ch 05 1 Neck Bearing Holder 6061 Al

Ch 06 1 Right Tilt mount 6061 Al

Ch 07 4 Tilt Drive mount HD RA Servo mount ServoCity #SIGMFG402

Ch 08 4 Neck Drive mount #4-40 x 1" Stand offs

Ch 09 1 Left Tilt mount 6061 Al


Table B.1: Head Part BOM

Part# Qty Description Dimension Notes/Material


Ch 10 1 Pan tilt bridge 1"x6"x1/4" 6061 Al

Ch 11 1 Upper neck drive ring 6061 Al

Ch 12 1 lower neck drive ring 6061 Al

Ch 13 2 Neck encoder mounts #4-40x 1" Stand offs

Ch 14 2 Camera mount plates 30x40x6mm 6061 Al

Ch 15 2 Tilt mount cleats 1/4x1/4x2" 6061 Al

Ch 16 2 Pan Drive Mounts Servo Tape ServoCity

Ch 17 2 Tilt, neck encoders Us Digital H1-500-I

Ch 18 1 Cable Funnel 1.25" x 3" Nylon

Ch 19 3 4 pin connectors for audio, CAN Molex 50-57-9404-p/ 70107-0003

Ch 20 2 Tilt Axis 1/2" x 1/4" 6061 Al

Ch 21 2 Power Connector molex 39-01-2020/39-01-2021

Ch 22 2 Pan Axis 1" x 1/4" 6061 Al

Ch 23 2 Camera encoders Us Digital E2-500-250-I

Ch 24 1 Timing belt wheel (neck) 10mm 48 tooth SDP/SI A6A53M048NF0608

Ch 25 1 Timing belt wheel (servo) 10mm 30 tooth SDP/SI A6A53030DF0908

Ch 26 1 Timing belt wheel (encoder) 10mm 25 tooth SDP/SI A6L53-025DF0908

Ch 27 2 Tilt Pushrod joints #4-40 ball swivel ServoCity #10756

Ch 28 1 alum hub 1/4" Servocity #3463H


Table B.2: Head Parts BOM (contd.)

DRDC Suffield CR 2012-121 35


Part# Qty Description Dimension Notes/Material
Ch 30 1 Round servo horn Karbonite HiTec

Ch 31 3 Pan/tilt servos HS-925mg HiTec

Ch 32 1 Neck drive servo HS-785HB HiTec

Ch 33 4 Pan pushrod joints #2-56 ball swivel ServoCity #10754

Ch 34 1 Pushrod (Tilt) #4-40x ServoCity #98847A005

Ch 35 2 Pushrod (Pan) #2-56x ServoCity #98837A003

Ch 36 1 Timing belt 9mm 10 tooth SDP/SI #A6R53M100090

Ch 37 2 PGR Flea Cameras Flea-Hi-Col

Ch 38 1 Neck Bearing Kaydon JHA15XL0

Ch 39 4 Tilt/Pan Bearings f anged 1/4" x 1/2" x 5/32" SDP/SI A7Y55-G5025M

Ch 40 4 Tilt/Pan Bearings plain 1/4" x 1/2" x 3/16" SDP/SI A7Y55-P5025M

Ch 41 4 Encoder Cables 5 conductor #28 Molex 50-57-9005

Ch 42 3 Directional Noise canceling microphones. Electret Knowles Ac. MD9755USZ-1

Ch 43 1 Top Microphone cable harness rear

Ch 44 1 Top Microphone cable harness Front

Table B.3: Head Parts BOM (contd.)

36 DRDC Suffield CR 2012-121


Part # Qty Description Dimension Notes/Material
Cn 01 1 Upper Versatainer plate Modif ed VT-EC23

Cn 02 1 VersaTainer body 5" high VT-05

Cn 03 1 Heat plate 1/4" 6061 Al

Cn 04 2 Heat f llets 1x1x3" 1/4" 6061 Al angle

Cn 05 2 HD mounting plates .1" ABS plastic

Cn 06 1 Lower Versatainer plate VT-EC02

Cn 07 1 VersaTainer Stand part A 1/8" Alum.

Cn 08 1 VersaTainer Stand part B 1/8" Alum.

Cn 09 1 ADL Heat-Pipe assembly

Cn 10 1 Pentium M 1.4ghz processor and cable kit ADL 855 PM 1.4

Cn 11 1 PC-104 Dual Firewire interface DL MSMW104+

Cn 12 1 PC-104 CAN Interface card DL MSMCAN

Cn 13 1 PC-104 Power supply 50V 50 watt TRI-M HE104-HV-16

Cn 14 1 Tadiran lithium battery 3.6v clock battery

Cn 15 1 USB Sound card housing and connectors removed to reduce size Switchcraft EN3P3M

Cn 16 1 Power Connector 3 Conductor W/P Switchcraft EN3C3F

Cn 17 1 Power Plug 3 Conductor W/P Switchcraft EN3P6M

Cn 18 1 Data Connector 6 Conductor W/P Switchcraft EN3C6F

Cn 19 1 Data Plug 6 Conductor W/P Hammond 1590A

Cn 20 1 Connector box Right angle attachment box. Polyethelene

Cn 21 2 Anti-chafe sheets 4”x6”, 4”x4” With power feed circuit in connector.

Cn 22 1 Lower Microphone cable harness Front


Table B.4: Neck Parts BOM

DRDC Suffield CR 2012-121 37


Usage Part#1 Part#2 Qty Size Length Units Style
neck servo wheel mounting CH25 CH29 4 #2-56 0.63 in pan head

pan Swivel mounting bolts ch 33 ch 14 4 #2-56 0.63 in pan head

Tilt Swivel mounting bolts ch 06 ch 27 4 #4-40 0.50 in pan head

neck servo mounting ch 08 8 #4-40 0.50 in pan head

neck encoder mounting ch 13 4 #4-40 0.50 in pan head

HD RA Servo mount Tilt ch 03 ch 07 4 #4-40 0.50 in pan head

SS555 mounts ch 05 SS555 4 #4-40 1.00 in pan head

tilt encoder Hub to servo ch 27 ch 30 2 #5-40 0.38 in f at head

Heat Sink f llets to plate cn-03 cn 04 6 #6-32 0.50 in pan head

Heat Sink f llets to housing cn 04 4 #6-32 0.38 in pan head

Neck top plate to lower drive ring CH 12 Ch 02 4 M4 12.00 mm pan head

Top cog to lower drive ring CH 12 ch 24, ch 11 4 M4 25.00 mm pan head

bearing ring ch 03 ch 05 4 M4 12.00 mm pan head

Tilt bearing holder to cleats ch15 ch 04 4 M4 12.00 mm f at head

Heat Sink plate to Heat spreader cn-03 6 M3 12.00 mm pan head

Tilt bearing holder cleats to base plate ch 03 ch 15 4 M3 12.00 mm pan head

Tilt Mounts to Pan Tilt Bridge ch 10 ch 06, ch 09 4 M3 12.00 mm f at head

E2 encoder mounting ch 23 ch 10 4 M2 8.00 mm pan head

Camera mounts ch 14 ch 37 4 M2 10.00 mm f at head


Table B.5: Fastener List

38 DRDC Suffield CR 2012-121


Annex C
Assembly Instructions
C.1 Eye camera mounts, Encoder, and Servo Motor
1. Fasten the camera mount to the bottom of the camera with the 2 2mm f at head
machine screws.

2. Insert the camera shaft through the lower bearing, the tilt bar, the f anged upper
bearing with extended inner race, and thread into the camera mount with loctite to
hold it. Use a slotted screwdriver to tighten.

3. Fasten the encoder base to the bottom of the tilt bar over the lower bearing using 2
* #2-56x1/4” screws.

4. Slide the encoder disk onto the shaft with the index stripe pointing outboard
centered on the connector while the camera is facing forward. Tighten the setscrew.

5. Install the encoder housing with the 2 * 4-40 screws.

6. Use 'Servo Tape' (double-sided self-stick tape) to fasten the servomotor to the tilt
bar with the round part of the servo horn in the notch provided.

7. Use the hole in the servo horn at 19mm from the pivot shaft for the ball swivel
linkage.

8. Trim off excess servo horn.

9. Set servos to 0 position (EM_SPosition)

10. Install servo horns pointing aft, and one spline inboard from center.

11. Install the Ball linkage shaft between the servo horn and the camera mount.

C.2 Neck assembly


1. Assemble the PC-104 stack using #4-40 hardware and standoffs between the cards.
From top Down :

(a) Aluminum Heat plate


(b) CPU board with heat pipe cooling unit.
(c) Stub Ribbon cables for PS/2, video, Com1, Com2, and USB/Audio/Ethernet as
well as the IDE cable and the clock battery cable must be installed before
attaching to the aluminum heat plate with 6 * M3 screws and 4 * #4-40 screws.
Heat sink compound is used between the Heat Pipe plate and the Aluminum
plate.
(d) Firewire interface card

DRDC Suffield CR 2012-121 39


(e) CAN Interface Card
(f) Power Supply card.

2. Install the Firewire, power and CAN cables and slide the stack all electronics into
versa-tainer. Using heat sink compound between the heat sink plate and the fillet
bolt the plate to the f llets using 6 * #6-32 screws.

3. Cables exiting top for: power, CAN, Com1, f rewire, audio.

4. Install nylon chaf ng sheet to cover the power supply.

5. Install the desired lower panel and mount the verstainer on its bench stand.

6. Bolt lower bearing ring to versatainer top 4 * M4x12

7. Install nylon chaf ng sheet to cover the heat plate.

8. Install woven nylon chaf ng tubes over the cables. Use one for power and f rewire,
one for audio, and one for CAN and COM cables.

9. Install versatainer top plate Feeding Power, Can, and f rewire cables through hole
using 8 * Self-tapping screws and sealant.

C.3 Head assembly


1. Clamp bearing to head base plate. 4 * M4x12

2. Install neck encoder. Index mark should be under connector when head is facing
forward. Use 1 * #4-40 x.5 from the bottom, 1 * #4-40 x1.25 from the top. The
long bolt goes closest to center of plate.

3. Insert long bolt until it bottoms on the bearing holder.

4. Tighten down the hex standoff.

5. Tighten down the nut on top of the encoder plate after everything else is lined up.

6. Fasten Neck servo with belt on it to base plate 4 * #4-40x.5. Ensure the belt is on
before installation

7. Install the tilt servo and encoder assembly. 6 * #4-40x.5 Note bolt closest to center
of plate needs a nut under the head to shorten it to clear the bearing holder.

8. Install SS555+interface using 3/16 nylon spacers in 4 places using 3 * #4-40 x 1


and 1 * #4-40 x .5

9. Install the tilt bar assembly. with cameras using 4 * 4mm x 12

10. Install tilt linkage and adjust.

40 DRDC Suffield CR 2012-121


C.4 Head to Neck Assembly
1. Set plate and bearing on lower ring

2. Line up top pulley and upper ring and place under belt

3. Fasten top pulley to lower ring 4 * M4x25

4. Feed cable through the funnel.

5. Insert the funnel into top pulley until black line is f ush with top of the top pulley.

6. Connect cables.

7. Apply power to the head. This will set all the servo motors to center position.

8. Ensure the arrow on the hub of the neck encoder is centered under the connector.

9. Line up the head facing forward on the neck.

10. Slide the belt over the large pulley and adjust the encoder for tension.

11. Note too much belt tension will bend the small pulley on the servo motor shaft.

12. Now you are ready to calibrate the alignment of all 4 motors.

DRDC Suffield CR 2012-121 41


Annex D
Demonstration Application
D.1 Demonstration Operation Mode.
We have set up a demonstration of the various components that runs on a Win XP
platform on the Computer in the neck. These apps are set up to start automatically when
windows boots up. CPU Monitors Winbond is a display panel that shows CPU voltages
and temperatures. The clock shows a display of the current CPU speed. The PGR
demonstration app FlyCap is used to show the real time display of the cameras. Two
copies of FlyCap are started, one for each camera. Each copy of FlyCap brings up a
camera selection dialog. Select a different camera Id from each copy and then press the
Green start arrow on the toolbar to start video capture mode. We took an open source
audio oscilloscope program and modifie it to handle multiple sound cards and to save
the selections for easy start-up. Two copies of the Oscilloscope4 program are started
up. Each one uses a different ini fil to save current parameters. One copy is connected
to the Onboard Sound card using the Line In inputs for the front microphones. The
other copy is connected to the USB sound card using the microphone input for the rear
microphone. Press the Run button at the top right of the oscilloscope screen to start
each display.

D.1.1 Eye/Head operations

The Application Eyetest was written in Delphi to conf gure, test and operate
all of the functions in the Heads SS555. This application uses small messages
to communicate parameters and commands to the heads SS555. This
application (and its partner in the SS555) presently uses a serial com port but
is designed to be easily ported to a CAN based communication link.

Each message is designed to easily f t in a CAN message buffer, and be


independent of other messages or sequencing. Most messages have a reply
that is sent immediately. One exception is where a position request (EM
Position) will not reply until the position is within the resolution (EM Pos
Resolution) specif ed for that motor. Some Commands (Poll, Dump etc.)
trigger a series of replies from the head. The Poll sequence of status and
dynamic variable is automatically sent from the head once per minute.

Individual parameters can be changed on the parameter page. If you change a


value and move to the next cell (up, down, left, right etc.) the new value will
be sent to the head. To start operations of the head you firs need to initialize
the Eyemotors. On the Operations Page click on the Initialize button. This
will send out a stop command for each eyemotor and then follow with a start
calibration command for each eyemotor. After a brief period of wiggling to
find the encoders index each eyemotor should go to run mode (6). If the
eyemotor cannot find the index in a short period of time it will go into failure
(7) and stop.

42 DRDC Suffield CR 2012-121


D.1.2 Calibration

Once the eyemotors are initialized the user can enter in the trim or offset
parameters that will line up the head and cameras to a known reference. Since
the SS555 does not have any EEPROM for data storage the trim variables
need to be sent to the SS555 after each power up.To determine an offset value
for a particular eyemotor, use the Parameters page. First set the offset
value (EM QOffset) for the desired motor to 0. Manually enter a position into
the appropriate column of the EM Post ion row to get the motor to point in the
desired direction. Once each offset is determined it should be noted down and
then the position is entered as the offset.

Head Pan Alignment can be done by aligning the front and rear cover
mounting screws with the fore/aft center-line of the robot.

A target with one horizontal line and 2 Vertical lines (space at the camera
baseline distance of 56 mm) can be placed in front of the head. Using the
motor position(as described above) to line up the appropriate cross to the
center pixels of the camera image.

DRDC Suffield CR 2012-121 43


Annex E
Message Format

Item Name ID Description Values Instances Read/Write


EM Op Mode /State 0 Eyemotor Operation mode or state 0 = disabled 4 R/W
1 = Start calibration
2..5 = calibrating
6 = normal operation

7 = Failed to calibrate

EM Position 1 EyeMotor position in radians 0=center 4 R/W


>0 =left or up

<0 = right or down

EM Pos Min 2 Minimum allowed position As above 4 R/W

EM Pos Max 3 Maximum allowed position As above 4 R/W

EM Period 4 PID Control loop period in ms. 4 R/W

EM kP 5 PID Proportional factor 4 R/W

EM kI 6 PID Integral factor 4 R/W

EM kD 7 PID Derivative factor 4 R/W

EM Pos Resolution 8 Report position success when error <= this 4 R/W

value.

EM SPeriod 9 Servo Pulse period in ms 10 ... 100 4 R/W

EM STMin 10 Servo Position Minimum in us. -800 ... 800 4 R/W

EM STMax 11 Servo Position Maximum in us. -800 ... 800 4 R/W

EM STCenter 12 Servo Position trim offset in us. -800 ... 800 4 R/W

EM SPosition 13 Servo Position in us. (added to 1500 us + -800 ... 800 4 R/W

STCenter)

EM QPos 14 Current raw encoder reading -4000 ... 4000 4 R/W

EM QScale 15 Scale to turn counts into radians 4 R/W

EM QOffset 16 Offset in radians from Index to Center 4 R/W

Pwr Enable 17 Digital output to control power supplies and 0 ... 1 8 R/W

others

AD Input 18 Columns : 0=Encoders, 1=Servo1, 0 ... 5000 8 R

2=Servo2, 3=Firewire

AD Scale 19 Analog voltage inputs *100 ie 500 = 5.00v 0.5 ... 10 8 R/W

Set All Power controls 20 Eyemotor Operation mode or state 0 ... 1 1 W

Set All Modes 21 EyeMotor position in radians 0 ... 1 1 W

Dump 22 Minimum allowed position 0 = dynamic values only 1 W

1=all parameters

Version 23 Version info. 3 R


Columns:
0= RTEMS=1/Bare=0
1 = Version number ie 103

2 = Ram Base

Table E.2: Message Format Table

44 DRDC Suffield CR 2012-121


Annex F
List of abbreviations/acronyms/initialisms
1D One Dimenaional

2D Two Dimensional

3D Three Dimensional

4D Four Dimensional

A Ampere

AC Alternating Current

ADC Analog to Digital Conversion

ABI Application Binary Interface

API Application Programmer Interface

AO Area of Operations

AOR Area of Responsibility

ANSI American National Standards Institute

ASCII American Standard Code for Information Interchange

BIT Built-In Test

BOM Bill of Materials

C4ISR Command, Control, Communications, Computers, Intelligence, Surveillance,


and Reconnaissance

C Celsius

CAN Control Area Network

C/A Course Acquisition GPS

CCD Charge-Coupled Device

COM Communication

COTS Commercial Off The Shelf

CPU Central Processing Unit

CR Carriage Return

DC Direct Current

DRDC Suffield CR 2012-121 45


DM Domain Model

DMU Dynamic Measurement Unit

DOF Degrees of Freedom

DGPS Differential GPS

ECEF Earth-Centred, Earth-Fixed

F Fahrenheit

FOG Fibre Optic Gyroscope

FOV Field Of View

GCC GNU Compiler Collection

GNU Gnu’s Not Unix

GPS Global Positioning System

HAAW Heavy Anti-Armour Weapon

HD High Density

IEC International Electrotechnical Commission

IEEE Institute of Electrical and Electronics Engineers

IMU Inertial Measurement Unit

IP Intellectual Property

IP54 Ingress Protection or International Protection rating 54

ISO International Standards Organization

JAUS Joint Architecture for Unmanned Systems

JTA Joint Technical Architecture

LAAW Light Anti-Armour Weapon

LAN Local Area Network

LF Line Feed

MGRS Military Grid Reference System

MIOS Multi Input Output System

MMU Memory Management Unit

MPIO Multi-Purpose Input Output

46 DRDC Suffield CR 2012-121


MSL Mean Sea Level

NBC Nuclear, Biological, Chemical

NEMA National Electrical Manufacturers Association

NIST National Institute of Standards and Technology

NSU Navigational Sensor Unit

NTP Network Time Protocol

OCS Operator Control Station

OCU Operator Control Unit

OEM Original Equipment Manufacturer

OPI Off ce of Primary Interest

OS Operating System

PC Personal Computer

PGR Point Grey Research

PID Proportional Integral Differential

POST Power-On Self Test

PCM Pulse Code Modulation

PWM Pulse Width Modulation

QADC Queued Analog Digital Conversion

RA Reference Architecture

RC Radio Controlled

RFP Request For Proposal

RGA Rate Gyro Accelerometer

RMS Root Mean Square

RPG Rocket Propelled Grenade

RPY Roll, Pitch, Yaw

RTEMS Real-Time Executive Management System (originally Real-Time Executive


for Missile Systems)

RTK Real-Time Kinematic

DRDC Suffield CR 2012-121 47


RTOS Real Time Operating System

SAE Society of Automotive Engineers

SI System International

SMA Senior Military Advisor

SS Steroid Stamp

TPU Time Processor Unit

TNA Thermal Neutron Activation

UAV Unmanned Aerial Vehicle

UGV Unmanned Ground Vehicle

US United States

USA United States of America

USV Unmanned Space Vehicle

UTC Universal Time Coordinated

UTM Universal Trans Mercator

UUV Unmanned Underwater Vehicle

UxV Unmanned (Aerial, Ground, Underwater, Space) Vehicle

V Volt

WG Working Group

WGS World Geodetic System

48 DRDC Suffield CR 2012-121


$QQH[ *
1RWDWLRQ
α ODWLWXGH DQJOH

β ORQJLWXGH DQJOH

σ 9DULDQFH RI D YDULDEOH SRSXODWLRQ

σ 6WDQGDUG GHYLDWLRQ RI D YDULDEOH SRSXODWLRQ

θ URWDWLRQ DERXW WKH PRGLI HG \D[LV LQ UDGLDQV IRU (XOHU 53<

φ URWDWLRQ DERXW WKH PRGLI HG [D[LV LQ UDGLDQV IRU (XOHU 53<

ψ URWDWLRQ DERXW LQLWLDO ]D[LV LQ UDGLDQV IRU (XOHU 53<

 /RFDO &RRUGLQDWH )UDPH RI 5HIHUHQFH 5RERW HJRFHQWULF

S /RFDO &RRUGLQDWH )UDPH SRVH 5RERW HJRFHQWULF

SU S\ -$86FRPSOLDQW

S: :RUOG &RRUGLQDWH )UDPH SRVH

S: −87 0 :RUOG &RRUGLQDWH )UDPH SRVH ZLWK 870

T TXDWHUQLRQ YHFWRU

T TXDWHUQLRQ FRQMXJDWH

TV TXDWHUQLRQ VFDODU FRPSRQHQW

T[ TXDWHUQLRQ LPDJLQDU\ SURMHFWLRQ DORQJ WKH L D[LV

T\ TXDWHUQLRQ LPDJLQDU\ SURMHFWLRQ DORQJ WKH M D[LV

T] TXDWHUQLRQ LPDJLQDU\ SURMHFWLRQ DORQJ WKH N D[LV

V 9DULDQFH RI D YDULDEOH VDPSOH

V 6WDQGDUG GHYLDWLRQ RI D YDULDEOH VDPSOH

[ [ GLVSODFHPHQW LQ ORFDO SRVH

[: [ GLVSODFHPHQW LQ JOREDO SRVH

\ \ GLVSODFHPHQW LQ ORFDO SRVH

\: \ GLVSODFHPHQW LQ ORFDO SRVH

] ] GLVSODFHPHQW LQ JOREDO SRVH

'5'&6XIILHOG&5 
zW z displacement in global pose

b baseline length

c subject bisection axis

d target distance

x bisected distance

A Left-eye intercept angle

B Right-eye intercept angle

C Complimentary angle

D Displacement vector

G Units of gravity (9.81 sm2 )

R Rotation Matrix

T Transformation Matrix

W World Coordinate Frame of Reference

50 DRDC Suffield CR 2012-121


Annex H
Def nitions (from Merriam-Webster1)
fovea a small rodless area of the retina that affords acute vision.

intensity 1: the quality or state of being intense; especially : extreme degree of


strength, force, energy, or feeling 2 : the magnitude of a quantity (as force or
energy) per unit (as of area, charge, mass, or time).

luminance 1 : the quality or state of being luminous 2 : the luminous intensity of a


surface in a given direction per unit of projected area.

ref ectance the fraction of the total radiant f ux incident upon a surface that is ref ected
and that varies according to the wavelength distribution of the incident radiation –
called also ref ectivity.

stochastic 1 : RANDOM; specif cally : involving a random variable 2 : involving


chance or probability : PROBABILISTIC

1 By permission. From Merriam-Webster’s Collegiate® Dictionary, Eleventh Edition ©2006 by Merriam-


Webster, Incorporated (www.Merriam-Webster.com).

DRDC Suffield CR 2012-121 51


This page left intentionally blank
DOCUMENT CONTROL DATA
(Security classif cation of title, body of abstract and indexing annotation must be entered when document is classif ed)

1. ORIGINATOR (the name and address of the organization preparing the document. 2. SECURITY CLASSIFICATION
Organizations for whom the document was prepared, e.g. Centre sponsoring a
contractor’s report, or tasking agency, are entered in section 8.) UNCLASSIFIED
Defence Research and Development - Suff eld (NON-CONTROLLED GOODS)
DMC A
PO Box 4000, Medicine Hat, AB, Canada T1A 8K6
REVIEW: GCEC December 2013

3. TITLE (the complete document title as indicated on the title page. Its classif cation should be indicated by the appropriate
abbreviation (S,C,R or U) in parentheses after the title).

Perception Module for Autonomous Mobile Robotics

4. AUTHORS
(Last name, firs name, middle initial. If military, show rank, e.g. Doe, Maj. John E.)

MacKay, I.

5. DATE OF PUBLICATION (month and year of publication of document) 6a. NO. OF PAGES (total 6b. NO. OF REFS (total cited in
containing information. Include document)
Annexes, Appendices, etc).

December 2007 68 28

7. DESCRIPTIVE NOTES (the category of the document, e.g. technical report, technical note or memorandum. If appropriate, enter the type of report,
e.g. interim, progress, summary, annual or final Give the inclusive dates when a specifi reporting period is covered).

Contract Report

8. SPONSORING ACTIVITY (the name of the department project off ce or laboratory sponsoring the research and development. Include address).

Defence Research and Development - Suff eld


PO Box 4000, Medicine Hat, AB, Canada T1A 8K6

9a. PROJECT OR GRANT NO. (if appropriate, the applicable research and 9b. CONTRACT NO. (if appropriate, the applicable number under which
development project or grant number under which the document was the document was written).
written. Specify whether project or grant).

12RJ05 W7702-05R100/001/EDM

10a. ORIGINATOR’S DOCUMENT NUMBER (the off cial document number 10b. OTHER DOCUMENT NOs. (Any other numbers which may be
by which the document is identif ed by the originating activity. This assigned this document either by the originator or by the sponsor.)
number must be unique.)

DRDC Suffield CR 2012-121

11. DOCUMENT AVAILABILITY (any limitations on further dissemination of the document, other than those imposed by security classif cation)
( X ) Unlimited distribution
( ) Defence departments and defence contractors; further distribution only as approved
( ) Defence departments and Canadian defence contractors; further distribution only as approved
( ) Government departments and agencies; further distribution only as approved
( ) Defence departments; further distribution only as approved
( ) Other (please specify):

12. DOCUMENT ANNOUNCEMENT (any limitation to the bibliographic announcement of this document. This will normally correspond to the Document
Availability (11). However, where further distribution beyond the audience specif ed in (11) is possible, a wider announcement audience may be
selected).

Unlimited
13. ABSTRACT

The purpose of the Perception Module is to capture imagery data from a binocular vision pair and
transform that information into distance estimates in the reference frame of nScorpion robot.
The Perception Module is a component in the overall design of the nScorpion Explosive
Ordinance Disposal (EOD) / Improvised Explosive Device (IED) robot system. nScorpion is an
advanced demonstration robot intended to advance the state of the art in EOD/IED robotics. The
goal of this research project is to demonstrate a higher degree of autonomy for current robotics
and augment the capability of soldiers or EOD/IED technicians. By improving overall autonomy,
nScorpion intends to show that humans can supervise individual or teams of robots and in some
cases remain one step back from dangerous situations.
Le Module de perception (MP) est un instrument scientifique visant à explorer la vision
artificielle binoculaire et en faire la démonstration; il capte une imagerie binoculaire à partir de
deux caméras articulées puis transforme ces données en estimations de distance dans un
cadre de référence rétinocentrique. Il a été conçu pour être abordable, léger et robuste,
approprié aux applications robotiques militaires et expériences scientifiques. Ses applications
potentielles sont vastes, et on peut le fixer à bien des types de plates-formes. C’est l’un des
composants du système robotique de neutralisation des explosifs et munitions (NEM) et de lutte
contre les dispositifs explosifs de circonstance (C-IED) nScorpion (northern scorpion -
paruroctonus boreus), un robot de démonstration avancé destiné à faire des percées en NEM
et C-IED. Le projet nScorpion vise à démontrer l’autonomie des robots actuels et amplifier les
capacités des soldats et des techniciens en NEM et C-IED. En mettant en œuvre des
mécanismes et algorithmes novateurs de vision artificielle binoculaire, le Module de perception
fera la démonstration de nouvelles techniques visant à améliorer la localisation et mieux
comprendre la vision. Le présent document en décrit en partie les théories, les justifications, la
conception et les applications, et recommande certaines améliorations.

14. KEYWORDS, DESCRIPTORS or IDENTIFIERS (technically meaningful terms or short phrases that characterize a document and could be helpful in
cataloguing the document. They should be selected so that no security classif cation is required. Identif ers, such as equipment model designation, trade
name, military project code name, geographic location may also be included. If possible keywords should be selected from a published thesaurus. e.g.
Thesaurus of Engineering and Scientif c Terms (TEST) and that thesaurus-identif ed. If it not possible to select indexing terms which are Unclassif ed, the
classif cation of each should be indicated as with the title).

robots, autonomous robotics, EOD, IED, selective perception, binocular vision, vergence, stereopsis, dispar-
ity, triangulation, COHORT, ALS, AIS program

You might also like