Professional Documents
Culture Documents
Perception Module For Autonomous Mobile Robotics
Perception Module For Autonomous Mobile Robotics
'HYHORSPHQW&DQDGD SRXUODGpIHQVH&DQDGD
I. MacKay
Denman Software Corp.
The scientific or technical validity of this Contract Report is entirely the responsibility of
the Contractor and the contents do not necessarily have the approval or endorsement of
the Department of National Defence of Canada.
'HIHQFH5 '&DQDGD
Contract Report
DRDC Suffield CR 2012-121
December 2007
Perception Module for Autonomous Mobile
Robotics
Perception Module Mk.I
I. MacKay
Denman Software Corp.
The scientific or technical validity of this Contract Report is entirely the responsibility of
the Contractor and the contents do not necessarily have the approval or endorsement
of the Department of National Defence of Canada.
Resume
Le Module de perception (MP) est un instrument scientifique visant à explorer la
vision artificielle binoculaire et en faire la démonstration; il capte une imagerie
binoculaire à partir de deux caméras articulées puis transforme ces données en
estimations de distance dans un cadre de référence rétinocentrique. Il a été conçu pour
être abordable, léger et robuste, approprié aux applications robotiques militaires et
expériences scientifiques. Ses applications potentielles sont vastes, et on peut le fixer à
bien des types de plates-formes. C’est l’un des composants du système robotique de
neutralisation des explosifs et munitions (NEM) et de lutte contre les dispositifs
explosifs de circonstance (C-IED) nScorpion (northern scorpion - paruroctonus
boreus), un robot de démonstration avancé destiné à faire des percées en NEM et C-
IED. Le projet nScorpion vise à démontrer l’autonomie des robots actuels et amplifier
les capacités des soldats et des techniciens en NEM et C-IED. En mettant en œuvre
des mécanismes et algorithmes novateurs de vision artificielle binoculaire, le Module
de perception fera la démonstration de nouvelles techniques visant à améliorer la
localisation et mieux comprendre la vision. Le présent document en décrit en partie les
théories, les justifications, la conception et les applications, et recommande certaines
améliorations.
This report outlines the modifie design of the Perception Module, describes questions
it should answer, and recommends system improvements. The inclusion of aural and
visual sensors make queued perception experiments possible and could lead to
improved operation performance. The successful implementation onto an EOD robot
will make a number of important perception and localization experiments possible.
While this autonomy will not exceed human performance, it can remove some of the
positive control burden, a factor in current EOD operations [4], and demonstrate a
glimpse of a future when force-multiplied humans team with autonomous devices.
1. Separate the head from the neck, machine a new neck plate, and re-cable the
interface from the neck to the head;
2. Modify the head plate to include a spindle on the neck servo pulley to support the
rotation of the neck;
3. Replace the Flea cameras with Flea2 cameras to improve camera stability and
update frequency;
4. Add limit switches to the boundaries of eye motion to prevent damage and improve
calibration.
Le document décrit aussi en détail le rôle et l’importance de ce type d’élément dans les
systèmes autonomes de l’avenir. Le Module de perception a été conçu pour être
abordable, léger et robuste, approprié aux applications robotiques militaires et
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Resume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Executive Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Sommaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Table of contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
List of tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2. Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3. Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Annexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
A.4 Cameras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
A.5 Lenses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
C Assembly Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
D.1.2 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
E Message Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
F List of abbreviations/acronyms/initialisms . . . . . . . . . . . . . . . . . . . . . 45
G Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
List of f gures
Figure 2. The four main image intensity factors affecting sensed luminance . . . . . . . 4
Figure 3. Simplif ed pinhole camera visual f xation estimating distance from vergence . . 7
Figure 5. Original KTH Binocular Vision System taken from Mertschat [7] . . . . . . . 9
Figure 10. Left eye (left) and right eye (right) images from Perception Module . . . . . . 17
The Perception Module is a component in the overall design of the nScorpion EOD
robot system. nScorpion is an advanced demonstration robot intended to advance the
state of the art in EOD robotics. The goal of this research project is to demonstrate a
higher degree of autonomy for current robotics and augment the capability of soldiers
or EOD/IED technicians. By improving overall autonomy, nScorpion intends to show
that humans can supervise individual or teams of robots and in some cases remain one
step back from dangerous situations. While this autonomy will not exceed human
performance, it can remove some of the positive control burden, a factor in current
EOD operations [5] and demonstrate a glimpse of a future when force-multiplied
humans team with autonomous devices.
In order to advance the ability of the current EOD robotics, a quantum leap in sensing
and computation is required as well as a paradigm shift away from the pure positive
control teleoperation philosophy. Delegating autonomy/authority down to the machine
for certain tasks makes it practical, in terms of bandwidth and human situational
awareness, to control an arbitrary number of autonomous robots by one operator.
Autonomous operation implies that the machine itself would need a sensor system
capable of distinguishing objects as small as EOD targets, like the 68mm diameter
PMA-3 AP mine, in a simple- to moderate- complexity environment. The largest
bandwidth sensor to date is an electro-optic camera; time-of-f ight sensor arrays alone
are not suff cient for volumetric mapping in these environments. Therefore, a visual
system capable of processing many frames per second is required at a minimum. It
must be able to give precise pose estimates from objects of interest below 10cm in size
and rapidly adjust this estimate in a volumetric sense so that the nScorpion may move
to intercept or manipulate. Since it must be autonomous, the computing power to
conduct data fusion levels 0 (pre-processing) through 3 (threat assessment) on the
sensor data, as def ned by the JDL def nitions[6], must be onboard.
The current state of the art machine vision, in general, employs co-planar stereo vision
sensors implementing, among other techniques, disparity mapping. Disparity mapping
computes volumetric data from the difference between images (left/right) of objects
that can be seen in both. Beyond the diff culties in calibrating intrinsic parameters and
extrinsic parameters, a disparity mapping / stereo vision system is computationally
The Perception Module uses a binocular vision system that can estimate distance using
eye vergence techniques. The eye vergence method estimates distance using the
difference in angles (vergence) measured from the eyes as they focus on the same target
object. Essentially, rather than solving a large matrix inversion problem, this method
relies on the opto-mechanical system to move the eyes into a solution. With the eye
vergence approach a maximal solution can be local but not necessarily global without
The Perception Module requires a system of sensors and computing that could meet the
onerous real-time machine vision, world modelling, localization, and control set out in
this project. Given no available binocular vision design at DRDC Suff eld, a baseline
was selected to reduce effort. The Perception Module design is based on the “KTH
head”, a binocular vision system developed at Stockholm University, by Oliver
Mertschat et. al [7]. The KTH head design demonstrated modular, cost-effective, and
f exible design qualities that were suited to f t into the nScorpion EOD system. Figure 5
details the original KTH binocular vision system. The purpose of Mertschat’s binocular
vision system was to provide a vision based sensor system for the control/operation of
small inexpensive mobile robots made of Commercial Off The Shelf (COTS)
components. The advantage of adopting this design as a baseline was that it made it
possible to also conduct disparity mapping if and when required to augment the
selective perception of the vergence technique. Unlike the KTH head, the Perception
Module is destined to be mounted above the robot platform and so it has improved
look-up and look-down capability. This allows it to look at the platform and look at
small objects near the front of the platform for manipulation. A number of additional
electronics and mechanical improvements undertaken at the same time are detailed in
Section 3.
This memorandum outlines the Perception Module and discusses in greater detail the
role and importance of this type of component in future autonomous systems. Section
2. presents a brief review of the theory. Section 3. describes the design of the
Perception Module. Section 4. comments on the evaluation of the design in its current
form. Section 5. summarizes the capabilities and outlines some improvements to the
Perception Module.
2. Theory
This memorandum has enough space to brief y and unjustly outline vision theory.
According to Aristotle, seeing is knowing what is where. According to Marr[3], vision
is an information processing task conducted by the visual system operating on three
levels. There are many great references; general references are Pinker[8], Hubel[2],
Marr[3], Grimson[9], and Friedman[1]. Some references on the neurocomputational
The general idea is that vision is the result of the eye’s and brain’s neural activity
processes arrived at by multiple parallel computing systems (or Parallel Distributed
Processing PDP [10]) based on an understanding of real objects’ luminance properties.
Figure 2 describes the four main factors that affect the luminance sensed on a retina or
a camera image plane from objects in the f eld of view; they are illumination, surface
luminance, object geometry, and viewpoint. Refer to Annex H for word def nitions.
It is believed that human eye movements[2] alternate between visual f xations and
saccades to move foveas (the higher resolution colour cone cell area of the retinas [26]
- see Annex H) around to perceive objects of interest. This is intuitive and can be
demonstrated by anyone; hold up one hand at eye level in your peripheral vision and
attempt to perceive the hand detail- the view is blurry. Compare peripheral hand detail
to the hand detail when your hand is in front of your eyes at visual f eld centre. Hubel
[2]described the process succinctly:
“First, you might expect that in exploring our visual surroundings we let
our eyes freely rove around in smooth, continuous movement. What our two
eyes in fact do is f xate on an object: we adjust the positions of our eyes so that
the images of the object fall on the two foveas; then we hold that position for a
brief period, say, half a second; then our eyes suddenly jump to a new position
by f xating on a new target whose presence in our visual f eld has asserted
itself, either by moving slightly, by contrasting with the background, or by
presenting an interesting shape. ”-Hubel[2]p79.
Saccades and micro-saccades are the simultaneous movements (or stationary w.r.t.
object in view and head/body movement) of the eyes to coalesce the retinal/foveal
imagery into a single binocular view. Micro-saccades are minuscule (approximately 1
to 2 arc minutes- imperceptible to the eye [2]) movements believed to update the cone
and rod cell image that respond to changes in luminance - these cells need movement or
the image is lost. Fixation is the maintenance of gaze in a constant direction, in the case
of vergence it is the visual fixatio on a subject object to produce a binocular view. To
look at an object closer by, the eyes rotate 'towards each other' (convergence), while for
an object farther away they rotate 'away from each other' (divergence). Vergence is one
physiological mechanism capable of estimating a solution to the correspondence
problem between binocular vision images because the extreme precision with which
vergence eye movements can be controlled [3] makes it possible to infer slight distance
changes. The diff culty lies in determining the true correspondence between images.
Fukushima et. al. [27] proposed that primates with frontal eyes use vergence
movements (eyes rotate in opposite direction ) to track small objects moving towards or
away from them and the smooth pursuit system (eyes rotate in same direction ) to track
small objects in frontal pursuit. Without duplicating vision entirely, the fact that
distance estimation from visual processing is unconscious and yet accurate suggests
that autonomous machines could attain real-time localization using such techniques.
Future papers will describe the mathematics in proper detail. For the purpose of
presenting the vergence technique, an oversimplif ed 2D scenario is presented.
Consider Figure 3, which describes a simplif ed pinhole camera- based binocular vision
system. The cameras are assumed to be coplanar with the target object and in the
frontal f eld of view of both cameras. Let us suppose that EL represents the left eye, and
ER represents the right eye as pinhole cameras with a f at 2D image plane parallel to the
3D Subject object in view. OL and OR represent the left and right eye origins (which
would be the respective foveal origins in the human eye [2]). Angles A and B represent
the rotated angle from perpendicular to the image plane for the left and right eyes
respectively. Distance b represents the variable baseline between the optical axes and
distance d represents an estimated distance to a point on the 3D subject of the f xation.
It could be argued that baseline b is f xed for further approximation. Using saccades
and micro-saccades, the angles A and B are adjusted until there exists agreement in the
visual system that the Subject lies in the optical centre of both eyes. The line c
bisecting Subject is co-planar with the lines c’ bisecting the images planes of left and
right eye. The image appearing on the image planes will be reversed and upside down.
The Line c intersects Subject at the estimated distance d from the binocular baseline b.
In reality, there would be a slight skew and misalignment of the left eye vs. right eye
image planes in biological eyes as well as the Charge Coupled Device (CCD)
electro-optical cameras used in the Perception Module owing to the intrinsic and
extrinsic parameters of the camera/eye. But in this ideal model it is assumed that the
eyes’ image planes are perfectly co-planar and therefore c, c′ L , and c′ R are co-planar.
l ′ L (Image vertical centre line left eye) and l ′ R (Image vertical centre line right eye) are
adjusted to correspond with l (Subject object centre line). The Perception Module
collectively tilts the eyes so that in general terms this will be the case post-calibration.
Both eyes are able to rotate around a vertical optical axis out of the paper co-located
with OL and OR respectively. It is assumed that the optical axis and the rotation axes
are identical unlike the human eye where the optical origin and the foveal origin are
not. The f xation movement attempts to align the subject vertical centre line l with the
Given the above assumptions, the known angles A and B, the known/ measured
baseline b, we can determine d. The trivial case is presented: where subject object is
within the baseline plane b and directly in front of view. Distance d can then be inferred
from b, A, and B.
(tanAtanB)
d=b
(tanA + tanB)
The bisecting point x along baseline b intersecting d can be determined. In general, it
will not bisect along baseline b unless angles A and B are identical. The following
equation can determine where the bisection point is.
(tanA)
x = b (tanA+tanB)
This basic triangulation techniques can be used to estimate distance to subjects as the
robot localizes itself.
3. Design
This section outlines the design of the Perception Module. DRDC did not have a
current binocular vision system in-use or developed in-house, therefore an external
baseline design was chosen. Refer to Figure 5 for the KTH design baseline. Refer to
Figure 6 for a front view of the assembled Perception Module. The Perception Module
consists of two sub-components, the head and the neck. The head houses 2 cameras
mounted in an actuated pan tilt assembly on a rotating neck. This actuated pan tilt
assembly gives the head 4 degrees of freedom (DOF) to adjust the eyes in relation to
the Perception Module origin. In addition, three microphones are located at 120 degree
intervals around the plastic case for aural sensing. The cameras and the microphones
are interfaced to a computer in the neck housing. The head’s components are controlled
by a micro-controller located in the head. The microphones and eyes are cabled to the
neck computer for processing. The head controller and neck computer communicate to
one another via RS-232 and CANBus interfaces. The overall assembly is enclosed and
housed in an outdoor splash and dust-proof housings that meets IP54 rating.
The following sections describe the hardware and software respectively. Further
hardware details are available in the attached annexes. Annex A describes the technical
specif cations of the Perception Module Mk.I. Annex B contains the bill of materials
(BOM) for the design. Annex C describes the assembly instructions. The following
section 3.1 describes the hardware design and 3.2 describes the software design.
(b) (b)
Figure 5: Original KTH Binocular Vision System taken from Mertschat [7]
(a) Axonometric 3D Model (top) from front cover/ Figure 8 [7]and (b) mechanical drawing
(bottom) Figure 10 [7]
This sealed housing has no fans and uses conduction cooling transfer heat to
the aluminum housing. The heavy aluminum conduction plates transfers
processor heat to the housing exterior. This conduction reduces normal CPU
temperature above 20 degrees C ambient. The CPU core temperature must
stay below 100 degrees C. Extreme thermal conditions can occur when the
CPU is operating near 100% inside the sealed container. This poses a serious
operating restriction to the proper function of the computer. This became the
critical risk for the Perception Module. The conduction heat sink has proven
to dissipate heat fast enough that the computer does not shut down when
operating at normal ambient temperatures. Elevated temperature operation
testing has not been done at this point. The CPU speed or the frequency of
update could be can be tuned to alleviate this condition by reducing processor
cycles. The housing has been powder coated white to minimize self-heating in
a sunny environment.
This neck housing is sealed to meet IP54 requirements. The IP54 rating
def nes protection against dust, stray thin metallic wires, and sprinkling of
water against the enclosure.
The 36V DC Power and CANBus interface cables exit the neck housing
bottom plate. An alternate bottom plate used during development allows
access to all onboard device ports: keyboard, mouse, video, USB, and IDE.
This development bottom is not IP54 rated. The neck bearing hole holds the
The head is based on a 1/8-inch aluminum plate. The electronics, servos and
camera assembly are mounted on the plate similar to [7]. A spherical acrylic
dome was considered for covering the camera f eld of view but has been
rejected for several reasons:
1. The dome surface would interfere with the camera lenses when tilted at an
upward viewing angle; and
2. It would require approximately 6 inches above the base plate to cover the
perimeter yet only 4 to 5 inches are available.
The cylindrical acrylic cover is a 6-inch high cylinder with a solid end plate
on top. There is a small joint between the top and sides which is visible as a
small elliptical discontinuity in the camera images. This aberration could be
used as a calibration guide for imagery. The cylindrical section extends
The gasket below the main plate provides a seal to the IP54 rating specif ed
and prevents damage to the cameras, servos and cables. Thermal dissipation is
not an expected problem with the electro-mechanical components in the head.
By using low power devices and high eff ciency switching power supplies
there should be minimal heat dissipation required. The placement of
components on the head was planned and calculated to provide a statically
balanced head. The offset is less than 100 gm/cm in the horizontal plane.
Since we have a bearing race diameter of 4 cm this should prevent any undue
wear or binding on the neck bearing.
The Head rotate mechanism follows the KTH reference design with a few
modif cations. An idler pulley was added to the encoder pulley for position
feedback. The pulley widths and sizes were increased to help use the motor
torque without belt slip over the pulley’s teeth. The neck sail winch servo
(HITEC HS-785HB) rotates the head on the neck bearing. This is a functional
replacement for the obsolete servo mentioned in [7].
The neck rotation gear ratio is 30/48 instead of 36/48 in the KTH design[7].
This restricts total head rotation to 788 degrees (30 / 48 * 3.5 * 360). The
original KTH neck rotation of 945 degrees is more than required for 360+
degrees as specif ed. This gear reduction increased torque and will rotate the
head faster under load. The drive belt is now a 9mm wide belt instead of the
6mm belt used at KTH. The mounting of the pulley on the servo shaft prevents
using high tensions on the belt as the pulley and shaft effectively bend.
The USDigital H1 encoder attached to the idler will provide 2000 counts per
revolution, or 4000 counts per head revolution. This beyond the requirements
since the servo control resolution is not likely to be 1/1000. The encoder has
an index pulse for identifying the head origin forward position. Since 2
revolutions of the encoder are possible for one revolution of the head therefore
a start-up rotation sequence is required to correctly identify the proper index
pulse. An absolute index pulse is important for the vergence method.
The camera pan/tilt mechanism follows the KTH reference design [7] adapted
for Point Grey Research (PGR) Flea cameras and encoders for position
feedback. Each eye camera (Flea) can independently pan approximately ± 38
degrees and collectively tilt ± 40 degrees. These motions will allow the eyes
to verge onto objects The f rst point of contact of the moving mechanism with
the acrylic cover are the front lower corners of the tilt bar. These corners have
been rounded off to prevent rubbing.
The base has only 2 small machine screws for fastening to mounting plates.
The location of these screws precludes having the pivot point right at the
image sensor plane or focal point. The camera cable connectors protrude
signif cantly out the camera back end (see Figure 7) and restricts eye
placement and movement. The connectors required modif cation to allow for
full movement and non-binding operation.
The camera servos (HITEC HS-925MG) control the pan and tilt functions.
These servos are high powered and fast servos and are functional
replacements for the servos mentioned in [7]. Performance tests showed these
servos to be almost violent in their motions.
The encoders (USDigital E2) attached to camera axles and the (USDigital H1)
the tilt pivot servo provide 2000 counts per revolution. This delivers
approximately 5 counts per degree. They have an index pulse for identifying
the eyes centered and level position as an absolute encoder position.
Dual Ball bearings are used on all pivot axles to support and align the shafts
and provide smooth motions. Like the head rotate motion, the pan and tilt
must be smooth to reduce noise in image processing.
The Flea has the primary image dimensions of 1024 W x 768 H pixels at
12-bit resolution and several operating modes where the user (computer) can
select binning and custom image sizes to specify regions of interest. This will
allow a zoom in/out capability completely in software. After trying several
proposed lenses, the 8mm F1.6 CS lenses have depth of f eld and minimal
distortion around the edges. These lenses have a manual focus and are
approximately 30mm long. The size and angle of the Firewire connectors on
the Flea cameras required signif cant labour in adjusting the cabling and
routing so the cameras have full motion without drag as well as clearance
inside the neck.
Three microphones are placed equidistant around the head cover, halfway
between the base plate and the top cover, and are connected to the neck
Power Input will come in the bottom of the VersaTainer on an 18-gauge cable.
The Main Power switch will be mounted on the rear of neck housing. A
PC-104 form factor DC- Switching power supply in the neck module converts
the input 36V DC to the main computer 5V DC. This power supply is a
high-voltage high-eff ciency card capable of meeting the demands of the
Pentium M speed switch.
A separate power supply and distribution board energizes the head circuits at
the various voltages of 5V DC (encoders and control logic) and 6V DC
(servos). This design simplifie cabling going through the neck joint with one
power cable and converting power in situ. Head power circuits consist of 5
independent high-eff ciency switching power supplies. The power/distribution
boards are stacked on top of the head controller. This design isolates servo
motor loads from control logic supplies. The modularity distributes the load,
and allows individual power boards to be exchanged if needed.
The cameras are powered via the Firewire 12V DC from the neck computer.
Despite the cameras maker's claim they are “Hot-Pluggable”, it turns out
that the computer interface cards do not tolerate the cameras being powered
down independently. After losing 3 interface cards to this anomaly, 12V DC
This section describes the software design of the Perception Module MkI. There are
three layers of software for the Perception Module. The low-level device drivers
interface servo motors, encoders, and control loop functions and are embedded on the
head controller. The RTEMS operating system operates above the low-level drivers;
controls for the head reside on the neck computer. A test/demonstration, Eyetest,
sample application on the neck computer controls the head controller over a serial link.
Test PC applications demonstrate and exercise the Perception Module individual head
devices. These applications are FlyCap from PGR to display the cameras’ output, and
sound board-based oscilloscope programs display the 3 audio input channels.
The SS555 developer kit from Intec Automation was used to develop the
device interface and control applications in the head controller. This software
was later ported to run on RTEMS as a full multi-tasking application.
Microsoft Visual tools were used to develop the Eyetest application on the
neck computer.
The MPC555 Time Processor Unit (TPU) has functions for quadrature
encoders complete with capture on an index pulse for alignment. The PWM
function of the MPC555 is used to provide a programmed pulse with a
specif ed repetition rate. No special TPU functions were required.
As described in section 3.1, four servos control eye pan (2), eye tilt (1), and
neck rotation (1) using PWM output signals from the SS555. The low-level
driver can provide 1024 position resolution for full scale of rotation of the
servo. This will translate into 1.0 to 2.0 ms pulse width for the servo. The
low-level driver API specif es the desired angle as a signed integer value. An
API for setting conf guration, limits, and calibration parameters will be
provided. These functions were implemented using the Intec Automation
Runtime library functions for MIOS and PWM. All of the parameters are
changeable on the f y.
The 4 optical quadrature encoders are connected to the TPU channels of the
SS555 that measures position of the pan/tilt/neck mountings. The current
values of position are available through the API as integers related to angle.
The low-level API includes calibration for individual parameters of all
encoders is provided. All of the parameters are dynamically changeable.
The encoder inputs will be able to capture the present count when the index
goes by. This allows us to store an offset from this index count for our
calibrated center position without tricky physical alignment procedures.
The servomotor and encoder pair have been encapsulated into an eyemotor
object. Four eyemotor instances are implemented as tasks in the RTEMS
application. At power-up, the servomotor is commanded to its center position.
During hardware assembly this is conf gured to be approximately the motor
centre. Once the motor has settled at its centre, the API commands
movements back and forth to f nd the encoder’s index pulse. This
initialization produces a high precision absolute reference. This absolute may
not be aligned ground truth eyemotor centre so a software offset can be
applied the encoder value that aligns the encoder centre with the eyemotor
centre. This soft calibration process allows the hardware assembly to be
simple and non-critical for alignment purposes. The software calibration after
the head is assembled gives us the desired precision and accuracy for the
agreement of the binocular imagery.
The eyemotor libraries include functions for controlling the power supplies
!ss,rr,cc,vvvv <CR><LF>
where
The Eyemotor tasks run on a periodic basis and run the calibration or control
loop for an eyemotor. Until a calibration phase has successfully completed we
cannot run the PID control loop. Each Eyemotor has a separate instance of the
task running at its own specif ed periodic rate.
The communication task monitors the COM port for input and passes
messages as they come in to the EyeConf g functions mentioned above. The
The background periodic task broadcasts a status report once a minute to the
demonstration application.
This application will initialize the drivers and calibration of the servos and
encoders. It will then poll the various inputs and send the information out a
telemetry port on a serial port. Parameter and position settings over the
telemetry link will demonstrate the various conf guration API functions. This
application runs under RTEMS and creates tasks for handling the 4
eyemotors, the serial comm link, and background supervisory duties.
4. Discussion
The purpose of the Perception Module is to capture imagery data from a binocular
vision pair and transform that information into distance estimates in the reference
frame of nScorpion robot. The kinematics of the Perception Module are
straightforward. Each servo motor and rotation mechanism has a large gear ratio that
essentially decouples the joint movements from each other. The noise and vibration
characteristics, on the other hand, will be non-trivial and their investigation should
take considerable effort. It will be important to understand these dynamics if one
intends to apply background correction techniques. These factors will be impacted by
such things as speed, track material and the head mounting beams and brackets.
The eye tilt joint can rotate over a bounded angle constrained by the head cover above
and the head plate below. The f rewire cable and connectors, as described in section
3.1, imposes a dead load on the rear of the cameras with the strain relief as the moment
arm. With the selection of a smaller gauge f rewire cable that is modif ed to allow for
eye movement, the dead load is minimized.
Neck pan, as detailed in section 3.1, is limited by the cables that connect the head
components to the neck computer below. This constraint is not viewed as a concern for
general operation because the eye movement DOF in conjunction with vehicle DOF
allows for trajectory planning in a redundant trajectory space. The neck position
resolution is more than suff cient for the current operation. The sail winch servo has
enough torque to rotate the head portion, but the plastic spline that connects the servo
gear to the pulley is insuff cient to the load causing it to bend. It should be improved
with a spindle mounted to the head plate.
Generally, the mechanisms can operate with soft limits to prevent damage. Limit
switches could be applied to the boundaries of eye movement to prevent damage and to
reconf rm calibration during initialization. This would improve damage avoidance and
calibration.
One improvement to the design would be the use of a slip ring system to decrease
cabling between the neck and the head. Slip rings can provide several amperes of
power and up to 8 signals concurrently. There is potential to allow 360 degree
continuous rotation using slip rings. However, current slip ring designs add noise to the
signals so the only practical cable to replace would be the power cable. The
communications via CANbus and the Firewire signals would be affected using a slip
ring. Firewire cabling would be attenuated using a slip ring and would require a
dedicated 12V DC source. Power converters onboard the head would reduce the ripple
from the injected noise if a slip ring was used for power only. Slip ring signal filterin
would increase power consumption by the head so that is a factor to consider versus
the continuous spin capability offered by the slip ring.
Another improvement for calibration purposes would be to install a laser range f nder
between the eyes co-planar with the eye image planes in order to provide a ground truth
of estimated distance. This would provide a data collection source to confir the
capability of any algorithm used for localization. An alternative is to place a simple
LED laser pointer co-planar so that the eyes can test calibrate on object along the
central axis. This would not capture data but it could allow manual calibration.
It is anticipated that the current CPU processing speed will not be suff cient for
real-time operation in complex environments. An increased size versatainer would
allow another CPU. This would increase overall processing, but at a risk to heat
overload on a sealed container. A more attractive option is to develop hardware that is
designed application specif c. Another option is to add a dedicated sound card to
PC/104 stack for the sound processing which will delegate some of the processing load
off the main CPU. This would improve performance by delegating sound processing
without signif cant heat problems.
A recommended improvement that will decrease the mass held at the top of the neck
mast is to separate the head from neck and recable the neck harness. The neck
computer stack could then mount onto the EOD robot. This would improve the head
dynamics by decreasing the moment arm, thereby improving robot stability.
System integration onto the dedicated EOD robot, which will include the integration of
4 microcontrollers and computer(s), will take signif cant effort. The four
microcontrollers control unique devices attached to them and connected by the
CANbus and serial (RS-232) interfaces. A common communication protocol must be
established that can relay messages of a restricted CANbus nature. This protocol must
include generic housekeeping and communication messages as well as a stable f nite
state machine for transitions to various run-levels or operation states. This work is a
critical risk to overall success.
5. Conclusions
The purpose of the Perception Module is to capture imagery data from a binocular
vision pair and transform that information into distance estimates in the reference
frame of the nScorpion robot. There are a number of improvements that should be
undertaken before system integration is complete:
1. Separate the head from the neck, machine a new neck plate, and recable the
interface from the neck to the head;
2. Modify the head plate to include a spindle on the neck servo pulley to support the
rotation of the neck;
3. Replace the Flea cameras with Flea2 cameras to improve camera stability and
update frequency;
4. Add limit switches to the boundaries of eye motion to prevent damage and improve
calibration.
The potential application of this module is wide, it could be mounted to many different
platform types. The target platform size range is from 50kg and up. For larger vehicles
it is conceivable to integrate several Perception Modules to improve localization and
situational awareness. The inclusion of aural and visual sensors make queued
perception experiments possible and could lead to improved operation performance.
The successful implementation onto an EOD robot will make a number of important
perception and localization experiments possible.
While this autonomy will not exceed human performance, it can remove some of the
positive control burden, a factor in current EOD operations [4], and demonstrate a
glimpse of a future when force-multiplied humans team with autonomous devices.
2. Hubel, David H. (1987). Eye, Brain, and Vision, 2nd. ed. Scientif c American
Library. W. H. Freeman and Company New York.
3. Marr, David (1982). Vision, 1st ed. W.H. Freeman and Company.
4. Nguyen H. and Bott J. (2000). Robotics for law enforcement. In SPIE, (Ed.), SPIE
International Symposium on Law Enforcement Technologies, SPAWAR US Navy.
SPIE Press.
5. H.G., Nguyuen and J.P., Bott (2000). Robotics for Law Enforcement: Beyond
Explosive Ordnance Disposal. (Technical Report 1839). US Navy SPAWAR.
SPAWAR Systems Center San Diego.
8. Pinker, Steven (1993). How The Mind Works, W. W. Norton & Company.
13. O. Jokinen and H. Haggr’en (1995). Relative orientation of two disparity maps in
stereo vision, pp. 157–162. Zurich: ISPRS Intercommission Workshop From Pixels
to Sequences - Sensors, Algorithms and Systems.
14. Qian, Ning (1994). Computing Stereo Disparity and Motion with Known Binocular
Cell Properties. Neural Computation, 6(3), 390–404.
15. Murray, Don and Little, James J. (2000). Using Real-Time Stereo Vision for
Mobile Robot Navigation. Autonomous Robots, 8(2), 161–171.
17. Xiong, Y. and Shafer, S.A. (1994). Variable Window Gabor Filters and Their Use
in Focus and Correspondence. In CVPR94, pp. 668–671.
18. Nakadai, K., Okuno, H., and Kitano, H. (2002). Realtime sound source localization
and separation for robot audition.
19. Ales Ude and Chris Gaskett and Gordon Cheng (2006). Foveated vision systems
with two cameras per eye, Orlando.
20. Zaharescu, A., Rothenstein, A., and Tsotsos, J. (2004). Towards a Biologically
Plausible Active Visual Search Model.
21. Berthouze, L., Rougeaux, S., Chavand, F., and Kuniyoshi, Y. (1996). Calibration of
a foveated wide angle lens on an active vision head. In IEEE/PAMI Computer
Vision and Pattern Recognition, San Francisco, USA.
22. Hutchinson, S. A., Hager, G. D., and Corke, P. I. (1996). A tutorial on visual servo
control. IEEE Trans. Robotics and Automation, 12(5), 651–670.
23. G. Hager and W. Chang and A. Morse (1995). Robot hand-eye coordination based
on stereo vision. IEEE Control Systems Magazine, 15(1), 30–39.
24. Prokopowicz, P.N. and Cooper, P.R. (1993). The Dynamic Retina: Contrast and
Motion Detection for Active Vision. In CVPR93, pp. 728–729.
25. Horii, Akihiro (1992). The Focusing Mechanism in the KTH Head Eye System.
(Technical Report ISRN KTH/NA/P–92/15–SE).
26. Iwasaki, Masayuki and Inomara, Hajime (1986). Relation Between Superf cial
Capillaries and Foveal Structures in the Human Retina. Investigative
Ophthalmology & Visual Science IOVS.org, 27, 1698–1705. (with nomenclature of
fovea terms).
27. Fukushima, K., Yamanobe, T., Shinmei, Y., Fukushima, J., Kurkin, S., and
Peterson, B. W. (2002). Coding of smooth eye movements in three-dimensional
space by frontal cortex. Nature, 419, 157–162.
28. Colin, A. and Puaut, I. (1999). Worst-Case Execution Time Analysis of the
RTEMS Real-Time Operating System.
3. Diameter Neck 18 cm
4. Weight 6.2 kg
6. Maximum Voltage 36 V DC
2. Ram 1 Gb
3. Hard Disk 40 Gb
A.4 Cameras
1. PGR Hi-Col Flea 1 digital cameras
3. Mount: CS lens
4. Focus: Manual
Cn 15 1 USB Sound card housing and connectors removed to reduce size Switchcraft EN3P3M
2. Insert the camera shaft through the lower bearing, the tilt bar, the f anged upper
bearing with extended inner race, and thread into the camera mount with loctite to
hold it. Use a slotted screwdriver to tighten.
3. Fasten the encoder base to the bottom of the tilt bar over the lower bearing using 2
* #2-56x1/4” screws.
4. Slide the encoder disk onto the shaft with the index stripe pointing outboard
centered on the connector while the camera is facing forward. Tighten the setscrew.
6. Use 'Servo Tape' (double-sided self-stick tape) to fasten the servomotor to the tilt
bar with the round part of the servo horn in the notch provided.
7. Use the hole in the servo horn at 19mm from the pivot shaft for the ball swivel
linkage.
10. Install servo horns pointing aft, and one spline inboard from center.
11. Install the Ball linkage shaft between the servo horn and the camera mount.
2. Install the Firewire, power and CAN cables and slide the stack all electronics into
versa-tainer. Using heat sink compound between the heat sink plate and the fillet
bolt the plate to the f llets using 6 * #6-32 screws.
5. Install the desired lower panel and mount the verstainer on its bench stand.
8. Install woven nylon chaf ng tubes over the cables. Use one for power and f rewire,
one for audio, and one for CAN and COM cables.
9. Install versatainer top plate Feeding Power, Can, and f rewire cables through hole
using 8 * Self-tapping screws and sealant.
2. Install neck encoder. Index mark should be under connector when head is facing
forward. Use 1 * #4-40 x.5 from the bottom, 1 * #4-40 x1.25 from the top. The
long bolt goes closest to center of plate.
5. Tighten down the nut on top of the encoder plate after everything else is lined up.
6. Fasten Neck servo with belt on it to base plate 4 * #4-40x.5. Ensure the belt is on
before installation
7. Install the tilt servo and encoder assembly. 6 * #4-40x.5 Note bolt closest to center
of plate needs a nut under the head to shorten it to clear the bearing holder.
2. Line up top pulley and upper ring and place under belt
5. Insert the funnel into top pulley until black line is f ush with top of the top pulley.
6. Connect cables.
7. Apply power to the head. This will set all the servo motors to center position.
8. Ensure the arrow on the hub of the neck encoder is centered under the connector.
10. Slide the belt over the large pulley and adjust the encoder for tension.
11. Note too much belt tension will bend the small pulley on the servo motor shaft.
12. Now you are ready to calibrate the alignment of all 4 motors.
The Application Eyetest was written in Delphi to conf gure, test and operate
all of the functions in the Heads SS555. This application uses small messages
to communicate parameters and commands to the heads SS555. This
application (and its partner in the SS555) presently uses a serial com port but
is designed to be easily ported to a CAN based communication link.
Once the eyemotors are initialized the user can enter in the trim or offset
parameters that will line up the head and cameras to a known reference. Since
the SS555 does not have any EEPROM for data storage the trim variables
need to be sent to the SS555 after each power up.To determine an offset value
for a particular eyemotor, use the Parameters page. First set the offset
value (EM QOffset) for the desired motor to 0. Manually enter a position into
the appropriate column of the EM Post ion row to get the motor to point in the
desired direction. Once each offset is determined it should be noted down and
then the position is entered as the offset.
Head Pan Alignment can be done by aligning the front and rear cover
mounting screws with the fore/aft center-line of the robot.
A target with one horizontal line and 2 Vertical lines (space at the camera
baseline distance of 56 mm) can be placed in front of the head. Using the
motor position(as described above) to line up the appropriate cross to the
center pixels of the camera image.
7 = Failed to calibrate
EM Pos Resolution 8 Report position success when error <= this 4 R/W
value.
EM STCenter 12 Servo Position trim offset in us. -800 ... 800 4 R/W
EM SPosition 13 Servo Position in us. (added to 1500 us + -800 ... 800 4 R/W
STCenter)
Pwr Enable 17 Digital output to control power supplies and 0 ... 1 8 R/W
others
2=Servo2, 3=Firewire
AD Scale 19 Analog voltage inputs *100 ie 500 = 5.00v 0.5 ... 10 8 R/W
1=all parameters
2 = Ram Base
2D Two Dimensional
3D Three Dimensional
4D Four Dimensional
A Ampere
AC Alternating Current
AO Area of Operations
C Celsius
COM Communication
CR Carriage Return
DC Direct Current
F Fahrenheit
HD High Density
IP Intellectual Property
LF Line Feed
OS Operating System
PC Personal Computer
RA Reference Architecture
RC Radio Controlled
SI System International
SS Steroid Stamp
US United States
V Volt
WG Working Group
β ORQJLWXGH DQJOH
SU S\ -$86FRPSOLDQW
T TXDWHUQLRQ YHFWRU
T TXDWHUQLRQ FRQMXJDWH
'5'&6XIILHOG&5
zW z displacement in global pose
b baseline length
d target distance
x bisected distance
C Complimentary angle
D Displacement vector
R Rotation Matrix
T Transformation Matrix
ref ectance the fraction of the total radiant f ux incident upon a surface that is ref ected
and that varies according to the wavelength distribution of the incident radiation –
called also ref ectivity.
1. ORIGINATOR (the name and address of the organization preparing the document. 2. SECURITY CLASSIFICATION
Organizations for whom the document was prepared, e.g. Centre sponsoring a
contractor’s report, or tasking agency, are entered in section 8.) UNCLASSIFIED
Defence Research and Development - Suff eld (NON-CONTROLLED GOODS)
DMC A
PO Box 4000, Medicine Hat, AB, Canada T1A 8K6
REVIEW: GCEC December 2013
3. TITLE (the complete document title as indicated on the title page. Its classif cation should be indicated by the appropriate
abbreviation (S,C,R or U) in parentheses after the title).
4. AUTHORS
(Last name, firs name, middle initial. If military, show rank, e.g. Doe, Maj. John E.)
MacKay, I.
5. DATE OF PUBLICATION (month and year of publication of document) 6a. NO. OF PAGES (total 6b. NO. OF REFS (total cited in
containing information. Include document)
Annexes, Appendices, etc).
December 2007 68 28
7. DESCRIPTIVE NOTES (the category of the document, e.g. technical report, technical note or memorandum. If appropriate, enter the type of report,
e.g. interim, progress, summary, annual or final Give the inclusive dates when a specifi reporting period is covered).
Contract Report
8. SPONSORING ACTIVITY (the name of the department project off ce or laboratory sponsoring the research and development. Include address).
9a. PROJECT OR GRANT NO. (if appropriate, the applicable research and 9b. CONTRACT NO. (if appropriate, the applicable number under which
development project or grant number under which the document was the document was written).
written. Specify whether project or grant).
12RJ05 W7702-05R100/001/EDM
10a. ORIGINATOR’S DOCUMENT NUMBER (the off cial document number 10b. OTHER DOCUMENT NOs. (Any other numbers which may be
by which the document is identif ed by the originating activity. This assigned this document either by the originator or by the sponsor.)
number must be unique.)
11. DOCUMENT AVAILABILITY (any limitations on further dissemination of the document, other than those imposed by security classif cation)
( X ) Unlimited distribution
( ) Defence departments and defence contractors; further distribution only as approved
( ) Defence departments and Canadian defence contractors; further distribution only as approved
( ) Government departments and agencies; further distribution only as approved
( ) Defence departments; further distribution only as approved
( ) Other (please specify):
12. DOCUMENT ANNOUNCEMENT (any limitation to the bibliographic announcement of this document. This will normally correspond to the Document
Availability (11). However, where further distribution beyond the audience specif ed in (11) is possible, a wider announcement audience may be
selected).
Unlimited
13. ABSTRACT
The purpose of the Perception Module is to capture imagery data from a binocular vision pair and
transform that information into distance estimates in the reference frame of nScorpion robot.
The Perception Module is a component in the overall design of the nScorpion Explosive
Ordinance Disposal (EOD) / Improvised Explosive Device (IED) robot system. nScorpion is an
advanced demonstration robot intended to advance the state of the art in EOD/IED robotics. The
goal of this research project is to demonstrate a higher degree of autonomy for current robotics
and augment the capability of soldiers or EOD/IED technicians. By improving overall autonomy,
nScorpion intends to show that humans can supervise individual or teams of robots and in some
cases remain one step back from dangerous situations.
Le Module de perception (MP) est un instrument scientifique visant à explorer la vision
artificielle binoculaire et en faire la démonstration; il capte une imagerie binoculaire à partir de
deux caméras articulées puis transforme ces données en estimations de distance dans un
cadre de référence rétinocentrique. Il a été conçu pour être abordable, léger et robuste,
approprié aux applications robotiques militaires et expériences scientifiques. Ses applications
potentielles sont vastes, et on peut le fixer à bien des types de plates-formes. C’est l’un des
composants du système robotique de neutralisation des explosifs et munitions (NEM) et de lutte
contre les dispositifs explosifs de circonstance (C-IED) nScorpion (northern scorpion -
paruroctonus boreus), un robot de démonstration avancé destiné à faire des percées en NEM
et C-IED. Le projet nScorpion vise à démontrer l’autonomie des robots actuels et amplifier les
capacités des soldats et des techniciens en NEM et C-IED. En mettant en œuvre des
mécanismes et algorithmes novateurs de vision artificielle binoculaire, le Module de perception
fera la démonstration de nouvelles techniques visant à améliorer la localisation et mieux
comprendre la vision. Le présent document en décrit en partie les théories, les justifications, la
conception et les applications, et recommande certaines améliorations.
14. KEYWORDS, DESCRIPTORS or IDENTIFIERS (technically meaningful terms or short phrases that characterize a document and could be helpful in
cataloguing the document. They should be selected so that no security classif cation is required. Identif ers, such as equipment model designation, trade
name, military project code name, geographic location may also be included. If possible keywords should be selected from a published thesaurus. e.g.
Thesaurus of Engineering and Scientif c Terms (TEST) and that thesaurus-identif ed. If it not possible to select indexing terms which are Unclassif ed, the
classif cation of each should be indicated as with the title).
robots, autonomous robotics, EOD, IED, selective perception, binocular vision, vergence, stereopsis, dispar-
ity, triangulation, COHORT, ALS, AIS program