Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Real-Time Vision-Based Mobile Robot Navigation in

Outdoor Environments
Bernard Ghanem, Ali Fawaz, and Ghassan Karame
Faculty of Engineering and Architecture, American University of Beirut
3 Dag Hammarskjold Plaza, 8th floor, NY, New York 10017-2303, USA
E-mail {bsg00}@aub.edu.lb

Abstract - This paper introduces the design of an outdoor context of a real-time “kidnapped robot problem” [1],
navigation system based on range sensors and camera whereby the robot, equipped with a single camera, is inserted
information intended to perform incremental localization in a into a previously unknown outdoor environment and at an
previously unknown environment, while being able to navigate initially unknown location within this environment. A number
successfully from source to destination, thus, avoiding impeding
of identifiable landmarks will be placed in the robot’s
obstacles.
Prior studies in the field of Simultaneous Localization and surroundings, some of which are not within initial camera
Mapping (SLAM) have shown computer vision (CV) to be an range and their respective global coordinates will be made
influential detection and recognition tool for indoor settings [2]. known to the robot a priori. Initially, the robot will be required
However, the application of this technique in dynamic outdoor to wander about its immediate location for preliminary
environments, especially where GPS is no longer efficient, has yet localization based on landmark detection, while avoiding
to be conducted and tested. Consequently, our mobile robot dynamic obstacles that may be encountered. The relative
system will be capable of localizing itself with respect to distances between the robot and these landmarks will need to
predefined identifiable landmarks, which are visually “sensed” be accurately estimated in order to minimize the error incurred
and detected. These static landmarks will be used by the robot to
in approximating its global position. A deterministic
provide a better estimate of its location in an attempt to plan a
path from its unknown initial position to a chosen destination triangulation method will be used to reduce the uncertainty
within its environment, while bypassing dynamic obstructions in accumulation inherent in automated navigation [1]. This
its path. research proposes a technique for incorporating CV methods
as an accessory to GPS when the latter renders inadequate
Index Terms - Mobile Robot, SLAM, Computer Vision, performance among tall buildings, large structures, and
Landmark detection, Localization. overhead trees. In fact, even though GPS is considered the
preferred sensor in outdoor operation, its performance
I. INTRODUCTION
efficiency is limited because of the obstruction of the vivid
The issue of SLAM has been extensively researched and view of the positioning satellite. Our implemented system
resolved with accurate precision via the use of traditional makes use of the abundant linear features describing
range sensors [3]. However, due to its relative complexity and neighbouring buildings in urban situations, where GPS fails to
highly demanding computational requirement, SLAM, using provide a substantial contribution to the localization task.
CV as its main sensory device to imitate human capabilities
II. EXPERIMENTAL SETUP
and perception, has not been appropriately encompassed
especially in outdoor settings, yet its future promise to be This section describes the experimental setup established to
more significant in the coming years due to the wide variety of depict the performance and functionality of our research and
relevant applications it renders. The advantages of this its various implementations.
technique will allow the robot to acquire more useful A. Robot Navigation Specifications
information from the features it senses and to augment its According to established experimental setup, the robot
initially incomplete “insight” into the existent environment. In will be initialized at an unknown position in the environment
fact, cameras have numerous other desirable properties that at which it will render the origin of its local coordinate frame
can be exploited for the purpose of enhancing robot sensing that is used for navigation. First, it will start to systematically
and navigation. For instance, they are low-cost general “wander” about this initial position in order to detect the first
purpose sensors, which behave passively preventing vision- identifiable landmark. In fact, the robot starts capturing
based navigation systems from suffering possible interference images, which are then analysed using the designed vision-
from external and/or internal sources. based tool as discussed later. If it detects a landmark, the robot
Our primary objective is to develop robust, user friendly will have to recognize it (i.e. through its distinctive colour and
software to be run on a Pioneer 3-AT outdoor robot, which shape), move closer to it, and then repeat the previous process
performs navigation on the basis of accurate localization (i.e. detection and recognition) in order to provide an estimate
results supplied by CV principles. In this regards, it is planned for that landmark’s position with respect to the robot’s local
to implement an incremental localization technique within the coordinate frame. Afterwards, it will continue to search for
another landmark and locate it as well, in order to transform method, a distinctive shape or feature can be drawn on these
the robot’s local position into coordinates suitable for the landmarks to be distinguished by the application. Due to the
global coordinate frame as defined by the experimenter. variation in shape, size, and perspective view, landmark
However, in case a landmark has not been found in the detection becomes more involved; however, localization is
camera’s optical range, the robot will revolve about itself with possible with only one collected image.
relative increments of 90° (knowing that the robot’s optical The CV application commences operation by executing a
range is about 100°). This rotation is terminated in either of stage of illumination normalization, which maintains a mean
the two situations: illumination value of 100 for the “raw” image. This ensures
ƒ The robot subsequently detects, recognizes the landmark, that the natural lighting affects neither landmark detection nor
and estimates the latter’s local pose localization [4]. After illumination augmentation, this image is
ƒ A complete turn has been performed, so the robot will further processed for the specific colours of the given
proceed with the same rotational activity but at a radius of landmarks. This demands selective background elimination on
1 meter from the previous position. This radius is the basis of chromatic properties (i.e. all blobs incomparable
incremented every time a complete turn is finished in colour to the landmark are whitened). This process can be
without finding a recognizable landmark. done in numerous ways, which include the two developed so
This scenario realizes all the possible combinations that far. The first methodology requires that each pixel is
may ensue, when the robot is placed in a previously unknown compared to a “generic” range for landmark colour, which is
environment. In this case, no action is conducted without a distinctive of the landmarks themselves. The other approach
reasonable motive, thus emulating human capacity in makes use of the correlation between landmark pixels and
wandering through a previously unseen setting. their neighbours via a normalization correlation matrix, which
In this overall layout, the robot will be required to is excerpted from a typical subsection of the landmark image.
perform landmark recognition/detection and self-localization. Although simpler in form and implementation, the first
These processes will be described in the following sections. method requires pixel-to-pixel processing with no relation to
what the surrounding pixels may yield. On the other hand, the
B. Landmark Recognition and Detection
second method is burdened with a relatively longer processing
In the context of this real-time “kidnapped robot
time than the first, yet it renders more details about the
problem”, the robot is challenged by various environmental
landmark, while eliminating the contrast factor from the
and technical issues that have to be resolved before a suitable
images.
localization method is applied.
After chromatic background rejection, the “crude” image
The robot is equipped with a Canon VC4 coloured
is analysed by a dilating morphological filter, which tends to
camera with a natural resolution of 768x480 pixels and an
remove noisy factors evident in the residual image. Also, it
optical range of approximately 100°. On the basis of the
renders a slightly “smoother” contour of the landmark, if
available camera features, the CV application must be able to
present, since it deals with expanding the majority colour
distinguish a suitable landmark, if available, and locate it
within a sampling matrix traversing the whole image. In this
within the processed image in a real-time situation. The main
regards, sharp and/or jagged edges are removed, thus,
limitation is caused by variations in illumination, which
facilitating the corner/line detectors that are applied later.
require the robot to be insensitive to image fluctuations
Moreover, a simple median-smoothing filter is applied to
induced by temporal and environmental settings.
the pre-processed image to remove any other insignificant
At start up, the robot is designed to use its frame grabber
noise elements due to random fluctuations. As a consequence,
to periodically capture a medium resolution image of its
the resulting image after preliminary landmark recognition
surroundings to be stored as “raw” data. This is based on the
must be void of any colour range other than the one to be
assumption that the robot has prior knowledge of the specific
tested for (red, green, or blue). This colour analysis can be
colour and shape of the available landmarks, which in the
performed in a variety of methods depending on the preferred
initial stage are limited to three cardboard cylinders red, green
colour space to be used (i.e. Red Green Blue space [RGB],
and blue in colour. This choice of landmark type comes as a
Hue Saturation Illumination space [HIS], or Chrominance
compromise between complex detection and localization. In
Luminance space [ChLu]) [5]. The optimal colour space to be
the case of cylindrical landmarks, the robot can simply check
implemented should be able to perform image processing
for a rectangular shape of a certain colour to detect the
independently of the changing environmental factors. Despite
presence of the landmark; however, the issue of localization
the fact that most images are represented as RGB sequences,
becomes more elaborate due to the inability of the robot to
this colour space remains sensitive to lighting that may disturb
distinguish its bearing towards the landmark from a single
the exact detection of a certain colour. For example, a
frame image. The need for accurate localization in the
coloured picture of a red landmark under bright light will
presence of a single camera obliges the application to require
represent a totally different RGB range than that of the same
a pair of images of the same landmark at two distinct
landmark but in normal lighting. As a better approach, either
positions, which are somewhat far apart. The alternative
HSV or ChLu spaces can be used directly to eliminate lighting
approach to landmark selection is to facilitate landmark
localization, while complicating landmark detection. In this
sensitivity. The resultant analysed image can be compared to their respective directional vectors in the image plane to
the raw one taken in Figure 1 below. decide whether they are “parallel” up to a certain extent
defined by another user-defined threshold. This takes into
account the inaccuracies that may result from both the Canny
and Hough methods and the relative slanting of the landmark
itself with respect to the robot’s camera.

Fig. 1 Background elimination for a green colour landmark


Fig. 2 Hough borders of the detected landmark
The remaining problem to be resolved is also the most
crucial for correct landmark detection. It has been shown that The average number of pixels that separate the two
much of the essential information of an image is stored in the “parallel” lines farthest from each other is an essential input to
edge map of the image, and that edge structures have an the Chromatic Distance Estimation process, which is a
apparent relevance in biological vision systems [9]. In heuristic approach that tends to approximate the distance
addition, the edge information in an image tends to be robust between the robot and the landmark been visualized.
under changes in illumination or related camera parameters. III. SELF-LOCALIZATION: TARGETING AND TRIANGULATION
For these reasons, edge structure has been used extensively in
computational vision. A variety of edge detectors are currently The localization process is subdivided into two sequential
available including the edge detector proposed by Canny as tasks, beginning with the determination of the first two
published in the OpenCV library [6]. landmarks with respect to the robot’s local frame followed by
This algorithm is utilized as the basis for line detection world frame conversion. Those two scenarios are explained in
used in landmark detection. In fact, the edge points located by details in the following section.
the Canny algorithm are fitted to suitable lines according to A. Targeting
certain user-defined threshold parameters and printed onto the In this procedure, two camera pictures are used, while
processed image for visual verification using Hough line assuming that the respective positions of the robot are
detection [9]. However, not all the found lines are appropriate obtained from odometry sensors. The robot angle is the angle
for the purpose of landmark detection, since vertical lines (i.e. by which the robot has tilted from the direction it was at the
with a directional angle along the length of the image ranging moment the application was executed.
from 90° to 45°) are the only selected candidates. This Using Euclidian geometry based on the camera angles,
restriction is valid because it reduces the number of lines to be the position of the landmark (the red circle in Figure 3) can be
processed for landmark suitability later on, thus, making the determined. In order to prevent errors due to approximating
process more compatible to real time situations. These the robot as a point object, the separation between the center
induced lines (as seen in Figure 2) are analysed according to of mass and the camera was approximated to be 20 cm. This
factor was crucial for calibrating the readings provided by the a means of providing an estimate of the distance between the
camera and those by the robot encoders. robot and the landmark. This estimation makes use of prior
experiments that were done offline by examining the number
of pixels distinguishing the landmark (via colour referencing)
at various distances from the camera.
After conducting multiple runs of tests at each distance, a
discrete graphical representation of the experimental data can
be extracted, which is used during the online localization
process to determine the approximate distance to the landmark
via linear interpolation. This method assumes proportionality
in the correlation between the number of pixels and the
distance separating the camera from the landmark. Even
though this assumption is an approximation, yet it renders a
simple and rapid method for distance estimation using
computer vision. The following graph (Figure 5) shows the
inversely proportional relationship between the number of
Fig. 3 Targeting the landmark from two distinct positions pixels representing the landmark width and the robot-to-
landmark distance.
B. Triangulation Method for Localization
After recording the local positions of the two landmarks Landmark Width vs. Distance
via targeting, enough information suffices to estimate the 350

global position of the robot. This task can be accomplished in 300


three equivalent methods that calculate the robot’s position
Width (pixels)
250
with respect to each landmark in the global reference frame.
200
The robot’s position can be better estimated as the weighted
150
average of the following methods that uniquely determine a
triangle (see Figure 4): 100

1. Using two sides and the angle between them 50


2. Using three sides and no angles 0
3. Using one side and the two adjacent angles 0 2 4 6 8 10 12 14

Distance (m)

Fig. 5 Graph showing the inverse relationship between landmark width and
distance from camera

As a consequence of this localization process, the robot


must be able to transform any position in its initial local frame
to the desired global frame, in which the landmarks are
defined. For this purpose, an ideal transformation operator
must be determined and updated to allow for this conversion
to occur. This can be summarized by the following
transformation matrix:
⎡ WL R px ⎤
⎢ ⎥
LT = ⎢
W
py ⎥
⎢ 0 0 1 ⎥⎦
Fig. 4 Triangulation using two landmark positions

W
In this expression, L R denotes the rotational matrix
In fact, this triangulation method is the core functional between both coordinate frames, while (px, py) represent the
component for our localization procedure, since it attempts to relative coordinates of the center of the world local frame with
determine an efficient estimate of the robot’s global position respect to the local one. This matrix is used to determine the
from the knowledge of the absolute locations of the landmarks
following relation of position: P = LT P , which transforms
W W L
in the world coordinate frame. However, it assumes that the
robot’s camera detects the same landmark in two different the position of the robot in the local frame (LP) to that in the
scenes or frames. In case this last condition is not fulfilled, the world frame (WP). This transformation operator can be
heuristic Chromatic Distance Estimation method is utilized as calculated using the given world coordinates of the two
landmarks (i.e. W1 and W2) and their respective local environment. The client is a graphical user interface (see
coordinates (i.e. L1 and L2) that were previously estimated. Figure 6) illustrating the various analytical milestones in the
camera’s image processing domain. In that respect, our server
⎡ cos θ − sin θ px ⎤ ⎡A −B px ⎤ was configured to run on the local host of the robot,
W
L T = ⎢⎢ sin θ cos θ p y ⎥⎥ = ⎢⎢ B A p y ⎥⎥ ......... (1) communicating with a connected client through a TCP socket
⎢⎣ 0 0 1 ⎥⎦ ⎢⎣ 0 0 1 ⎥⎦ over a wireless network (WLAN). TCP sockets provide
reliable connection and transfer of information and mask data
⎡ A ⎤ interference from other servers operating on the network. The
⎢ B ⎥
server’s main operation is to send the logged images captured
where X = ⎢ ⎥ is the solution matrix of the equation
⎢ px ⎥ by the robot during its run. The server fetches the IplImage
⎢ ⎥ structure information for each image along with its analysis
⎣⎢ p y ⎦⎥
data and sends them sequentially to a remotely connected
client.
KX =b where The ACTS client communicates with the server (IP
⎡ L x1 − L y1 1 0⎤ ⎡W x1 ⎤ address: 192.168.1.2) through the TCP socket, over the
⎢L L x1 0 1⎥ ⎥ ⎢W ⎥ WLAN network. Hence, the client retrieves the IplImage
K =⎢ and b = ⎢
y1 y1 ⎥ structure and opens a window showing the retrieved frames
......... ( 2)
⎢Lx2 − L y2 1 0 ⎥ ⎢ Wx2 ⎥ sequentially in order to create a strip of frames. Our client-
⎢ ⎥ ⎢ ⎥ server application provides a means to connect interactively to
⎣⎢ L y 2 Lx2 0 1 ⎦⎥ ⎣⎢W y 2 ⎦⎥ the robot running simultaneously with this application. It lets
you compare the analysed images to the initial ones, while
displaying relevant coordinates and landmark properties in a
IV. CLIENT-SERVER APPLICATION real-time setting.
The mobile servers constituting the underlying client
server control architecture for the mobile platform, embodied
in the Pioneer 3-AT Operating System software and found
embedded on the robot's microcontroller, manage the low-
level tasks of robot control and operation, including motion as
well as acquiring sensor information and driving accessory
components like the PTZ camera.
It is the job of an intelligent client running on a connected
PC to perform the full gamut of robotics control strategies and
tasks, such as obstacle detection and avoidance, localization,
features recognition, mapping, PTZ camera control. Nearest
the robot, ARIA's ArDeviceConnection class, at the command
of your application code, establishes and maintains a
communication channel with the robot server, packaging
commands to ArRobotPacketSender and decoding responses
ArRobotPacketReceiver from the robot in safe and reliable
packet formats ArRobotPacket prescribed by the client-server Fig. 6 Client-Server Interface
protocols.
On the other hand, ACTS provides a server port for V. GIS ROBOT TRACKING APPLICATION
retrieving JPEG-compressed images. This is useful for remote A Geographic Information System (GIS) is a computer-
viewing of the video data, but without needing the overhead based information system that enables the capture, modelling,
of a large bandwidth. The SAV server has an associated client, manipulation, retrieval, analysis and presentation of
which is supplied as part of the ACTS package. The server geographically referenced data. The most important
can be started automatically by the Linux system, but then it component in a GIS is the database. The value of the GIS
must be shut down manually if we are going to do frame database lies in the quality and usability of its data. As
grabbing on the system for other purposes (vision processing). technology continues to advance, flexible options have
Hence, we can not make use of the SAV application supplied become available that provide a wider range of applications
by ACTS. and a host of opportunities to fit any particular functionality.
Our main challenge was to create a visual platform in In this perspective, GIS technology has found its way into
order to best illustrate the operation and analysis of the frames many robotics applications.
captured by our robot. For that purpose, we implemented a A real-time GIS tracking package was implemented for
client-server application in which the server runs on a Linux the Pioneer 3-AT robot. This package is an autonomous real
platform while the client operates in a regular WIN32 time GPS tracking system, which tracks the robot as it moves
through along its trajectory. Using this package, we could extensions to the localization problem, in which neither robot
track the robot (as indicated by triangular shapes and nor landmark positions are known, form one of the other
connected by blue lines in Figure 7) and be able to know its directions that could be followed by future research.
exact coordinates, velocity and the analysis that was generated
during the specific run. In addition, a mapping option has VIII. CONCLUSION
been added to the overall functionality of the package
In this paper, we have presented a method for mobile
whereby the obstacles sensed by the robot along its trajectory
robot localization using computer vision, whereby a robot
are plotted onto the map as small black squares to distinguish
placed in an unknown outdoor environment localizes itself
them from actual robot positions. A proposed extension to the
and navigates successfully from source to destination while
performance of this package is to allow the user to specify the
avoiding impeding obstacles. This method exhibits many
robot’s destination, while the shortest path to the latter
advantages in terms of recent feature-based methods. This was
position is identified as computed using Dijkstra’s algorithm.
achieved by exploiting and analyzing the strengths of several
Our package software provides different analysis and
available alternatives. Experimental results indicate that the
options that could be applied to help in tracking the robot at a
method is promising for practical, real-time and real-world
certain time. Options available in this package include
implementation.
exporting the viewed map into jpeg and other compatible
formats, printing a specific view of the map, loading shape
files for previous runs, zooming, and panning facilities. Figure REFERENCES
7 illustrates a sample of the GIS software layout and interface. [1] Andrea Bonarnini, Paolo Aliverti and Michele Lucioni. An
Omnidirectional Vision Sensor for Fast Tracking for Mobile Robots,
IEEE Transactions on Instrumentation and Measurement, VOL. 49,
NO.3, June 2000
[2] Andrew J. Davison, Yolanda Gonzalez Cid and Nobuyuki Kita. Real-
Time 3D SLAM with Wide-Angle Vision, Proc. IFAC Symposium on
Intelligent Autonomous Vehicles, 2004
[3] Andrew J. Davison. Real Time Simultaneous Localisation and Mapping
with a Single Camera, ICCV 2003
[4] Dorian J. Spero and Ray A. Jarvis, 3D Vision for Large-Scale Outdoor
Environments, Proc. Australian Conference on Robotics and Automation,
November 2002
[5] Guilherme N. DeSouza and Avinash C. Kak, Vision for Mobile Robot
Navigation: A Survey, IEEE Transactions of Pattern Analysis and
Machine Intelligence, VOL 24, NO. 2, February 2002
[6] Jacky Baltes, Localization for Mobile Robots using Lines, Seventh
Fig. 7 Example of the robot trajectory traced by the described software International Conference on Control, Automation, Robotics and Vision
(ICARCV’02), December 2002
VI. PROPOSED APPLICATIONS
The recent success of the Mars Pathfinder mission
highlights the fact that autonomous robotic systems utilizing
computer vision could be applied to demanding real-time
applications. Clearly, the development of autonomous robots
will be a significant factor in many domains of exploration of
environments. Examples of such environments include outer
space, the depths of the oceans, radioactive or contaminated
sites or other extreme environments. Autonomous robots are
already in use in automated delivery systems in some hospitals
and warehouses. In order to render robots as an integral part
of every household, solutions are required to be constructed,
which are practical, efficient and cost-effective. We believe
that the work presented here is a step in this direction.

VII. FUTURE WORK


During the research, a good deal of attention was paid to
obtaining detected landmarks which were free of outliers.
Future work would include proposing alternative methods for
detecting the landmark. (This could include methods for
selecting better or more appropriate thresholds that may be
determined via online learning processes.) Additional

You might also like