Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Uwe Franke, Dariu Gavrila, Steffen Gorzig, Frank Lindner, Frank Paetzslld, and Christian Wohler, DaimlerWBenz

Research

THE EARLY EIGHTIES, VISION- M O S T COMPtJ'TER-!ZSION S Y S T W S FOR VEHICLE GUTDANCE


based navigation systems drew on past expe-
rience gained in static-image processing. ARE FOR HIGHMAY SCENARIOS. DEVELOPING AUTONOMOUS OR
Because of the scarcity of computational
power at that time, these systems typically
DRIVER-ASSISTANCE SYSTEMS FOR COMPLEX URBAN TRAFFIC
took a picture, analyzed it, and then blindly POSES NEWALGOHTEMIC AND SYSTEM-ARCHITECTURE
drove some distancebefore stopping again for
the next picture. Then, in 1986, Emst Dick- CHALLENGESe TO ADDRESS THESE ISSUES, THE AUTHORS
manns demonstrated autonomous driving on INTRODUCE THEIR INTELWGENT S T O P ~ GSYSTEM
O AND
hghways, using only a few 8086 processors
and Kalman filters to restrict the possible DISCUSS APPROPRUTE ALGOlUTHMS AND APPROACHES FOR
scene interpretations to those consistent with VISION-MODULE CONTROL,
the considered systems' dynamics and with
previously derived interpretations.
During the last 10 years, numerous vision sary, and try to avoid unpredictable hazards, following. The driver must respond to the sys-
systems for lateral and longitudinal vehicle such as children running across the street. tem's signal to enable it to follow. If a vehicle
guidance, lane-departure waming, and colli- Intelligent Stop&Go is our first approach cannot follow the leader (for example, be-
sion avoidance have been developed all over to building such a system. It includes stereo cause the lead vehicle is changing lanes),
the world.24 For example, the Daimler-Benz vision for depth-based obstacle detection and Intelligent Stop&Go must be able to drive
demonstrator vehicle VITA II,as shown in the tracking and a framework for monocular autonomously for a limited distance while
European Prometheus project's 'final presen- detection and recognition of relevant ob- searching for a new leader. Also, a following
tation, can drive autonomously on highways jects-without requiring a trurd-sized super- vehicle will return control to its driver if it
and perform overtaking maneuvers without computer. We have limited the computa- must stop for a stop sign or if it is the first
any intera~tion.~ tional power in our demonstrator car, Urban vehicle stopped at a red traffic light.
However, an even more attractive vision Traffic Assistant (UTA), to three 200-MHz Besides an Intelligent Stop&Go system, dri-
system would extend its support beyond PowerPCs. ver-assistance systems (such as rear-end col-
highwaylike roads, to the everyday driver in lision avoidance or red-traffic-light recogni-
city traffic. Imagine a system that can imi- tion and waming) are also of interest for urban
tate a human driver: it not only keeps a con- Urban applications and vision traffic. The most important perception tasks
stant distance from the vehicle in front, as a tasks that must be performed to build such systems
radar-based system would do, but can also are to:
follow that vehicle laterally. Moreover, it can Our goal is for Intelligent Stop&Go to
stop at red traffic lights and stop signs, give detect an appropriate leading vehicle and sig- detect the leading vehicle and estimate its
the right of way to other vehicles if neces- nal the driver that it's ready for autonomous distance, speed, and acceleration in lon-

40 1094-7167/98/$10.000 1998 IEEE IEEE INTELLIGENT SYSTEMS


(4 (b) (4
Figure I. Classifying pixels: (a) left and (b) right stereo image pair and (c) classified pixels far the left image.

- gitudinal and lateral directions;


extract the lane’s course, even if it lacks
Our feature-based approach classifies
each pixel according to the gray values of its
well-painted markings and does not show four direct neighbors. It checks whether each
clothoidal geometry; neighbor is much brighter, is much darker,
detect and recognize small traffic signs or has similar brightness to the considered
and traffic lights in a highly colored central pixel. This leads to 81 (34) different
environment; classes, encoding edges and corners at dif-
detect and classify different additional ferent orientations.
traftic participants, such as bicyclists or For example, Figures l a and 1b show a
pedestrians; and stereo image pair taken with our camera sys-
detect stationary obstacles that limit the tem. The cameras are separated by 30 cm.
available free space, such as parked cars. Figure IC shows the left image’s structure
classification. Different gray values repre- Figure 2. Disparity image. A feature pixel‘s darkness is
sent different structures, and pixels in homo- proportional to its disparity.
Stereo-based obstacle geneous areas are assigned to the “white”
detection and tracking class and ignored in the next step, corre- can easily extract such structures from the
spondence analysis. disparity image if we can assume that the
Navigating in urban traffic requires an The correspondence analysis works on the road is (nearly) flat. If we h o w the camera
internal 3D map of the environment in front feature images computed for the left and right orientation, we can determine that all points
of the car. This map must include position images, reducing the search for possibly cor- having a disparity in a certain interval around
and motion estimates of relevant traffic par- responding pixels to a simple test of whether the expected value lie on the road, and that
ticipants and potential obstacles. In contrast two pixels belong to the same class. Thanks those with larger disparities belong to objects
to the highway scenario, where you can con- to the epipolar constraint and the fact that the above the road.
centrate on looking for the rear ends of lead- cameras are mounted with parallel optical
ing vehicles, our system has to deal with a axes, we need search for pixels with identical Estimation of camera height and pitch
large number of different objects. classes only on corresponding image rows. angle. In practice, camera height and pitch
Past investigations have produced several Obviously, this classification scheme can- angle do not remain constant during driving.
schemes for object detection in traffic. Besides not guarantee uniqueness of the correspon- Fortunately, we can efficiently estimate the
the 2D model-based techniques that search for dences. If ambiguities occur, we choose the relevant camera parameters by using the
rectangular, symmetric shapes, inverse-per- solution giving the smallest disparity-that extracted road-surface points. Least-squares
spective mapping-based techniques,6optical- is, the largest distance-to overcome this techniques or Kalman filtering can minimize
flow-based approaches? and correlation- problem. This prevents wrong correspon- the sum of the squared residuals between the
based stereo systems8have also been tested. dences (such as those caused by periodic expected and found disparities.
The most direct method to derive 3D infor- structures) from generating phantom obsta- Figure 3 shows the results of a test drive:
mation is binocular stereo vision, for which cles close to the camera. In addition, we the pitch angle decreases during acceleration
correspondence analysis poses the key prob- ignore measurements that violate the order- and increases during braking. Between 80
lem. Unfortunately, classical approaches ing constraint. The correspondence analysis and 120 seconds, the speed remained nearly
such as area-based correlation techniques or produces a disparity image (see Figure 2), constant. The uneven road surface caused the
edge-based approaches are computationally which is the basis for all subsequent steps. pitch angle’s oscillations.
very expensive. To overcome this problem,
we have developed a feature-based approach Structures on the road. To recognize road Obstacle detection and tracking. If we
that meets our specific needs and runs in real boundaries, knowing which structures in an remove all features on the road plane, we can
time on a 200-MHz PowerPC 604.9 image lie on the road’s surface is helpful. We generate a 2D depth map containing the re-

NOVEMBER/DECEMBER 1998 41
Once the system obtains an ROI, it applies
more computationally intensive algorithms
to recognize the object (that is, to estimate
model parameters). In this application, ob-
jects have a wide variety of appearances
because of shape variability, different view-
ing angles, and illumination changes. Be-
cause explicit models are seldom available,
we derive models implicitly by learning from
examples. Recognition is a classification
process, and we have implemented a large
set of classifiers for this purpose-that is,
polynomial classifiers, principal-component
analysis, radial-basis functions, time-delay
neural networks (TDNN), and support-vector
machines (the latter in collaboration with the
Time (seconds) MIT Center for Biological and Computa-
tional Learninglo).To provide test examples
Figure 3. Estimated camera pitch angle during acceleration and deceleration. for these learning approaches to object recog-
nition, we’ve recorded many hours of data
on a digital video recorder while driving in
acceleration in longitudinal as well as lateral urban traffic. We label interesting data seg-
directions) of these objects based on their ments offline.
positions relative to the camera system.
The longitudinal motion parameters are Road recognition.Road recognition has two
the inputs for a distance controller. Figure 5 major objectives: to enable automatic road
shows the results of a test drive in Esslingen, following and to improve lateral-control be-
Germany. The desired distance consists of a havior. As a stand-alone application, road
safety distance of 10meters and a time head- recognition enables automatic road-follow-
way of 1 second. Note the small distance er- ing. For example, Intelligent Stop&Go must
ror when the leader stops. register when the leading vehicle makes lane
changes so that it can return the following
-4 -2 0 2 4 vehicle back to the driver’s control.
Lateral position of detected objects Object recognition Furthermore, the lateral-control behavior
(meters) of Intelligent Stop&Go can be improved. As
Figure 4. Depth map: trees and the car cause the peaks. Essential to Intelligent Stop&Go’s intelli- the standard solution, a lateral tow-bar con-
gence is its ability to recognize objects. Two troller provides lateral guidance. Intelligent
classes of objects pertain to this application: Stop&Go needs to know the vehicle’s rela-
m W g features. The depth map in Figure 4 tionship to the leading vehicle in terms of &s-
covers 30 m x 8 m. The leading vehicle causes infrastructure elements (road boundanes, tance and angle, as measured by the stereo-
the two peaks at 15m (&stance) at the 0 m and road marks, traffic signs, and traffic vision system. A tow-bar-controlled vehicle
-2 m lateral positions. We use this map to lights) and approximately follows the leading vehicle’s
detect objects that are subsequently tracked. In - traffic participants (pedestrians, motor- trajectory, but it tends to cut comers in nar-
each loop, already tracked objects are deleted cycles, bicycles, and vehicles). row tums. We can minimize this undesirable
from the depth map before the next detection. behavior if we know the following vehicle’s
The detection step roughly estimates an How do we recognize various objects? position relative to the lane.
object’s width. This step fits a rectangular box Although devising a general framework is dif-
to the cluster of feature points that contributed ficult, we often fmd ourselves applying two Road boundaries and markings. Lanes in
to the depth map’s extracted area. The system steps, detection and classification. The pur- urban roads are often not as well-defied as
tracks this cluster from frame to frame. To pose of detection is to efficiently obtain a those inhighways. Lane boundaries often have
estimate the obstacle’s distance, the system region of inter-est-that is, a region in image or poor visibility. Various objects, such as traffic
computes a disparity histogram of the object’s parameter space that could be associated with participants or background infrastructure,clut-
feature points. To obtain a dispai’ity estimate apotential object. Besides the stereo approach, ter the image. The road topology comprises a
with subpixel accuracy, the system fits a we have also developed detection methods variety of different lane elements, including
parabola to this histogram’s maximum. based on shape, color, and motion. For exam- stop h e s , pedestsian crossings, and forbidden
The current version can consider an arbi- ple, vehicles use shape and color cues to find zones. A comprehensive geometrical road
trary number of objects. Kalman filters esti- potential traffic signs and mows on the road, model cannot be readily defined. All these
mate the motion states (that is, speed and and motion to find potential pedestrians. characteristics suggest the need for a data-dri-

IEEE INTELLIGENT SYSTEMS


ven, global-image analysis that robustly sep-
arates lane structures from background clutter.
Global detection analyzes polygonal con-
tour images. A database organizes the poly-
gons with respect to their length, orientation,
301
25 n
Measured distance,
Desired distanct

position, and mutual spatial relations such


as parallelism and collinearity. It provides
fast filters for these attributes. We can spec-
ify arbitrary property combinations to detect
possible road structures. A polynomial clas-
sifier subsequently classifies detected lane
boundaries as curbs, markings, or clutter.
Figure 6a shows an image of a road; Figure
6b shows the results of the lane-boundary
recognition as dark lines. Regions where
obstacles have been detected by the stereo-
vision system are ruled out as possible posi- -5 I I I I I I 1 1 I I I

tions of road structures. 0 10 20 30 40 50 50 70 80 90 100


Even though the entire global detection Time (seconds)
needs only 80 to 100ms on a PowerPC 604,
data-driven methods are computationally Figure 5. Autonomous vehicle following in city traffic. Speed is measured in meters per second.
quite expensive. However, as we learned
from the work on highway lane following,
model-based estimation of road geometry is detection and classification, mentioned earlier. butes are the area, bounding box, aspect ratio,
very efficient in that it needs to consider only It uses shape and brightness cues in a region- length, and smoothness of contour, and addi-
small portions of the image when lane mark- based approach. The detection step consists of tional features that we determine based on
ings are locally tracked in image sequences. gray-value segmentation and filtering. computational cost.
Our urban lane tracker combines global The gray-scale segmentation involves We have developed a query language for
analysis with this local method. For as long reducing the number of colors in the original this region database, dubbed MetaCCC,
as possible, it locally tracks boundaries image to a handful. In this application, we which allows queries based on attributes of
detected by the global analysis to estimate base this reduction on the minima and single regions as well as on relations between
the vehicle's lateral position and yaw angle plateaus of the gray-scale histogram. regions (for example, adjacency, proximity,
in a few milliseconds. Global detection and Following this gray-scale segmentation, enclosure, and collinearity). The filtering step
local tracking work together to deal with the we apply a color-connected components thus involves formulating a query to select
dynamic environment." (CCC) analysis to the segmented image. The candidate regions from the database.
algorithm produces a database containing The resulting set is normalized for size and
Arrow recognition. The recognition of arrows information about all regions in the seg- given as input to a radial-basis-function
on the road follows the two-step procedure, mented image. Among the computed attri- (RBF) classifier. Figure 7 shows the original

Figure 6. (a) An urban road scenario and (b) its extracted polygonal edge image (light) and detected lane structures (dark).

NOVEMBER/DECEMBER 1998 43
shape-based traffic-sign-detectionsystem for
gray-scale. This system might also be useful
for systems that cannot use color because of
cost considerations.
The gray-scale-detection system works on
edge features, rather than on region features.
It uses a hierarchical template-matching
technique based on distance t ~ a n s f o r m s . ~ ~
DT-based matching allows the matching of
arbitrary (binary) patterns in an image. These
could be circles, ellipses, or triangles, but are
also nonparameterized pattems-for exam-
ple, outlines of pedestrians.
Preprocessing involves reading a test
image (see Figure 8 4 , computing the thresh-
olded edge image (see Figure 8c), and com-
puting its chamfer image, the DT image, (see
Figure 8d). The DT image has the same size
Figure 7. (a) Street scene displaying a direction arrow, and (b) the scene with the arrow segmented and classified. as the binary edge image, but has nonbinary
entries: each pixel contains the image dis-
tance to the nearest edge pixel of the corre-
and the obtained result. system1*goes back to the Prometheus proj- sponding binary edge image.
ect and was originally developed for the Image matching involves correlating a
Traffic signs. The urban scenario offers : highway scenario. Overall, it follows the binary shape pattern (for example, Figure 8b)
number of challenges to traffic-sign recogni. same steps as arrow detection-that is, color with the distance image. At each template
tion. First, we must extend the field of view segmentation, filtering, and classification. location, a correlation measure gives the
which results in lower effective image reso. The color segmentation step involves pixel nearest-distance sum of the template points
lution. At the same time, the use of wide. classification using a lookup table. We gen- to image edge points. A low value denotes a
angle lenses introduces significant lens dis erated the lookup table offline in a training good match (low dissimilarity); a zero value
tortion. Traffic signs are also not viewec process using a polynomial classifier. denotes a perfect match. If the measure is
head-on, as they are most of the time on high, The segmentation process outputs pixels below a user-defined threshold, we consider
ways. In the city, they must be recognize( labeled “red,” “blue,” “yellow,” and “uncol- that a pattern has been detected.
when they are lateral to the camera and appea: ored.” As before, we apply the CCC algo- The advantage of matching a template with
skewed in the image. We have extended Intel rithm and formulate queries on the resulting the DT image is that the resulting similarity
ligent Stop&Go to cope with these problem: regions, using the MetaCCC procedure. Typ- measure will be smoother as a function of the
and have added the urban-specific signs tha ical queries would include searching for red template parameters (transformation and
are relevant for our goals to the database. regions of a particular shape that encloses an shape). This enables various efficient search
uncolored region. algorithms to lock onto the correct solution.
Color. Our colored-traffic-sign-recognitior The next step, classification, is done with It also allows more variability between a tem-
an RBF classifier in a multistage process. We plate and an object of interest in the image.
input a color-normalized pictograph ex- Matching with the edge image (or the unseg-
tracted from the original image at the candi- mented gradelit image, not shown here), on
date locations provided by the MetaCCC the other hand, typically provides strong peak
procedure. The classification stages involve responses but rapidly declining off-peak
color, shape, and pixel values. Integration responses, which do not facilitate efficient
A Ibl
over time stabilizes the results.12

Gray-scale. Clearly, color information is


search strategies or allow object variability.
We have extended the basic DT-matchmg
scheme with a herarchical approach, which-
beneficial for traffic-sign recognition. Yet in addition to performing a coarse-to-fine
some open issues remain regarding color seg- search over the translation parameters-
mentation. The difficulty i s that there is quite groups templates offline into ahierarchy based
a range in which the same “true” traffic-sign on their similarity.This way, the approach can
color can appear in the image, depending on simultaneously match multiple templates at
illumination conditions-that is, whether the coarse search levels, resulting in significant
camera looks toward or away from the sun speedup factors. The template hierajchy can
Figure 8. Stages of preprocessing and image matching: and whether it’s day or night, rainy or sunny. be constructed manually or generated auto-
(a) an image, (b) template, (c) edge image, and (d) dis- To determine how successful we can be with- matically from available examples (that is,
tance-transform image. out color cues, we are also working on a learning). Furthermore, matching distin-

44 IEEE INTELLIGENT SYSTEMS


Figure 9. Traffic sign detection: (a) day and (b) night (yellow dots denote intermediate results of the hierarchical
search; final results are outlined in red).

guishes features by type and computes sepa- output neurons for the K different object
rate DT’s for each type (for example, by edge classes to be distinguished the output neuron
orientation).l3 with the highest activation denotes the class Figure 10. (a) Recognition of a red traffic light. Also, the
Figure 9 illustrates the hierarchical ap- to which the object is assigned. For traffic cropped and normalized ROI (b) before and (c) after pre
proach.Yellow dots indicate where the match lights, we have K = 2 classes-the “traffic processingit by means of a simulated Mahowald retina.
between the image and a (prototype) tem- light” and “non-traffic-light” classes.
plate of the template tree was good enough to We trained different classifiers for red,
consider matching with more specific tem- red-yellow, yellow, and green traffic lights. the pedestrian outlines in the database and
plates (for example, the children-crossing On a 200-MHz Power PC, the algorithm runs performing a principal-component analysis
sign) on a finer grid. The final detection at a rate of approximately six images per sec- on the resulting data set (see Figure 11).This
results are outlined in red. We achieve typi- ond, depending on the number of traffic-light results in a compact representation of the
cal detection rates of 90% on single frames, candidates. In experiments, we found recog- original image data in terms of eigenvectors.
with 4% to 6% false positives. More than nition rates of above 90%,with false positive The first eigenvectors (and the mean) repre-
95% of the false positives were rejected in a rates below 2%. sent characteristic features of the pedestrian
subsequent pictograph-classification stage, distribution;the last eigenvectorsmostly rep-
using an RBF network. Pedestrians. We are developing work to rec- resent noise (see Figure 12). For example,
ognize the most vulnerable traffic participants, preliminary findingshave shown that we need
Traffic lights. Traffic-light recognition fob pedestrians.We can recognize pedestrians by fewer than 10 eigenvectors to account for
lows the same three steps used before: color either their shape or their characteristicwalk- most of the variability in the pedestrian data.
segmentation, filtering, and classification. ing pattems.14 To find pedestrians in a test image, we
Color segmentation uses a simple lookup apply a gradient operator to the image and
table to determine image parts in the traffic- Shape mmgnirion. We are compiling a large normalize for image encrgy. The resulting
light colors: red, yellow, and green. set of pedestrian outlines to account for the normalized gradient image is projected onto
By applying the CCC algorithm to the seg- wide variability of pedestrian shapes in the the pedestrian eigenspace by correlation with
mented image, the filtering step selects only real world, Our first approach to pedestrian the first few eigenvectors of the data set. A
the blohlike regions whose areas lie in a cer- recognition by shape has involved blurring threshold on the correlation value detemlnes
tain range as possible traftic-light candidates.

NOVEMBER/DECEMBER 1998 45
(bl (cl
Figure 12. Principalcomponent analysis: eigenvertors (a) 0 (mean), (b) 1, (c) 2, and (d) 25.

whether a pedestrian has been found or not. sists of gray-scale image sequences. This pedestrian's legs by evaluating temporal
Figure 13 shows a matching result. Besides results in a TDNN architecture with spatio- changes of a shape-dependent cluster
the coqect solution, we also obtain a false pos- temporal receptive fields. The receptive fields feature.
itive on a window. The laaer detection can be act as filters to extract spatiotemporalfeatures 3 0 segmentation by steYeo vision. As
rejected by considering the camera position- from the input image sequences. Thus, fea- described previously, the stereo algorithm
ing with respect to the road. The rejection rate tuses are not hard-coded but learned during determines bounding boxes circumscrib-
can be further improved by using an additional training. Subsequent classification relies on ing obstacles in the scene.
classification step (for example, RBF or NN) the filtered image sequences rather than on
on the eigenspace representations. Constar- the raw image data. Currently, we use the The TDNN then performs a final venfication
tine Papageorgiou and colleagues have TDNN for the recognition of pedestrian gait on the candidate region sequence (see Figure
described another approach to pedestrian patterns. A detection step determines the 14) We are examining whether the TDNN
recognition,1°which we are integratingin our approximateposition of possible pedestrians; approach can be used for segmentation-free
UTA demonstrator ina collaboration between we have two methods at our disposal: object and motion detection and recognition
MIT and Daimler-Benz Research.
9 Color clustering on monocular images
Morion recognition We can expand the neural in a combined color andpositioiafeature Putting things together
network used for traffic-light recoaiition into space,15 A fast polynomial classifier se-
the temporal dimension; the input then con- lects the clusters possibly containing a In our Prometheus demonstrator VITA I1
(which enabled autonomous driving on high-
ways), we assigned each vision module to a
subset of processors in a static configuration.
The modules ran continually, even though
some of their results were of no interest for
vehicle guidance at that moment.
Using this architecturefor computer vision
in complex, downtown traffic would require
even more resources. But why look for new
traffic signs or continuously determine the
lane width while the car is stopped at a red
traffic light? When driving on a highway,
why look for pedestrians and crosswalks?
Although we were successful with the
brute-force approach, such questions reveal
some of its disadvantages:

It lacks a concept for module control-


for example, to focus resources on rele-
vant tasks in specific situations.
It does not scale for a larger number of
modules.
It does not have a uniform concept for the
interconnection of modules.
Figure 13. Matching results: The pedestrian is correctly detected, but a false alarm occurs in the window above. The adaptation of the system to new

46 IEEE INTELLIGENT SYSTEMS


* Reuse of old modules can be difficult tem runs on three 200-MHz AIX PowerPCs
because of nonexistent interfaces. for the computer-vision modules and a Win-
dows Pentium I1 for the visualization of the
The growing complexity of autonomous results (see Figure 15). The next step is to
systems requires an architecturethat can cope adapt the system to multiple environments
with these problems. We have pursued the ap- and applications, including
proach of making an explicit distinction be-
tween the modules (obstacle detection, vehi- autonomous driving on highways,
cle control, and so on) and the connection of - speed-limit assistance (warns if the dri-
modules to perform a specific application(for ver exceeds the road’s speed limit),
example, autonomous stop-and-go driving). 9 lane-departure warning, and
A module interface connects each module enhanced cruise control (slows the vehi-
to the system. This lets developers concen- cle down when a car in front falls below
trate on solving computer-vision problems the desired speed and speeds it up again
instead of worrying about how to connect if no obstacle is ahead).
their work to the application environment.
Connecting a group of these modules forms
an application.We can implement lane keep-
ing, for example, by connecting the lane-
detection module and the laterd-vehicle-con-
trol module. E EXPERIENCE WE’VE GAINED
(4 This flexible architecture allows develop- from testing Intelligent Stop&Go on urban
ment of various applications without modi- roads strengthens our confidence that com-
gure 14. Pedestrian recognition: (a) three frames of a fying the modules. Moreover, the system puter vision will go downtown, soon bring-
sequence of stereo images (only the left image of each allows runtime connection changes. This ing vision-based driver assistance and au-
pair is shown) on which a 3D segmentation has been per- enables economical use of computational tonomous driving to the city.
formed to determine the bounding boxes shown (the resources and adaptation to the current situ- In this article, we have focused on com-
time-delay neural nefwork recognizedthe corresponding ation. We can, for example, accomplish lane puter vision, but some related issues are also
image regions as the legs of a pedestrian); (b) a result-
keeping on highways and switch dynami- of great interest for this kind of application:
ing cropped ond scaled image sequence to be fed into
cally to the Intelligent Stop&Go application
the TDNN for final verification.
in the city.16 9 Angle of view. We need a wide-angle lens
We have integrated all modules described to see traffic lights and signs when we’re
in this article into the system. Our demon- close to them. However, the precision of
applications (such as enhanced cruise strator UTA is able to recognize traffic lights, stereo-based distance measurement and
control) and the integration of new vis- traffic signs, overtaking vehicles, direction the performance of object detection and
ion modules usually require extensive arrows, and pedestrian crossings, and to fol- recognition-in particular, traffic-sign
reimplementations. low a leading vehicle autonomously. The sys- recognition-increases with the focal

Figure 15. (a) Objects seen by the system and (b) their visualization. The touch screen acts as the interface between developer and vision modules.

NOVEMBER/DECEMBER 1998 47
length. We expect that these problems can 6. A. Broggi, “Obstacle and Lane Detection on lariu Gavrila is aresearch scientist at the Daim-
be solved with high-resolution chips of the ARGO,” Proc. ITSC ‘97:IEEE Conf er-Benz Research Center in Ulm,Germany,where
Intelligent Transportation Systems (CD- le works on vision-based systems for intelligent
1,0002or 2,0002 pixels rather than with ROM), IEEE Press, Piscataway, N.J., 1997. tehicle navigation. His research interests include
rotatable cameras, vario-lenses, or multi-
nodel-basedobjectrecognitionin computer vision
camera systems. 7. W. Enkelmann, “Robust Obstacle Detection mdtrackinghumans for applicationin virtual-real-
Camera dynamics. A second sensor prob- and Tracking by Motion Analysis,” Proc. ity and intelligent burnan-machine interactions.
lem is the insufficient dynamic range of ITSC ’97:IEEE Con5 Intelligent Transpor-
tation Systems (CD-ROM),1997. He received his MS in computer science from the
the common charge-coupled device. The Free University,Amsterdam, and his PhD in com-
new logarithmic CMOS chips (high- 8. K. Saneyoshi, “3-D Image Recognition Sys- puter science lrom the University of Maryland at
dynamic-range chips) promise a way out tem by Means of Stereoscopy Combined with College Park. Contact him at Daimler-Benz
of this dilemma. On sunny days, these Ordinary Image Processing,”Proc. Intelligent Research, T728, D-70546 Stuttgart, Germany:
chips will help us see structures in shad- Vehicles Conf ’94, IEEE Press, 1594, pp. 5avrila@dbag.ulm.daimlerbenz.com.
13-18.
owed areas. At night, they’ll improve
vision when bright lights glare into the 9. U. Franke and I. Kutzbach, “Fast Stereo
camera and confuse the automatic cam- Based Object Detection for Stop&Go Traf-
era controI. fic,”Proc. htelligenr VehiclesCont ’96,lEEE
Press, 1996,pp. 339-344. Steffen Gorzig is working on his PhD thesis at
Application-specific hardware. To bring Daimler-Benz Research. His main interests in-
intelligent vision systems to the market, 10. C. Papageorgiou, M. Oren, andT. Poggio, “A clude parallel programming and multiagent sys-
we also have to address appropriate hard- General Framework for Object Detection,” tems. He received his Diploma in computer sci-
ware cost. Analog chips for early vision Proc. Int’l Cons Computer Vision, IEEE ence from the University of Stuttgart, Germany.
steps and modem FPGA (field-program- Press, 1998,pp. 555-562. Contact him at Daimler-Benz Research,T728, D-
mable gate array) technology offer pos- 70546 Stuttgart, Germany; goerzig@dbag.stg.
11, F, Paetzold and U, Franke, “Road Recogni- daimlerbenz.com.
sible solutions for future mobile vision tion in Urban Environment,” IEEE Conf.
systems. We are investigating the possi- Intelligent Transportation Systems, IEEE
bility of using FPGAs to speed up edge Press, 1998, pp. 87-91
detection and the distance transform.
12. R. Janssen et al., “HybridApproach for Traf-
fic Sign Recognition,”Proc. Intelligent VehL- Frank Lindner is a research scientist at the
In addibon to these technical issues, impor- d e s Conf, IEEE Press, 1993,pp, 390-397. Daimler-Benz Research Center, Ulm. His re-
tant legal (that is, liability) and acceptance search interests include classification theory and
issues must be resolved before our customers 13. D. Gavrila, “Multifeature Hierarchical Tem- its applications.He received his Diploma in com-
can drive vehicles with an Intelligent Stop& plate Matching Using Distance Transforms,” puter science from the University of Stuttgart.
Go system into the city Proc. ICPR’98: Int’I Con5 Pattern Recogni- Contact him at Daimler-Benz Research,T728, D-
tion,IEEE Press, 1998, pp. 439444.
70546 Stuttgart, Germany; liiidner@dbag.ulm.
14. D.M. Gavrila, “The Visual Analysis of Hu- daimlerbenz.com.
man Movement: A Survey,” to be published
in Computer Vision Image Understanding,
Vol. 73, No. 1,1959.
References 15. B. Heisele and C. Woebler, “Motion-Based Frank Paetzold is working on his PhD thesis at
Recognition of Pedestrians,”Proc. ICPR ’98: the Daimler-Benz Research Department,Essling-
1. E.D. Dickmanns and A. Zapp, “A Curvature- Int’l Cor$ Pattern Recognition, IEEE Press,
Based Scheme for Improving Road Vehicle en. His research interests focus on computer vision
1998,pp. 1325-1330. for autonomous vehicle navigation. He received
Guidance by Computer Vision,” Proc. SPIE
Con5 Mobile Robots, 1986,pp. 161-168. 16. S. Gorzig and U. Franke, “ANTS: Intelligent his Diploma in electricalengineering from the Uni-
Vision in Urban Traffic,”IEEE Con$ Intelli- versity of Bochum. Contact him at Daimler-Benz
2. J. Weber et al., “New Results in Stereo-Based gent Transportation Systems, IEEE Press, Research, T728,D-70546 Stuttgart, Germany;
Automatic Vehicle Guidance,” Proc. Inrelli- 1558,pp. 545-549. paetzold@dbag.stg.daimlerbenz.com.
gent Vehicles Con6 ‘95,IEEE Press, Piscat-
away, N.J., 1995,pp. 530-535.
3. U. Franke et al., “The Daimler-Benz Steering
Assistant: A Spin-off from Autonomous Dn- Uwe Franke is a research scientist with the Daim-
ving,” Proc. Intelligent Vehicles Conf ‘94, ler-Benz Research Department, Esslmgen, Ger- Christian Wohler is working on his PhD thesis at
IEEE Press, 1994,pp. 120-126. many, and is responsible for the Intelligent the,Daimler-BenzReseai-chCenter, Ulm. His re-
Stop&Go project His research interests include search interests include object recognition and
4. D. Pomerleau and T. Jochem, “Rapidly vision-based lane recogiiihon for autonomous motion analysis on image sequences by time-delay
Adapting Machine Vision for Automated vehicles, lane-departure waming systems, sensor neural networks. He studied physics at the univer-
Vehicle Steering,”IEEE Expert, Vol. 11, No. fusion of vision and radar, and truck platoonmg
He received his Dip1 -1ng and PhD-both m elec-
sities of Wurzburg, Germany, and Grenoble,
2,Apr. 1596,pp. 15-27.
trical engineering from the Technical University France, and received his Diploma in physics from
5. B. Ulmer, “VITA 11:Active CollisionAvoid- of Aachen. Contact him at Daimler-Benz Re- the Univ. of Wurzburg. Contact him at Daimler-
ance in Real Traffic,” Proc. Intelligent Vehi- search, T728, D-70546 Stuttgart, Germany, Benz Research, J728, D-70546 Stuttgart, Ger-
cles ConS. ’94, IEEE Press, 1994,pp. 1 4 . franke@dhag stg daimlerbenz coin many; woehler@dhag.ulm.daimlerbenz.com.

48 IEEE INTELLIGENT SYSTEMS

You might also like