Professional Documents
Culture Documents
Franke 1998
Franke 1998
Research
NOVEMBER/DECEMBER 1998 41
Once the system obtains an ROI, it applies
more computationally intensive algorithms
to recognize the object (that is, to estimate
model parameters). In this application, ob-
jects have a wide variety of appearances
because of shape variability, different view-
ing angles, and illumination changes. Be-
cause explicit models are seldom available,
we derive models implicitly by learning from
examples. Recognition is a classification
process, and we have implemented a large
set of classifiers for this purpose-that is,
polynomial classifiers, principal-component
analysis, radial-basis functions, time-delay
neural networks (TDNN), and support-vector
machines (the latter in collaboration with the
Time (seconds) MIT Center for Biological and Computa-
tional Learninglo).To provide test examples
Figure 3. Estimated camera pitch angle during acceleration and deceleration. for these learning approaches to object recog-
nition, we’ve recorded many hours of data
on a digital video recorder while driving in
acceleration in longitudinal as well as lateral urban traffic. We label interesting data seg-
directions) of these objects based on their ments offline.
positions relative to the camera system.
The longitudinal motion parameters are Road recognition.Road recognition has two
the inputs for a distance controller. Figure 5 major objectives: to enable automatic road
shows the results of a test drive in Esslingen, following and to improve lateral-control be-
Germany. The desired distance consists of a havior. As a stand-alone application, road
safety distance of 10meters and a time head- recognition enables automatic road-follow-
way of 1 second. Note the small distance er- ing. For example, Intelligent Stop&Go must
ror when the leader stops. register when the leading vehicle makes lane
changes so that it can return the following
-4 -2 0 2 4 vehicle back to the driver’s control.
Lateral position of detected objects Object recognition Furthermore, the lateral-control behavior
(meters) of Intelligent Stop&Go can be improved. As
Figure 4. Depth map: trees and the car cause the peaks. Essential to Intelligent Stop&Go’s intelli- the standard solution, a lateral tow-bar con-
gence is its ability to recognize objects. Two troller provides lateral guidance. Intelligent
classes of objects pertain to this application: Stop&Go needs to know the vehicle’s rela-
m W g features. The depth map in Figure 4 tionship to the leading vehicle in terms of &s-
covers 30 m x 8 m. The leading vehicle causes infrastructure elements (road boundanes, tance and angle, as measured by the stereo-
the two peaks at 15m (&stance) at the 0 m and road marks, traffic signs, and traffic vision system. A tow-bar-controlled vehicle
-2 m lateral positions. We use this map to lights) and approximately follows the leading vehicle’s
detect objects that are subsequently tracked. In - traffic participants (pedestrians, motor- trajectory, but it tends to cut comers in nar-
each loop, already tracked objects are deleted cycles, bicycles, and vehicles). row tums. We can minimize this undesirable
from the depth map before the next detection. behavior if we know the following vehicle’s
The detection step roughly estimates an How do we recognize various objects? position relative to the lane.
object’s width. This step fits a rectangular box Although devising a general framework is dif-
to the cluster of feature points that contributed ficult, we often fmd ourselves applying two Road boundaries and markings. Lanes in
to the depth map’s extracted area. The system steps, detection and classification. The pur- urban roads are often not as well-defied as
tracks this cluster from frame to frame. To pose of detection is to efficiently obtain a those inhighways. Lane boundaries often have
estimate the obstacle’s distance, the system region of inter-est-that is, a region in image or poor visibility. Various objects, such as traffic
computes a disparity histogram of the object’s parameter space that could be associated with participants or background infrastructure,clut-
feature points. To obtain a dispai’ity estimate apotential object. Besides the stereo approach, ter the image. The road topology comprises a
with subpixel accuracy, the system fits a we have also developed detection methods variety of different lane elements, including
parabola to this histogram’s maximum. based on shape, color, and motion. For exam- stop h e s , pedestsian crossings, and forbidden
The current version can consider an arbi- ple, vehicles use shape and color cues to find zones. A comprehensive geometrical road
trary number of objects. Kalman filters esti- potential traffic signs and mows on the road, model cannot be readily defined. All these
mate the motion states (that is, speed and and motion to find potential pedestrians. characteristics suggest the need for a data-dri-
Figure 6. (a) An urban road scenario and (b) its extracted polygonal edge image (light) and detected lane structures (dark).
NOVEMBER/DECEMBER 1998 43
shape-based traffic-sign-detectionsystem for
gray-scale. This system might also be useful
for systems that cannot use color because of
cost considerations.
The gray-scale-detection system works on
edge features, rather than on region features.
It uses a hierarchical template-matching
technique based on distance t ~ a n s f o r m s . ~ ~
DT-based matching allows the matching of
arbitrary (binary) patterns in an image. These
could be circles, ellipses, or triangles, but are
also nonparameterized pattems-for exam-
ple, outlines of pedestrians.
Preprocessing involves reading a test
image (see Figure 8 4 , computing the thresh-
olded edge image (see Figure 8c), and com-
puting its chamfer image, the DT image, (see
Figure 8d). The DT image has the same size
Figure 7. (a) Street scene displaying a direction arrow, and (b) the scene with the arrow segmented and classified. as the binary edge image, but has nonbinary
entries: each pixel contains the image dis-
tance to the nearest edge pixel of the corre-
and the obtained result. system1*goes back to the Prometheus proj- sponding binary edge image.
ect and was originally developed for the Image matching involves correlating a
Traffic signs. The urban scenario offers : highway scenario. Overall, it follows the binary shape pattern (for example, Figure 8b)
number of challenges to traffic-sign recogni. same steps as arrow detection-that is, color with the distance image. At each template
tion. First, we must extend the field of view segmentation, filtering, and classification. location, a correlation measure gives the
which results in lower effective image reso. The color segmentation step involves pixel nearest-distance sum of the template points
lution. At the same time, the use of wide. classification using a lookup table. We gen- to image edge points. A low value denotes a
angle lenses introduces significant lens dis erated the lookup table offline in a training good match (low dissimilarity); a zero value
tortion. Traffic signs are also not viewec process using a polynomial classifier. denotes a perfect match. If the measure is
head-on, as they are most of the time on high, The segmentation process outputs pixels below a user-defined threshold, we consider
ways. In the city, they must be recognize( labeled “red,” “blue,” “yellow,” and “uncol- that a pattern has been detected.
when they are lateral to the camera and appea: ored.” As before, we apply the CCC algo- The advantage of matching a template with
skewed in the image. We have extended Intel rithm and formulate queries on the resulting the DT image is that the resulting similarity
ligent Stop&Go to cope with these problem: regions, using the MetaCCC procedure. Typ- measure will be smoother as a function of the
and have added the urban-specific signs tha ical queries would include searching for red template parameters (transformation and
are relevant for our goals to the database. regions of a particular shape that encloses an shape). This enables various efficient search
uncolored region. algorithms to lock onto the correct solution.
Color. Our colored-traffic-sign-recognitior The next step, classification, is done with It also allows more variability between a tem-
an RBF classifier in a multistage process. We plate and an object of interest in the image.
input a color-normalized pictograph ex- Matching with the edge image (or the unseg-
tracted from the original image at the candi- mented gradelit image, not shown here), on
date locations provided by the MetaCCC the other hand, typically provides strong peak
procedure. The classification stages involve responses but rapidly declining off-peak
color, shape, and pixel values. Integration responses, which do not facilitate efficient
A Ibl
over time stabilizes the results.12
guishes features by type and computes sepa- output neurons for the K different object
rate DT’s for each type (for example, by edge classes to be distinguished the output neuron
orientation).l3 with the highest activation denotes the class Figure 10. (a) Recognition of a red traffic light. Also, the
Figure 9 illustrates the hierarchical ap- to which the object is assigned. For traffic cropped and normalized ROI (b) before and (c) after pre
proach.Yellow dots indicate where the match lights, we have K = 2 classes-the “traffic processingit by means of a simulated Mahowald retina.
between the image and a (prototype) tem- light” and “non-traffic-light” classes.
plate of the template tree was good enough to We trained different classifiers for red,
consider matching with more specific tem- red-yellow, yellow, and green traffic lights. the pedestrian outlines in the database and
plates (for example, the children-crossing On a 200-MHz Power PC, the algorithm runs performing a principal-component analysis
sign) on a finer grid. The final detection at a rate of approximately six images per sec- on the resulting data set (see Figure 11).This
results are outlined in red. We achieve typi- ond, depending on the number of traffic-light results in a compact representation of the
cal detection rates of 90% on single frames, candidates. In experiments, we found recog- original image data in terms of eigenvectors.
with 4% to 6% false positives. More than nition rates of above 90%,with false positive The first eigenvectors (and the mean) repre-
95% of the false positives were rejected in a rates below 2%. sent characteristic features of the pedestrian
subsequent pictograph-classification stage, distribution;the last eigenvectorsmostly rep-
using an RBF network. Pedestrians. We are developing work to rec- resent noise (see Figure 12). For example,
ognize the most vulnerable traffic participants, preliminary findingshave shown that we need
Traffic lights. Traffic-light recognition fob pedestrians.We can recognize pedestrians by fewer than 10 eigenvectors to account for
lows the same three steps used before: color either their shape or their characteristicwalk- most of the variability in the pedestrian data.
segmentation, filtering, and classification. ing pattems.14 To find pedestrians in a test image, we
Color segmentation uses a simple lookup apply a gradient operator to the image and
table to determine image parts in the traffic- Shape mmgnirion. We are compiling a large normalize for image encrgy. The resulting
light colors: red, yellow, and green. set of pedestrian outlines to account for the normalized gradient image is projected onto
By applying the CCC algorithm to the seg- wide variability of pedestrian shapes in the the pedestrian eigenspace by correlation with
mented image, the filtering step selects only real world, Our first approach to pedestrian the first few eigenvectors of the data set. A
the blohlike regions whose areas lie in a cer- recognition by shape has involved blurring threshold on the correlation value detemlnes
tain range as possible traftic-light candidates.
NOVEMBER/DECEMBER 1998 45
(bl (cl
Figure 12. Principalcomponent analysis: eigenvertors (a) 0 (mean), (b) 1, (c) 2, and (d) 25.
whether a pedestrian has been found or not. sists of gray-scale image sequences. This pedestrian's legs by evaluating temporal
Figure 13 shows a matching result. Besides results in a TDNN architecture with spatio- changes of a shape-dependent cluster
the coqect solution, we also obtain a false pos- temporal receptive fields. The receptive fields feature.
itive on a window. The laaer detection can be act as filters to extract spatiotemporalfeatures 3 0 segmentation by steYeo vision. As
rejected by considering the camera position- from the input image sequences. Thus, fea- described previously, the stereo algorithm
ing with respect to the road. The rejection rate tuses are not hard-coded but learned during determines bounding boxes circumscrib-
can be further improved by using an additional training. Subsequent classification relies on ing obstacles in the scene.
classification step (for example, RBF or NN) the filtered image sequences rather than on
on the eigenspace representations. Constar- the raw image data. Currently, we use the The TDNN then performs a final venfication
tine Papageorgiou and colleagues have TDNN for the recognition of pedestrian gait on the candidate region sequence (see Figure
described another approach to pedestrian patterns. A detection step determines the 14) We are examining whether the TDNN
recognition,1°which we are integratingin our approximateposition of possible pedestrians; approach can be used for segmentation-free
UTA demonstrator ina collaboration between we have two methods at our disposal: object and motion detection and recognition
MIT and Daimler-Benz Research.
9 Color clustering on monocular images
Morion recognition We can expand the neural in a combined color andpositioiafeature Putting things together
network used for traffic-light recoaiition into space,15 A fast polynomial classifier se-
the temporal dimension; the input then con- lects the clusters possibly containing a In our Prometheus demonstrator VITA I1
(which enabled autonomous driving on high-
ways), we assigned each vision module to a
subset of processors in a static configuration.
The modules ran continually, even though
some of their results were of no interest for
vehicle guidance at that moment.
Using this architecturefor computer vision
in complex, downtown traffic would require
even more resources. But why look for new
traffic signs or continuously determine the
lane width while the car is stopped at a red
traffic light? When driving on a highway,
why look for pedestrians and crosswalks?
Although we were successful with the
brute-force approach, such questions reveal
some of its disadvantages:
Figure 15. (a) Objects seen by the system and (b) their visualization. The touch screen acts as the interface between developer and vision modules.
NOVEMBER/DECEMBER 1998 47
length. We expect that these problems can 6. A. Broggi, “Obstacle and Lane Detection on lariu Gavrila is aresearch scientist at the Daim-
be solved with high-resolution chips of the ARGO,” Proc. ITSC ‘97:IEEE Conf er-Benz Research Center in Ulm,Germany,where
Intelligent Transportation Systems (CD- le works on vision-based systems for intelligent
1,0002or 2,0002 pixels rather than with ROM), IEEE Press, Piscataway, N.J., 1997. tehicle navigation. His research interests include
rotatable cameras, vario-lenses, or multi-
nodel-basedobjectrecognitionin computer vision
camera systems. 7. W. Enkelmann, “Robust Obstacle Detection mdtrackinghumans for applicationin virtual-real-
Camera dynamics. A second sensor prob- and Tracking by Motion Analysis,” Proc. ity and intelligent burnan-machine interactions.
lem is the insufficient dynamic range of ITSC ’97:IEEE Con5 Intelligent Transpor-
tation Systems (CD-ROM),1997. He received his MS in computer science from the
the common charge-coupled device. The Free University,Amsterdam, and his PhD in com-
new logarithmic CMOS chips (high- 8. K. Saneyoshi, “3-D Image Recognition Sys- puter science lrom the University of Maryland at
dynamic-range chips) promise a way out tem by Means of Stereoscopy Combined with College Park. Contact him at Daimler-Benz
of this dilemma. On sunny days, these Ordinary Image Processing,”Proc. Intelligent Research, T728, D-70546 Stuttgart, Germany:
chips will help us see structures in shad- Vehicles Conf ’94, IEEE Press, 1594, pp. 5avrila@dbag.ulm.daimlerbenz.com.
13-18.
owed areas. At night, they’ll improve
vision when bright lights glare into the 9. U. Franke and I. Kutzbach, “Fast Stereo
camera and confuse the automatic cam- Based Object Detection for Stop&Go Traf-
era controI. fic,”Proc. htelligenr VehiclesCont ’96,lEEE
Press, 1996,pp. 339-344. Steffen Gorzig is working on his PhD thesis at
Application-specific hardware. To bring Daimler-Benz Research. His main interests in-
intelligent vision systems to the market, 10. C. Papageorgiou, M. Oren, andT. Poggio, “A clude parallel programming and multiagent sys-
we also have to address appropriate hard- General Framework for Object Detection,” tems. He received his Diploma in computer sci-
ware cost. Analog chips for early vision Proc. Int’l Cons Computer Vision, IEEE ence from the University of Stuttgart, Germany.
steps and modem FPGA (field-program- Press, 1998,pp. 555-562. Contact him at Daimler-Benz Research,T728, D-
mable gate array) technology offer pos- 70546 Stuttgart, Germany; goerzig@dbag.stg.
11, F, Paetzold and U, Franke, “Road Recogni- daimlerbenz.com.
sible solutions for future mobile vision tion in Urban Environment,” IEEE Conf.
systems. We are investigating the possi- Intelligent Transportation Systems, IEEE
bility of using FPGAs to speed up edge Press, 1998, pp. 87-91
detection and the distance transform.
12. R. Janssen et al., “HybridApproach for Traf-
fic Sign Recognition,”Proc. Intelligent VehL- Frank Lindner is a research scientist at the
In addibon to these technical issues, impor- d e s Conf, IEEE Press, 1993,pp, 390-397. Daimler-Benz Research Center, Ulm. His re-
tant legal (that is, liability) and acceptance search interests include classification theory and
issues must be resolved before our customers 13. D. Gavrila, “Multifeature Hierarchical Tem- its applications.He received his Diploma in com-
can drive vehicles with an Intelligent Stop& plate Matching Using Distance Transforms,” puter science from the University of Stuttgart.
Go system into the city Proc. ICPR’98: Int’I Con5 Pattern Recogni- Contact him at Daimler-Benz Research,T728, D-
tion,IEEE Press, 1998, pp. 439444.
70546 Stuttgart, Germany; liiidner@dbag.ulm.
14. D.M. Gavrila, “The Visual Analysis of Hu- daimlerbenz.com.
man Movement: A Survey,” to be published
in Computer Vision Image Understanding,
Vol. 73, No. 1,1959.
References 15. B. Heisele and C. Woebler, “Motion-Based Frank Paetzold is working on his PhD thesis at
Recognition of Pedestrians,”Proc. ICPR ’98: the Daimler-Benz Research Department,Essling-
1. E.D. Dickmanns and A. Zapp, “A Curvature- Int’l Cor$ Pattern Recognition, IEEE Press,
Based Scheme for Improving Road Vehicle en. His research interests focus on computer vision
1998,pp. 1325-1330. for autonomous vehicle navigation. He received
Guidance by Computer Vision,” Proc. SPIE
Con5 Mobile Robots, 1986,pp. 161-168. 16. S. Gorzig and U. Franke, “ANTS: Intelligent his Diploma in electricalengineering from the Uni-
Vision in Urban Traffic,”IEEE Con$ Intelli- versity of Bochum. Contact him at Daimler-Benz
2. J. Weber et al., “New Results in Stereo-Based gent Transportation Systems, IEEE Press, Research, T728,D-70546 Stuttgart, Germany;
Automatic Vehicle Guidance,” Proc. Inrelli- 1558,pp. 545-549. paetzold@dbag.stg.daimlerbenz.com.
gent Vehicles Con6 ‘95,IEEE Press, Piscat-
away, N.J., 1995,pp. 530-535.
3. U. Franke et al., “The Daimler-Benz Steering
Assistant: A Spin-off from Autonomous Dn- Uwe Franke is a research scientist with the Daim-
ving,” Proc. Intelligent Vehicles Conf ‘94, ler-Benz Research Department, Esslmgen, Ger- Christian Wohler is working on his PhD thesis at
IEEE Press, 1994,pp. 120-126. many, and is responsible for the Intelligent the,Daimler-BenzReseai-chCenter, Ulm. His re-
Stop&Go project His research interests include search interests include object recognition and
4. D. Pomerleau and T. Jochem, “Rapidly vision-based lane recogiiihon for autonomous motion analysis on image sequences by time-delay
Adapting Machine Vision for Automated vehicles, lane-departure waming systems, sensor neural networks. He studied physics at the univer-
Vehicle Steering,”IEEE Expert, Vol. 11, No. fusion of vision and radar, and truck platoonmg
He received his Dip1 -1ng and PhD-both m elec-
sities of Wurzburg, Germany, and Grenoble,
2,Apr. 1596,pp. 15-27.
trical engineering from the Technical University France, and received his Diploma in physics from
5. B. Ulmer, “VITA 11:Active CollisionAvoid- of Aachen. Contact him at Daimler-Benz Re- the Univ. of Wurzburg. Contact him at Daimler-
ance in Real Traffic,” Proc. Intelligent Vehi- search, T728, D-70546 Stuttgart, Germany, Benz Research, J728, D-70546 Stuttgart, Ger-
cles ConS. ’94, IEEE Press, 1994,pp. 1 4 . franke@dhag stg daimlerbenz coin many; woehler@dhag.ulm.daimlerbenz.com.