A Real-Time Vehicle Detection and Tracking System in Outdoor Traffic Scenes

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

A Real-Time Vehicle Detection and Tracking System in Outdoor Traffic Scenes

1 1 1 2 2
Xin Li , XiaoCao Yao , Yi L. Murphey , Robert Karlsen , Grant Gerhart
1
Department of Electrical and Computer Engineering
The University of Michigan-Dearborn
Dearborn, Michigan 48128-1491, U.S.A.
Voice: 313-593-5028, Fax: 313-593-9967
yilu@umich.edu
2
U. S. Army - TARDEC, Warren, MI 48397-5000

Abstract Examples of research that use optical flow techniques


for obstacle detection and tracking can be found in the
This paper presents a moving vehicle detection and ROMA system [1] and the ASSET-2 (A Scene Segmenter
tracking system, MVDT, for real-time operation in Establishing Tracking v2) [2]. However, optical flow
outdoor scenes. MVDT consists of three major based methods fail when the relative motion between
components: road detection, vehicle detection, and objects and cameras is small. The non-monocular image
vehicle tracking. The road detection algorithm utilizes a processing method can overcome this drawback.
plane-fitting feature, the vehicle detection uses both Examples of stereo vision based ITS research can be
segmented blob and Snakes blob features in a neural found in the UTA (Urban Traffic Assistance) project by
network classifier, and a fast vehicle tracking algorithm is the DaimlerChrysler [3] and ARGO prototype vehicle [4].
designed to locate vehicles in consecutive image frames. We have developed a real-time computer vision
We show through experiments that MVDT is effective in system, MVDT, which uses both stereo vision imagery
detecting moving vehicles in various outdoor scenes and and motion cues for moving vehicle detection and
can indeed reach real-time operation requirements. tracking. Figure 1 illustrates the computational steps in
MVDT. MVDT contains two major components: road
1. Introduction detection and vehicle detection and tracking.
ITS’s (Intelligent Transportation Systems) have been
an active research area for the last decade. Moving
vehicle detection is an important component in an ITS
that can be used in driver assistance system and Adaptive
Cruise Control (ACC) to give warning signals to drivers
in case of potential collisions. Therefore, it is an
important technology for improving traffic safety and
transportation efficiency.
Our research is focused on developing a real-time vehicle
moving vehicle detection system for operation in outdoor detection
scenes. Several facts make this task rather challenging:
• Real-time performance requires efficient
algorithms
• Cameras mounted in a moving vehicle are not
static, which means most methods used by
surveillance systems can not be applied here Figure1: System architecture of MVDT
• Scenes are dynamic and so, less prior knowledge
can be utilized. The road detection algorithm (Road Model) is
• Illumination and contrast might change, even in the
designed to detect either well-structured or unstructured
same scene at different times roads using grayscale images. The objective of Road
• Objects’ shape and color have broad ranges
Model is to detect and remove road regions in an image in
real-time, so that later processes can focus on the ROI
Vision-based systems developed for this task can be (Region Of Interest). The input is the depth image of the
categorized into two classes: optical flow based scene calculated from the stereo camera system. The
techniques and non-monocular image processing based segmentation algorithm extracts the regions that are likely
techniques. to be vehicles, and the feature extraction algorithm
calculates vehicle features to be used by the neural

0-7695-2128-2/04 $20.00 (C) 2004 IEEE


network system for vehicle classification. A fast vehicle ( A, B , C ) for grid Gr ,c by using a least squares fitting
tracking algorithm is presented to locate vehicles in K
consecutive image frames. After tracking K frames, the method.
MVDT system returns to the Road Model, followed by [Step 3] The plane fitting error Err of grid Gr ,c is
r, c
vehicle detection. The combination of vehicle detection
and tracking can make real time operation possible. calculated by:
Experiments have been conducted on image sequences Err =
' '
∑ x + M − 1 ∑ y + N − 1 ( P ( x , y ) − PlaneFitti ng ( x , y ))
2

captured on highways and rural roads and the results are r, c x= x' y= y'

presented. ' '


where ( x , y ) is the coordinates of the left-bottom pixel
2. Road detection within grid Gr ,c , and PlaneFitting ( x, y ) is the fitted
We have developed a road detection algorithm that value obtained by,
attempts to identify regions of roads in a disparity image PlaneFitting ( x , y ) = A * x + B * y + C
generated from stereo imagery and then segment road Based on the plane fitting error at each grid (r, c), we
areas from other parts of the scene. The roads can be develop the following road detection algorithm.
either structured or non-structured. The algorithm is The road detection algorithm, Road Model, uses grids
designed based on the following four assumptions: as units and searches for road areas from lower rows to
• There is only one connected road area in each
higher rows. At each row, it first finds a grid that is likely
image to be within the road region and uses it as “seed”. A grid
• Roads always appear in the lower part of images
is considered as a seed if its plane fitting error is within
• Pixel values within a small area of the road do not
the road range [T1, T2]. T2 is set to make sure the grid has
change much a small plane fitting error and T1 is used to eliminate
• Road areas are reasonably flat
X
totally dark areas, which often occur under the trees, or
M totally bright saturated areas. It then searches leftward
GH −1,0 ... ... G H −1,W −1 N
from the “seed”, and then rightward to grow the road
...

region. A grid is considered as belonging to the road


. P(x,y) region if its plane fitting error is within the road range, or
... ...
...

...

G r ,c
Y it is close to an adjacent grid that is considered a road
grid. The procedure continues to grow in this way until
...

...

...

r, y no more road grids are found. Figure 3 shows three


G0, 0 ... G0,W / 2 ... G0 ,W −1 examples of roads found by Road Model. The grids are
the road region found by Road Model, while the white
0 c, x
grids are the seed grids. Road Model is successful in
detecting road regions in front of the prime vehicle. Note
Figure 2. Illustration
Illustration of plane fitting in image that in Fig. 3(b) Road Model stopped at the lane lines,
grids
which is not our concern within this project.
The road detection algorithm utilizes road features
called plane fitting errors that are obtained as follows. An
input image of size X*Y is uniformly divided into grids,
where each grid contains M*N pixels (see Figure 2). The
plane fitting error is calculated for each grid by the
following procedure: (a) (b) (c)
[Step 1] Each grid Gr ,c (see Figure 2) within a road Figure 3. Road regions detected by Road Model. (a)
is a freeway, (b) is a city street, and (c) is a dirt road.
region can be modeled by a linear function:
P( x, y ) = A * x + B * y + C (1)
3. Vehicle detection and tracking
where ( x, y ) is the coordinates of a pixel within this grid Vehicle detection consists of the following processes,
and P ( x, y ) is the disparity value of the corresponding image segmentation, vehicle detection, which consists of
pixel(see Figure 3). A, B and C are the coefficients to be feature extraction and neural network classification, and
determined. vehicle tracking.
[Step 2] For each image grid, fit all points to the function
in Eq. (1) to obtain M*N functions and the coefficients, 3.1. Image segmentation
The image segmentation algorithm uses the depth
image to segment possible vehicle regions. A depth

0-7695-2128-2/04 $20.00 (C) 2004 IEEE


image provides distance information of objects in a scene. X, Y and R are shown in Figure 5; Depth_Std_Deviation
The assumption used by the segmentation algorithm is is the standard deviation of depth values within a
that distance values within certain objects should be segmented blob.
similar. Based on this assumption, we expect the first Both Blob_Moments and Snakes_Moments are the
derivative of depth values for a given object to be small; invariant moments features [5] calculated from segmented
but the derivatives at the object boundaries should be blobs and Snakes blobs respectively. The Snake blobs are
more significant. generated by using a circle centered on the bounding box
The formulas we use to calculate the first derivative in of the segmented blob as the initial shape(see Figure 5),
the x and y directions are as follows: and then applying the Snakes algorithm described in [6]
Depth ( x + h , y ) − Depth ( x − h , y ) , to the segmented blob.
Der _ X ( x , y ) =
2h The reason for using both the segmented and the
Depth ( x , y + h ) − Depth ( x , y − h ) , Snakes blob is that Snakes blobs can be quite different
Der _ Y ( x , y ) =
2h from segmented blobs as illustrated in Figure 6. The
where Depth( x, y ) is pixel ( x, y ) ’s value in a depth Snakes blobs usually provide smooth boundaries and
image, and h ∈ {1,2,3,...n...} . more accurate shape descriptions (see Figure 6 (a) and
(b)). However, in some cases the Snake blobs can miss
The segmentation algorithm is outlined as follows: the fine details of object shapes such as shown in Figure 6
INPUT: A depth image (c) and (d). The blob in Figure 6(c) is clearly not a
[Step 1] Eliminate road area marked by Road Model vehicle, but the Snakes blob shown in 6(d), which is
[Step 2] Eliminate pixels with large distance values generated from 6(c), looks more like a vehicle blob,
[Step 3] For each pixel (x,y), which is not desirable.
if ( !( Der _ X ( x , y ) == 0 && Der _ Y ( x, y ) == 0 ) ) Initial Snake
Shape
Seg _ depth ( x , y ) = 0 ; Bounding
Box
else Seg _ depth ( x , y ) = Depth ( x , y )
Y
[Step 4] Apply the connected component algorithm to Segmented
Seg_depth to obtain object regions. Image Region:
R
OUTPUT: A segmented depth image.
X

Figure 4 shows an example generated by the Figure 5. Illustration of initial shape for generating
segmentation algorithm. The left image is the depth Snakes blobs.
image of the same scene as Figure 3 (a), and the image on
the right is the seg_depth image. The segmented objects
are shown in Figure 6 (a) and (c).

(a) (b) (c) (d)


Figure 6. (a) Segmented car blob; (b) the Snakes
blob of (a); (c) segmented non-
non-car blob, (d) Snakes
blob of (c).
(a) (b)
Figure 4. An example of depth image segmentation. 3.3. Vehicle tracking
(a) is the original depth image, and (b) is segmented The motivation of vehicle tracking is to bypass the
depth image, seg_depth. relatively time consuming operation of vehicle detection
on certain image frames. We have developed the
3.2. Vehicle detection using a neural network following tracking algorithm. Let us assume that the
Vehicle detection is performed by a neural network vehicle detection system detect Ot vehicle blobs from
classifier. We use a feed-forward one hidden layer image frame Ft. At frame Ft+1, we only track these Ot
network with backpropagation as the learning algorithm. vehicle blobs. In order to make the tracking efficient, we
The critical part of a neural network classifier define a match window of NxN within each vehicle blob
implementation is to specify effective object features, Vj, and a search window S. Let (xi, yi) be the center of Vi,
which are used as input feature vectors to the neural then the location of Vi in Ft+1 is found by calculating the
network. We define 17 features for each region: following least Mean Absolute Error (MAE) at every
(Aspect_Ratio, Occupancy_Ratio, Depth_Std_Deviation, point (i, j) within image frame Ft+1.
7 Blob_Moments, 7 Snakes_Moments), where:
Aspect _ Ratio = Y / X , Occupancy _ Ratio = R /( X * Y )

0-7695-2128-2/04 $20.00 (C) 2004 IEEE


N −1 N −1

∑ ∑ F ( x + p, y + q ) − F
p =0 q =0
t i i t +1 ( xi + i + p, yi + j + q)
MAE(i, j ) = 2
N
The location of Vi is at (x i + i ' , y j + j ' ) if
MAE (i' , j' ) is minimum among all points within the
search window S in image frame Ft+1.
The tracking operation can be carried out in the next K (a) (b)
Figure 7. Two examples of vehicles detected by the
frames after t th frame, K ≥ 1. After tracking through K
MVDT system from the test image sequence.
frames, the Road Model and vehicle detection system can
be applied to image frame t+K+1 again. The cycle of 5. Acknowledgment
detection and tracking can be repeated throughout the This work is supported in part by a grant from the
entire image sequence. However, in order to avoid Education Foundation of TRW Inc., and a DoD SBIR
missing new incoming vehicles into the scene, the contract.
tracking should be applied infrequently and K should be
small.
6. References
[1] W. Kruger, W. Enkelmann, S. Rossle, “Real-time
4. Experiments estimation and tracking of optical flow vectors for
We tested the MVDT system on image sequences obstacle detection”, in: Proceddings of the IEEE
captured in various outdoor scenes. The neural network Intelligent Vehicles Symposium’98, Stuttgart, Germany,
system has an architecture of 17 inputs, 5 hidden nodes Oct. 1998, pp.341-346.
and 1 output. The image sequences for training and
[2] S.M. Smith, J.M. Brady, “ASSET-2: Real-time motion
testing were captured at the rate of 24 fps. The training
segmentation and shape tracking”, IEEE Transactions on
data consists of 263 car blobs and 1470 non-car blobs,
segmented from image sequences taken from highways Pattern Analysis and Machine Intelligence 17(8), 1995,
and city streets. In the testing data, there are 84 car blobs pp.814-829.
and 519 non-car blobs. The test data is totally blind, since [3] U. Franke, A. Joos, “Real-time Stereo Vision for
the blobs were generated from two image sequences taken Urban Traffic Scene Understanding”, Proceedings of the
at different times and on different roads from the training IEEE Intelligent Vehicles Symposium 2000, Dearborn
data. (MI), USA, Oct. 3-5, 2000, pp. 273-278.
Table 1 shows the system performance on the
[4] Alberto Broggi, Massimo Bertozzi, Gianni Conte, and
training and test data. It is encouraging to see that the
system performance on the blind test data is very close to Alessandra Fascioli, ARGO Prototype Vehicle, In L.
its performance on the training data. Figure 7 shows the Vlacic, F. Harashima, and M.Parent, editors, Intelligent
Vehicle Technologies, chapter 14, pp. 445-493,
vehicles detected by MVDT on two images in the test
Butterworth-Heinemann, London, UK, Jun. 2001.
sequences. We calculated computational time of MVDT
while it was running on a Pentium M 1.3GHz, 512 MB [5] M.K. Hu, “Visual pattern recognition by moments
memory under MS Windows XP and our experiment invariants”, IEEE Trans. Inform. Theory, 8, 1962, pp.
showed that MVDT reached about 14 frames per second 179-187.
while the tracking program was called at every 4th frame
with K=1. [6] D. Williams, M. Shah, “A Fast Algorithm for Active
We have presented a vehicle detection and tracking Contours and Curvature Estimation”, Computer Vision,
system, MVDT, and the major algorithms deployed by the Graphics and Image Processing, Vol.55, No.1, Jan. 1992,
MVDT. We showed through experiments that MVDT is pp. 14-26.
effective in detecting moving vehicles in various outdoor
scenes and indeed can reach real-time operation
requirements.
Table 1. Performance of MVDT System
Non-Car Car Correct Error Undecided
Training 91.8% 98.9% 95.44% 4.43% 0.13%
Testing 94.0% 97.8% 94.5% 5.5% 0.0%

0-7695-2128-2/04 $20.00 (C) 2004 IEEE

You might also like