A Novel Algorithm For Estimating Vehicle Speed From Two Consecutive Images

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

A Novel Algorithm for Estimating Vehicle Speed

from Two Consecutive Images

Xiao Chen He and Nelson H. C. Yung


Department of Electrical and Electronic Engineering
The University of Hong Kong, Pokfulam Road, Hong Kong
{xche, nyung } @eee.hku.hk

of vehicle speed from the image sequence presents a


natural extension of the existing methods, and also
Abstract serves to verify the accuracy of the measurement
In this paper, we present a new algorithm for technology concerned. As a result, several
estimating individual vehicle speed based on two image-based vehicle speed estimation techniques have
consecutive images captured from a traffic safety been proposed in the past. Some of them suggested
camera system. Its principles are first, both images are estimating speed by placing reference geometrical
transformed from the image plane to the 3D world shapes either on the road physically or on the screen
coordinates based on the calibrated camera image, and then estimating travel times of vehicles
parameters. Second, the difference of the two between the referenced shapes either through a
transformed images is calculated, resulting in the perspective image view or a perpendicular view
background being eliminated and vehicles in the two normal to the road plane [4], [5], [6]. Some others
images are mapped onto one image. Finally, a block make efforts on pixel speed estimation and convert to
feature of the vehicle closest to the ground is matched ground truth speed by geometrical transformation [7].
to estimate vehicle travel distance and speed. Moreover, Dailey & Li [8] proposed an algorithm that
Experimental results show that the proposed method is based on un-calibrated cameras and a known vehicle
exhibits good and consistent performance. When length distribution. Obviously, large estimation error
compared with speed measurements obtained from may result when the distribution is not representative.
speed radar, averaged estimation errors are 3.27% Cathey & Dailey [9], [10] used a one-parameter
and 8.51% for day-time and night-time test examples camera calibration model to ‘straighten’ images and to
respectively, which are better than other previously compute an image to roadway scale factor, from which
published results. The proposed algorithm can be a cross-correlation algorithm is applied to the
easily extended to work on image sequences. straightened image to measure travel distance and
estimate average traffic speed. Schoepflin & Dailey
[11], [12] proposed a calibration method based on a
1. Introduction known distance perpendicular to the road and a cross
correlation technique to estimate the mean speed after
Vehicle speed is a fundamental traffic variable that tracking a group of vehicles on a lane. Pai et al [13]
is essential to both macroscopic and microscopic proposed an integrated method for mean traffic speed
traffic analysis [1]. The current state-of-the-art in estimation, in which block matching algorithm is used
speed measurement technologies include the use of to estimate the moving distance of a vehicle between
magnetic inductive loop detectors, magnetic strips, two consecutive frames. They proposed a fast
Doppler radar, infrared sensors, laser sensors, and etc searching algorithm called adaptive windowing
[1], [2], [3]. Inductive loop detectors and strips are prediction, based on determined vehicle speed in
considered as intrusive sensors that need to be previous frames, for improving real-time system
installed at a fixed location on the road, whereas computational efficiencies.
infrared, radar and laser sensors are non-intrusive In summary, our survey reveals that most
sensors that do not need to dig up the road. As many vision-based speed estimation methods estimate
of these speed measurement technologies also use average traffic speed over a period of time with error
digital image sequences to record the event, estimation rate of over 10% compare with the reference value.
Such error rate is considered large for any practical

IEEE Workshop on Applications of Computer Vision (WACV'07)


0-7695-2794-9/07 $20.00
Authorized © 2007
licensed use limited to: University of Texas at Arlington. Downloaded on October 09,2023 at 18:45:48 UTC from IEEE Xplore. Restrictions apply.
Figure 1. Diagram of the proposed algorithm

use. Another limitation is that these methods are coordinates using camera parameters calculated from
usually based on a sequence of video images: for lane-marking calibration [16]. As there is no height
updating background reference [14] or computing information immediately available from the two
forward and backward differences between the frames images, only the X-Y plane in 3D is considered for all
for motion tracking [15]. The errors due to day-night subsequent estimation. From the two reconstructed
transition or general weather changes could be large images, an image differencing is applied so that the
unless updating is frequent enough, which needs to two vehicles are overlaid on one resultant image.
trade-off with computational complexity. Actually, in Although this can effectively remove the background
most road safety systems, vehicle speed is monitored features as the time difference between the two images
by radar or other non-visual sensors, and then two is small, it also resulted in a large amount of noise and
high quality consecutive images are captured when holes along the boundaries of the vehicles. These are
speeding is detected. In these systems, although traffic removed by a carefully selected threshold and
video may be available which are of low resolution, an morphological operations to obtain a combined
automatic vehicle speed estimation method from the vehicle mask. After that, a block matching algorithm is
two high quality consecutive images is therefore a applied, which includes specifying the block and
logical choice. If speed estimation can be calculated matching it within a search window. As the 3D X-Y
from two images, then it would be trivial to do the plane is given in physical dimension (meters in this
same from an image sequence. For these reasons, we case), the distance between the two matched blocks
propose a new two-images-based vehicle speed denotes the distance traveled by the vehicle between
estimation algorithm in this paper, which can be the two consecutive images. Speed is then estimated
compared with the speed measured by a Doppler by taking this value and dividing it by the time taken
speed radar system operating at 34.3 GHz for between the two images.
referencing. Compare with video-based algorithms,
the proposed algorithm is fast, needs less storage, and 2.2. 2D to 3D transformation
is reasonably accurate.
This paper is organized as follows: detailed The purpose of camera calibration is to establish a
approach is presented in Section 2, experiment results relationship between the 3-D world coordinates and
are depicted in Section 3, and conclusion is drawn in the corresponding 2-D image coordinates. In this
Section 4. paper, we adopt the camera model proposed in [16]
and [17], which is depicted in Fig. 2. In essence, this
2. Proposed Algorithm model describes all the extrinsic parameters (camera
height h, pan angle p, tilt angle t and swing angle s)
2.1. General principle and an intrinsic parameter (focal length, f). The
definitions of these parameters are: Pan angle is the
The block diagram of the proposed algorithm is horizontal angle from the Y axis of the world
depicted in Fig. 1. To begin with, the two consecutive coordinate system to the projection of the optical axis
images taken through a perspective view are of the camera on the X-Y plane measured in the
transformed from their 2D image plane to 3D world anti-clockwise direction. Tilt angle is the vertical angle

IEEE Workshop on Applications of Computer Vision (WACV'07)


0-7695-2794-9/07 $20.00
Authorized © 2007
licensed use limited to: University of Texas at Arlington. Downloaded on October 09,2023 at 18:45:48 UTC from IEEE Xplore. Restrictions apply.
Z

f s

y x era
cam
t
.q

ne
la
(a) Original 1st image

ep
ag
im
p . Q h
Y

world plane

X
(b) Original 1st image (c)-(d) Reconstructed images of (a)&(b)
Figure 2. Perspective camera model on X-Y plane respectively.
Figure 3. Transform 2-D image to X-Y plane in 3-D.
of the optical axis of the camera with respect to the
X-Y plane of the world coordinate system. Swing in X-Y plane can be reconstructed, as depicted in Fig.
angle is the rotational angle of the camera along its 3(c) and 3(d). Since both Fig. 3(c) and 3(d) represent
optical axis. Focal length is the distance from the the view along the Z-axis perpendicular to the road
image plane to the center of the camera lens along the plane, the distance traversed by the vehicle from 3(c)
optical axis of the camera, while camera height is the to 3(d) represents the actual distance traveled.
perpendicular distance from the center of the camera However, it should be noted that this is only true for
lens to the X-Y plane. These parameters can be points on the road plane. Other points that are not on
estimated using existing road lane markings without the road plane do not represent the same as height
having to measure them physically [16]. Without information is not available. Therefore, the problem in
detailed derivation, the mapping between the image matching is not finding the most correlated blocks
plane and world plane is quoted below. between the two vehicle images as in [10] and [12],
Let Q=(XQ,YQ,ZQ) be an arbitrary point in the 3-D but to make sure that the feature points in the block
world coordinate and q=(xq,yq) be the corresponding are on the road surface as well. The most reliable
2-D image coordinates of Q. A forward mapping points in these images for the purpose are the points
function, which defines the transformation from a where the wheels touch the ground. Unfortunately,
point in the world coordinates to a point in the image these points are extremely difficult to be determined.
coordinates is given as, For the reason, our subsequent estimation is based on
⎡ X Q (cos p cos s + sin p sin t sin s ) ⎤ the shadow of the back bumper that casts on the
⎢ ⎥
f ⎢+ YQ (sin p cos s − cos p sin t sin s )⎥ ground. Although this feature is relatively easy to find,
⎢+ Z cos t sin s ⎥ and
xq = ⎣ Q ⎦ , this is also represents a source of error due to road
− X Q sin p cos t + YQ cos p cos t + Z Q sin t + h / sin t
color change, illumination change and shadow.
⎡ X Q (− cos p sin s + sin p sin t cos s ) ⎤
⎢ ⎥
f ⎢+ YQ (− sin p sin s − cos p sin t cos s )⎥ 2.3. Vehicle Mask Overlay
⎢+ Z cos t cos s ⎥
yq = ⎣ Q ⎦ .
− X Q sin p cos t + YQ cos p cos t + Z Q sin t + h / sin t
If a background reference image is available, we
If point Q lies on the X-Y plane, ZQ is zero, and XQ can easily extract the vehicle masks of each image and
and YQ can be calculated from (xq,yq) as: estimate the vehicle position directly [18], [19]. On the
⎡h sin p (x sin s + y cos s ) sin t ⎤
q q ⎡− h cos p (x sin s + y cos s ) sin t ⎤ q q premise that no background reference image is
⎢ ⎥ ⎢ ⎥
XQ = ⎣⎢+ h cos p (x cos s − y sin s ) ⎦⎥
q q ⎣⎢+ h sin p (x cos s − y sin s )
, YQ = ⎦⎥q q
.
available, we regard one of the two images as a
xq cos t sin s + yq cos t cos s + f sin t xq cos t sin s + yq cos t cos s + f sin t
background reference, then perform conventional
Equations (1-4) define the transformation between vehicle mask extraction on the other image and finally
the world and image coordinates. get an overlaid vehicle mask. In this paper, a simple
Given an image as depicted in Fig. 3(a), lane vehicle mask extraction method was adopted, which
markings that form a parallelogram [16] can be used includes three steps: temporal differencing,
to estimate the camera parameters, from which thresholding and morphological operations.
Equations (1-4) can be solved. In other words, every First, we overlay one image onto the other by
pixel of the two consecutive images can be taking an absolute difference between the two images.
transformed to the 3D coordinates (with Z=0) using This results in the image as depicted in Fig. 4(a). For
Equations (3) and (4), and the corresponding images color images, we obtain the absolute difference in each

IEEE Workshop on Applications of Computer Vision (WACV'07)


0-7695-2794-9/07 $20.00
Authorized © 2007
licensed use limited to: University of Texas at Arlington. Downloaded on October 09,2023 at 18:45:48 UTC from IEEE Xplore. Restrictions apply.
it is possible that individual vehicle masks merged
with each other, creating an exceptional case as
depicted in Fig. 5. Speed still can be estimated from
this exceptional case, except that the search range for
block matching has to be enlarged. The following
sub-section discusses how this is performed and how
search range is enlarged because of that.

2.4. Block Matching and Speed Estimation

In this Section, the block matching algorithm


(BMA) discussed in [20] is modified for our purpose.
(a) (b) (c)
First, a source block is defined in terms of a horizontal
Figure 4. Overlaid vehicle mask from image base line, a fixed height and a variable width
difference. (a) Absolute difference of Fig. 3(c)&(d); according to the size of the mask, as depicted in Fig. 6.
(b) Binary image after thresholding; (c) Extracted The blob shown in Fig. 6 is part of the lower blob
vehicle mask after morphological operations. from Fig. 4(c), in which a horizontal base line is
drawn across the lowest part of the blob. From the
base line, an upper boundary of k pixel above the line

(a) (b) (c) (d)


Figure 5. Exceptional case when individual vehicle
masks merged together (a) Reconstructed 1st Figure 6. Definition of source block
image; (b) Reconstructed 2nd image; (c) Binary
image after thresholding; (d) Extracted vehicle and a lower boundary also of k pixels below the line
mask after morphological operations. are defined. As the upper boundary line intersects the
vehicle mask, the left and right boundaries are defined
color channel first and then average them to a gray
by the width of the mask intersected by the upper
level image. The image depicted in Fig. 4(a) is then
boundary line. This defines a source block of 2k×w in
thresholded to produce a binary image as depicted in
size.
Fig. 4(b). The threshold, T, is a predefined value to
With these boundaries, we have a block defined in
accommodate small pixel intensity variation. After
the first reconstructed image as the source block. The
that, morphological opening is used to cleanup all the
search is performed in the second reconstructed image.
isolated points and morphological closing is used to
The mean absolute difference (MAD) measurement
join the disconnected vehicle components together,
between the source block and candidate blocks is
resulting in the extracted vehicle mask as shown in Fig.
employed as a criterion in our proposed matching
4(c).
algorithm. In the case of Fig. 4(c), we have a roughly
In general, the case as depicted in Fig. 4 represents
estimated displacement of the vehicle. Therefore,
a typical scenario. By computing the connected
search can be limited to a region S, search window,
components in the binary image, we have two blobs
which has a size of (2sx+2k)×(2sy+w) and centered at
which represent the masks of two individual vehicles.
estimated displacement. sx is usually set to a small
From the distance of the bottom edge of the two blobs,
value aiming to reduce error due to the rough
displacement of the vehicle in two consecutive frames
estimation of the bottom lines, whereas sy is
can be roughly estimated, which could be refined by
determined by the degree of transverse vehicle
block matching in Section 2.4. However, for vehicles
displacement to be tolerated. If lane change is allowed,
that are traveling at low speed, or have long length, or
then sy would be set to the lane width. In the case of
the two consecutive images are taken at a shorter time,

IEEE Workshop on Applications of Computer Vision (WACV'07)


0-7695-2794-9/07 $20.00
Authorized © 2007
licensed use limited to: University of Texas at Arlington. Downloaded on October 09,2023 at 18:45:48 UTC from IEEE Xplore. Restrictions apply.
150
Speed from radar
120

Speed (km/h)
Estimated speed
90
60
30
0
1 5 10 15 20 25 30
(a) Test examples in daytime
150
120 Speed from radar

Speed (km/h)
Estimated speed
90
60
30
0
1 5 10 15 20 25 30
(b) Test examples in nighttime
Figure 7. Travel distance estimation Figure 8. Estimated speed compared with radar
measured speed
the two blobs are merged as in Fig. 5(d), it is not
possible to calculate the estimated displacement, where R is the resolution of reconstructed images, t is
therefore, we have to search the best matching block the time interval between the two images, and v is the
in a larger search window of [2sx+2k +(dmax-dmin)]× estimated speed.
(2sy+w), where dmax and dmin are maximum and
minimum displacement respectively which are derived 3. Experimental results and analysis
from prior knowledge such as theoretic maximum and
minimum speed of the vehicle. Consider that In this experiment, we evaluate the accuracy of the
illumination condition across images may vary (due to proposed method in estimating individual vehicle
flash, shadow of building, etc.), and that lane markings speed from real traffic images at day-time as well as
may affect the matching, we normalize the blocks by night-time, by comparing the estimated value with the
enhancing their contrast and saturating the high speed measured by a Doppler speed radar system. All
intensities part of them, as shown in Fig. 7, before the test images are taken by a surveillance camera at
calculating MAD. 0.5 seconds apart, with the size of 1280×1024 and 24
Once a block is matched, traveled distance in pixels, bit color depth mode. We first calculate the
PD, of the vehicle between the two images is reconstructed images as described in Section 2.2 with
represented by the distance between the source block resolution of 10 pixels/meter, and then threshold the
and the matched block. Vehicle speed is the distance 256 gray level frame difference using a threshold of 30
traveled normalized to physical distance in meters, and to generate the vehicle mask. k, sx and sy in block
divided by the time taken to traverse that distance, matching period were set to 10 pixels, 20 pixels and
which is given by: 10 pixels respectively.
PD / R , For the 31 sets of day-time images, the results
v=
t estimated by the proposed method and reference speed

(a) (b) (c)


Figure 9. Typical test examples. (a) high speed daytime case, (b) low speed daytime case, (c) nighttime

IEEE Workshop on Applications of Computer Vision (WACV'07)


0-7695-2794-9/07 $20.00
Authorized © 2007
licensed use limited to: University of Texas at Arlington. Downloaded on October 09,2023 at 18:45:48 UTC from IEEE Xplore. Restrictions apply.
from radar are plotted in Fig. 8(a). Reference speeds Transportation Institute., Research Report
FHWA/TX-95/1392-8, 1994.
extend from 41 to 121 km/h. It can be seen that the [4] K. W. Dickinson and R. C. Waterfall, "Video Image Processing
estimation error in each case ranges from 0 to 6.72 for Monitoring Road Traffic," in Proceeding of IEE
km/h. The average error rate is 3.27%, with higher International Conference on Road Traffic Data Collection, pp.
estimation error rate occurring at low speed. Typical 5-7, 1984.
[5] R. Ashworth, D. G. Darkin, K. W. Dickinson, M. G. Hartley,
example for daytime cases is depicted in Fig. 7, where C. L. Wan, and R. C. Waterfall, "Application of Video Image
Fig. 9(a) depicts a high speed case and Fig. 9(b) Processing for Traffic Control Systems," in Proceeding of
depicts a low speed case. In general, the location of Second International Conference on Road Traffic Control, vol.
the baseline appears to be accurate and the matching 14-18 London, UK, 1985.
[6] S. Takaba, M. Sakauchi, T. Kaneko, Hwang Byong Won, and
of source block is correct. T. Sekine, "Measurement of traffic flow using real time
For the 33 sets of night-time images, the results are processing of moving pictures," in Proceeding of 32nd IEEE
plotted in Fig. 8(b). Again, reference speed ranges Vehicular Technology Conference, vol. 32, pp. 488-494 San
from 41 – 122, and it can be seen that the estimation Diego, California, USA, 1982.
[7] Z. Houkes, "Measurement of speed and time headway of motor
error is larger than the day-time cases, with absolute vehicle with video camera and computer," in Proceeding of 6th
error of each case varies from 0.36 to 14.44 km/h, IFAC/IFIP Conference on Digital Computer Applications to
giving an average error of 8.51% compared with speed Process Control, pp. 231-237, 1980.
from radar. It should be noted that for night-time [8] D. J. Dailey and L. Li, "An algorithm to estimate vehicle speed
using uncalibrated cameras," in Proceeding of IEEE/IEEJ/JSAI
images the bumper shadows are difficult to be International Conference on Intelligent Transportation
extracted correctly in most cases since their color or Systems, pp. 441-446, 1999.
intensity is very similar to the road. This is illustrated [9] F. W. Cathey and D. J. Dailey, "One-parameter camera
by a typical example for night-time case in Fig. 9(c). calibration for traffic management cameras," in Proceeding of
7th International IEEE Conference on Intelligent
In this example, the source block is found to be one Transportation Systems, pp. 865-869, 2004.
corner of the bumper, and the matching is correctly [10] F. W. Cathey and D. J. Dailey, "A novel technique to
performed on the 2nd image. However, as the bumper dynamically measure vehicle speed using uncalibrated roadway
is above ground, this introduces error due to height to cameras," in Proceeding of IEEE Intelligent Vehicles
Symposium, pp. 777-782, 2005.
the estimation as a result. As such, the deterioration of [11] T. N. Schoepflin and D. J. Dailey, "Dynamic camera
estimation performance for night-time cases is not calibration of roadside traffic management cameras for vehicle
entirely unexpected. Comparing our results with other speed estimation," IEEE Transactions on Intelligent
vision-based speed estimation methods [8-12], it is Transportation Systems, vol. 4, pp. 90-98, 2003.
[12] T. N. Schoepflin and D. J. Dailey, "Algorithms for calibrating
found that the proposed method achieves rather good roadside traffic cameras and estimating mean vehicle speed," in
daytime estimation when others have over 10% of Proceeding of IEEE Intelligent Vehicles Symposium, , pp.
error. As none others have attempted nighttime cases, 60-65, 2004.
no reasonable comparison can be made here. [13] T. W. Pai, W. J. Juang, and L. J. Wang, "An adaptive
windowing prediciton algorithm for vehicle speed estimation,"
in Proceeding of IEEE Conference on Intelligent
4. Conclusion and future direction Transportation Systems, pp. 901-906, 2001.
[14] A. H. S. Lai and N. H. C. Yung, "A fast and accurate
scoreboard algorithm for estimating stationary backgrounds in
In conclusion, a novel method for estimating an image sequence," in Proceeding of IEEE international
individual vehicle speed using two consecutive images Symposium on Circuits and Systems, vol. 4, pp. 241-244 vol.4,
from a traffic surveillance camera has been proposed. 1998.
Compared with speed from radar, the averaged [15] C. Vieren, F. Cabestaing, and J. G. Postaire, "Catching
Moving-Objects with Snakes for Motion Tracking," Pattern
estimation errors for day-time cases is 3.27%, while Recognition Letters, vol. 16, pp. 679-685, 1995.
for night-time cases is 8.51%. Future work will be [16] X. C. He and N. H. C. Yung, "A new method for solving
focused on the block matching accuracy and the ill-condition in vanishing point based camera calibration," to be
robustness of the algorithm against large changes in published in Optical Engineering, , 2006.
[17] G. S. K. Fung, N. H. C. Yung, and G. K. H. Pang, "Camera
ambient illumination conditions. calibration from road lane markings," Optical Engineering, vol.
42, pp. 2967-2977, 2003.
5. References [18] A. H. S. Lai, G. S. K. Fung, and N. H. C. Yung, "Vehicle type
classification from visual-based dimension estimation," in
[1] C. Sun and S. G. Ritchie, "Individual vehicle speed estimation Proceeding of IEEE Conference on Intelligent Transportation
using single loop inductive waveforms," Journal of Systems, pp. 201-206, 2001.
Transportation Engineering-Asce, vol. 125, pp. 531-538, 1999. [19] W. W. L. Lam, C. C. C. Pang, and N. H. C. Yung, "Highly
[2] Y. H. Wang and N. L. Nihan, "Freeway traffic speed estimation accurate texture-based vehicle segmentation method," Optical
with single-loop outputs," Advanced Traffic Management Engineering, vol. 43, pp. 591-603, 2004.
Systems and Automated Highway Systems 2000, pp. 120-126, [20] A. Murat Tekalp, "Block-Based Methods," in Digital video
2000. processing, Chapter 6, Prentice-Hall signal processing series.:
[3] D. L. Woods, B. P. Cronin, and R. A. Hamm, "Speed Prentice Hall PTR, 1995.
Measurement with Inductance Loop Speed Traps.," Texas

IEEE Workshop on Applications of Computer Vision (WACV'07)


0-7695-2794-9/07 $20.00
Authorized © 2007
licensed use limited to: University of Texas at Arlington. Downloaded on October 09,2023 at 18:45:48 UTC from IEEE Xplore. Restrictions apply.

You might also like