A High Efficient System For Traffic Mean Speed Estimation From MPEG Video (10.1109@AICI.2009.358)

2009 International Conference on Artificial Intelligence and Computational Intelligence

A High Efcient System For Trafc Mean Speed Estimation from MPEG Video
Fu Yuan Hu
Department of Electronics & Informatics
Suzhou University of Science and Technology
Suzhou City, Jiangsu Prov.,P.R.China
Email: fuyuanhu@gmail.com
Xing Fa Dong
Department of Electronics & Informatics
Suzhou University of Science and Technology
Suzhou City, Jiangsu Prov.,P.R.China
Email: dongxfa@mail.usts.edu.cn

Jian Wang
Department of Electronics & Informatics
Suzhou University of Science and Technology
Suzhou City, Jiangsu Prov.,P.R.China
Email: wanjiansuzhou@sina.com

acquisition, as well as low cost [11].

Most existing vision systems for monitoring road trafc rely on (i) counting based-virtual loop or (ii) vehicle
tracking-based approaches. A virtual loop is composed
of detection lines or a bounding area, manually dened.
Through emulating the functionality of the induction loop
detector, this type of systems consider changes in the image
proles (corresponding to vehicles crossing the lines or the
bounding area) for counting the number of vehicles and estimating their average speed [14]. These methods are simple
and fast, but not exible as they require manual settings by
an operator. Tracking-based approaches, Generally, consider
three steps: vehicle detection [17], vehicle tracking and
trafc parameters calculation [18], [4], [2], [9], [10], [11].
In these approaches, the images of individual cars need to
be separated, which makes their applicability not feasible
in real situations with changes in lighting conditions, trafc
congestion and vehicle occlusion [7].
In order to alleviate for the above mentioned limitations
of vision based systems, recently, trafc analysis approaches
based on estimated motion vectors have been proposed.
These methods do not require close view for analyzing
trafc scenes, moreover they are characterized by a reduced
computation. Among such methods, Yu et al., [15], [13], [16]
proposed several versions of an algorithm to quickly extract
average vehicle speed from MPEG compressed video, within
a time window. They assume that the vehicles motion is
homogenous in the temporal domain and project the motion
vectors from the image plane into the ground plane using
an afne camera model. The camera calibration being made
manually. The proposed approach has a very low computational cost since computation for fully decoding MPEG
stream and computing trajectory or detecting vehicles is
saved. The interested reader is referred to [1], [8], [13]
for the aspects of MPEG parsing for the extraction of
motion vectors and DCT coefcients and their use for

AbstractIn this paper, we present a vision-based trafc

measurement system, allowing automatic trafc ow segmentation, camera calibration and trafc information estimation.
The system quickly estimates mean vehicle speed directly from
MPEG Motion Vectors. Although extensive work has been
done in extracting and using motion information from MPEG
video data in compressed domain, to our best knowledge, only
few works have been dedicated to the use of MPEG motion
vector for trafc analysis. The proposed system is stable and
handles camera vibrations and illumination changes. The paper
describes the main principles of our system together with
qualitative and quantitative representative results.
Keywords-intelligent transportation systems; motion vectors;
trafc measurement system; vehicle speed;

Nowadays trafc analysis is one of the challenging societal and economical problems related to transportation in
industrialized countries. The last few years have seen a
growing interest in intelligent transportation systems (ITS).
Within ITS system trafc surveillance/measurement systems
(TMS) are gaining interest within the research community
as well industrial and governmental institutions. Trafc surveillance systems must quickly provide (to trafc managers
and drivers) information, such as vehicle speed and density.
Apart from real-time operations, trafc data is also used as
an important source of information for long-term planning
and trafc management activities.
Vehicle speed is a fundamental trafc information that
is essential to both macroscopic and microscopic trafc
analysis [6]. The current state-of-art in speed measurement
technologies include the use of magnetic inductive loop
detectors, magnetic strips, laser sensors, etc..., Recently
vision-based approaches have been also used. Compared
with other non-intrusive TMS, vision-based trafc measurement systems (VTMS) has a lot of advantages, including
portability, easy installation and operation, rich information
978-0-7695-3816-7/09 $26.00 2009 IEEE
DOI 10.1109/AICI.2009.358

Hichem Sahli
Department of Electronics & Informatics
Vrije Universiteit Brussel (VUB)
Brussel, Belgium
Email: hsahli@etro.vub.ac.be


motion vector smoothing. Using motion vectors, other authors proposed methods for trafc analysis methods using
stochastic processes models [3], [7], neuronal networks [?],
and deterministic approaches [12]. MPEG parsing and trafc
status classication are being outside the scope of this paper.
For the estimation of the average vehicles speed we
propose using the 3-D vector eld dened on the road
surface, describing the motion of each 3-D point between
two time steps. It can be seen as an extension of the motion
vector or optical ow to 3-D, being the projection of the
3-D scene ow onto the images, resulting in a 2-D vector
The remainder of the paper is organized as follows. In
Section II, we describe the proposed approach. Section III
presents the camera calibration strategy, Section IV denes
the trafc mean estimation, section V illustrates some experimental results, and nally section VI gives some conclusions
and future work.

Figure 1.

The overall diagram of the proposed algorithm.

1) Camera Model: Here we consider a camera system

placed at an unknown hight, and orientation, with respect to
the X Y ground plane (supposed at). Any point (X,Y, 0)
on the ground plane is projected on the pixel (x, y) on the
image plane following [10], [5], [16]:


A. General Principle
Fig. 1 depicts the the proposed system. It consist of two
steps processing:
an off-line processing step dealing with the automatic
estimation of the camera calibration parameters using the standardized road markers dimensions. This
step consists in two modules, rst the road area is
determined by segmenting the motion vectors, having
determined the road area, we estimate the background
road image, from which the road markers are detected,
and calibration points, corresponding to the corners
of the road markers are used to estimate the camera
calibration parameters.
an on-line module, which considers as input the
smoothed motion vectors, being the 2D image velocity,
and estimates, on the road area determined during the
off-line step, the vehicles 3D velocity using the camera
calibration parameters.

y =

L1 X + L2Y + L3
L7 X + L8Y + 1
L4 X + L5Y + L6
L7 Xw + L8Yw + 1


The parameters Li |i = 1 8 are the camera parameters for

the projective model. They are estimated using at least four
pairs of calibration points that determine the correspondence
between the ground plane and the image plan. In Section III,
we propose an automatic method for obtaining these calibration points by making use of the white markers along
the road/lanes and the standardized road width, and markers
lengths and width, as illustrated in Fig. 2.

B. 2D to 3D Transformation
Before discussing the trafc mean speed estimation, we
need to dene the notation for the camera projection parameters, 3D scene velocity and 2D image velocity. MPEG
motion vectors, or optical ow, typically estimates 2D pixel
motion, but when combined with known depth information
can yield estimates of 3D pixel velocity. The 2D ow or motion vector at pixel (block) (x, y) in an image can be written
 M(x, y) = (u, v), and the motion magnitude MV (x, y) =
(u2 + v2 ). The corresponding 3D velocity of the pixel
(block) in 3D space is denoted as E(X,Y, Z) = (U,V,W ). In
our the following we dene both the relationship between a
point (X,Y, Z) in the 3D space and its projection (x, y), and
the relationship between M and E.

Figure 2.


Required Road Markings Parameters

2) 3D Velocity Estimation: Ignoring the vehicles hight,

we can denote the velocity along the X and Y axes as:
X2 X1

Y2 Y1
V =

with (X1 ,Y1 ) and (X2 ,Y2 ) are two positions of a vehicle at
two successive times. Using Eq( 1) and Eq( 2), and some
manipulation we get the 3D velocity as function of the 2D
ow (u, v):

(L 4 x + L 5 y + L 6 )2
(L 4 x + L 5 y + L 6 )2


U = (L 1 u + L 2 v)(
L 4 x + L 5 y + L 6 ) (L 4 u + L 5 v)(
L 1 x + L 2 y + L 3 )
V = (L 7 u + L 8 v)(
L 4 x + L 5 y + L 6 ) (L 4 u + L 5 v)(
L 7 x + L 8 y + L 9 )

where (u,
= FrameRate(u, v) and the parameters L j , j =
1 9 are function of the projection parameters Li , i = 1 8
in Eq( 1) and Eq( 2).

Figure 3. Motion Vectors Filtering, First row: frame 12 and frame 299;
Second row: MPEG motion vectors; Third row: smoothed motion vectors


changes as well as camera vibration. In order to alleviate for

this problem we use the Temporal difference history (several
images) to accommodate for these changers.
Fig. 4 illustrates the result of the above described step,
namely, motion-based road region segmentation, background
image estimation, and the obtained road image, without
moving objects.

In this section we give an overview of the different steps

for estimating the camera parameters Li , i = 1 , 8.
A. Road Region Segmentation From Motion Vectors
The motion estimation in MPEG is based on blockmatching criteria, which is known to be sensitive to noise.
To derive reliable results, it is necessary to lter the raw
motion vectors before further processing. In our approach,
we did use a spatio-temporal median lter to smooth the
motion vectors. Fig. 3 illustrates the results of the motion
vector ltering.
The smoothed motion vectors are used as inputs to the
road region segmentation module. In this step we try to
detect road pixels, characterized by the fact that at these
locations the motion vectors behave as a continuous step
function. Indeed, a road pixel (block) will have a motion
state (moving/static) changing dynamically as vehicles are
moving on the road. Thus, we set a matrix DM(x, y) which
counts the number of state changes of a given pixel (x, y),
during a certain time (number of frames). Pixels with a
certain number of changes (parameter set empirically between 6 to 14) will be considered as road pixels, denoted as
Road(x, y).
Having detected the road area, we estimate the background image (corresponding to the scene without moving
vehicle). Temporal difference between successive images is
the simplest way to sperate background pixels from moving
ones. However, this method do not handle illumination

(a) Image Frame

(b) Extracted Road Region

(c) Background Image

(d) Road Image

Figure 4.

Road Extraction

B. Road Marking Extraction

Having extracted the road image, the next step is the
extraction of the road markers, and the detection of some


corner points as illustrated in Fig. 2. The different steps of

this processing are:
First, a canny edge detection is applied to the road
then a progressive probabilistic hough transform is
applied, and we keep only the longest detected lines,
we select all the parallel lines, being the borders of the
road markings,
nally, we t a triangle to the line segments sustained
by the detection parallel linear structures.
Fig. 5 illustrates the different steps for the detection of
the road markers on the image of of Fig. 4.(d).
Finally we estimate the camera parameters by solving the
equations Eq.( 1) and Eq.( 1), using as calibration points the
detected marker corners and the associated world coordinate
points obtained using the standardized dimensions of the
markers and the road width as illustrated in Fig. 2.



Figure 6.


3D Speed Calculation Without Removing Noise.






Figure 5. Control Point Detection: (a) Lane Marking Segmentation, (b)

Example of detected elongated parallel lines (c) Detected markers

Figure 7. 3D Speed Calculation Based on Motion Vectors (a) the car is

entering the given region (b) the car is moving in given region (c) the car
is moving in given region (d) the car is going out the given region.


Having estimated the camera parameters, we can estimate
the 3D velocity as given by Eq.( 3) and Eq.( 4). The
average vehicles speed is then determined either over for the
segmented road region, or a given Area-of-interest (ROI) in
the image, as follows:

x,y U 2 (x, y) +V 2 (x, y)
where N is the number of moving pixels (blocks) of the road
region or the dened ROI.

different images is approximately the same, only for the last

case where the vehicle is going out from the ROI.
Then, we show the results of mean vehicle speed and the
radio of motion vectors for 1500 frame image sequences.
Fig. 8(a) show the ratio of motion pixels in given region
for motion vectors. Fig. 8(b) show the mean vehicle speed
in different frame for given region using motion vectors
and camera model parameters. From Fig. 8, we notice that
vehicles speed dont change signicantly in a short time
windows when there are vehicle.
Finally, we test the distribution of vehicle speed for motion vectors. Fig. 9 shows the distributions of the estimated
speed for 1500 frame image sequences. It is symmetric about
its mean, which shows the speed distributions satisfy the
normal distribution.

Here we test the results of vehicle speed estimation using
real video in Belgium. We estimate 3D speed using motion
vectors after we remove noise. Due to noise, it affect the
accuracy of mean vehicle speed, which has been shown in
Fig. 6. Even there is no vehicles, the speed is not equal
to zero in Fig. 6. Subsequently, we compute the 3D mean
speed in given region and the corresponding parameters after
First of all, we show the results in given region for motion
vectors. From the Fig. 7, we know that the average speed
is 28.4117m/s, 28.4277m/s,26.2403m/s and 15.7060m/s,
respectively. Despite that the distance between the car and
the camera is becoming shorter, the estimated speed for the

In this paper, we present a vision-based trafc measurement system, using MPEG Motion Vectors. The proposed
approach allows automatic trafc ow segmentation, camera
calibration and trafc information estimation such as average
vehicles speed. The approach is robust and cost-effective.
Further work would be the assessment of the estimation


