Professional Documents
Culture Documents
Accurate Background Modeling For Moving
Accurate Background Modeling For Moving
Abstract— Fast and accurate foreground detection in video “ideal background model” which is not easy to obtain and,
sequences is the first step in many computer vision at the same times, easily influenced by the environment.
applications. In this paper, we propose a new method for
Thus, the well-known methods in this category [3,4]
background modeling that operates in color and gray spaces
and that manages the entropy information to obtain the pixel
requires a training period (off-line) absent of foreground
state card. Our method is recursive and does not require a objects to build a background model. However, these
training period to handle various problems when classify pixels techniques are always impractical in real life. In addition,
into either foreground or background. First, it starts by the relocation of background components after the training
analyzing the pixel state card to build a dynamic matrix. This period and the abrupt illumination changes are slowly
latter is used to selectively update background model. integrated in the model. Therefore, an on-line background
Secondly, our method eliminates noise and holes from the modeling [5,6] using a dynamic matrix of significant
moving areas, removes uninteresting moving regions and changes shows effective background estimation compared
refines the shape of foregrounds. A comparative study through to other methods . The performance and success of this
quantitative and qualitative evaluations shows that our method technique depend on process to extract and update dynamic
can detect foreground efficiently and accurately in videos even matrix.
in the presence of various problems including sudden and In this paper, we propose a new on-line method for
gradual illumination changes, shaking camera, background background modeling. In our work, background is
component changes, ghost, and foreground speed. represented by a simple recursive model. The selective
update stage of this model is based on a dynamic matrix and
Keywords-component; moving object detection; is constructed from two levels: frame level and pixel level.
background modeling; dynamic matrix; For each input frame, to make a decision in the dynamic
matrix update, we can use pixel state cards obtained by IFD
I. INTRODUCTION based on a Dynamic Spatio-Temporal Entropy Image
Moving object detection in complex scenes is an active (DSTEI)[2]. Foregrounds are obtained by subtracting each
research topic in computer vision. The related research area new frame from this model based on a dynamic threshold.
includes video compression and indexing, robotics, remote Some post-treatments are used to finalize the detection by
monitoring, etc. The research diversity is justified by the eliminating noise and holes from the moving areas,
complexity of the problem; this is due to various challenges removing uninteresting moving region and refining
still incompletely resolved like sudden and gradual foregrounds shapes.
illumination changes, shaking camera, background The remainder of this paper is organized as follows:
component changes, ghost, and foreground speed. Section 2 describes the over proposed method process. The
The core problem tackled by the various methods for efficiency and accuracy of the proposed method are
moving object detection in the literature [1-10] is illustrated by an exhaustive experimental evaluation and
identifying pixels belonging to moving objects (the comparison results in section 3. Finally, the proposed
foreground). approach is synthesized and future works directions are
We organize contributions reported in the literature in presented in section 4.
four classes with a categorization based on inter-frame
II. PROPOSED METHOD
processing: methods based on inter-frame difference (IFD),
those based on background modeling (BM), methods based In our work, we adopt a background modeling (BM)
on the optical flow (OF), and hybrid methods. approach to extract the foreground from frames. Note that
Inter-frame difference [1,2] is the simplest approach. the model can be non recursive or recursive. The non
However, it may not be able to detect whole relevant pixels recursive model is presented by a buffer of frames and a
of same types of foreground, since it uses only previous sliding window [4][7]. The recursive model corresponds to
frame. Optical flow-based approach [10] is quite efficient: it one model or multiple models for each input frame
detect foregrounds without a priori knowledge about the [3][6][8]. Compared to the non recursive model, the
background. But a real-time detection is difficult to achieve recursive model requires less storage and adapts quickly to
without specialized hardware. sudden and progressive background changes. However, in a
Background modeling-based methods [3-9] are the most non recursive model, these changes are considered as
popular approach. However, their efficiency depends on the foreground objects and persist during a long time.
Note that the loop iterations converge rapidly after three t −1,t − 2 t,t −1
(1) Compute absolute inter-frame differences ΔGRAY and ΔGRAY between
or four iterations to determine the value of th.
t t −1 t −2
the three frames (ψ GRAY ,ψ GRAY
and ψ GRAY ).
B. Model initialization
As illustrated in Fig. 1, the first step of our approach is
( )
(2) Compute spatio-temporal pixels entropy χt of frame ψ t
t
model initialization ( ςiRGB ) by the absolute successive (3) Compute automatic threshold χ TH by function τ(I)
53
⎧Ε
⎪ xy = 0: Μxy =1.
⎨ t
pixels from those caused by noise such as background t t
(6)
⎪⎩Εxy =1: Μxy = 0
object edges. For this, we calculate the spatio-temporal t
entropy based on a labeling technique of neighboring pixels
in a spatio-temporal sliding window. Pixel labeling
t
technique of a frame ψ is based on two temporal
⎧ΔGRAY
⎪
⎨ t,t −1
t −1,t − 2 t −1 t −2
t −1,t − 2
differencing ( Δ GRAY
t,t −1
and ΔGRAY ) between frames = ψGRAY , ψGRAY
. (7)
⎪⎩ΔGRAY = ψGRAY , ψGRAY
t t −1 t−2 t t −1
ψ GRAY , ψ GRAY and ψ GRAY according to:
⎧ΔGRAY
⎪
3) Update background model: The update background
⎨ t,t −1
t −1,t − 2 t −1 t −2
= ψGRAY , ψGRAY
. (2)
⎪⎩ΔGRAY = ψGRAY , ψGRAY
model character includes two levels as in (8): the frame
t t −1
level and the pixels level. We start by the fame level to
address fast background update when there are no moving
We build for each pixel a spatio-temporal sliding objects in the field of view and it makes the method robust
window (S) defined by: even in abrupt shaking camera or illumination changing [5].
S = {(i, j)k | i − x < ⎢⎣ w / 2⎥⎦ , j − y < ⎢⎣ w / 2⎥⎦ ,0 ≤ t − 1 < L}
If the model update condition in frame level is not satisfied,
then the pixels level is used to handle adaptation processing
with (w, w) and L the control size of S respectively in by a linear model. By analyses of Μ t at the both levels, our
height, width and depth, here w=L=3, w2*L neighbors of method is accurate and fast to scene-changing problem.
⎧ ψ t If ∑ Μ t / (m * n ) ≥ 0 , 9
t −1,t − 2
Π xy are indicated by Πijt −1,t − 2 in Δ GRAY and Πijt,t −1 in Δ GRAY t,t −1
.
⎪
State labeling technique (Fig. 2) is defined to determine the
⎪⎪ ⎧ F o r e a c h p i x e l ς tx y . (8)
= ⎨ ⎪⎪
t,t −1
labels of S from Δ GRAY t −1,t − 2
and ΔGRAY . A state of label (e)
⎪ e ls e ⎨ I f Μ xy = 0 t h e n
ς tR G B
⎪ ⎪ t
t
belongs to {0,1,2}. We initialize window L1 (see Fig. 2) by
⎩⎪ ⎩⎪ ς x y = 0 .7 * ς x y + 0 . 3 * ψ
t t
e={2}, labels of L2 and L3 are computed by comparing xy
t −1,t − 2
Πijt −1,t − 2 and Π ijt,t −1 with the relative thresholds Δ TH and
t,t −1
ΔTH obtained by τ(I) . Depending on the comparison results, D. Foreground segmentation
we attribute labels {0,1,2} to L2 and L3 as detailed in Fig. 2. In order to obtain accurate foreground segmentation
For each Пxy, the probability density function Px.y.e of a results, we start by building a foreground mask. Later we
use post-processing to improve the detection results by
label (e) is defined by: grouping moving pixels in regions, eliminating noise and
holes from them, remove uninteresting moving region and
Hx.y.e
P
x.y.e
= . (3) refine the shape of foregrounds.
N
1) Building foreground mask: We subtract the input
frame ( ψ t ) from the background model ( ς RGB t
) in RGB
with N the number of labels in S and H
x.y.e is the
channels as in (9). Let ξGRAY denote the subtraction result,
x,y,t
number of label (e) in S. Spatio-temporal entropy χtxy of after converting ξ tRGB in gray level, the subscript x, y and t
t
Πtxy is obtained by: represent the spatio-temporal pixel position and ξTH a
dynamic threshold calculated by τ(I) . Equation 10 shows
∑ Px . y .e log ( Px . y .e ).
2 the expression to compute foreground mask ( ξ 01t
).
χ txy = − (4)
e=0
t t t
t t
Each χxy is compared with the threshold χTH to build ξ RGB = ψ RGB − ς RGB . (9)
⎨ x,y,t
(5) x,y,t t x,y,t
⎪⎩ χ xy > χ TH : Ε xy = 1
.
t x,y,t
. (10)
54
Figure 2. State labeling technique
isolated pixels and small regions can be removed after recursive BM with KDE [4] and (4) hybrid method of [9].
grouping connects components. Holes inside the connected Each method is considered as reference in its category.
components filled in by morphological operations. Second, we have selected various popular sequences
Moreover, in order to increase the precision of foreground with typical conditions like (a) sudden and (b) gradual
illumination changes, (c) shaking camera, (d) background
objects, we apply a refinement step by Algorithm 2.
component changes, (e) ghost, and (f) foreground speed.
The automatic threshold computation function (section
Third, methods are tested and evaluated on this benchmark
A) is also applied to the input frame. As we have noticed,
suite of video sequences. The following describes validation
the goal of this function is to classify the matrix values (here
techniques and conditions and presents the obtained results.
the input frame in the gray level) into two defined classes
Typical frames are used for evaluation.
(C1 and C2) according to their values. Let C1 denote
foreground pixels and C2 background pixels. The A. Validation and evaluation conditions
foreground pixels detection is well improved during the Quantitative scores (QS) are used for the qualitative
loop threshold computation. When the threshold is analysis of our results. Two metrics are used: Precision and
established, we compute the binary mask ( ω01 t
) of C1. Recall. The best technique should have the highest rates of
Precision and Recall. In addition, we measure the number of
Algorithm 2: Refinement technique frames (N) necessary so that the large background changes
do not appeared as foreground.
For each region R iξ
x ,y,t
For each pixel ξ01
x,y,t x,y,t
If ω01 = 1 and ξ 01x , y, t = 1 then e=j( ω01 )
x,y,t
Else if ω01 = 1 and ξ 01x , y ,t = 0 then x,y,t
if j( ω01 )=e then
x , y,t
ξ 01 =1
55
This is done by the frame level update of background
model.
In Fig. 3, the results of the other BM methods ((2), (3))
and the hybrid method present little misclassifications. This
is confirmed by their rate of recall in TABLE II.
The adaptation to the relocation of a background
Figure 5. Detection results for challenge (e) component is achieved successfully for the majority of
evaluated methods. Method (3) needs additional frames to
In order to illustrate the advantages of our approach, we complete its adaptation (TABLE III). Fig. 5 shows that the
present the results in four figures: in Fig. 3, we show some ghost does not appear in our result but becomes visible in
detection results issued from the four methods in the BM ((2) and (3)) and the hybrid (4) methods.
presence of the challenges ((a)-(d)). TABLE IV present results for scenes containing
different (f) foreground speed.
Precision and Recall rates show that our method detects
foreground pixels effectively in presence of different
foreground speed. In fact, our method presents the best
precision rate for this challenge.
In TABLE V, we present the number of frames (N)
Figure 6. Detection results for challenge (f): scene 1 in line 1 and scene 2
in line 2 necessary so that the large background changes do not
appeared as foreground. The results show that our method
Fig. 4 presents detection results at the moment of sudden needs a few numbers of frames to adapt to background
illumination changes. Fig. 5 and Fig. 6 illustrate detection changes compared to the other methods. Furthermore, our
results for respectively ghost and foreground speed. Next, method is not influenced by gradual illumination changes
we interpret these results. and shaking camera.
56
IV. CONCLUSION pattern Analysis And Machine Intelligence, VOL. 24, NO. 7,
July 2002.
In this paper, a novel and accurate method for BM is
presented. Our method addresses various difficult situations
such as sudden and gradual illumination changes, shaking
camera, background component changes, ghost, and
foreground speed. To do so, our on-line background
modeling step operates at the pixel level and frame level. In
addition, our method is based on spatio-temporal analyses to
update a dynamic matrix. The automatic threshold
computation function makes the decision adaptively to the
event in the scene and improves the quality of the results.
Post-processing completes the work by filtering out
uninteresting moving pixels and refine foreground shape.
Our method was evaluated with various video files
against different conditions. According to the presented
results, we can conclude that we can develop a robust
computer vision application based on our method. In fact,
our method combines rapidity of adaptation in background
changes and a high precision rate in most experiments.
As future works, we will look to the reactivity of our
method in presence of other challenges like shadow cast by
foreground, camouflage and foreground occluded by fixed
or moving objects.
References
[1] R.Lillestrand, “Techniques for change detection,” IEEE
Trans. On Computers, 21(7), pp. 654–659, 1972.
[2] Me. Chang and Y. Cheng, “Motion Detection by Using
Entropy Image and Adaptive State-Labeling Technique,”
IEEE Inter. Symposium on Circuits and Systems, New
Orleans, pp. 3667–3670,2007.
[3] Y.Grimson, C. Stauffer , R. Romano and L. Lee, “Using
adaptive tracking to classify and monitor activities in a site,”
Proc. Conf. Comp. Vision Pattern Rec, 1998.
[4] A.Elgammal, R.Duraiswami, D.Harwood and LS.Davis,
“Background and foreground modeling using nonparametric
kernel density estimation for visual surveillance,” Proc. IEEE;
90(7), pp. 1151–1163, Jul. 2002.
[5] T. Yang, Z. Li, Q. Pan and J. Li, “Real-Time and Accurate
Segmentation of Moving Objects in Dynamic Scene,”MM’04,
New York, USA, pp. 10–16,2004.
[6] T.Soumya, “A Moving Object Segmentation Method for Low
Illumination Night Videos,” Proc. of WCECS, October 22–
24, San Francisco, USA, 2008.
[7] Cohen and G. Medioni, “Detecting and tracking moving
objects for video surveillance,”Proc. of CVPR, pp. 319–325,
1999.
[8] Collins, Lipton, Kanade, Fujiyoshi, Duggins, Tsin, Tolliver,
Enomoto, and Hasegawa, “A System for Video Surveillance
and Monitoring,”VSAM, Technical report CMURI, 00–12,
Robotics Institute, Carnegie Mellon University, May, 2000.
[9] D. Zhou and H. Zhang, “Modified GMM background
modeling and optical flow for detection of moving
objects,”Inter. Conf. on Systems, Man, and Cybernetics,
2005.
[10] D. Bruce, K. Lucas and K. Takeo, “An iterative image
registration technique with an application to stereo vision,”
Proc. of the Inter. Joint Conference on Artificial Intelligence,
1981, pp. 674–679.
[11] T.Kanungo, D. M. Mount, S. Netanyahu, D. Piatko, R.
Silverman, and Y. Angela, “An Efficient k-Means Clustering
Algorithm: Analysis and Implementation,” IEEE Trans. on
57