Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

2010 Digital Image Computing: Techniques and Applications

Accurate Background Modeling For Moving Object Detection in a Dynamic Scene


Salma Kammoun Jarraya, Mohamed Hammami and Hanene Ben-Abdallah
Sfax University
MIRACL
Sfax, Tunisia
Salma.kammoun@fsegs.rnu.tn, Mohamed.Hammami@fss.rnu.tn , Hanene.Benabdallah@fsegs.rnu.tn

Abstract— Fast and accurate foreground detection in video “ideal background model” which is not easy to obtain and,
sequences is the first step in many computer vision at the same times, easily influenced by the environment.
applications. In this paper, we propose a new method for
Thus, the well-known methods in this category [3,4]
background modeling that operates in color and gray spaces
and that manages the entropy information to obtain the pixel
requires a training period (off-line) absent of foreground
state card. Our method is recursive and does not require a objects to build a background model. However, these
training period to handle various problems when classify pixels techniques are always impractical in real life. In addition,
into either foreground or background. First, it starts by the relocation of background components after the training
analyzing the pixel state card to build a dynamic matrix. This period and the abrupt illumination changes are slowly
latter is used to selectively update background model. integrated in the model. Therefore, an on-line background
Secondly, our method eliminates noise and holes from the modeling [5,6] using a dynamic matrix of significant
moving areas, removes uninteresting moving regions and changes shows effective background estimation compared
refines the shape of foregrounds. A comparative study through to other methods . The performance and success of this
quantitative and qualitative evaluations shows that our method technique depend on process to extract and update dynamic
can detect foreground efficiently and accurately in videos even matrix.
in the presence of various problems including sudden and In this paper, we propose a new on-line method for
gradual illumination changes, shaking camera, background background modeling. In our work, background is
component changes, ghost, and foreground speed. represented by a simple recursive model. The selective
update stage of this model is based on a dynamic matrix and
Keywords-component; moving object detection; is constructed from two levels: frame level and pixel level.
background modeling; dynamic matrix; For each input frame, to make a decision in the dynamic
matrix update, we can use pixel state cards obtained by IFD
I. INTRODUCTION based on a Dynamic Spatio-Temporal Entropy Image
Moving object detection in complex scenes is an active (DSTEI)[2]. Foregrounds are obtained by subtracting each
research topic in computer vision. The related research area new frame from this model based on a dynamic threshold.
includes video compression and indexing, robotics, remote Some post-treatments are used to finalize the detection by
monitoring, etc. The research diversity is justified by the eliminating noise and holes from the moving areas,
complexity of the problem; this is due to various challenges removing uninteresting moving region and refining
still incompletely resolved like sudden and gradual foregrounds shapes.
illumination changes, shaking camera, background The remainder of this paper is organized as follows:
component changes, ghost, and foreground speed. Section 2 describes the over proposed method process. The
The core problem tackled by the various methods for efficiency and accuracy of the proposed method are
moving object detection in the literature [1-10] is illustrated by an exhaustive experimental evaluation and
identifying pixels belonging to moving objects (the comparison results in section 3. Finally, the proposed
foreground). approach is synthesized and future works directions are
We organize contributions reported in the literature in presented in section 4.
four classes with a categorization based on inter-frame
II. PROPOSED METHOD
processing: methods based on inter-frame difference (IFD),
those based on background modeling (BM), methods based In our work, we adopt a background modeling (BM)
on the optical flow (OF), and hybrid methods. approach to extract the foreground from frames. Note that
Inter-frame difference [1,2] is the simplest approach. the model can be non recursive or recursive. The non
However, it may not be able to detect whole relevant pixels recursive model is presented by a buffer of frames and a
of same types of foreground, since it uses only previous sliding window [4][7]. The recursive model corresponds to
frame. Optical flow-based approach [10] is quite efficient: it one model or multiple models for each input frame
detect foregrounds without a priori knowledge about the [3][6][8]. Compared to the non recursive model, the
background. But a real-time detection is difficult to achieve recursive model requires less storage and adapts quickly to
without specialized hardware. sudden and progressive background changes. However, in a
Background modeling-based methods [3-9] are the most non recursive model, these changes are considered as
popular approach. However, their efficiency depends on the foreground objects and persist during a long time.

978-0-7695-4271-3/10 $26.00 © 2010 IEEE 52


DOI 10.1109/DICTA.2010.18
The calculated model is constantly updated thus making
possible to adapt it to changes that can occur in the
background. Given a new frame, there are two kinds of
mechanisms to update the background model: 1) selective Figure 1. Main stages of our approach
update: which adds the pixel values to the model only if it is
classified as background pixels with significant changes; Thus, overtime, some background pixels are occluded by
and 2) blind update: which just adds the new pixels values foreground pixels in frames becomes discovered in other.
to the model under a combination model. Blind update may We detect these pixels by subtracting the first frame from
leads to a poor detection of the foreground pixels (more the second, and pixels value results are then removed from
false negatives) as it erroneously become part of the model. the third frame.
On the other hand, selective update enhances detection of
foreground pixels, since they are not added to the model. t ,t −1,t − 2 t −1 t−2
However, any incorrect detection decision will result in ςiRGB = Δ RGB t
= ψ RGB − ψ RGB − ψ RGB . (1)
persistent incorrect detection later. This effect is reduced by
an efficient decision method.
In our work, we represent the background by a recursive C. Model update
model and we update it by a selective technique. For each input frame, we proceed in three steps to detect
The flowchart of our approach (Fig. 1) comprises three pixels candidate to be updated in the background model. For
main stages: 1) Model initialization to build initial this, we start by building a pixel state card ( Ε t ) in order to
background model, 2) Model update to adapt model to the
changes which can occur in the background and 3) update a dynamic matrix ( Μt ) . We analyze Μt to select
Foreground segmentation to obtain moving objects. background pixels with significant changes, and the model
Before detailed proposed steps, we show that it is of background ( ς RGB
required to present our automatic threshold computation
t
) is then updated for the selected pixels.
function used in our approach to compute automatically In the following, we describe details of model update step.
thresholds. 1) Building pixel state card: This step distinguishes
background pixels (assigned 1) from foreground pixels
A. Automatic threshold computation function (assigned 0). To do this classification, IFD based on
Generally, a threshold is computed as either the average thresholding pixels intensity is used by many works
value of the maximum and the minimum or a mean value [5,6][8]. Despite its simplicity in extracting background
picked up from histograms. Our automatic threshold pixels, any pixels’ changing in the background can be easily
computation function ( τ(I) ) uses a matrix ( I ) to classify classified as foreground by temporal differencing. In
addition the decision is related to a pixel level, without
matrix values into two classes (C1 and C2) according to their taking into account its neighbors, which may give results
values. without any spatial coherence.
During the classification we define two variables last Since the performance of next step (Update dynamic
and current those represent, respectively, the value of the matrix) is influenced by this classification, we proceed by
threshold in the last and current iterations. The current value an IFD based on dynamic spatio-temporal entropy image
is initialized by the median value between minimum and [2]. The advantage of this technique is that the decision is
maximum values of matrix ( I ) , then, it is updated by taking made by analyzing the pixel’s neighbors in a spatio-
the median value between the medians of the two classes C1 temporal sliding window to obtain the entropy of each pixel.
and C2. The update iterates until the establishment of the We next describe this technique (Algorithm 1) which
threshold: current = last. computes the pixels state card ( Ε t ) .
In fact, this principle is inspired from the k-means
clustering algorithm [11] which attempts to find the centers
of natural clusters in the data. Algorithm 1 : Building pixels state card (Ε )t

Note that the loop iterations converge rapidly after three t −1,t − 2 t,t −1
(1) Compute absolute inter-frame differences ΔGRAY and ΔGRAY between
or four iterations to determine the value of th.
t t −1 t −2
the three frames (ψ GRAY ,ψ GRAY
and ψ GRAY ).
B. Model initialization
As illustrated in Fig. 1, the first step of our approach is
( )
(2) Compute spatio-temporal pixels entropy χt of frame ψ t
t
model initialization ( ςiRGB ) by the absolute successive (3) Compute automatic threshold χ TH by function τ(I)

differences of the first three input frames


t
(4) For each pixel Π xy of current frame ψt if χt Π xy < χ TH then ( )
(ψ , ψ , ψ ) in the RGB color space according to (1).
t t
t t −1 t −2 Ε xy = 1 else Ε = 0
xy
RGB RGB RGB

The basic idea of this initialization comes from the


assumption that the pixel value in the moving object position Classical spatio-temporal computation is based on gray
changes in successive frames. levels’ distribution in a window. However, it cannot
differentiate between large entropies resulting from moving

53
⎧Ε
⎪ xy = 0: Μxy =1.
⎨ t
pixels from those caused by noise such as background t t
(6)
⎪⎩Εxy =1: Μxy = 0
object edges. For this, we calculate the spatio-temporal t
entropy based on a labeling technique of neighboring pixels
in a spatio-temporal sliding window. Pixel labeling
t
technique of a frame ψ is based on two temporal
⎧ΔGRAY

⎨ t,t −1
t −1,t − 2 t −1 t −2
t −1,t − 2
differencing ( Δ GRAY
t,t −1
and ΔGRAY ) between frames = ψGRAY , ψGRAY
. (7)
⎪⎩ΔGRAY = ψGRAY , ψGRAY
t t −1 t−2 t t −1
ψ GRAY , ψ GRAY and ψ GRAY according to:

⎧ΔGRAY

3) Update background model: The update background
⎨ t,t −1
t −1,t − 2 t −1 t −2
= ψGRAY , ψGRAY
. (2)
⎪⎩ΔGRAY = ψGRAY , ψGRAY
model character includes two levels as in (8): the frame
t t −1
level and the pixels level. We start by the fame level to
address fast background update when there are no moving
We build for each pixel a spatio-temporal sliding objects in the field of view and it makes the method robust
window (S) defined by: even in abrupt shaking camera or illumination changing [5].
S = {(i, j)k | i − x < ⎢⎣ w / 2⎥⎦ , j − y < ⎢⎣ w / 2⎥⎦ ,0 ≤ t − 1 < L}
If the model update condition in frame level is not satisfied,
then the pixels level is used to handle adaptation processing
with (w, w) and L the control size of S respectively in by a linear model. By analyses of Μ t at the both levels, our
height, width and depth, here w=L=3, w2*L neighbors of method is accurate and fast to scene-changing problem.

⎧ ψ t If ∑ Μ t / (m * n ) ≥ 0 , 9
t −1,t − 2
Π xy are indicated by Πijt −1,t − 2 in Δ GRAY and Πijt,t −1 in Δ GRAY t,t −1
.

State labeling technique (Fig. 2) is defined to determine the
⎪⎪ ⎧ F o r e a c h p i x e l ς tx y . (8)
= ⎨ ⎪⎪
t,t −1
labels of S from Δ GRAY t −1,t − 2
and ΔGRAY . A state of label (e)
⎪ e ls e ⎨ I f Μ xy = 0 t h e n
ς tR G B

⎪ ⎪ t
t
belongs to {0,1,2}. We initialize window L1 (see Fig. 2) by
⎩⎪ ⎩⎪ ς x y = 0 .7 * ς x y + 0 . 3 * ψ
t t
e={2}, labels of L2 and L3 are computed by comparing xy
t −1,t − 2
Πijt −1,t − 2 and Π ijt,t −1 with the relative thresholds Δ TH and
t,t −1
ΔTH obtained by τ(I) . Depending on the comparison results, D. Foreground segmentation
we attribute labels {0,1,2} to L2 and L3 as detailed in Fig. 2. In order to obtain accurate foreground segmentation
For each Пxy, the probability density function Px.y.e of a results, we start by building a foreground mask. Later we
use post-processing to improve the detection results by
label (e) is defined by: grouping moving pixels in regions, eliminating noise and
holes from them, remove uninteresting moving region and
Hx.y.e
P
x.y.e
= . (3) refine the shape of foregrounds.
N
1) Building foreground mask: We subtract the input
frame ( ψ t ) from the background model ( ς RGB t
) in RGB
with N the number of labels in S and H
x.y.e is the
channels as in (9). Let ξGRAY denote the subtraction result,
x,y,t

number of label (e) in S. Spatio-temporal entropy χtxy of after converting ξ tRGB in gray level, the subscript x, y and t
t
Πtxy is obtained by: represent the spatio-temporal pixel position and ξTH a
dynamic threshold calculated by τ(I) . Equation 10 shows

∑ Px . y .e log ( Px . y .e ).
2 the expression to compute foreground mask ( ξ 01t
).
χ txy = − (4)
e=0

t t t
t t
Each χxy is compared with the threshold χTH to build ξ RGB = ψ RGB − ς RGB . (9)

pixel state card ( Ε t ) according to:

⎪⎧ξGRAY < ξTH : ξ01 = 0


⎧χ
⎪ xy < χ TH : Ε xy = 0
⎨ t
t t t

⎨ x,y,t
(5) x,y,t t x,y,t

⎪⎩ χ xy > χ TH : Ε xy = 1
.

⎪⎩ξGRAY > ξTH : ξ01 = 1


t t

t x,y,t
. (10)

2) Update dynamic matrix: For the first iteration, we


initialize the dynamic matrix by (6). For each input frame 2) Post-processing: The aim of the post-processing
(iteration>1), the update of Μ t is achieved by (7). steps is to improve the candidate foreground mask by
reducing the misclassifications. We use the standard binary
image processing operations to handle these problems:

54
Figure 2. State labeling technique

isolated pixels and small regions can be removed after recursive BM with KDE [4] and (4) hybrid method of [9].
grouping connects components. Holes inside the connected Each method is considered as reference in its category.
components filled in by morphological operations. Second, we have selected various popular sequences
Moreover, in order to increase the precision of foreground with typical conditions like (a) sudden and (b) gradual
illumination changes, (c) shaking camera, (d) background
objects, we apply a refinement step by Algorithm 2.
component changes, (e) ghost, and (f) foreground speed.
The automatic threshold computation function (section
Third, methods are tested and evaluated on this benchmark
A) is also applied to the input frame. As we have noticed,
suite of video sequences. The following describes validation
the goal of this function is to classify the matrix values (here
techniques and conditions and presents the obtained results.
the input frame in the gray level) into two defined classes
Typical frames are used for evaluation.
(C1 and C2) according to their values. Let C1 denote
foreground pixels and C2 background pixels. The A. Validation and evaluation conditions
foreground pixels detection is well improved during the Quantitative scores (QS) are used for the qualitative
loop threshold computation. When the threshold is analysis of our results. Two metrics are used: Precision and
established, we compute the binary mask ( ω01 t
) of C1. Recall. The best technique should have the highest rates of
Precision and Recall. In addition, we measure the number of
Algorithm 2: Refinement technique frames (N) necessary so that the large background changes
do not appeared as foreground.
For each region R iξ
x ,y,t
For each pixel ξ01
x,y,t x,y,t
If ω01 = 1 and ξ 01x , y, t = 1 then e=j( ω01 )
x,y,t
Else if ω01 = 1 and ξ 01x , y ,t = 0 then x,y,t
if j( ω01 )=e then
x , y,t
ξ 01 =1

The technique used to refine the result is to determine


t t
the relationship between ω01 and ξ01 . In fact, our intensive
t
experimental study shows that ω01 presents a high precision
in moving pixel detection. However, it has too many
misclassifications (false positives). Therefore, our goal is to
t t
use ω01 to increase the precision in the ξ01 mask without
including these false detections.
t Figure 3. Detection results for challenges ((a)-(d))
The mask ω01 is segmented in moving regions by the
t t
same technique. Identified moving region in ξ01 and ω01 are
i j
indicated respectively by R ξ and R ω , where (i) and (j) are
the labels of each region. Pixels of a region have the same
label.
In our refinement technique, a controlled “OR” logic is
t t
used between each region in ξ01 and its corresponding in ω01 .
t t
For each region in ξ01 , we recover its label (j) in ω01 , pixels
t t
with this label in ω01 are assigned 1 in ξ01 .

III. EXPERIMENTAL RESULTS


In order to evaluate the performance of our method, first,
we have implemented the software of four well-known
methods of moving object detection: (1) IFD based on
DSTEI [2], (2) recursive BM with GMM [3], (3) non
Figure 4. Detection results at the moment of (a)

55
This is done by the frame level update of background
model.
In Fig. 3, the results of the other BM methods ((2), (3))
and the hybrid method present little misclassifications. This
is confirmed by their rate of recall in TABLE II.
The adaptation to the relocation of a background
Figure 5. Detection results for challenge (e) component is achieved successfully for the majority of
evaluated methods. Method (3) needs additional frames to
In order to illustrate the advantages of our approach, we complete its adaptation (TABLE III). Fig. 5 shows that the
present the results in four figures: in Fig. 3, we show some ghost does not appear in our result but becomes visible in
detection results issued from the four methods in the BM ((2) and (3)) and the hybrid (4) methods.
presence of the challenges ((a)-(d)). TABLE IV present results for scenes containing
different (f) foreground speed.
Precision and Recall rates show that our method detects
foreground pixels effectively in presence of different
foreground speed. In fact, our method presents the best
precision rate for this challenge.
In TABLE V, we present the number of frames (N)
Figure 6. Detection results for challenge (f): scene 1 in line 1 and scene 2
in line 2 necessary so that the large background changes do not
appeared as foreground. The results show that our method
Fig. 4 presents detection results at the moment of sudden needs a few numbers of frames to adapt to background
illumination changes. Fig. 5 and Fig. 6 illustrate detection changes compared to the other methods. Furthermore, our
results for respectively ghost and foreground speed. Next, method is not influenced by gradual illumination changes
we interpret these results. and shaking camera.

B. Qualitative analysis TABLE II. QE FOR CHALLENGE (C)


TABLE I presents Precision and Recall rates obtained Methods (1) (2) (3) (4) (our)
by the methods for (a) sudden illumination changes and (b) Precision 1 1 1 1 1
gradual illumination changes after few frames of the event.
Recall 1 0,8961 0,7469 0,8287 1
Our method and method (3) have the best rate of
precision (TABLE I) for (a) sudden illumination changes, TABLE III. QE FOR CHALLENGE (D) AND (E)
but our method models the accurate background with few
misclassifications at the moment of this event (Fig. 4). Thus Methods (1) (2) (3) (4) (our)
the other methods ((1)-(4)) classified the changes Precision 1 1 1 1 1
background appearance as foreground (Fig. 4). The Recall 1 1 0,7800 1 1
Precision and Recall rates for (b) gradual illumination
changes show that in our method, update model achieved TABLE IV. QE FOR CHALLENGE (F)
successfully: background model incorporate progressive
Precision Recall
changes in real time.
Methods scene 1 scene 2 scene 1 scene 2
Quantitative scores (QS) for shaking camera and
Background component changes are presented respectively (1) 0.7882 0.4954 0.4827 0.6020
in TABLE II and TABLE III. (2) 0.7397 0.2499 0.4114 0.7252
Our method can classify easily the large area in
(3) 0.9078 0.5046 0.8073 0.6868
movement by shaking camera in background changes (Fig.
3 and TABLE II). (4) 0.5873 0.1395 0.9523 0.9855
(our) 0.9993 0.6190 0.8005 0.7466
TABLE I. QS FOR CHALLENGE (A) AND (B)
TABLE V. QUANTITATIVE EVALUATION RESULTS
Precision Recall
Methods (a) (b) (a) (b) Challenges
Methods
(1) 0.4795 1 0.8518 1 (a) (b) (c) (d)- (e)

(2) 0.6644 1 0.1226 0,7251 (1) N=7 N=1 N=1 N=1


(2) N=+13 N=+33 N=+33 N=35
(3) 0.8957 1 0.1565 0,8943
(3) N=+13 N=+50 N=+19 N=51
(4) 0.4356 1 0.2512 1
(4) N=13 N=8 N=4 N=14
(our) 0.7711 1 0.6482 1
(our) N=2 N=0 N=0 N=14

56
IV. CONCLUSION pattern Analysis And Machine Intelligence, VOL. 24, NO. 7,
July 2002.
In this paper, a novel and accurate method for BM is
presented. Our method addresses various difficult situations
such as sudden and gradual illumination changes, shaking
camera, background component changes, ghost, and
foreground speed. To do so, our on-line background
modeling step operates at the pixel level and frame level. In
addition, our method is based on spatio-temporal analyses to
update a dynamic matrix. The automatic threshold
computation function makes the decision adaptively to the
event in the scene and improves the quality of the results.
Post-processing completes the work by filtering out
uninteresting moving pixels and refine foreground shape.
Our method was evaluated with various video files
against different conditions. According to the presented
results, we can conclude that we can develop a robust
computer vision application based on our method. In fact,
our method combines rapidity of adaptation in background
changes and a high precision rate in most experiments.
As future works, we will look to the reactivity of our
method in presence of other challenges like shadow cast by
foreground, camouflage and foreground occluded by fixed
or moving objects.

References
[1] R.Lillestrand, “Techniques for change detection,” IEEE
Trans. On Computers, 21(7), pp. 654–659, 1972.
[2] Me. Chang and Y. Cheng, “Motion Detection by Using
Entropy Image and Adaptive State-Labeling Technique,”
IEEE Inter. Symposium on Circuits and Systems, New
Orleans, pp. 3667–3670,2007.
[3] Y.Grimson, C. Stauffer , R. Romano and L. Lee, “Using
adaptive tracking to classify and monitor activities in a site,”
Proc. Conf. Comp. Vision Pattern Rec, 1998.
[4] A.Elgammal, R.Duraiswami, D.Harwood and LS.Davis,
“Background and foreground modeling using nonparametric
kernel density estimation for visual surveillance,” Proc. IEEE;
90(7), pp. 1151–1163, Jul. 2002.
[5] T. Yang, Z. Li, Q. Pan and J. Li, “Real-Time and Accurate
Segmentation of Moving Objects in Dynamic Scene,”MM’04,
New York, USA, pp. 10–16,2004.
[6] T.Soumya, “A Moving Object Segmentation Method for Low
Illumination Night Videos,” Proc. of WCECS, October 22–
24, San Francisco, USA, 2008.
[7] Cohen and G. Medioni, “Detecting and tracking moving
objects for video surveillance,”Proc. of CVPR, pp. 319–325,
1999.
[8] Collins, Lipton, Kanade, Fujiyoshi, Duggins, Tsin, Tolliver,
Enomoto, and Hasegawa, “A System for Video Surveillance
and Monitoring,”VSAM, Technical report CMURI, 00–12,
Robotics Institute, Carnegie Mellon University, May, 2000.
[9] D. Zhou and H. Zhang, “Modified GMM background
modeling and optical flow for detection of moving
objects,”Inter. Conf. on Systems, Man, and Cybernetics,
2005.
[10] D. Bruce, K. Lucas and K. Takeo, “An iterative image
registration technique with an application to stereo vision,”
Proc. of the Inter. Joint Conference on Artificial Intelligence,
1981, pp. 674–679.
[11] T.Kanungo, D. M. Mount, S. Netanyahu, D. Piatko, R.
Silverman, and Y. Angela, “An Efficient k-Means Clustering
Algorithm: Analysis and Implementation,” IEEE Trans. on

57

You might also like