Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

T. T. Zin et al.

: Unattended Object Intelligent Analyzer for Consumer Video Surveillance 549

Unattended Object Intelligent Analyzer


for Consumer Video Surveillance
Thi Thi Zin, Member, IEEE, Pyke Tin, Hiromitsu Hama, Member, IEEE,
and Takashi Toriu, Member, IEEE

Abstract — Consumer video camera surveillance with the have become a critical threat of public safety; especially,
continuous advancements of image processing technologies is explosive attacks with unattended packages are repeatedly
emerging for consumer world of applications. Technology for concentrated on such public places. A key function in such a
detecting objects left unattended in consumer world such as surveillance system is the understanding of human behavior in
shopping malls, airports, railways stations has resulted in relation with objects left unattended in public places.
successful commercialization, worldwide sales and the winning In this context, visual surveillance for human-behavior
of international awards. However, as a consumer video understanding has been investigated worldwide as an active
application the need is now greater than ever for a surveillance research topic [1]. In particular, an automated video surveillance
system that is robustly and effectively automated. In this paper, system for robustly and effectively detecting of unattended objects
we propose an intelligent vision based analyzer for semantic is increasing the worldwide attention in many contexts, especially,
analysis of objects left unattended relation with human in the consumer world of applications. In these systems, it should
behaviors from a monocular surveillance video, captured by a
be a sufficiently high accuracy enabling a real-time performance.
consumer camera through cluttered environments. Our analyzer
Thus, a prime goal of automated visual surveillance is to obtain a
employs visual cues to robustly and efficiently detect unattended
objects which are usually considered as potential security live description of what is happening in a monitored area and take
breach in public safety from terrorist explosive attacks. The (or trigger) appropriate action. Not always appreciated is that visual
proposed system consists of three processing steps: (i) object tasks people find straightforward can sometimes represent major
extraction, involving a new background subtraction algorithm challenges for the computer. The computational effort and
based on combination of periodic background models with complexity involved in simply “following” someone through an
shadow removal and quick lighting change adaptation,(ii) extended video sequence is enormous, and a truly robust and
extracted objects classification as stationary or dynamic reliable tracker has yet to be developed. Compounding the problem
objects, and (iii) classified objects investigation by using is that usually public areas under surveillance often have
running average about the static foreground masks to calculate fluctuating and variable lighting conditions, people are frequently
a confidence score for the decision making about event (either occluded by other people or structures, and people may temporarily
unattended or very still person). We show attractive leave a monitored area, etc. Each of these factors can add
experimental results, highlighting the system efficiency and tremendous difficulty to the task. The automated analysis of
classification capability by using our real-time consumer video unattended objects in relation with human behavior in the
surveillance system for public safety application in big cities1. surrounding area is the subject of this paper, with the aim to
explore efficient algorithms for consumer use.
Index Terms — consumer video surveillance, intelligent
As a consumer video application, automatic unattended
analyzer, unattended object, multiple background model.
object detection inter-relation with human suspicious behavior
I. INTRODUCTION requires a sufficiently high accuracy and the computation
complexity should enable a real-time performance. For such a
Consumer surveillance cameras are cheap and ubiquitous. system, we need to analyze not only the motion of people and
The advent of smart consumer cameras with higher processing objects, but also the posture of the person such as carrying
capabilities has now made it possible to design video bags, leaving the bags unattended, as the postures of the
surveillance systems which can contribute to the safety of persons can provide important clues for the understanding of
people in the home and in public places such as shopping their motives and intentions. Hence, accurate detection and
malls, airports, railways stations and etc. Terrorist attacks recognition of various human postures in relation with objects
contribute to the scene understanding so that we can classify
1
This work was supported in part by SCOPE: Strategic Information and consumer video surveillance as described in Fig. 1.
Communications R&D Promotion Program (10103768).
Thi Thi Zin, Pyke Tin and T. Toriu are with Graduate School of
A. Literature on Unattended Object Surveillance Video Analysis
Engineering, Osaka City University, Osaka, Japan (e-mail: {thithi, In the past, many approaches based on background
pyketin}@ip.info.eng.osaka-cu.ac.jp, toriu@info.eng.osaka-cu.ac.jp).
H. Hama is with R&D Center of 3G Search Engine, Incubator, Research
subtraction were proposed [2]-[8]. Such methods differ mainly
Center for Industry Innovation, Osaka City University, Osaka, Japan (e-mail: in the type of background model and in the procedure used to
hama@ado.osaka-cu.ac.jp). update the model. Among them, a mixture of Gaussian
Contributed Paper
Manuscript received 03/08/11
Current version published 06/27/11
Electronic version published 06/27/11. 0098 3063/11/$20.00 © 2011 IEEE
550 IEEE Transactions on Consumer Electronics, Vol. 57, No. 2, May 2011

Intelligent analyzer
Consumer video surveillance

Unattended object

Surveillance
scenes

Fig. 1. Consumer video surveillance system.

distributions has been used for modeling the pixel intensities in For example, a statistical model of the background is used to
[3], [4]. In [5] the authors proposed a simple background detect foreground regions and to eliminate object shadows [15].
subtraction method based on logarithmic intensities of pixels. Two background models system is discussed in [16], [17] for
They claimed to have results that are superior to traditional detection of stationary objects. In many surveillance scenarios,
difference algorithms and which make the problem of threshold the initial background contains objects that are later removed
selection less critical. In [6] a prediction-based online method for from the scene or left into the scene. Correctly classifying
modeling dynamic scenes is proposed. The approach seems to whether a foreground blob corresponds to unattended or
work well, although it needs a supervised training procedure for removed objects or still person is an essential problem in
the background modeling, and requires hundreds of images background model, but most existing systems neglect it.
without moving objects. Adaptive Kernel density estimation is
B. Consumer Requirements of Unattended Object
used in [7] for a motion-based back-ground subtraction algorithm,
Surveillance Video Analyzers
the detection of moving objects to handle complex background,
but the computational costs is relatively high. In [8] the authors We live in a consumer surveillance society. In all the rich
used spectral, spatial and temporal features, incorporated in a countries of the world everyday life is suffused with
Bayesian framework, to characterize the background appearance surveillance encounters, not merely from dawn to dusk but
at each pixel. Their method seems to work well in the presence of 24/7. In a world becoming ever more attuned to potential
both static and dynamic backgrounds. security threats, the detection of unattended baggage is a key
Although many researchers focus on the background subtraction, capability of any surveillance system. This leads to the
few papers can be found in the literature for foreground analysis [9], consumer requirements to include an immediate identification
[10]. Reference [11] analyzed the foreground as moving object, of neglected baggage and simultaneous assessment of the
shadow, and ghost by combining the motion information. The circumstances of its abandonment, which can signify the
computation cost is relatively expensive for real-time video difference between effective control and potential chaos. The
surveillance systems because of the computation of optical flow. In main benefits for consumers from unattended object
[10] the authors described a background subtraction system to surveillance video analyzers are:
detect moving objects in a wide variety of conditions, and a second 1. Identifies suspicious, unattended baggage within
system to detect objects moving in front of moving back-grounds. moments of its abandonment,
In their work, a gradient-based method is applied to the static 2. Allows fast recognition of the moment of abandonment
foreground regions to detect the type of the static regions as to determine whether a threat exists.
unattended or removed objects (ghosts). It does this by analyzing
the change in the amount of edge energy associated with the Thus, we can see that the impacts of unattended object
boundaries of the static foreground region between the current surveillance video analyzers on consumer world of
frame and the background image. The performance of this method applications are enormous especially when security and safety
could strongly depend on the technique used to update the are concerned. In this aspect, the major challenges for
background and, moreover, they could fail in presence of non consumer applications are as follows.
uniform objects. The existing methods can also be divided into two  The stationary and non-stationary objects detection and
categories according to their use of one or more background analysis results should have sufficient accuracy for
subtraction models. And for each category, it can further be consumer acceptance and expectation.
subdivided into two classes: one based on frame-to-frame analysis  High-processing efficiency achieving (near) real-time
[12], [13] and the other based on a sub-sampled analysis [14]. operation with low-cost consumer video cameras.
T. T. Zin et al.: Unattended Object Intelligent Analyzer for Consumer Video Surveillance 551

 A conversion of visual results to a real world space can II. OVERVIEW OF PROPOSED UNATTENDED OBJECT
facilitate the analysis of special events such as very still INTELLIGENT ANALYZER
person existence, cases of robbery, stolen and potential This section presents an overview of proposed
suicide bombing. unattended object analyzer for which the block diagram is
In the sequel, we will discuss the above in some more detail. shown in Fig. 2 It is a multi-level event-analysis system
To address the challenging problem of accurately analyzing which consists of three conceptual components, each
the stationary and non-stationary objects detection and being briefly explained below.
achieving high-level event analysis from monocular video A. Preprocessing
sequences, the system should provide analysis at different
The periodic background modeling along with stochastic
semantic levels. A joint analysis tool is required to bridge the
likelihood image and moving object detection are
gaps between the pixel-level, object-level and event-level
implemented. Each image within the video covering an
analysis and classifications. Our system has been designed
individual human body and static objects are segmented
such that it incorporates multiple levels of background models
to extract the ‘blobs’ representing foreground objects. In
and motion analysis from the object-level onwards. The
this processing, the periodic concept based backgrounds
system can be utilized in surveillance applications with
with periods of Short Length (SL) and Long Length (LL)
analysis results at all four levels.
are automatically built and updated by temporal statistical
For the second challenge, we have organized an evaluation
analysis. The main motivation is that the recently changed
of our method by partly embedding it in a new experimental
pixels that stay static after they changed can be
real-time video content-analysis system. The evaluation has
distinguished from the actual background pixels and the
proved its efficiency, as it achieves a near real-time
pixels corresponding to the moving regions by analyzing
performance. More over resource management is applied to
the intensity variance in different temporal scales. We
optimize the content-analysis processing, even when not all
employ the mixture of the periodic models along with
resource requirements of all components can be satisfied at the
Stochastically Varied (SV) likelihood image background
same time. Regarding to the third challenge we introduce a
and update them based on stable history maps and
real world application scheme for scene understanding. The
difference history maps. After motion detection, a shadow
location and posture of persons are visualized in a virtual
removing procedure is performed on each image in order
world after investigating context knowledge. The accurate and
to discard shadow points that, generally, deform the shape
realistic reconstruction in a virtual space can significantly
of the moving objects. The intensity and texture
contribute to the scene understanding, like crime-evidence
information are integrated to remove shadows and to
collection and healthcare behavior analysis. Therefore, it is
make the algorithm working for quick lighting changes.
interesting to extend scene-reconstruction functionality in
advanced surveillance applications, as consumers require B. Stationary Object Detection Processes
more semantic results than only the conventional visualization A matching algorithm is employed to detect if the object
that most of existing systems normally provide. is unattended long enough to trigger the alert. Moreover a
In this paper, as a contribution to the scene understanding of mixture of multiple statistical models is used to analyze
crime-evidence collection for consumer requirements, we the foreground as moving objects, unattended objects, or
present a new intelligent analyzer with reference to unattended removed objects (ghosts), and still person while detecting
suspicious objects. Keeping the consumer-acceptance as a key the backgrounds. Different thresholds are used to obtain
component and to make the analyzer more efficient and the foreground mask (for moving objects) and the static
effective, we introduce new periodic background modeling that region mask (for stationary objects).
does not require object tracking. Moreover our system does not
require object initialization, tracking, or offline training. It C. Classifying Process for Object Type
accurately segments objects even if they are fully occluded. The For the stationary region mask, a segmentation method is
system is able to deal with people who stop and sit for extended developed to detect the type of the static region
periods of time and not regularly detect them as unattended (unattended or removed or still), significantly
objects. A logic-based system is introduced to classify detected outperforming previous techniques. Only those
objects as either an unattended object or a still person. unattended/removed objects that meet the user defined
The rest of the paper is organized as follows: an overview alert requirements will trigger the alerts. With the method
of the proposed system is provided in section II. In section III, proposed in this paper, our system can be more robust to
we develop an innovative periodic concept based background illumination changes and dynamic background, and it can
models and stationary region detection. The classification of also work very well even if the images of the video are in
detected object types is presented in section IV. Section V low quality. In addition, the rule based classifier is used
covers some experimental results on standard datasets as well to distinguish the unattended object and the still-standing
as our real-world surveillance scenarios. Finally, concluding persons, which is a problem that is not solved in previous
remarks and discussions are presented in section VI. approaches.
552 IEEE Transactions on Consumer Electronics, Vol. 57, No. 2, May 2011

Preprocessing
Video sequence
Multiple Background
Updating

SL LL SV Shadow Object
Removal Extraction

Change Detection

Process of object type classification


Process of stationary object detection
Meet abandoned Yes Stationary object
object requirements Is abandoned?
detection
No
No
Yes
No No
Is removed? Still person
Trigger alert
Discard
Yes Yes
No Yes Discard
Meet removed Meet still person
object requirements requirements No
Yes
Make Decision

Fig. 2. Unattended Object Intelligent Analyzer.

III. PERIODIC BACKGROUND MODELING COMBINED WITH after they changed can be distinguished from the actual
STOCHASTIC IMAGE background pixels and the pixels corresponding to the moving
Most of the existing surveillance systems start with a period regions by analyzing the intensity variance in different
of empty scenes to facilitate the construction of the original temporal scales. At any given time, any given pixel is not only
background. This way of starting a surveillance system is one element of a particular pixel process, but also one element
hardly applicable to real world phenomena especially of image. Contextual constraint of both temporal and spatial is
consumer world. In order to facilitate the consumer necessary in the robust labeling. To model the temporal and
application areas such as crime-evidence collection, we spatial contextual information, our model for background has
develop a mathematical periodic concept for background two components. One component processes images at pixel
maintenance and subtraction as a labeling problem in a series level and the other processes images at frame level.
of images. Specifically, we establish three reference In pixel level process a background is determined by
background models. They are named as: maintaining the most consistent states of each pixel within a
1. Short Length periodic updated background model (SL), certain time. With such background, the changed pixels which
2. Long Length periodic updated Background model (LL), do not fit the requirement are obtained, also pixel color, pixel
3. Stochastically Varied likelihood image model (SV). intensity information is used for background process. Similarly,
For the first two backgrounds SL and LL the user can adjust moving objects, lighting changes, and reflections on floors and
the time interval between the update of reference background walls need to clear up efficiently with only stationary objects
frames to adapt different needs and environments, furthermore, remaining in the scene. Moreover, to avoid exhausted scanning
both the backgrounds update dynamically, the first one is of all possible bounding boxes, we first introduce two criteria to
updated frequently while the second one has a slower update rate screen out a small number of suspected regions. To become an
according to the change of the environments. We then aggregate unattended object, two conditions should be satisfied. First, it
the frame-wise motion statistics into a stochastically varied should be a foreground object. Second, it should remain static in
likelihood image by updating the pixel-wise values at each frame. recent frames. This means that by comparing the original
The periodic concept is originally used in theory of Markov background with the moving foreground regions, we can
chain for classification of states and it is now becoming a very hypothesize whether a pixel corresponds to an unattended item
successful tool in solving queuing and waiting line problems. or not. On the other hand, an item stolen is original part of the
But to the best of our knowledge we have not found any background, when it is taken off from the scene, we could also
literature for applying this concept to image processing determine whether the pixel belongs to a stolen object by the
technologies and human behavior analysis. It is completely same principle. However, the background image cannot always
new and earns novelty in this aspect. In our case the main maintain a static state, it must update with the changing
motivation is that the recently changed pixels that stay static circumstances.
T. T. Zin et al.: Unattended Object Intelligent Analyzer for Consumer Video Surveillance 553

A. Periodically Updated Background Models According to the updating rules, even if the foreground
The first frame of the inputting video image is initialized as changes at a fast pace, it will not affect the background, but if
SL and LL respectively in our application, and an improved the foreground is stationary, it will gradually merges into the
adaptive background updating method is applied by background. In this way, we prevent the background model to
constructing two maps of pixel history. The first map is Stable be polluted by pixel which is logically not belonging to the
history Map (SM) which represents the number of times a background scene. Moreover, we could see that the intensity
pixel is stable in consecutive frames. of each pixel of SL or LL has great connection with the
For the nth image frame in a video sequence, a pixel is said corresponding foreground. Furthermore, the following
to be stable if | I n ( x, y )  I n 1 ( x, y ) | Ths and unstable of inferences are made.
otherwise. Here, Ths is a pre-defined threshold. We can then 1. SFn(x, y) = 1 and LFn(x, y) = 1, which means (x, y) is a
define the updating rule for SM as follows. pixel indicate that there is a new moving object come
into the scene, and it does not belong to any backgrounds.
SM  SM  1 if | I n ( x, y )  I n 1 ( x, y ) | Ths ,
(1) In this case, it can be seen that SL adapts itself to the
0 otherwise. relatively consistent changes, but it does not learn
temporary color changes due to motion of the objects.
The initial value for each pixel in SM is set to 0. If a pixel is
Thus, such a pixel is marked as SF(x, y) = 1 in the short
in the object plane, it is marked as unstable and set its value to
periodic foreground. Since the LL is updated less
0. The second map is a difference history map (DM), which
frequently, a temporary change cannot alter SL. The
represents the number of times a pixel is significantly different
pixel is also marked as F(x, y) = 1 in the long periodic
from the background in consecutive frames. It is the condition
foreground mask.
for a stationary object becoming a part of background. By this
2. SFn(x, y) = 1 and LFn(x, y) = 0, where pixel (x, y) is a
definition, we observed that DM = n –SM where n stands for
part of the detected object which is changed now, then
the total number frames in the sequence.
sets back to its original value. And an assumption that it
The initial value for each pixel in DM is 0. If the pixel
takes more time to adapt the original background to the
belongs to the object plane, its value increases by 1. Based on
detected object than the change period is made. In case a
the information from both maps and taking the still object and
uncovered background situation into account, the backgrounds pixel that was a part of the scene background is occluded
adaptively updated frame-by-frame by: for sometimes and then uncovered, the long periodic
foreground will still be zero, LF(x, y) = 0. LL is updated
 I n ( x, y ) if SM ( x, y )  Th f and DM ( x, y )  Th f , less frequently hence it is not responsive enough to adapt

SLn ( x, y )  SLn1 ( x, y ) if SM ( x, y )  Th f and DM ( x, y )  0, (2) to the new color during the occlusion. Yet, SL is
 responsive and adapts itself during the occlusion, which
(1   ) SLn1 ( x, y )   I n ( x, y ) if SM ( x, y )  0.
causes SF(x, y) = 1.
SLn(x, y) and SLn-1(x, y) represent the short periodic updated 3. SFn(x, y) = 0 and LFn(x, y) = 1; (x, y) is a scene
backgrounds pixel value at position (x, y) in current and background pixel that was occluded before. A stationary
previous frames. In the same way, LLn(x, y) and LLn-1(x, y) pixel will be blended into SL i.e. SF(x, y) = 0 if it stays
represent the long length periodic updated background at stationary long enough. Assuming this duration is not
position (x, y) and the corresponding updating rules are prolonged to blend the pixel in the scene background. As a
result, the long periodic updated foreground will be one,
SLn ( x, y ) if SM ( x, y )  Th f and DM ( x, y )  Th f , LF(x, y) = 1. This is expected for the left behind items.

LLn ( x, y )   LLn1 ( x, y ) if SM ( x, y )  Th f and DM ( x, y )  0, (3) 4. SFn(x, y) = 0 and LFn(x, y) = 0, which shows (x, y) is a

(1   ) LLn1 ( x, y )   SLn ( x, y ) if SM ( x, y )  0, pixel equal to both background pixel, this means that
there is no change in the scene. In this condition, only
where Thf is a predefined threshold value and α, β are the background updating operate.
learning rate of two backgrounds. We can describe these inferences by using a finite state
At every frame, we estimate the short length periodic machine concept as shown in Fig. 3.
foreground (SF) and long length periodic foreground (LF) by
comparing the current frame I by the background models SL B. Stochastically Varied Likelihood Image Model (SV)
and LL. We obtain two binary foreground maps SF and LF Although the relationship between two backgrounds and
where F(x, y) = 1 indicates the pixel (x, y) is changed. The LF their relative foreground has been discussed in previous, but
shows the variations in the scene that were not there before the case SFn(x, y) = 0 and LFn(x, y) = 1 is of great essential for
including the moving objects, temporarily static objects, detection. Under this condition, a pixel (x, y) may correspond
moving shadows, noise, and illumination changes that the to a static object, in the cause of the changed pixel already
background models fail to adapt. The foreground SF contains blended in SFn, but not prolonged enough to blend in LFn.
the moving objects, noise, etc. However, it does not show the Thus we will construct a stochastically varied likelihood
temporarily static regions that we want to detect. image based on the two previous updating models as follows.
554 IEEE Transactions on Consumer Electronics, Vol. 57, No. 2, May 2011

Moving Uncovered Unattended Scene The decay parameter k governs how fast the likelihood
object background object background
should decrease if no evidence is provided. It also determines
the responsiveness of the system in case the unattended item is
removed, in which case the pixels returns their original
background values before the detection, or blended into the
scene background. The decay parameter can be set
Long periodic Short periodic Long periodic Short periodic
Foreground Foreground Foreground Foreground proportional to the evidence threshold. This means only a
LFn(x, y) = 1 SFn(x, y) = 1 LFn(x, y) = 0 SFn(x, y) = 0 single parameter is needed for the likelihood image. Neither of
Change Change No Change No Change
the backgrounds and their mixture models depends on the
likelihood stochastic image preset values. This makes the
detection robust against the variations of the evidence and
decay parameters that can be set comfortably without
struggling to fine tune the overall system.

 C. Shadow Removing

Video sequence
After the background subtraction only the blobs whose area
is greater than a certain threshold are maintained.
Fig. 3. Hypotheses on LF and SF. Unfortunately each preserved blob contains not only the
relative moving object but also its own shadows. The presence
SV ( x, y )  1 if P( SF ( x, y )  1  LF ( x, y )  0)  1, of shadows is a great problem for a motion detection system,
SF ( x, y )  k if P( SF ( x, y )  1  LF ( x, y )  0)  1, because they alter real size and dimension of the objects. This
 (4)
SV ( x, y )   problem is more complex in indoor contexts, where shadows
0 if P( SV ( x, y )  0)  1,
max SV ( x, y ) otherwise, are emphasized by the presence of many reflective objects; in
addition shadows can be detected in every direction, on the
where max and k are positive numbers and P(.) is the floor, on the walls but also on the ceiling, so typical shadow
probability measure of an event. removing algorithms, that assume shadows in a plane
The likelihood image enables removing noise in the orthogonal with the human plane, cannot be used. To prevent all
detection process. It also controls the minimum time required these problems, correct shapes of the objects must be extracted
to assign a static pixel as an unattended item. For each pixel, and to do that a shadow removing algorithm is implemented.
The shadow removing approach described here starts from
the likelihood image collects the evidence of being an
the assumption that a shadow is a uniform decrease of the
unattended item. Whenever this evidence elevates up to a
illumination of a part of an image due to the interposition of
preset level, we mark the pixel as an unattended item pixel
an object with respect to a bright point-like illumination
and raise an alarm flag. The evidence threshold max is defined source. From this assumption, we can note that shadows move
in term of the number of frames and it can be chosen with their own objects but also that they do not have a fixed
depending on the desired responsiveness and noise texture, as real objects do: they are half-transparent regions
characteristics of the system. In case the foreground detection which retain the representation of the underlying background
process produces noisy results, higher values of max should surface pattern. Therefore, our aim is to examine the parts of
be preferred. High values of max decrease the false alarm rate. the image that have been detected as moving regions from the
On the other hand, higher the preset level gets longer the previous segmentation step but with a texture substantially
minimum duration a pixel takes to be classified as a part of an unchanged with respect to the corresponding background.
unattended item. Construction of three backgrounds is Formally, we evaluate, for each candidate point (x, y) the
illustrated in Fig. 4. ratio as R  I n ( x, y) where I n ( x, y ) and Bn ( x, y ) are the
B n ( x, y )
Stationary object intensity value the pixels (x, y) in the current image and in the
background image, respectively. After this, pixels with
Stochastically Varied likelihood
image model (SV)
uniform ratio will be removed. The output of this phase
provides an image with the real shape of the detected objects,
Short periodic updated Moving object Long periodic updated without noise or shadows.
background model background model
(SL) (LL)
 IV. UNATTENDED OBJECT ANALYZING PROCESS
In video surveillance one of the most important applications
 is to distinguish the unattended or removed object from still
person. In order to do so, we subdivide extracted objects
t t+ t Image frames time moving object was classified into one of four types, Temporary
Static Object (TS), Moving Person (MP), Still Person (SP),
Fig. 4. Proposed background model.
T. T. Zin et al.: Unattended Object Intelligent Analyzer for Consumer Video Surveillance 555

Unattended Object (UO), and Unknown (U), using a simple enforces the two backgrounds with stochastically updated
rule-based classifier for the real-time process. It uses features background enables us to detect the unattended object
such as the velocity of a blob, and exponent running average. correctly and separately as shown in Fig. 5(b). The detection
To classify, we used three critical assumptions: results of the proposed method by using PETS 2006 dataset
1. Unattended object does not move by itself, and our own video sequences are also presented in Fig. 6(a)
2. Unattended object has an owner and and Fig. 6(b) respectively. These results show that the
3. The size of the unattended object is probably smaller proposed method works well on both datasets.
than a person. We have also compared the proposed method with some
If objects were detected, they were initially classified as traditional methods. We make a list of the method compared in
Unknown. Then, using the velocity of the moving object, the our experiment with (i) single background model, (ii) dual
Unknown was classified as Person or UO. That is to say, if background model and (iii) multiple background models (the
Unknown moved at a velocity higher than that of the threshold proposed method). One scene includes at least 2000 frames, and
value, Thv for several consecutive frames, it was identified as its first 10 frames are used for initialization. According to our
a Moving Person. If Unknown’s velocity was below the experimental results, the single background model and dual
threshold velocity TLv, it was classified as (TS). If Unknown background models cannot handle the background changes, but
is identified as TS, UO and Still Person were distinguished by the multiple background models with stochastically updated
background reinforcement could detect object regions
using the Exponent Running Average (ERA). If ERA is
accurately compared with the other traditional methods.
greater than a predefined threshold value The, the TS is
From our experimental works, we also observed that the
classified as still person and otherwise it will be unattended
single background model is sensitive to the short-term
object. illumination changes. It results in erroneous detection of the
ground surface, the wall and so on. On the other hand, the
V. EXPERIMENTAL RESULTS dual background model is robust for the short-term
Experiments were carried out in a public transportation illumination changes, but it detects not only the object regions
environment. The test video sequences used are taken by our but also surrounding pixels of the objects. Considering the
normal video camera in International Airports. The characteristics of these two models, the advantages of the
environments are also randomly chosen. No special single background model matches the disadvantages of the
background conditions have been imposed. We have taken dual background model, and vice versa. Both traditional
five video sequences in crowded environments in which some models cannot detect the object location frame exactly but our
are in front of check-in gate. It also contains complex multiple background approach has high advantage in this
scenarios with multiple people sitting, standing and walking at aspect which is the most important factor for unattended
variable speeds. Some are sitting in very still position. This object detection problems. Moreover, our method works well
type of environment is very common in our daily life. Even without making any restrictions for the initialization. So, our
though most of existing methods so far do not take this type of method is useful for surveillance applications even though the
realistic situations into account, the proposed method can pure background image is not available.
handle successfully these cases. We also have considered
partial occlusion and sometimes completely occlude in a VI. CONCLUSIONS AND FUTURE WORKS
specified moment. All videos have instances of various shapes We have proposed an innovative periodic concept based
of unattended objects and still people. These are taken from framework that enables multifunctional unattended objects in a
different venues. Each video sequence tests a different
human surveillance system for consumer use. This system can
viewpoint. Moreover, we have already confirmed the
also be applied for detecting special events such as recording a
performances of our method using PETS2006 datasets. So, we
burglary, robbery or monitoring school zone safety problems, for
tested totally 20 video sequences with various public areas in
school children, thereby contributing to the safety of people in
real time environments. The images used here 320240 pixels
(QVGA) resolution. The frame rate is 10 fps. the home and schools. Moreover, it achieves an object
Fig. 5 shows the detection results for our own video recognition accuracy rate of about 95% to enable a realistic
sequences. In Fig. 5(a), for our own video sequence, one implementation of a surveillance system, or further analysis of
person brings a bag and leaves on a stone block. As visible, human behavior. The occlusion problem is also thoroughly
after it was detected as an unattended object, temporary tackled and successfully dealt with various aspects. The proposed
occlusions due to the moving people do not cause the system object detection method works surprisingly well with well
to fail. One person carrying a bag comes into the scene while known public datasets. Due to its simplicity the computational
multiple people are walking and standing. Afterwards, the bag effort is kept low and no training steps are required.
is detected as an unattended object. The first two figures in Fig.
5(b) illustrate the short and long periodic backgrounds ACKNOWLEDGMENT
producing the initial nominated region of stationary object. We thank Scope project members and the students of
We can see that two backgrounds alone can not detect the Physical Electronics and Informatics of Osaka City
static object separately. But our proposed method which University, for their participations in producing tested videos.
556 IEEE Transactions on Consumer Electronics, Vol. 57, No. 2, May 2011

frame no. 165 frame no. 560 frame no. 822 frame no. 1151 frame no. 1435 frame no. 1721 frame no. 1973

(a)

Short periodic Background Long periodic Background Initial nominated region of


stationary object Unattended object detection

Short periodic Foreground Long periodic Foreground Stochastic Foreground

(b)
Fig. 5. Example of one video sequence: (a) input frames, (b) unattended object detection.
Frame no. 370 Frame no. 675 Frame no. 897 Frame no. 993

(a)
Frame no. 165 Frame no. 500 Frame no. 1800 Frame no. 2700

(b)
Fig. 6. Example of test video sequences: (a) sequence of PETS 2006 datasets and (b) sequence of our own datasets.

REFERENCES [4] C. Stauffer and W. E. L. Grimson, “Learning patterns of activity using


real-time tracking,” IEEE Trans. Pattern Analysis and Machine
[1] T. B. Moeslund, A. Hilton, and V. Kruger, “A survey of advances in
Intelligence, vol. 22 no. 8, pp. 747-757, Aug. 2000.
vision-based human motion capture and analysis,” Computer Vision and
Image Understanding, vol. 104, pp. 90-126, 2006. [5] Q-Z. Wu and B-S. Jeng, “Background subtraction based on logarithmic
[2] L. Li and M. K. H. Leung, “Fusion of two different motion cues for intensities,” Patter Recognition Letters, vol. 23, no. 13, pp. 1529-1536,
intelligent video surveillance,” Electrical and Electronic Technology, Nov. 2002.
TENCON., vol. 1, pp. 19-22 Aug. 2001. [6] A. Monnet, A. Mittal, N. Paragios, and V. Ramesh, “Background
[3] W. E. L. Grimson, C. Stauffer, R. Romano, and L. Lee, “Using adaptive Modeling and Subtraction of Dynamic Scenes,” Proc. of IEEE Int. Conf. on
tracking to classify and monitoring activities in a site,”. Proc. of CVPR Computer Vision (ICCV), Nice, France, pp. 1305-1312, Oct. 2003.
98, Santa Barbara, CA, USA, pp. 22-29, Jun.1998.
T. T. Zin et al.: Unattended Object Intelligent Analyzer for Consumer Video Surveillance 557

[7] A. Mittal, N. Paragios, “Motion-based Background Subtraction using adaptive Pyke Tin received the B.Sc. degree (with honor) in
kernel Density Estimation,” Proc. of CVPR, pp 302- 309, 2004. Mathematics in 1965 from University of
[8] L. Li, W. Huang, I. Y. H.Gu, Q. Tian, “Statistical Modeling of Complex Mandalay, Myanmar, the M.Sc. degree in
Backgrounds for Foreground Object Detection,” IEEE Trans. on Image Computational Mathematics in 1970 from
Processing, vol. 13, no. 11, pp. 1459-1472, Nov. 2004. University of Rangoon, Myanmar and the Ph.D.
[9] R. Cucchiara, C. Grana, M. Piccardi, A. Prati, “Detecting Moving Objects, degree in stochastic processes and their
Ghosts, and Shadows in Video Streams,” IEEE Trans. on Pattern Analysis and applications in 1976 from Monash University,
Machine Intelligence, vol. 25, no. 10, pp. 1337-1342, Oct. 2003. Australia. He was the Rector of the University of
[10] J. Connell, A.W. Senior, A. Hampapur, Y.-L. Tian, L. Brown, and S. Pankanti, Computer Studies, Yangon and Professor of
“Detection and Tracking in the IBM People Vision System,” IEEE Intl. Conf. Computational Mathematics. He is now a visiting Professor of Graduate
on Multimedia-Expo. (ICME04), Jun. 2004. School of Engineering, Osaka City University, Osaka, Japan. His research
[11] S.M. Smith, “A new class of corner finder,” Proc. of 3rd British Machine Vision interests include image search engines, queueing systems and applications to
Conf., pp. 139-148, 1992. Computer vision, stochastic processes and their applications to Image
[12] S. Cheng, X. Luo, and S. M.Bhandarkar, “A multiscale parametric background processing.
model for stationary foreground object detection,” in IEEE Workshop on
Motion and Video Computing, 2007.
[13] F. Porikli, Y. Ivanov, and T. Haga, “Robust abandoned object detection using
dual foregrounds,” EURASIP Journal on Advances in Signal Processing, vol.
2008, no. 1, pp. 1-11, 2008. Hiromitsu Hama received the B.E., M.E. and
[14] R. Miezianko and D. Pokrajac, “Detecting and recognizing abandoned objects Ph.D. degrees in electrical engineering from Osaka
in crowded environments,” Proc. of Computer Vision System, pp. 241-250, University, Osaka, Japan, in 1968, 1970 and 1983,
2008. respectively. He is currently an emeritus Professor
[15] P. Spagnolo, A. Caroppo, M. Leo, T. Martiriggiano, T. D’Orazio, “An at Osaka City University, and continues his R&D
Abandoned/Removed Objects Detection Algorithm and Its Evaluation on PETS activity at the incubator of Osaka City University.
Datasets”, Proc. of IEEE Conf. on Video and Signal Based Surveillance His research interests are in the areas of next-
(AVSS’06), pp. 17-21, 2006. generation search engine, surveillance systems,
[16] F. Porikli, “Detection of temporarily static regions by processing video at ITS (Intelligent Transport Systems), smile and
different frame rates”, Proc. of IEEE Conf. on Advanced Video and Signal laughter science, image processing, computer
Based Surveillance (AVSS’07), pp. 236-241, 2007. vision, reconstruction of 3D world. He is a member of IEEE, IEICE, IIITE, J-
[17] W. Lao, J. Han, P.H.N. de With, “Automatic surveillance analyzer using FACE and The Society for Humor Sciences
trajectory and body-based modeling,” Digest of Technical Papers: Intl. Conf. on
Consumer Electronics (ICCE), Las Vegas, NV, USA, Jan. 10-14, pp. 1-2, 2009.
[18] W. Lao, J. Han, P.H.N. de With, “Automatic video-based human motion
analyzer for consumer surveillance system,” IEEE Trans. on Consumer
Electronics, vol. 55, no. :2, pp. 591-598, May 2009. Takashi Toriu received the B.Sc. in 1975, M.Sc.
[19] J. Han, D. Farin, P.H.N. de With, W. Lao, “Real-time video content analysis and Ph.D. degree in physics from Kyoto
tool for consumer media storage system,” IEEE Transactions on Consumer University, Kyoto, Japan, in 1977 and 1980,
Electronics, vol. 52, no. 3, pp. 870-878, Aug. 2006. respectively. He was a researcher in Fujitsu
[20] Y. Cho, S.O. Lim, H.S. Yang, “Collaborative occupancy reasoning in visual Laboratories Ltd. from 1982 to 2002, and now he is
sensor network for scalable smart video surveillance,” IEEE Trans. on a Professor of Osaka City University. His research
Consumer Electronics, vol. 56, no. 3, pp. 1997-2003, Aug. 2010. interests are in the areas of image processing,
computer vision, and especially in modeling of
BIOGRAPHIES human visual attention. He is a member of IEEE,
IEICE, IPSJ, ITE and IEEJ.
Thi Thi Zin received the B.Sc. degree (with honor)
in Mathematics in 1995 from Yangon University,
Myanmar and the M.I.Sc degree in Computational
Mathematics in 1999 from University of Computer
Studies, Yangon, Myanmar. She received her
Master and Ph.D. degrees in Information
Engineering from Osaka City University, Osaka,
Japan, in 2004 and 2007, respectively. From 2007
to 2009, she was a Postdoctoral Research Fellow of
Japan Society for the Promotion of Science (JSPS).
She is now a specially appointed Assistant Professor of Graduate School of
Eng, Osaka City University. Her research interests include human behavior
understanding, ITS, and image recognition. She is a member of IEEE.

You might also like