Time-Scale Change Detection Applied To Real-Time Abnormal Stationarity Monitoring

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

ARTICLE IN PRESS

Real-Time Imaging 10 (2004) 9–22

Time-scale change detection applied to real-time abnormal


stationarity monitoring
Didier Auberta, Fre! de! ric Guichardb, Samia Bouchafac,*
a
INRETS, 2 av. du Gal Malleret-Joinville, 94114 Arcueil Cedex, France
b
Poseidon Technologies, 3 rue Nationale, 92100 Boulogne, France
c
Institute d’Elecronique Fondamentale, University Paris Sud, Bt 220, 91405 Orsay Cedex, France

Abstract

This paper presents two robust algorithms with respect to global contrast changes: one detects changes; and the other detects
stationary people or objects in image sequences obtained via a fixed camera. The first one is based on a level set representation of
images and exploits their suitable properties under image contrast variation. The second makes use of the first, at different time
scales, to allow discriminating between the scene background, the moving parts and stationarities. This latter algorithm is justified
by and tested in real-life situations; the detection of abnormal stationarities in public transit settings, e.g. subway corridors, will be
presented herein with assessments carried out on a large number of real-life situations.
r 2003 Elsevier Ltd. All rights reserved.

1. Introduction such as corridors, where long stays are generally


uncommon. Prolonged stationarity may indicate the
This paper presents a device developed as part of presence of: a person suffering from illness, a person
the European CROwd MAnagement with Telematic fallen and unable to rise, a thief waiting for a potential
Imaging and Communication Assistance (CROMATICA) victim, a drug dealer, an illicit vendor, a beggar, a vagrant,
and Pro-active Integrated systems for Security Manage- an abandoned object or graffiti being painted on a wall.
ment by Technological, Institutional and Communica- This paper is organized as follows. We first present
tion Assistance (PRISMATICA) projects. Their aim is our application and its constraints. We emphasize the
to assist, through the use of computer imaging tools, importance of defining a method that is effective under
public transit operators in their daily tasks, such as low contrast and stable under contrast change. In
incident and anomaly detection. This kind of detection Section 3, we describe a ‘‘change-detector’’ algorithm
is useful for reacting quickly to events capable of based on specific primitives: the level lines of the image.
disturbing network management (e.g. entry into for- As shown in [1–3], this representation is, to a certain
bidden areas, illegal track crossing, counter-flow move- extent, contrast-independent, a feature which allows us
ments), decreasing user safety (e.g. dizziness, fighting, to design a detection based on image ‘‘geometry’’ rather
vandalism, drug deals, unattended objects, overcrowded than on image ‘‘contrast’’. We show that the intersections
platforms, person requiring assistance) or decreasing the or the disappearance of level lines between images and a
system’s attractiveness (presence of vagrants, beggars, specially built ‘‘reference’’ are indications of changes in
illicit vendors, graffitists). the images (novel/moving objects). We then derive a
In this paper, we concentrate on the detection of real-time algorithm from these items. This algorithm
abnormal stationarities in subway corridors using an does not require any arbitrary threshold especially in the
image processing device. By abnormal stationarity, we primitive extraction process. The only necessary para-
are implying excessive periods of immobility in places meters are: the historical time T, and the occurrence
threshold To that need to be fixed from the application
constraints. In short, these parameters allow us to state:
*Corresponding author.
E-mail addresses: didier.aubert@inrets.fr (D. Aubert),
‘‘This event has been present in the images for less than
fguichard@poseidon.fr (F. Guichard), samia.bouchafa@ief.u-psud.fr To units of time within the last T units of time; a change
(S. Bouchafa). has therefore transpired.’’

1077-2014/$ - see front matter r 2003 Elsevier Ltd. All rights reserved.
doi:10.1016/j.rti.2003.10.001
ARTICLE IN PRESS
10 D. Aubert et al. / Real-Time Imaging 10 (2004) 9–22

In Section 4, we demonstrate how to use the same by considering points that belong neither to the back-
change detector algorithm to detect stationary objects ground nor to the estimated instantaneous parts in
by just adjusting these two parameters. More precisely, motion. While not explicitly stated, we can remark
we observe that a comparison of the results obtained by from their study the implicit use of two time scales
applying the change detector algorithm twice with both for discriminating changes. Motion detection can be
a long and a short historical time yields a satisfactory considered as a quick change (small time scale) and
instantaneous indication of the stationary objects. A stationarities as a change occurring over a given time
summation of stationarities over time enables us to period (intermediate time scale). Our algorithm will
estimate, at each pixel, a stop duration which is then make explicit use of this property, by implementing a
used to trigger an alarm once a threshold has been time-scale adaptable change detector.
reached. In our application, we have considered a 2-min Another original approach [8,9] proceeds in a very
duration as it has been established by monitoring different fashion by focusing on vertical head (or
operators. We also point out that the algorithm’s body oscillations) during the movement of crowds.
structure naturally remedies the possible occlusions of The authors propose to capture these oscillations (over
such objects by a moving crowd. time) within the frequency domain. A stationary crowd
Lastly, we present some experimental work and can thereby be detected by the absence of any such
display the level of performance obtained by applying frequency patterns. Unfortunately, this method only
the system in real-life situations. detects the global stationarity of the scene and has not
been designed to detect isolated stationary objects.
Lastly, these systems exhibit a strict set of hypotheses
2. The aim of this application regarding crowd density, camera location and orienta-
tion, in order to avoid occlusions, detect vertical
In order to detect an incident or abnormality, various oscillations and, in some cases, make assumptions of a
means are currently being used by transit network sketchy pedestrian model.
officials. The most common in detecting or at least
confirming that an incident has occurred is based on 2.2. Problems arising with this type of application
closed-circuit television (CCTV) cameras. However,
with this equipment, operators are only able to visualize 2.2.1. Lighting conditions
a few cameras on the site at any given time. Moreover, The quality of the camera used, the lighting environ-
continuously watching screens is rather ineffectual, ment, dust and the transmission links all suggest that a
fastidious and boring. For all of these reasons, it is system for this type of application must be able to
useful to build a system able to detect events from all the accommodate poor-quality images. Due to a generally
cameras simultaneously. In this case, only some of the low level of light, images tend to display noise and the
pertinent images are sent on to the operators. As a contrast between a person and the background is often
consequence, operator workload gets reduced and the minor. Conversely, some cameras may be positioned
number of detected incidents rises. By increasing the close to lights, with the resulting images likely to be
speed of detection, the actions undertaken are more partially saturated. Other signal disruptions are gener-
effective, as is the level of safety. ated along the transmission links (e.g. oversized cables,
switches, electromagnetic sources). In addition, the
2.1. Previous vision-related approaches system may be subjected to illumination variations
(e.g. contrast adjustment of the camera itself, modifica-
Various applications require the detection of statio- tion to the lighting environment, outdoor scenes).
narities (e.g. the detection of people waiting for a green
light at intersections, the detection of motionless crowds 2.2.2. Frequent occlusions
indicating an overcrowded situation). A corridor ceiling is generally too low to obtain a
The general approach consists of constructing a camera view angle that limits occlusion. Thus, when a
reference image that represents the normal gray-level stationary person is partially or completely occluded by
(or any other measurement for that matter) state of the other people or a crowd, the detection process may
background scene. The difference with the current image encounter difficulty in detecting this person and
then shows the non-permanent objects, such as pedes- measuring the corresponding stop duration.
trians (see [4,5]). An analysis of the motion of these
objects and even sometimes their shape [5,6] can serve 2.2.3. Movements of ‘‘stationary’’ people
to indicate which ones are stationary. However, motion Furthermore, a stationary person is rarely completely
estimation—as well as shape analysis—is not very motionless. In fact, even a person standing still makes
straightforward within the context of our application. motion with his legs, arms, etc., thereby adding
In [7], the authors propose to discriminate stationarities complexity to the detection process.
ARTICLE IN PRESS
D. Aubert et al. / Real-Time Imaging 10 (2004) 9–22 11

3. Change detector algorithm met with little success. As pointed out in [7], the natural
change in gray-level distributions (or in any measure-
This section focuses on the ‘‘change detector’’, which ments based on these distributions) does not enable
constitutes a key generic piece of the overall algorithm selecting any universal thresholds for discriminating
proposed for detecting stationary objects. We assume between background and novelties.
herein that the video device is observing a fixed scene, Moreover, constructing the permanent (‘‘reference’’)
with the presence of objects moving/disappearing/ situation from quantitative measurements is not an easy
appearing. In the following discussion, the term ‘‘back- task: a simple averaging over time may yield values that
ground’’ will denote the fixed components of the scene, are affected by both the moving objects (especially in a
whereas the other components will be referred to as crowded scene) and contrast changes. Instead, it is
‘‘novelties’’. The purpose of a ‘‘change detector’’ is, for a preferable to use the qualitative information at each
given observation, to decide whether each pixel belongs pixel. For this purpose, we will establish a limited list of
to the background or to the novelties. The change characteristics at each pixel and measure their occur-
detector is to be distinguished from the motion detection rence over time, as an alternative to averaging their
device in the following manner: the change detector is values.
designed to detect new objects/events present in the The use of edge maps, such as the zero-crossings of
scene for less than a fixed period of time, whereas the the Laplacian or of a Canny-Deriche edge detector,
motion detector will only detect moving objects. While constitutes a major step towards withstanding contrast
all motion implies change, the converse statement is not changes [11,17]. Since ‘‘edges’’ (roughly) correspond to
always true. large intensity variations, the presence of an edge at a
As a consequence, methods based on motion detec- pixel is more stable than the gray-level value itself.
tion (see [10–13]) or on motion estimation [14] cannot be Unfortunately, as usually computed, the selection of
applied directly. The use of such methods to switch from such edges is in fact made on the basis of intensity (and
motion detection to change detection would at least derivative) criteria. This selection is contrast-dependent;
entail tracking over time those objects [15] that had therefore, either it is sensitive to contrast change or it
experienced motion in the past. This step clearly discards the low contrast zone (or both). This is not an
involves complex operations, in particular in crowded intrinsic drawback of Gradient or Laplacian but a
and partially occluded environments. problem due to the way they are calculated in the
It should also be pointed out that those methods computer vision community.
involving multiple camera set-ups (see e.g. [16]) are, for Indeed, two masks Mx and My are generally used to
the time being, not being given consideration by estimate the horizontal and vertical gradient compo-
network managers for cost reasons. In the future, nents. Quite often, they integrate a smoothing term to
however, they could provide an appropriate problem- filter noise (Prewitt, Sobel, Kirsch masks, for instance).
solving approach. It is the first drawback of the method since this filtering
In response to the change detection problem, the depends strongly on the contrast and alters edges
classical method would be to build a reference image location. Gradient G is, thus, computed by convolving
representing the background state [7]. When the the image with each of the masks:
illumination is constant or known, this reference can
! !
be built by averaging (not necessarily in a linear fashion) Gx ðpÞ IðpÞ#Mx
over time the gray-level values at each pixel GðpÞ ¼ ¼ :
Gy ðpÞ IðpÞ#My
[4,9,13,17,18]. The average time interval depends on a
duration threshold that distinguishes between new and
permanent events. Similarly, depending on the situation, The orientation is then:
gray level values of the image may be replaced by other  
measurements, such as spatial gradients, wavelet coeffi- Gy ðpÞ
yGðpÞ ¼ Arctg :
cients, edge maps, etc. The essential point herein is to Gx ðpÞ
use a measurement technique that yields constant values
over time in the background part for the given images. Let us consider, for instance, the following image,
Intensity-based measurements are now contrast-de-
pendent, a feature which makes this method sensitive to
light changes, especially when measurements are accu-
mulated over time (e.g. several minutes). Suggested
approaches (e.g. the use of logarithm differences
between two successive images [4], reference images
based on gradient amplitude [5]) have been experimen-
ted in order to overcome light changes, but so far have
ARTICLE IN PRESS
12 D. Aubert et al. / Real-Time Imaging 10 (2004) 9–22

and the following Prewitt masks: 3.1. Local and not contrast-based image measurements
2 3 2 3
1 1 1 1 0 1
6 7 6 7 In this subsection, we set out to define a simple image
Mx ¼ 4 0 0 0 5 and My ¼ 4 1 0 1 5: measurement that remains stable under contrast
1 1 1 1 0 1 change. Such a measurement allows us to discriminate
between changes due to object motion and those due to
Around the considered vertex, pixel values are illumination effects.
2 3
a b b
6 7
4 a c c 5: 3.1.1. Observations
a c c Let us begin by assuming a total of N observations
Ii of a background-only scene Sc (i.e. the scene without
Thus, the gradient is any novelty). These images are obtained by a succession
! ! of operations that transforms Sc into a discrete and
2b þ 2c 1 sampled set of data Ii : Among these operations:
GðpÞ ¼ ¼ 2ðc  bÞ 
3a þ b þ 2c 0 illumination conditions, lens smoothing, sampling,
! ! contrast adjustment, quantization. Due to this series of
0 0 operations, the images Ii differ from one another.
þ ðb  aÞ  þ 2ðc  aÞ  :
1 1 As seen in [19–21], we can approximate all these
operations by means of sampling followed by a global
The terms ðc  bÞ; ðb  aÞ and ðc  aÞ correspond to contrast change. This approximation procedure is of
the number of level lines between, respectively, c and b, b course very rough and is only acceptable for video
and a, c and a. Thus, the computed gradient orientation devices in which the smoothing due to the lens requires a
encompasses geometric information (the vectors) and small spatial support. Nowadays, as experimentally
contrast (the number of level lines). It is the second proven, this procedure may be considered adequate for
drawback of the classical method. We can illustrate this certain special applications.
result by changing the contrast (variation in a, b, c Let us denote the sampled version of Sc ; from the
values) in the example above. same sampling as the image, by Sd ; the previous

82 3
> 10 15 15 !
>
>6 110
> 7
2 3 >>
> 4 10 70 70 5 ) G ¼ ) Arctgð125=110Þ ¼ 0:85
a b b >
> 125
< 10 70 70
6 7
4a c c5 ¼ 2 3
>
> 5 50 50 !
a c c >
> 80
> 6
>
>
7
4 5 90 90 5 ) G ¼ ) Arctgð215=80Þ ¼ 1:22:
>
> 215
:
5 90 90

Even if the geometry is the same, gradient orientation approximation then reveals that all observations Ii can
estimated in a traditional way leads to two different be deduced from Sd through a global contrast change
results (a difference of 0.37 which roughly corresponds (except for new shadows and spotlighting). The term
to p/8) depending on the contrast. ‘‘global’’ connotes an intensity value change within
To conclude, we believe that it is first necessary to the entire image that preserves the relative levels of
separate contrast information from geometrical infor- illumination. In other words, for each i, a non-
mation. Due to the variation over time of the former, decreasing function gi (which models the contrast
the uncertainty is raised as to whether the information change) exists such that
available on the latter allows deriving a reliable change Ii ¼ gi ðSd Þ: ð1Þ
detector. This issue is addressed below in a section
split into two parts: we first describe the level lines Since all gi can differ, a gray-level comparison would
representation of the images that enables separating not produce any tangible result. Let us introduce at this
between contrast (the levels) and geometry (the lines). point the level-lines representation of the image.
From this representation, we then derive a simple
measurement based on geometry. Secondly, we describe 3.1.2. The level-lines representation [1–3,22]
how to combine these measurements in constructing a Let Iði; jÞ denote the intensity of image I at pixel
‘‘change-detector’’. location ði; jÞ: Xl is the level set of pixels with intensity
ARTICLE IN PRESS
D. Aubert et al. / Real-Time Imaging 10 (2004) 9–22 13

Fig. 1. (left) An image from a London underground sequence and (right) the level lines associated with the image (only 16 level lines displayed).

greater than or equal to l; i.e.: between images I1 and I2 and, as such, proves to be a
Xl I ¼ fP ¼ ði; jÞ; such that IðPÞXlg: ð2Þ useful indicator for a change detection algorithm.

We shall refer to the boundary of this set as the 3.1.4. Disturbance effect on level lines
‘‘l level line’’ (see Fig. 1 for an example of level lines). Various types of disturbances may affect the level
A level line is composed (in the case of discrete images) lines. It is necessary to identify them and to determine
of a finite number of Jordan curves. As a result of (2), the corresponding consequences on level lines in order
the level sets of an image are included in others, i.e. to adapt our measurement techniques.
if lXm; then Xl ICXm I: Level lines therefore do not The smoothing effect, due to the lens, has already
intersect. been studied [20]. The impact on the level lines is a
displacement of 2 pixels at the maximum with a
3.1.3. Level lines and contrast changes standard quality lens. Our experiments have shown that
What is the relationship between the scene Sd and its such an impact is indeed very low.
observation (image Ii )? Quantification affects the pixel gray level for 71
If I ¼ gðSd Þ with g being a contrast change, then: quantization level and thus influences the number of
fXl IglAf0;y;255g CfXl Sd glAIR : ð3Þ level lines passing through the pixel. This disturbance is
mainly visible in homogeneous areas where the level
This would suggest that all level sets of the observed lines fluctuate.
image are indeed level sets of the scene. The operation The noise within the image may have two conse-
that switches from the scene to the image can be quences. The holes or peaks in the intensity dimension
considered as the simple removal of some level sets and of the image corresponding to the noise will generate
a possible change in their levels. The converse inclusion small, isolated level sets. Noise may also locally disturb
in relation (3) might not hold true, wherever the the direction of the level lines.
contrast change exhibits a flat part, where flat indicates Appearing shadows generate additional level lines.
beyond the quantification level. In the worst case, While in many video surveillance applications shadows
gðiÞ ¼ 08i; which makes I completely black and there- may disturb the detection process, in the present
fore only one level set is remaining (that of the entire application they can serve to signal the presence of a
picture). key object. In some of our experiments for example, the
What does the intersection of level lines represent? system was able to detect a stationary person behind a
Relation (3) yields the following: pillar due solely to the shadow being cast. When a light
fXl I1 glAf0;y;255g CfXl Sd glAIR bulb burns out, another light suddenly gets turned on
and or when the illumination changes within the scene due
to atmospheric conditions, certain shadows and high-
fXl I2 glAf0;y;255g CfXl Sd glAIR :
lights may appear. In our outdoor applications (i.e. road
Since the level sets of Sd are included in others, a traffic measurement, Automatic Incident Detection,
contrast change in the scene can create or remove some estimation of vehicle queue length at road junctions),
of the level lines, yet cannot create a piece of level line in shadows and backlighting are generally discarded due to
one image that intersects a piece of level line in another. both their nature [23] and to the fact that such a result
Thus, an intersection of level lines is an indicator that appears suddenly at a given location without any
something other than a contrast change has occurred previous tracking [24].
ARTICLE IN PRESS
14 D. Aubert et al. / Real-Time Imaging 10 (2004) 9–22

Fig. 2. (left) The original image and (middle and right) display of all pixels on which a level line with the quantized orientation (every p/8) passes, as
indicated by the arrow inside the circle.

3.1.5. Local measurements based on level lines consider this reference therefore as a kind of union of all
Working with curves (level lines) is not very con- the level lines of the observed images. In the absence of
venient in practice, especially for real-time applications. any motion and noise, this union will become an
We propose herein to construct at each pixel an approximation of all the level lines of the background
‘‘abstract’’ of the geometry of neighboring level lines. part of the scene Sd :
Such a measurement must be local in order to avoid
mixing up the background part and the changing part. 3.2.1. Building and updating the reference data
Since any local measurement based on the level-lines The reference is to reflect the states of the measure-
geometry is admissible, we choose to compute the ments corresponding to the background. Let us first
direction of the half-tangent of each level-line passing introduce some of the notation:
through each pixel. Consequently, several orientations
may be presented for any given pixel; these orientations
* y0 ; :::; yZ1 : Z possible quantized directions (e.g.
are then quantized for easier storage and use (see Fig. 2 every p/8).
for an example of such a measurement) and for added
* ft ðP; yk Þ: a value in {0,1} indicating whenever the
resistance to direction change due to noise. P at the pixel P at time t.
direction yk exists
Naturally, we can consider the level-line tangent
* FtpT ðP; yk Þ= Tt¼1 ft ðP; yk Þ: the number of times
computation as a different way to estimate gradient the direction yk exists for P during period T (time
orientation. However, it is done in a more suitable way ‘‘history’’).
since the geometry of the level line is not affected by One very simple way to determine if a local
global contrast change. orientation of the level lines passing by each pixel is
permanent enough to be introduced into the reference
3.2. An algorithm based on reference data involves calculating its occurrence on a given temporal-
sliding window. The update consists of re-computing, at
Comparing two observations is more restrictive than every moment t, the number of occurrences of each
comparing an observation directly with the scene Sd : If direction associated with point P:
one observation (piece of level line) is present in one
image and not in the other, this may be due to a contrast X
t
Fipt ðP; yk Þ ¼ fi ðP; yk Þ ¼ Fipt1 ðP; yk Þ
change. In contrast, if we compare an observation i¼1
directly with the background of the scene Sd (which
þ ft ðP; yk Þ or m  Fipt1 ðP; yk Þ
contains all the level sets), a new piece of level line would
t
thus be created by means of motion or noise. To þ ð1  mÞ  ft ðP; yk Þ with m ¼ :
enhance change detection, we have decided to build a tþ1
reference data set that approximates the background of A direction with a large number (threshold T0 ) of
the scene, modulo the contrast. We would like to occurrences will thus be considered as belonging to the
ARTICLE IN PRESS
D. Aubert et al. / Real-Time Imaging 10 (2004) 9–22 15

Since Pobserved overestimates Pr (Pobserved is, by


construction, greater than Pr), we can consider that

Pobserved ¼ minðmin ¼ Pi;k


observed Þi¼f0yn1g Þk¼f0yZ1g :

Considering that we have Z possible directions per


pixel and Np pixels per image, the expectation of such
an event occurring becomes:

E ¼ ZNp pobserved ðs; TÞ:


Fig. 3. Example of a reference image: (left) an image extracted from
a 200-image sequence and (right) the reference image deduced from Such an expectation must be less than 1; thus, we
the reference data, by means of displaying in black those pixels with
choose To larger than Tomin ; where Tomin satisfies the
at least one orientation that exceeds the occurrence threshold (T0 ¼
50 images). following:

Z Np pobserved ðTomin ; TÞo1:

background R: (See Fig. 3 for an example of the Similarly, we cannot choose To to be too large,
reference data obtained.) because background elements disturbed by noise would
 not get incorporated into the reference. This yields a
 If F
 tpT ðP; yk Þ > T0 then RðP; yk Þ ¼ 1; similar upper bound for the occurrence threshold. To is
 to be chosen smaller than Tomax ¼ T  Tomin : We
 otherwise RðP; yk Þ ¼ 0:
conclude that To has to be chosen within the range
[Tomin, TTomin]. In order to be effective, this range
3.2.2. Minimal relevance occurrence threshold must be non-empty, a condition which is always satisfied
Threshold To has to be carefully set in order to avoid with a minimal number of images. A high value of To
considering certain moving objects as part of the produces a reduction in the events (presence of an
background. It is generally set on the basis of orientation) stored in the reference. As a consequence,
experience, depending on the behavior of the moving the comparison with a new image will increase the
objects (average speed, image rate, etc.). Nonetheless, a detection rate, while also increasing the number of false
natural minimal bound of To is provided as a result of alarms. Conversely, a low value will allow for orienta-
the following consideration: tions with small occurrences to entering into the
Let us suppose we are looking at a fixed background reference, causing therefore the detection rates to
scene (i.e. no moving objects) plus some random decrease as well as the number of false alarms. For
phenomena (such as noise), what then is the minimal applications, it is necessary to choose an empirical
threshold To such that no randomly created direction tradeoff between detection rate and false alarm rate.
will get considered as background? Let us call Pr the In quantitative terms, for an amount of T ¼
probability that a random phenomenon generates a 200ð256  256Þ typical images, and with Z ¼ 32; the
given direction at a given point. The probability that a minimal number of occurrences over time to be
random phenomenon creates this direction at least s considered as background is: Tomin ¼ 20; which is 10%
times over T images is then given by the following of the total number of images. With the same number of
expression: pixels and directions, and in setting Tomin =T ¼ 3%; we
would then require a minimal number of approximately
X
T
pðs; TÞ ¼ CTk Prk ð1  PrÞTk : 1000 images. This parameter T is often set by the
k¼s number of the available images provided from the
last few minutes, last few hours, last few days, etc. The
Unfortunately, Pr is not directly computable since best initial approach would be to choose this time frame
we have no a priori knowledge of the cause of the as wide as possible. In any case, it must be wide enough
directions observed (noise, moving object, background, to ensure that the occurrence threshold range is not
etc.). However, what can be directly measured is the empty.
probability (Pobserved ) that a direction appears on a pixel
through a sequence of n images from the visualized 3.2.3. Detection
scene: While for building and updating the reference data,
we consider all level lines to ensure obtaining as
occurrence of yk within the image i complete a reference as possible, to avoid the ‘‘quanti-
Pi;k
observed ¼ :
number of pixels in i fication’’ disturbance during the detection process, only
ARTICLE IN PRESS
16 D. Aubert et al. / Real-Time Imaging 10 (2004) 9–22

Fig. 4. (left) An image; (middle) the corresponding detection, without filtering, obtained with reference data built from 200 images and T0 ¼ 100 and
(right) the same detection for (white and black) areas larger than 10 pixels.

level lines with at least 2 levels (quantization error+1) Acquisition


Historical time: T
are taken into account. For an observed piece of such
Current Image
level line, two possibilities can arise:
Reference
1. The level line exists in the reference, meaning it is not
a detection. In fact, it is necessary to take into Updating process Occurrence Occurrence
account the immediate vicinity of the level lines when threshold threshold: T0
comparing them to the reference to tolerate a small For each pixel:
Extraction of the
displacement on the level lines location due to the local orientation of
‘‘smoothing effect’’. the level lines
Comparison with the reference
2. If it does not exist, then a detection is possible. If the
piece intersects a level line stored in the reference,
Detection image
then detection is certain. Unfortunately, a detection
based solely on the location of a level-line intersection
is often, in practice, ineffective. Whenever the back- Optional filtering process

ground is uniform in intensity up to a certain noise Fig. 5. Diagram of the change detector.
level, such detection is not possible: this is the case,
for instance, with roads or corridors. For the other
observations, there are two possible causes:
small isolated detection areas [22] that can be attributed
3. either the reference is sufficiently complete, meaning
with high probability to noise. We used an empirical
therefore it is a detection, or
setting of 10 pixels, for 256  256 images.
4. the reference is incomplete and uncertainty remains
as to whether it is a detection or an effect of contrast
3.2.5. Diagram of the algorithm
change.
See Fig. 5.
Given a direction y at pixel P, we check if this
direction occurs in the reference up to both its level of 3.2.6. Some good properties
precision and the quantization of the directions stored in In order to demonstrate the robustness of our
the reference (i.e. if RðP; yÞ ¼ 1 or not). If such is not the detection on outdoor scenes and to highlight the benefit
case, then the point is said to be detected (see Fig. 4), of using level-lines measurements, we have conducted
which means that either the point is not part of the the following set of experiments.
background or the point is part of the background The aim of the first experiment is to test the change
but the reference is incomplete. Let us also note that a detection capability of our system on outdoor scene. To
point may exhibit simultaneously several detections, ensure that the system is tested under different
corresponding to several directions that are not in the illumination conditions, we used two image sequences
reference. recorded 6 months apart. As shown in Fig. 6, our
method yields good results. Of course, in addition to the
3.2.4. Optional filtering process vehicles, we got shadows and changes in vegetation.
If necessary, noise can be removed from the detection The aim of the second experiment is to demonstrate
(see Fig. 4). Morphological filters or median filters can the capability of encompassing various states of the
be used for this purpose. If the localization of the scene all within the same reference. We built a reference
detection is to be preserved, we prefer discarding the data set from three different image sequences of the
ARTICLE IN PRESS
D. Aubert et al. / Real-Time Imaging 10 (2004) 9–22 17

Fig. 6. (top left) Sequence of images used to build the references; (top middle) sequence of images used to test the detection (shot 6 months before the
top left sequence) and (top right) detection obtained using our method.

Fig. 7. A single reference data set has been constructed by combining the three sequences from the same intersection. The left and middle sequences
were recorded on the same windy day but at different times. The right sequence was taken about 6 months later on a sunny day. In the bottom row,
we display (in black) those pixels with level-line directions not contained in the reference data.

same intersection. Two sequences were recorded during allows the algorithm to work even under low contrast
a windy day at two different times, while the third one once the moving object can be distinguished from the
was recorded about 6 months later on a sunny day. As background by a single gray-quantification level.
seen in Fig. 7, all of the vehicles and pedestrians * Its extraction involves no threshold or parameters.
indicated by arrows have been well detected. The other * Its response to a global change of contrast can be
detections are due to leaves moving on trees. Thus, the described, whereas such tends not to be possible with
reference data automatically takes into account the other ‘‘edge map’’ representations (e.g. zero-crossings
various lighting configurations, which confers great of the Laplacian).
stability on the detection with respect to illumination
changes. Our algorithm is used herein to control the The local configuration of the level lines around each
traffic light at this intersection. point provides a qualitative description (which again is
obtained without introducing any parameters). This
3.2.7. Conclusion feature enables ‘‘counting’’ the occurrences of each
Let us conclude this section by summarizing the main configuration over time in order to create an occurrence-
features of the change detector algorithm. based reference as opposed to an average value-based
The level-lines representation can be compared with reference. The pertinence of a configuration is measured
an ‘‘edge map’’ representation, yet with the following herein by its occurrence in time rather than by its
distinctions: ability to surpass a spatial or energy amplitude scale
threshold (as is the case with a classical edge map
* The level-lines representation is a complete represen- extraction).
tation (in that it serves to reconstruct the image, up In sum, the entire algorithm displays in all just two
to a given contrast change). This particular feature parameters that we believe are naturally ‘‘built in’’ into
ARTICLE IN PRESS
18 D. Aubert et al. / Real-Time Imaging 10 (2004) 9–22

a change detector: the historical time T and an Let us now present the algorithm in greater detail:
occurrence threshold T0 : At the end of the process, it is
possible to eliminate small areas detected so as to 4.1. Detection of stationary areas
remove noise; this step then adds another optional
parameter: the area threshold. 4.1.1. Short-term change detection (‘‘moving parts’’)
By applying the change detector with a short-term
reference Rshort (computed by taking a sliding temporal
4. An abnormal stationary detection system window of T ¼ 0:5 s and T0 ¼ 0:1 s), areas in the image
containing moving objects (representing either a person
The extracted level lines could be classified onto one or an object present at the same place in the scene for
of the following categories: those that belong to the less than 0.5 s) are detected.
scene background, those that corresponds to moving The result is a binary image representing the short-
objects, or to stationary objects. term changes (Mt):
We propose using the ‘‘presence duration’’ as a mean
for discriminating between these three categories. The If ( yk =ft ðP; yk Þ ¼ 1 and Rshort ðP; yk Þ ¼ 0
background is naturally assumed to remain unchanged then Mt ðPÞ ¼ 1;
for a very long period of time. Conversely, the moving otherwise Mt ðPÞ ¼ 0:
objects (a moving crowd) will not yield stable config-
urations even over short periods of time. In between According to the previous discussion about the
these two extremes, stationarity is characterized by occurrence threshold, the number of images considered
objects that remain at approximately the same place to create the reference is not high enough to ensure that
over an intermediate period of time. This set-up then noise will not disturb the observation.
involves the use of a change algorithm with two time
periods: a short one to detect moving objects, and an 4.1.2. Long-term change detection (‘‘novelties’’)
intermediate one to detect moving objects and stationa- Once again, we could have used the change detector
rities. From this information, we can construct an as is, with a higher T and T0 in order to detect the
image containing ‘‘stationarity areas’’. Summing the novelties corresponding to a person or an object present
occurrences of the stationary state at each pixel over at the same place in the scene for less than 5 min. But
time enables estimating the ‘‘stop duration’’. It should instead, we elected to use the previously computed
be obvious that a stop measurement is possible short-term detection in the following way: the long-term
regardless of the changes in the person/object position reference is only updated at pixels where no motion
inside this entire area or at least part of it. Above a given occurs. At peak hours, the scene is admittedly very
threshold over this duration, an alarm signal is sent to crowded (over 90% of the time). By updating references
an operator. at only the non-moving pixels, we strengthen the quality
Our system may be summarized by the following of the reference data.
diagram (Fig. 8). Under slight changes, we apply the change detector
with a long-term reference Rlong (computed by taking a
sliding temporal window of T ¼ 10 min and T0 ¼ 5 min)
Images to detect novelties. The result is the binary long-term
change image (Nt ):
Short-term change Long-term change
detection
If ( yk =ft ðP; yk Þ ¼ 1 and Rlong ðP; yk Þ ¼ 0 then Nt ðPÞ ¼ 1;
detection
otherwise Nt ðPÞ ¼ 0:
The lower bound for T and T0 has been influenced
Stationary areas by the objective of sending an alarm within 3 min of
detection. A stationary object thus has to be detected
(point 1 in the St map) for at least 5 min (2 min to detect
Stationary duration and 3 min to send the alarm).
updating

4.1.3. Detection of the stationary areas


Stationarity Initialization database
analysis (alarm duration threshold
By removing the motion areas from the novelties,
= 2 min.) only the people and objects remaining stationary for at
least 0.5 s are left. This step is performed by simply
Alarm
subtracting Mt from Nt :
Fig. 8. Diagram of the ‘‘abnormal stationarity’’ detection system
devised herein. St ðPÞ ¼ Nt ðPÞ  Mt ðPÞ:
ARTICLE IN PRESS
D. Aubert et al. / Real-Time Imaging 10 (2004) 9–22 19

4.2. Measurement of the stop duration affect different parts of the body over time:
toccult ðP; tÞ ¼ ð1  bÞ  toccult ðP; t  1Þ þ ðbÞ  Mt ðPÞ:
This functional attribute integrates stationarities over
time in order to both compute the stop duration and
locate stationary areas. 4.2.2. Correction of the duration estimation
By updating the stop duration only when the target is
4.2.1. Estimation of the stationary duration visible, we are able to stabilize the detection and hold the
The estimation of the stationary duration and its target even when masked by a crowd. In doing so,
update are carried out by an exponential average, in unfortunately, the estimated stop duration value is lower
accordance with the following formula: than the real one in the presence of occlusion. It then

index of durationðPÞt ¼ ð1  aÞ  index of durationðPÞt1 þa if St ðPÞ ¼ 1;


index of durationðPÞt ¼ ð1  aÞ  index of durationðPÞt1 otherwise:

For a pixel P motionless for the last t images, it can becomes necessary to correct the stop duration esti-
thus be easily verified that mated in order to avoid introducing detection delays. In
most instances, this bias can be reduced and even
index of durationðPÞt ¼ 1  ð1  aÞt
removed: it is simply necessary to increase the evolution
0pindex of durationp1; speed of the ‘‘duration’’ a according to the number of
images for which the computation had not been
where t is the number of previously observed images and
conducted (toccult ). Thus, a is no longer a constant and
a the value controlling the evolution speed of the
is to be evaluated for each new image and for each pixel
‘‘index of duration’’ (an index whereby ‘‘0’’ means
P by means of the following formulation:
‘‘nothing present’’ and ‘‘1’’ means ‘‘always present’’). 0
a is determined such that ‘‘index of duration’’ is equal aðPÞt ¼ 1  ð1  Td Þ1=t
to a threshold Td when the target maximal stop duration and
of stationarity (2 min) has been reached. For instance,
t0 ¼ t  minðtoccult ðP; tÞ; 50% tÞ;
a ¼ 1  ð1  0:7Þ1=ð10260Þ ¼ 0:001003; when consider-
ing a target stop duration of 2 min, a Td of 70% and a where t0 pt corresponds to the new number of times an
10 image/s computational rate. Td is chosen such that area must be observed motionless before triggering the
any detection is stable (the ‘‘index of duration’’ may alarm. However, if we merely note that t0 ¼ t  toccult
oscillate between Td and 1.0 without any loss of when toccult is close to t, the decision to trigger the alarm
detection) and such that a detection disappears as will be taken even though the area is seen motionless
quickly as possible when a stationary person/object very infrequently, a situation which may lead to false
leaves the scene (the detection ends when the ‘‘in- alarms. To avoid this situation from arising, we have
dex of duration’’ drops below Td again). arbitrarily stipulated that a stationary area must be
By computing the stop duration in this manner, we observed for at least 50% t before taking any action.
may lose a real detection each time a target area is The result is ‘‘zero’’ delay (or actually 710 s from our
occluded by a moving crowd. As a consequence, vantage point) when the occlusion rate is less than 50%,
oscillations (detection/no detection) in the detection but the delay rises for higher occlusion rates.
may be obtained. To overcome this problem, we
propose freezing all measurements in the target area as 4.3. Detection of abnormal stationarities
long as the crowd is causing occlusion. In this way, we
are able to preserve the estimated duration until the area The last functional attribute serves to identify both
becomes visible again, at which time the duration the potentially abnormal stationarities, based on the
increases or decreases depending on the presence or stop duration previously measured, and the localization
absence of stationarity at the considered location. This of stop areas. An alarm is then triggered in the event of
procedure adds robustness to the system and stability abnormality.
to the measurements, even when subjected to frequent Each pixel whose stationarity duration exceeds 2 min
occlusions. is labeled as abnormal. Currently, the detection of just
In the system, the occlusion is characterized by the one pixel is sufficient to trigger the alarm (in practice, the
motion map (Mt ). In general, occlusions are caused by a accumulation over time yields very reliable information
person or a crowd passing in front of the stationary at each single pixel). The alarm consists of a sound to
object. This definition includes not only occlusions but alert the operator as well as on-screen information to
also the movements of stationary people, which tend to localize the abnormal stationarity on the monitor. This
ARTICLE IN PRESS
20 D. Aubert et al. / Real-Time Imaging 10 (2004) 9–22

visual information is conveyed by a red rectangle Table 1


surrounding the area of stationarity considered to be Performance of the abnormal stationarity detection system
abnormal. Number of Number Non- Erroneous Detection
stationary of detections detections delay (when
situations detections occlusion
o50%)
5. Assessment and results
436 427 (98%) 9 (2%) 0 710 s
Our system has been tested on several real-life
situations. It is running at more than 10 image/s rate
on a Pentium 166 MHz-based PC equipped with a
numerization board. During the PRISMATICA project knowledge have not been handled by competing
our application (plus three others: intrusion detection, systems.
occupancy rate estimation and queue length measure- The non-detections obtained can be explained not
ment) was integrated in a small industrial PC (a Celeron only by a low contrast, but also in many instances by a
500 MHz) that can simultaneously deal with up to four short stationarity duration (around 212 min) and a very
video sources at a rate of 5 Hz. high occlusion rate (>90%). In all cases, except for
Several such devices may be linked, via an Ethernet one, the system detected the stop, but the measured
network, to a supervising system generally located in the stationarity duration did not reach the threshold above
control room, which supports the centralized HMI of which the alarm is triggered.
the system, sends the configuration data and commands. The remaining problem then is the delay in detection
In order to generate a wide variety of situations for occlusion rates above 50%. When this situation
(station design, crowd density, location of stationary arises, a delay in detection can occur and, in the worst
people/objects, positioning and orientation of the case we encountered, may reach approx. 6 min. (This
camera), we chose to videotape scenes at the Paris case occurred for a person in the back of the scene
Metro (RATP) station ‘‘Havre-Caumartin’’. In fact, this completely occluded more than 90% of the time.) In
subway station includes a large number of different such a situation however, even for an operator to see the
corridor configurations (straight and curve, long and person proves to be difficult.
short, wide and narrow, light and dark background) and
different crowd patterns, due to its proximity to large
department stores. In order to increase the variability of 6. Conclusion
the processed scenes even further and demonstrate the
transposability of the system to other networks, The purpose of the automated abnormal stationarity
videotapes supplied by other operators were also used. detection system presented herein is to improve the
This approach resulted in more than 224 h of videotape quality of service and crowd management in public
footage and over 400 real-life stationarity situations, transit networks. The levels of performance obtained,
some of which been quite difficult to deal with. even for very complex scenes (very dense crowds,
In order to assess our system, we began by manually movement of stationary people, low contrast, or subjects
recording (time code of the events, approximate location positioned in distant locations) surpass the requirements
of the stationarity in the image, and duration of the imposed upon operators.
stationarity) all stationarity events of greater than 2 min. To obtain such performance, we developed the
Results obtained by the system were then compared specifications of a new ‘‘change’’ detector based on the
with the recorded data. The differences observed give: representation of images by their level lines. This
the non-detection rate (real-life situations not detected approach has enabled: producing stable results under
by the system), the number of erroneous detections illumination changes, detecting objects even under low
(detected areas without any real-life justification), and contrast, and generating a most effective reference data
the detection delay. Lastly, the number of stationarities set. This algorithm remains quite generic and has been
detected versus the total number of real-life stationa- successfully introduced into some of our other applica-
rities yields the detection rate of the system. tions, such as traffic monitoring at intersections [24], the
Table 1 summarizes the results obtained from this detection of counter-flow people motion [25] and queue
comparison. length measurements. A number of these systems are
As observed in these results, the system is able to now available on the market.
handle the problems affecting this type of application. Given that operators seem to be showing increasing
The results obtained have fulfilled the requirements interest in the system, it is only natural to assume that
(>90% of detection and less than 5 false alarms per one day it will be installed in some networks. Such an
day) and demonstrated the ability to cope with some installation, however, will require a reorganization of
very complex situations (see Fig. 9), which to our the networks’ operating mode in order to incorporate
ARTICLE IN PRESS
D. Aubert et al. / Real-Time Imaging 10 (2004) 9–22 21

Fig. 9. Examples of abnormal stationary object detection (black squares and dots). (a) A stationary person detected in the back of the scene within a
low-contrast and moderately crowded environment; the detection delay is +1 min. (b) The visible part of a seated person detected within a
moderately crowded environment; the detection delay is +10 s. (c) Two people detected in a very crowded concourse, hence the occlusion rate is very
high for this sequence. (d) The same people after several movements and a displacement without loss of detection at any time; the detection delay is
+30 s. (e) An unattended bag detected in a very crowded corridor; the detection delay is +20 s. (f) A seated person detected in the back of the scene
in a normally crowded hall; zero detection delay.

the alarm mechanism; among other features, a fast, part of the 4th and 5th PCRD framework programs.
reliable and effective means of intervention must be The authors are especially grateful to Ms. B. George and
implemented. S.-S. Ieng for his valuable assistance during the
validation phase of the system.

Acknowledgements
References
The work presented herein has been undertaken
within the scope of the CROMATICA and PRISMA- [1] Guichard F, Morel JM. Partial differential equations and image
TICA projects. It has been supported by an EC grant as iterative filtering. Tutorial ICIP 95, Washington DC, 1995.
ARTICLE IN PRESS
22 D. Aubert et al. / Real-Time Imaging 10 (2004) 9–22

[2] Maragos P. A representation theory form orphological image and [14] Irani M, Anandan P. A unified approach to moving object
signal processing. IEEE PAMI 1989;11(6). detection in 2D and 3D scenes. IEEE Transactions on Pattern
[3] Serra J. Image analysis and mathematical morphology. New Analysis and Machine Intelligence 1998;20(6).
York: Academic Press; 1982. [15] Hager GD, Belhumeur PN. Efficient region tracking with
[4] Oscarsson E. TV-camera detecting pedestrians for traffic light parametric models of geometry and illumination. IEEE Transac-
control. Proceedings of IMEKO, 1982. p. 275–82. tions on Pattern Analysis and Machine Intelligence 1998;20(10).
[5] Reading IAD, Wan CL, Dickinson KW. Developments in [16] Kanade T, Collins RT, Lipton AJ, Burt P, Wixson L. Advances
pedestrian detection. Traffic Engineering and Control 1995; in cooperative multi-sensor video surveillance. DARPA98, 1998.
538–42. p. 3–24.
[6] Reading IAD, Wan CL, Dickinson KW. Detection of pedestrians [17] Hoose N. Computer image processing in traffic engineering.
at puffin crossings using computer vision. Road Traffic Monitor- Taunton: Research Studies Press; 1991.
ing and Control 1996;23–5. [18] Siyal MY, Fathy M, Darkin CG. Image processing algorithms for
[7] Gibbins D, Newsam GN, Brooks MJ. Detecting suspicious detecting moving objects. ICARCV’94 1994;3:1719–23.
background changes in video surveillance of busy scenes. IEEE [19] Bouchafa S. Contrast-invariant motion detection: application to
Workshop on Applications of Comp. Vision, 1996. abnormal crowd behavior detection in subway corridors. Ph.D.
[8] Davies AC, Yin JH, Velastin SA. Crowd monitoring using image dissertation, Univ. Paris VI-INRETS, 1998.
processing. Electronics & Communication Engineering Journal [20] Caselles V, Coll B, Morel JM. Topographic maps and contrast
1995;22–26. changes in natural images. IJCV 1999;33(1):5–27.
[9] Velastin SA, Yin JH, Davies AC, Vicencio-Silva MA, Allsop RE, [21] Guichard F, Bouchafa S, Aubert D, Change detector based on
Penn A. Automated measurement of crowd density and motion level sets. ISMM2000, Palo Alto, 26–29 June 2000.
using image processing. Proceedings of the Seventh IEE Interna- [22] Monasse P, Guichard F. Fast computation of a contrast-invariant
tional Conference on Road Traffic Monitoring and Control, image representation. IEEE Transactions on Image Processing
London, UK, 26–28 April 1994. p. 127–32. 2000;9(5):860–72.
[10] Black MJ, Anandan P. A model for the detection of motion over [23] Wixson L. Illumination assessment for vision-based traffic
time. ICCV90, 1990. p. 33–7. monitoring. Proceedings of the International Conference on
[11] Fathy M, Siyal MY. A combined edge detection and back Pattern Recognition, Track 3: Applications and Robotic Systems,
ground differencing image processing approach for real-time August 1996.
traffic analysis. Road and Transport Research 1995;4(3): [24] Aubert D, Boillot F. Automatic measurements of traffic flow by
1025–39. image processing: application to the traffic regulation in towns,
[12] Jain R. Dynamic scene analysis. Pattern recognition 2 1985;1: RTS (Research Transport Safety), No. 62, January–March 1999.
125–67. [25] Bouchafa S, Aubert D, Bouzar S. Crowd motion estimation and
[13] Tekalp AM. Digital video processing. Englewood Cliffs, NJ: motionless detection in subway corridors by image processing.
Prentice-Hall Inc.; 1995. IEEE Conference on Intelligent Transport Systems, 1997.

You might also like