Professional Documents
Culture Documents
CAMERA
by
Aneesh Sharma
Rahul Singhal
May 2010
CERTIFICATE
It is certified that the work contained in the B.Tech. Project entitled “Background
modeling using Pan-Tilt camera” by Aneesh Sharma (Y06UC012) and Rahul
Singhal (Y06UC089) has been carried out under my supervision and that this work
has not been submitted elsewhere for a degree.
The use of autonomous pan-tilt cameras as opposed to static cameras can dramat-
ically enhance the range and effectiveness of surveillance systems, but effective
tracking in such pan-tilt scenarios remains a challenge. Existing approaches for con-
structing background models fails here since they are designed for static cameras.
In this paper we estimate camera motion parameters, and use it to update the back-
ground model online in the presence of scene activity. Camera motion is estimated
as the median learned over Flow matrix between consecutive frames. Foreground
regions are detected as changes using single Gaussian model, and thus are skipped
during background model construction.
Dedicated to our parents
5
Acknowledgments
We would like to thank all our colleagues for keeping us sane, to our parents for
supporting us through all of this and to Mr. Prithwijit Guha for their support and
their valuable suggestions, without which we were not been able to carry out this
work.
Contents
1 Introduction 1
2 Foreground Extraction 4
5 Conclusion 22
6
List of Figures
7
Chapter 1
Introduction
Active video surveillance involves using several static cameras so as to have a bet-
ter perception of the situation [3]. Such kind of system is quite expensive and not
feasible in some scenarios. The use of active Pan-Tilt (PT) cameras in such scenario,
reduces the actual number of cameras required for monitoring a certain environ-
ment. During operation, each PT camera provides a high virtual resolution over a
large area, which can potentially track activities over a large area and capture high-
resolution imagery around the tracked objects. Pan-tilt motion of the camera not
only increases the viewpoint as compared to the static-camera but it also improves
the coverage and performance. In addition to surveillance scenarios, such systems
are also of relevance for visual systems for car driving and mobile robot navigation,
where an active vision system can quickly construct a scene background model and
interpret agents and activity in the scene [4].
A network of such active cameras could be used for modeling of large scenes
and reconstruction of events and activities within a large area. Pan-tilt cameras as
opposed to static cameras enhances the range and effectiveness of surveillance sys-
tems [2], but at the same time background modeling becomes an issue. All con-
ventional background models for static cameras do not use the knowledge of the
1
CHAPTER 1. INTRODUCTION 2
inter-frame motion, and thus it fails to segment the foreground effectively in case of
relative motion between camera and objects.
Figure 1.1: Stationary camera Vs Pan tilt camera. A number of stationary cameras are
required to monitor a wide area, while a single pan-tilt camera is sufficient for the task.
Note the lines showing the Field of view
Foreground Extraction
4
CHAPTER 2. FOREGROUND EXTRACTION 5
Optical flow or optic flow is the pattern of apparent motion of objects, surfaces,
and edges in a visual scene caused by the relative motion between an observer (an
eye or a camera) and the scene. Optical flow techniques such as motion detection,
object segmentation, time-to-collision and focus of expansion calculations, motion
compensated encoding, and stereo disparity measurement utilize this motion of the
objects surfaces, and edges.
For a 2D+t dimensional case (3D or n-D cases are similar) a pixel at location
7
CHAPTER 3. INTER-FRAME MOTION ESTIMATION 8
(x,y,t) with intensity I(x,y,t) will have moved by δx, δy and δt between the two image
frames, and the following image constraint equation can be given:
Assuming the movement to be small, the image constraint at I(x,y,t) with Taylor
series can be developed to get:
∂I ∂I ∂I
I(x + δx, y + δy, t + δt) = I(x, y, t) + δx + δy + δt + . . .
∂x ∂y ∂t
+Higher Order T erms . . . (3.2)
∂I δx ∂I δy ∂I
=⇒ + + =0 (3.4)
∂x δt ∂y δt ∂t
which results in
∂I ∂I ∂I
vx + vy + =0 (3.5)
∂x ∂y ∂t
where vx ,vy are the x and y components of the velocity or optical flow of I(x,y,t)
∂I ∂I ∂I
and ,
∂x ∂y
and ∂t
are the derivatives of the image at (x,y,t) in the corresponding
directions. Ix , Iy and It can be written for the derivatives in the following. Thus,
The Horn-Schunck method of estimating optical flow is a global method which in-
troduces a global constraint of smoothness to solve the aperture problem.
It assumes smoothness in the flow over the whole image. Thus, it tries to
minimize distortions in flow and prefers solutions which show more smoothness.
The flow is formulated as a global energy functional which is then sought to be
minimized. This function is given for two-dimensional image streams as:
Z Z
E= (Ix u + Iy v + It )2 + α2 (|∇u|2 + |∇v|2 ) (3.7)
where Ix , Iy and It are the derivatives of the image intensity values along the x, y
and time dimensions respectively, V~ = [u, v]T is the optical flow vector, and the
parameter α is a regularization constant. Larger values of α lead to a smoother
flow. This functional can be minimized by solving the associated Euler-Lagrange
equations. These are
µ ¶ µ ¶
∂L ∂ ∂L ∂ ∂L
− − =0 (3.8)
∂u ∂x ∂ux ∂y ∂uy
µ ¶ µ ¶
∂L ∂ ∂L ∂ ∂L
− − =0 (3.9)
∂v ∂x ∂vx ∂y ∂vy
Ix (Ix u + Iy v + It ) − α2 ∆u = 0 (3.10)
Iy (Ix u + Iy v + It ) − α2 ∆v = 0 (3.11)
∂2 ∂2
where subscripts again denote partial differentiation and ∆ = ∂x2
+ ∂y 2
where the superscript k+1 denotes the next iteration, which is to be calcu-
lated and k is the last calculated result. This is in essence the Jacobi method applied
to the large, sparse system arising when solving for all pixels simultaneously.
Figure 3.1: Horn Schunck output showing flow vectors in x and y direction.
duces an additional term to the optical flow by assuming the flow to be constant in a
local neighborhood around the central pixel under consideration at any given time.
The Lucas-Kanade method is still one of the most popular versions of two-
frame differential methods for motion estimation (which is also called optical flow
estimation). The solution assumes a locally constant flow. The method is based
upon the Optical Flow equation. The additional constraint needed for the estimation
of the flow field is introduced in this method by assuming that the flow (vx , vy ) is
constant in a small window of size m × m.. with m > 1, which is centered at Pixel x,y
and numbering the pixels within as 1 . . . n, n = m2 , a set of equations can be found
as
CHAPTER 3. INTER-FRAME MOTION ESTIMATION 12
With this there are more than two equations for the two unknowns and thus
the system is over-determined. Hence:
Figure 3.2: Lucas Kanade output showing flow vectors in x and y direction.
As generally more equations are available for flow estimation than needed (over
determined system) the Lucas-Kanade algorithm can be used in combination with
statistical methods to improve the performance in presence of outliers as in noisy
images. A statistical analysis marks the outliers and the flow is then estimated based
on the remaining equations or weighted accordingly.
Comparisons
• Here in LK, whole image is broken down into different layers (pyramids).
Figure 3.3: Lucas Kanade Pyramidal output showing flow vectors in x and y direction.
Chapter 4
This chapter deals with the efficient background model construction in case of pan-
tilt cameras. As there is relative motion between the camera and objects, conven-
tional methods of background modeling fails over here. The proposed algorithm
combines the concepts from conventional background model and the inter-frame
motion. The proposed algorithm can be classified in following stages.
Flow estimation gives us complete view of the relative shift between camera and ob-
ject. Thus help us in construction of background model. Flow value between the two
consecutive frames gives the idea of overlapping region and non-overlapping re-
gion.There are various existing methods for calculating flow between frames (chap-
15
CHAPTER 4. MOSAICED BACKGROUND MODEL 16
Figure 4.2: Flow density with x and y axis showing flows in different directions.
ter 3). Among these methods, Lucas Kanade pyramidal approach is the effective
and robust way to compute flow between frames. As it is usually carried out in
CHAPTER 4. MOSAICED BACKGROUND MODEL 17
a coarse-to-fine iterative manner, in such a way that the spatial derivatives are first
computed at a coarse scale in scale-space (or a pyramid), one of the images is warped
by the computed deformation, and iterative updates are then computed at succes-
sively finer scales.One of the characteristics of the Lucas Kanade algorithm, and that
of other local optical flow algorithms, is that it does not yield a very high density
of flow vectors, i.e. the flow information fades out quickly across motion bound-
aries and the inner parts of large homogenous areas show little or no motion. Its
advantage is the comparative robustness in presence of noise.
segmentation minimizing False Negatives and False Positives. The tracking process
makes the correspondence of the segmented objects with the objects being tracked
from previous frames. Depending on the technique, the tracking can be clearly sep-
arated from the segmentation (when previous foreground information is not used
for the segmentation) or can be implicit in the foreground segmentation (when we
are using a priori information of the object).
We modeled the background by analyzing each pixel (i,j) of the image. The back-
ground model consists of mean and variance corresponding to each pixel value. In
Figure, it is shown an image with the system idea, where each pixel appear modeled
with a Gaussian distribution.
Mean and variance for each frame (µt , σ 2 ) are updated as follows:
d = (It − µt )2 (4.3)
where It is the value of pixel under analysis in the current frame, µt and σt2 are the
mean and variance of the Gaussian distribution respectively, α is the rate of update
which we have chosen as 0.03.
This is the final step of processing, which led to the construction of mosaiced im-
age of whole background. At this stage first of all, a background model of twice
the height and thrice the width of regular frame is constructed, assuming that the
maximum shift along x and y direction would be always less than the width and
half of the height respectively. The first frame in the data sets are taken as a ref-
erence and learnt as it is. The next consecutive frames, each are classified as over-
lapping and non-overlapping regions. The non-overlapping regions or new regions
are learned as it is in the background model while overlapping regions are passed
thorough foreground extraction process and are learned in background model with
foreground objects skipped out. Background Model is updated dynamically at each
pixel position using its mean and variance. Finally, the output is mosaiced image
consisting of background objects with left out foreground parts.
CHAPTER 4. MOSAICED BACKGROUND MODEL 20
Figure 4.5: Mosaic output of test case. Black represents uninitialized region
Chapter 5
Conclusion
The pan-tilt camera samples sub-regions of a larger scene whose background model
mosaic is obtained by stitching the pixel-wise intensity distributions of the back-
ground regions. We estimate flow as the median value of the flow vector computed
using LK pyramidal approach. The flow value gives an estimate shift between ob-
ject and camera. The flow values in x and y directions are combined with the single
22
CHAPTER 5. CONCLUSION 23
[1] Brian G. Schunck Berthold K.P. Horn. Determining optical flow. In Artificial
Intelligence, volume 17, 1981.
[2] Arindam Biswas, Prithwijit Guha, Amitabha Mukerjee, and K.S. Venkatesh. In-
trusion detection and tracking with pan-tilt cameras. In Proceedings of the Third
International Conference on Visual Information Engineering, 2006.
[3] A. F. Bobick. Movement, activity, and action: The role of knowledge in the
perception of motion. Philosophical Transactions Royal Society London B, 1997.
[5] D.J. Fleet J.L. Barron and S. Beauchemin. Performance of optical flow techniques.
In International Joint Conferences on Artificial Intelligence, volume 12, pages 43–77,
1994.
[6] B.D. Lucas and T. Kanade. An iterative image registration technique with an ap-
plication to stereo vision. In International Joint Conferences on Artificial Intelligence,
pages 674–679, 1981.
24
BIBLIOGRAPHY 25
[7] C. Stauffer and W.E.L. Grimson. Adaptive background mixture models for real-
time tracking. In IEEE Computer Society Conference on Computer Vision and Pattern
Recognition, volume 2, page 252, June 1999.
[8] Z. Zivkovic. Improved adaptive gaussian mixture model for background sub-
traction. In Proceedings of the 17th International Conference on Pattern Recognition,
volume 2, pages 28–31, 2004.