Bachelor Thesis - Aneesh Sharma

B ACKGROUND MODELING USING PAN -T ILT
CAMERA
by
Aneesh Sharma
Rahul Singhal
COMMUNICATION AND COMPUTER Engineering
LNM INSTITUTE OF INFORMATION TECHNOLOGY, JAIPUR
May 2010
CERTIFICATE
It is certified that the work contained in the B.Tech. Project entitled “Background
modeling using Pan-Tilt camera” by Aneesh Sharma (Y06UC012) and Rahul
Singhal (Y06UC089) has been carried out under my supervision and that this work
has not been submitted elsewhere for a degree.
May, 2010 Prithwijit Guha

Visiting Faculty
Communication & Computer Engg.
LNM Institute of Information Technology,
Jaipur, Rajasthan
Abstract
The use of autonomous pan-tilt cameras as opposed to static cameras can dramat-
ically enhance the range and effectiveness of surveillance systems, but effective
tracking in such pan-tilt scenarios remains a challenge. Existing approaches for con-
structing background models fails here since they are designed for static cameras.
In this paper we estimate camera motion parameters, and use it to update the back-
ground model online in the presence of scene activity. Camera motion is estimated
as the median learned over Flow matrix between consecutive frames. Foreground
regions are detected as changes using single Gaussian model, and thus are skipped
during background model construction.
Dedicated to our parents
5
Acknowledgments
We would like to thank all our colleagues for keeping us sane, to our parents for
supporting us through all of this and to Mr. Prithwijit Guha for their support and
their valuable suggestions, without which we were not been able to carry out this
work.
Contents
1 Introduction 1
2 Foreground Extraction 4
3 Inter-Frame Motion Estimation 7
3.1 Estimation of optical flow . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Algorithms for calculating optical flow . . . . . . . . . . . . . . . . . . 9
3.2.1 Horn Schunck . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2.2 Lucas Kanade . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2.3 Pyramidal Lucas Kanade . . . . . . . . . . . . . . . . . . . . . 13
4 Mosaiced Background Model 15
4.1 Flow estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2 Foreground Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2.1 Single Gaussian Model . . . . . . . . . . . . . . . . . . . . . . . 18
4.3 Background Stitching . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5 Conclusion 22
6
List of Figures
1.1 Stationary camera vs. Pan tilt camera . . . . . . . . . . . . . . . . . . . 3
1.2 Flowchart of the PT camera based background modeling . . . . . . . 3
2.1 Foreground detection output . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Foreground detection output . . . . . . . . . . . . . . . . . . . . . . . 6
3.1 Horn Shunk Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Lucas Kanade Output . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3 Lucas Kanade Pyramidal Output . . . . . . . . . . . . . . . . . . . . . 14
4.1 Block Diagram of Proposed Algorithm . . . . . . . . . . . . . . . . . . 16
4.2 Flow density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.3 LK Pyramidal Flow calculated on test case . . . . . . . . . . . . . . . 20
4.4 Single Gaussian output on test case . . . . . . . . . . . . . . . . . . . . 20
4.5 Final Mosaiced image of test case . . . . . . . . . . . . . . . . . . . . . 21
7
Chapter 1
Introduction
Active video surveillance involves using several static cameras so as to have a bet-
ter perception of the situation [3]. Such kind of system is quite expensive and not
feasible in some scenarios. The use of active Pan-Tilt (PT) cameras in such scenario,
reduces the actual number of cameras required for monitoring a certain environ-
ment. During operation, each PT camera provides a high virtual resolution over a
large area, which can potentially track activities over a large area and capture high-
resolution imagery around the tracked objects. Pan-tilt motion of the camera not
only increases the viewpoint as compared to the static-camera but it also improves
the coverage and performance. In addition to surveillance scenarios, such systems
are also of relevance for visual systems for car driving and mobile robot navigation,
where an active vision system can quickly construct a scene background model and
interpret agents and activity in the scene [4].
A network of such active cameras could be used for modeling of large scenes
and reconstruction of events and activities within a large area. Pan-tilt cameras as
opposed to static cameras enhances the range and effectiveness of surveillance sys-
tems [2], but at the same time background modeling becomes an issue. All con-
ventional background models for static cameras do not use the knowledge of the
1
CHAPTER 1. INTRODUCTION 2
inter-frame motion, and thus it fails to segment the foreground effectively in case of
relative motion between camera and objects.
In this report, we construct a background model which uses pan-tilt cam-

era in an environment with some foreground activities. The algorithm introduces
new grounds which combines the concepts from both inter-frame motion and back-
ground model for static camera. In first step, various flow algorithms are imple-
mented to get the actual shift between the frames. Relative motion between camera
and object is estimated from the histogram median drawn over flow vectors be-
tween consecutive frames. And then using this flow value a panoramic model is cre-
ated with each pixel containing a single gaussian (SG) model. The SG model gives
us change detection output which effectively segments foreground objects from the
scenario. The change detection output is used to update background model which
further help us in the creation of mosaiced image.
CHAPTER 1. INTRODUCTION 3
Figure 1.1: Stationary camera Vs Pan tilt camera. A number of stationary cameras are
required to monitor a wide area, while a single pan-tilt camera is sufficient for the task.
Note the lines showing the Field of view
Figure 1.2: Flowchart of the pan-tilt camera based background modeling.

Chapter 2
Foreground Extraction
Foreground object segmentation is one of the most challenging issues in computer

vision. The task of Foreground object segmentation is to extract moving targets of
interest from given image sequences. It is an essential step in lots of computer vision
applications such as visual surveillance, 3D motion capture and human-computer
interactions. High-quality segmentation can benefit the further processing and in-
fluence the overall performance of the whole system greatly [7]. However, because
of many uncontrollable factors in the actual acquisition environment such as cam-
era movement, light changes, shadows and complex background, fast and accurate
moving target extraction becomes a very difficult problem and makes this topic a
challenging research field in computer vision.
In most applications, cameras have fixed position and parameters. So it

is possible for us to have a stable background model for these applications. Sim-
ple background subtraction is the most straightforward method for segmentation
in such kind of situation. The background subtraction method calculates the differ-
ences of each pixel between current image and the background image, and then it
detects the moving object through a threshold value. To reflect the temporal changes
of the background, the statistical background subtraction approaches use the sta-
4
CHAPTER 2. FOREGROUND EXTRACTION 5
tistical characteristics of individual pixels to construct more accurate background

models. For this, single Gaussian model is considered as a sufficient approxima-
tion to practical pixel changing process due to its ability to efficiently segment fore-
ground.
Single Gaussian model is an effective background model to approximate the

variation in the presence of any particular pixel. It maintains mean (µk ) and variance
(υk ) corresponding to each pixel k in image I. It works on a per-pixel basis, and tends
to produce output that contains segmented foreground objects such as person, car,
etc. The segmentation result run on sample data set is shown in 2.2.
In case of pan-tilt cameras, the Single Gaussian modeling technique that is

used for static cameras completely fails over here because of the relative motion be-
tween camera and object. In order to construct a background model, we need to
estimate the flow between the consecutive frames. The main reason for this situ-
ation is due to the fact that the Single Gaussian Model does not use the knowledge
of the inter-frame motion, and thus it fails to segment the foreground effectively.
This problem exists in all of the conventional background subtraction methods. In
order to solve this problem, we need to introduce the concept of inter-frame motion
estimation which is discussed in the next chapter.
CHAPTER 2. FOREGROUND EXTRACTION 6
Figure 2.1: single Gaussian model output on sample dataset1.
Figure 2.2: single Gaussian model output on sample dataset2.

Chapter 3
Inter-Frame Motion Estimation
Optical flow or optic flow is the pattern of apparent motion of objects, surfaces,
and edges in a visual scene caused by the relative motion between an observer (an
eye or a camera) and the scene. Optical flow techniques such as motion detection,
object segmentation, time-to-collision and focus of expansion calculations, motion
compensated encoding, and stereo disparity measurement utilize this motion of the
objects surfaces, and edges.
3.1 Estimation of optical flow
Sequences of ordered images allow the estimation of motion as either instantaneous

image velocities or discrete image displacement. The optical flow methods try to
calculate the motion between two image frames which are taken at times t and t +
δt at every pixel position. These methods are called differential since they are based
on local Taylor series approximations of the image signal; that is, they use partial
derivatives with respect to the spatial and temporal coordinates.
For a 2D+t dimensional case (3D or n-D cases are similar) a pixel at location
7
CHAPTER 3. INTER-FRAME MOTION ESTIMATION 8
(x,y,t) with intensity I(x,y,t) will have moved by δx, δy and δt between the two image
frames, and the following image constraint equation can be given:
I(x, y, t) = I(x + δx, y + δy, t + δt) (3.1)
Assuming the movement to be small, the image constraint at I(x,y,t) with Taylor
series can be developed to get:
∂I ∂I ∂I
I(x + δx, y + δy, t + δt) = I(x, y, t) + δx + δy + δt + . . .
∂x ∂y ∂t
+Higher Order T erms . . . (3.2)
From these equations it follows that:

∂I ∂I ∂I
δx + δy + δt = 0 (3.3)
∂x ∂y ∂t
∂I δx ∂I δy ∂I
=⇒ + + =0 (3.4)
∂x δt ∂y δt ∂t
which results in
∂I ∂I ∂I
vx + vy + =0 (3.5)
∂x ∂y ∂t
where vx ,vy are the x and y components of the velocity or optical flow of I(x,y,t)
∂I ∂I ∂I
and ,
∂x ∂y
and ∂t
are the derivatives of the image at (x,y,t) in the corresponding
directions. Ix , Iy and It can be written for the derivatives in the following. Thus,
Ix vx + Iy vy = −It =⇒ (∇I)T V = −It (3.6)
This is an equation in two unknowns and cannot be solved as such. This is

known as the aperture problem of the optical flow algorithms. To find the optical
flow another set of equations is needed, given by some additional constraint. All
optical flow methods introduce additional conditions for estimating the actual flow
[5].
3.2 Algorithms for calculating optical flow
There are some general algorithm for calculating optical flows.
1. Horn Schunck [1]
2. Lucas Kanade [6]
3. Pyramidal Lucas Kanade
3.2.1 Horn Schunck
The Horn-Schunck method of estimating optical flow is a global method which in-
troduces a global constraint of smoothness to solve the aperture problem.
It assumes smoothness in the flow over the whole image. Thus, it tries to
minimize distortions in flow and prefers solutions which show more smoothness.
The flow is formulated as a global energy functional which is then sought to be
minimized. This function is given for two-dimensional image streams as:
Z Z
E= (Ix u + Iy v + It )2 + α2 (|∇u|2 + |∇v|2 ) (3.7)
where Ix , Iy and It are the derivatives of the image intensity values along the x, y
and time dimensions respectively, V~ = [u, v]T is the optical flow vector, and the
parameter α is a regularization constant. Larger values of α lead to a smoother
flow. This functional can be minimized by solving the associated Euler-Lagrange
equations. These are
µ ¶ µ ¶
∂L ∂ ∂L ∂ ∂L
− − =0 (3.8)
∂u ∂x ∂ux ∂y ∂uy
µ ¶ µ ¶
∂L ∂ ∂L ∂ ∂L
− − =0 (3.9)
∂v ∂x ∂vx ∂y ∂vy
where L is the integrand of the energy expression, giving

Ix (Ix u + Iy v + It ) − α2 ∆u = 0 (3.10)
Iy (Ix u + Iy v + It ) − α2 ∆v = 0 (3.11)
∂2 ∂2
where subscripts again denote partial differentiation and ∆ = ∂x2
+ ∂y 2
denotes the Laplace operator. In practice the Laplacian is approximated numeri-

cally using finite differences, and may be written ∆u(x, y) = ū(x, y) − u(x, y) where
∆u(x, y) is a weighted average of u calculated in a neighborhood around the pixel
at location (x,y).
However, since the solution depends on the neighboring values of the flow field, it
must be repeated once the neighbors have been updated. The following iterative
scheme is derived:
Ix ūk + Iy v̄ k + It
uk+1 = uk − Ix (3.12)
α2 + Ix2 + Iy2
Ix ūk + Iy v̄ k + It
v k+1 = v k − Iy (3.13)
α2 + Ix2 + Iy2
where the superscript k+1 denotes the next iteration, which is to be calcu-
lated and k is the last calculated result. This is in essence the Jacobi method applied
to the large, sparse system arising when solving for all pixels simultaneously.
Properties: Advantages of the Horn-Schunck algorithm include that it yields a high

density of flow vectors, i.e. the flow information missing in inner parts of homo-
geneous objects is filled in from the motion boundaries. On the negative side, it is
more sensitive to noise than local methods.
3.2.2 Lucas Kanade
In computer vision, the Lucas-Kanade method is a two-frame differential method

for optical flow estimation developed by Bruce D. Lucas and Takeo Kanade. It intro-
Figure 3.1: Horn Schunck output showing flow vectors in x and y direction.
duces an additional term to the optical flow by assuming the flow to be constant in a
local neighborhood around the central pixel under consideration at any given time.
The Lucas-Kanade method is still one of the most popular versions of two-
frame differential methods for motion estimation (which is also called optical flow
estimation). The solution assumes a locally constant flow. The method is based
upon the Optical Flow equation. The additional constraint needed for the estimation
of the flow field is introduced in this method by assuming that the flow (vx , vy ) is
constant in a small window of size m × m.. with m > 1, which is centered at Pixel x,y
and numbering the pixels within as 1 . . . n, n = m2 , a set of equations can be found
as
Ix1 vx + Iy1 vy = −It1
Ix2 vx + Iy2 vy = −It2

..
.
Ixn vx + Iyn vy = −Itn (3.14)
With this there are more than two equations for the two unknowns and thus
the system is over-determined. Hence:
AV~ = −b =⇒ AT (AV~ ) = AT (−b) =⇒ V~ = (AT A)−1 AT (−b) (3.15)
Figure 3.2: Lucas Kanade output showing flow vectors in x and y direction.
Properties: One of the characteristics of the Lucas-Kanade algorithm, and that of

other local optical flow algorithms, is that it does not yield a very high density of
flow vectors, i.e. the flow information fades out quickly across motion boundaries
and the inner parts of large homogenous areas show little or no motion. Its advan-
tage is the comparative robustness in presence of noise.
3.2.3 Pyramidal Lucas Kanade
As generally more equations are available for flow estimation than needed (over
determined system) the Lucas-Kanade algorithm can be used in combination with
statistical methods to improve the performance in presence of outliers as in noisy
images. A statistical analysis marks the outliers and the flow is then estimated based
on the remaining equations or weighted accordingly.
When applied to image registration, such as stereo matching or images with

large displacements, the Lucas-Kanade method is usually carried out in a coarse-to-
fine iterative manner, in such a way that the spatial derivatives are first computed at
a coarse scale in scale-space (or a pyramid), one of the images is warped by the com-
puted deformation, and iterative updates are then computed at successively finer
scales.One of the characteristics of the Lucas-Kanade algorithm, and that of other
local optical flow algorithms, is that it does not yield a very high density of flow
vectors, i.e. the flow information fades out quickly across motion boundaries and
the inner parts of large homogenous areas show little or no motion. Its advantage is
the comparative robustness in presence of noise.
1 L−1
I L (x, y) = (I (2x − 1, 2y − 1) + I L−1 (2x + 1, 2y − 1) + I L−1 (2x − 1, 2y + 1)) +
16
1 L−1
(I (2x − 1, 2y) + I L−1 (2x + 1, 2y) + I L−1 (2x, 2y − 1) + I L (2x, 2y + 1)) +
8
1 L−1 1
(I (2x, 2y)) + (I L−1 (2x + 1, 2y + 1)) (3.16)
4 16
Equation 3.16 defines the pyramid representation of a generic image I of

size nx × ny . Let I0 = I be the zeroth level image. This image is essentially the
highest resolution image (the raw image). The image width and height at that level
are defined as (x0 = xn ) and (y0 = yn ). The pyramid representation is then built in
a recursive fashion: compute I1 from I0 , then compute I2 from I1 , and so on . . .. Let
L = 1, 2, . . . be a generic pyramidal level, and let IL−1 be the image at level L-1.
Comparisons
• LK method dominates over HS in terms of performance and complexity.
• LK pyramidal method is quite efficient than HS in terms of time taken for

computation.
• Here in LK, whole image is broken down into different layers (pyramids).
• LK pyramidal uses features to estimate the optical flow between frames.
Figure 3.3: Lucas Kanade Pyramidal output showing flow vectors in x and y direction.
Chapter 4
Mosaiced Background Model
This chapter deals with the efficient background model construction in case of pan-
tilt cameras. As there is relative motion between the camera and objects, conven-
tional methods of background modeling fails over here. The proposed algorithm
combines the concepts from conventional background model and the inter-frame
motion. The proposed algorithm can be classified in following stages.
• Flow estimation between frames
• Segmentation of Foreground from background
• Learning Background model
4.1 Flow estimation
Flow estimation gives us complete view of the relative shift between camera and ob-
ject. Thus help us in construction of background model. Flow value between the two
consecutive frames gives the idea of overlapping region and non-overlapping re-
gion.There are various existing methods for calculating flow between frames (chap-
15
CHAPTER 4. MOSAICED BACKGROUND MODEL 16
Figure 4.1: step by step method of background modeling.
Figure 4.2: Flow density with x and y axis showing flows in different directions.
ter 3). Among these methods, Lucas Kanade pyramidal approach is the effective
and robust way to compute flow between frames. As it is usually carried out in
a coarse-to-fine iterative manner, in such a way that the spatial derivatives are first
computed at a coarse scale in scale-space (or a pyramid), one of the images is warped
by the computed deformation, and iterative updates are then computed at succes-
sively finer scales.One of the characteristics of the Lucas Kanade algorithm, and that
of other local optical flow algorithms, is that it does not yield a very high density
of flow vectors, i.e. the flow information fades out quickly across motion bound-
aries and the inner parts of large homogenous areas show little or no motion. Its
advantage is the comparative robustness in presence of noise.
4.2 Foreground Segmentation
Foreground segmentation is a fundamental first processing stage for vision sys-

tems which monitor real-world activity being of great interest in many applica-
tions. For instance, In the real time scenario moving objects such as human being
can be separated out from stationary/non-living objects. segmentation also allows
people to learn background effectively and led us to the efficient realization of 3-D
world. In 3D multi-camera environments, robust foreground segmentation allows
a correct 3-dimensional reconstruction without background artifacts while, in video
surveillance tasks, foreground segmentation allows a correct object identification
and tracking.
The objective of a foreground segmentation and Tracking is to segment the

scene in foreground objects and background and establish the temporal correspon-
dence of the foreground objects. In this report we will focus on techniques that are
based on a classification using a statistical model of the background and the fore-
ground. For this reason, we will assume that the first frame as a reference consisting
of only background objects. Our objective will be to improve the models and define
an appropriate updating of these models to reach a correct foreground-background
segmentation minimizing False Negatives and False Positives. The tracking process
makes the correspondence of the segmented objects with the objects being tracked
from previous frames. Depending on the technique, the tracking can be clearly sep-
arated from the segmentation (when previous foreground information is not used
for the segmentation) or can be implicit in the foreground segmentation (when we
are using a priori information of the object).
The approach used here is foreground segmentation based on background

modeling. This is a common technique proposed to use a background model to
detect foreground regions as a background exception. The background model itself
is dynamically updated.
4.2.1 Single Gaussian Model
We modeled the background by analyzing each pixel (i,j) of the image. The back-
ground model consists of mean and variance corresponding to each pixel value. In
Figure, it is shown an image with the system idea, where each pixel appear modeled
with a Gaussian distribution.
Mean and variance for each frame (µt , σ 2 ) are updated as follows:
µt = αIt + (1 − α)It−1 (4.1)
σt2 = αd2 + (1 − α)(σt−1 )2 (4.2)
d = (It − µt )2 (4.3)
where It is the value of pixel under analysis in the current frame, µt and σt2 are the
mean and variance of the Gaussian distribution respectively, α is the rate of update
which we have chosen as 0.03.
This updating step allows a background model evolution, making it robust

to soft illumination changes, a common situation in outdoor scenarios. For each
frame, the pixel value is classified as foreground according to Equation :
|It − µt | > kσt (4.4)
4.3 Background Stitching
This is the final step of processing, which led to the construction of mosaiced im-
age of whole background. At this stage first of all, a background model of twice
the height and thrice the width of regular frame is constructed, assuming that the
maximum shift along x and y direction would be always less than the width and
half of the height respectively. The first frame in the data sets are taken as a ref-
erence and learnt as it is. The next consecutive frames, each are classified as over-
lapping and non-overlapping regions. The non-overlapping regions or new regions
are learned as it is in the background model while overlapping regions are passed
thorough foreground extraction process and are learned in background model with
foreground objects skipped out. Background Model is updated dynamically at each
pixel position using its mean and variance. Finally, the output is mosaiced image
consisting of background objects with left out foreground parts.
Figure 4.3: LK Pyramidal flow output.
Figure 4.4: SG Model output. Black part represents uninitialized region

Figure 4.5: Mosaic output of test case. Black represents uninitialized region
Chapter 5
Conclusion
This report extends work on mosaic-based background modeling using non-stationary

cameras in several novel ways. All conventional background models do not use the
knowledge of the inter-frame motion, and thus they fail to segment the foreground
effectively in case of relative motion between camera and objects. The algorithm in-
troduces new grounds which combines the concepts from both inter-frame motion
and background model for static camera.
The proposed algorithm works in following steps:
• Determining the camera motion using LK based Pyramidal approach,
• Segmenting foreground from background using single Gaussian model,
• Learning the mosaiced background online.
The pan-tilt camera samples sub-regions of a larger scene whose background model
mosaic is obtained by stitching the pixel-wise intensity distributions of the back-
ground regions. We estimate flow as the median value of the flow vector computed
using LK pyramidal approach. The flow value gives an estimate shift between ob-
ject and camera. The flow values in x and y directions are combined with the single
22
CHAPTER 5. CONCLUSION 23
Gaussian modeling to generate an online background learning model for pan-tilt

cameras. The modified single Gaussian model uses flow values to get the region of
learning which means regions segmented as foreground are skipped during back-
ground learning process. Thus, objects moving within frame are most likely to be
skipped during background construction process.
However, this is clearly a very initial approach in the area of surveillance

system. One of the main areas which we would like to work upon is the area of
foreground detection process. Our approach uses single Gaussian model which
produces unstable and error-prone results, including false holes in detected objects
caused by camera noise. Single Gaussian model does not use the prior knowledge
between the pixels which seems the obvious reason for noisy output. Because of the
shortcomings of simple background subtraction method mentioned above, other al-
ternative like GMM (Gaussian Mixture Model) will greatly enhance the working of
the algorithm [8]. GMM is a probabilistic model for density estimation using a mix-
ture distribution. They introduces the concept of statistics in single gaussian model.
Since GMM provides very good performances and interesting properties as a clas-
sifier, our future work thrives towards the implementation of GMM to efficiently
detect the foreground.
Bibliography
[1] Brian G. Schunck Berthold K.P. Horn. Determining optical flow. In Artificial
Intelligence, volume 17, 1981.
[2] Arindam Biswas, Prithwijit Guha, Amitabha Mukerjee, and K.S. Venkatesh. In-
trusion detection and tracking with pan-tilt cameras. In Proceedings of the Third
International Conference on Visual Information Engineering, 2006.
[3] A. F. Bobick. Movement, activity, and action: The role of knowledge in the
perception of motion. Philosophical Transactions Royal Society London B, 1997.
[4] D. Gutchess, M. Trajkovics, E. Cohen-Solal, D. Lyons, and A.K. Jain. A back-

ground model initialization algorithm for video surveillance. In Eighth IEEE
International Conference on Computer Vision, volume 1, pages 733–740, July 2001.
[5] D.J. Fleet J.L. Barron and S. Beauchemin. Performance of optical flow techniques.
In International Joint Conferences on Artificial Intelligence, volume 12, pages 43–77,
1994.
[6] B.D. Lucas and T. Kanade. An iterative image registration technique with an ap-
plication to stereo vision. In International Joint Conferences on Artificial Intelligence,
pages 674–679, 1981.
24
BIBLIOGRAPHY 25
[7] C. Stauffer and W.E.L. Grimson. Adaptive background mixture models for real-
time tracking. In IEEE Computer Society Conference on Computer Vision and Pattern
Recognition, volume 2, page 252, June 1999.
[8] Z. Zivkovic. Improved adaptive gaussian mixture model for background sub-
traction. In Proceedings of the 17th International Conference on Pattern Recognition,
volume 2, pages 28–31, 2004.

Bachelor Thesis - Aneesh Sharma

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bachelor Thesis - Aneesh Sharma

Uploaded by

Copyright:

Available Formats

B ACKGROUND MODELING USING PAN -T ILT

COMMUNICATION AND COMPUTER Engineering

LNM INSTITUTE OF INFORMATION TECHNOLOGY, JAIPUR

May, 2010 Prithwijit Guha

3 Inter-Frame Motion Estimation 7

3.1 Estimation of optical flow . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.2 Algorithms for calculating optical flow . . . . . . . . . . . . . . . . . . 9

3.2.1 Horn Schunck . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2.2 Lucas Kanade . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.2.3 Pyramidal Lucas Kanade . . . . . . . . . . . . . . . . . . . . . 13

4 Mosaiced Background Model 15

4.1 Flow estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.2 Foreground Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.2.1 Single Gaussian Model . . . . . . . . . . . . . . . . . . . . . . . 18

4.3 Background Stitching . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.1 Stationary camera vs. Pan tilt camera . . . . . . . . . . . . . . . . . . . 3

1.2 Flowchart of the PT camera based background modeling . . . . . . . 3

2.1 Foreground detection output . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Foreground detection output . . . . . . . . . . . . . . . . . . . . . . . 6

3.1 Horn Shunk Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.2 Lucas Kanade Output . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.3 Lucas Kanade Pyramidal Output . . . . . . . . . . . . . . . . . . . . . 14

4.1 Block Diagram of Proposed Algorithm . . . . . . . . . . . . . . . . . . 16

4.2 Flow density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.3 LK Pyramidal Flow calculated on test case . . . . . . . . . . . . . . . 20

4.4 Single Gaussian output on test case . . . . . . . . . . . . . . . . . . . . 20

4.5 Final Mosaiced image of test case . . . . . . . . . . . . . . . . . . . . . 21

In this report, we construct a background model which uses pan-tilt cam-

Figure 1.2: Flowchart of the pan-tilt camera based background modeling.

Foreground object segmentation is one of the most challenging issues in computer

In most applications, cameras have fixed position and parameters. So it

tistical characteristics of individual pixels to construct more accurate background

Single Gaussian model is an effective background model to approximate the

In case of pan-tilt cameras, the Single Gaussian modeling technique that is

Figure 2.1: single Gaussian model output on sample dataset1.

Figure 2.2: single Gaussian model output on sample dataset2.

Inter-Frame Motion Estimation

3.1 Estimation of optical flow

Sequences of ordered images allow the estimation of motion as either instantaneous

I(x, y, t) = I(x + δx, y + δy, t + δt) (3.1)

From these equations it follows that:

Ix vx + Iy vy = −It =⇒ (∇I)T V = −It (3.6)

This is an equation in two unknowns and cannot be solved as such. This is

3.2 Algorithms for calculating optical flow

There are some general algorithm for calculating optical flows.

1. Horn Schunck [1]

2. Lucas Kanade [6]

3. Pyramidal Lucas Kanade

3.2.1 Horn Schunck

where L is the integrand of the energy expression, giving

denotes the Laplace operator. In practice the Laplacian is approximated numeri-

Properties: Advantages of the Horn-Schunck algorithm include that it yields a high

3.2.2 Lucas Kanade

In computer vision, the Lucas-Kanade method is a two-frame differential method

Ix1 vx + Iy1 vy = −It1

Ix2 vx + Iy2 vy = −It2

Ixn vx + Iyn vy = −Itn (3.14)

AV~ = −b =⇒ AT (AV~ ) = AT (−b) =⇒ V~ = (AT A)−1 AT (−b) (3.15)

Properties: One of the characteristics of the Lucas-Kanade algorithm, and that of

3.2.3 Pyramidal Lucas Kanade

When applied to image registration, such as stereo matching or images with

Equation 3.16 defines the pyramid representation of a generic image I of

• LK method dominates over HS in terms of performance and complexity.

• LK pyramidal method is quite efficient than HS in terms of time taken for