Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Multiple Human Object Tracking using Background

Subtraction and Shadow Removal Techniques


S. Saravanakumar#1, A.Vadivel#2, C.G Saneem Ahmed*3
#
Department of Computer Applications, National Institute of Technology
*Department of Electronics and Communication, National Institute of Technology
Tiruchirappalli, Tamilnadu, India.
1
somansk@yahoo.com
2
vadi@nitt.edu
3
saneem89@gmail.com

Abstract ² The main objective of this paper is to develop positioning error and the positioning error continuously
multiple human object tracking approach based on motion getting added while updating the template.
estimation and detection, background subtraction, shadow In template-based approach category, mean-shift method
removal and occlusion detection. A reference frame is initially [3] and Kernel-based tracking method [4] have been proposed,
used and considered as background information. While a new
where the color histograms of the target object is constructed
object enters into the frame, the foreground information and
background information are identified using the reference frame using a Kernel density estimation function. Since, the color
as background model. Most of the times, the shadow of the histogram is invariant feature for rotation, scaling and
background information is merged with the foreground object translation, it is considered as one of the suitable feature for
and makes the tracking process a complex one. In the approach, handling the problem of change in the scale, rotation and
morphological operations are used for identifying and removed translation of target object. The object tracking is carried out
the shadow. The occlusion is one of the most common events in by comparing the color histogram of the template and the
object tracking and object centroid of each object is used for target object. However, mean-shift method is not suitable for
detecting the occlusion and identifying each object separately. 3-D target object and monochromatic object. In case of
Video sequences have been captured in the laboratory and tested
monochromatic target object, even small variation in
with the proposed algorithm. The algorithm works efficiently in
the event of occlusion in the video sequences. illumination, produces narrow histogram pattern and tracking
often fails.
Keywords²Background modeling and subtraction, human In object tracking problem, the object representation is the
motion detection, shadow removal and occlusion detection. difficult aspect. Various ways of representing or describing
target objects have been proposed such as object appearance
[1, 2], image features [5, 6], target contour [7, 8] and color
I. INTRODUCTION histogram [4]. In both appearance-based and color histogram
based approaches, the region of the object has to be defined
In Computer Vision, object tracking is considered as one of for describing the target. Thus, if some of the background
the most important task. Various methods have been proposed pixels are mixed with the defined region, the tracking may
and reported both in academia and industry for large number fail.
of real-time applications. The object tracking methods may While tracking non-rigid objects, the probabilistic based
broadly be categorized as template-based, probabilistic and tracking methods have given better performance. Some of the
pixel-wise. While the template-based method represents the approach in this category can be found in [13, 14, 15, 19]. In
object in a suitable way for tracking, the probabilistic method one of the probabilistic method [13], the factors such as
uses intelligent searching strategy for tracking the target motion detector, region tracker, head detector and active shape
object. Similarly, the similarity matching techniques are used tracker have been combined for tracking the pedestrian. The
for tracking the target object in pixel-based methods. assumption made in this method is that there are no people
However, among all the above said approaches, the template- moving in the background. Since, this method uses contour as
based approach is found to be suitable for many real-time one of the feature, initial contour definition is difficult for the
applications [1, 2]. In this category of tracking methods, complicated contour target object.
similarity of the predefined target is being calculated with the Object tracking is also performed by predicting the object
object translation. However, for object transformations such position from the past information and the predicted current
as translation, rotation and scaling this method often fails. position. These types of methods combine both statistical
This is due to the fact that the procedures of selection of target computation and the parameter vector [11, 12, 16, 17].
object as constant size templates. For handling this difficult However, for real-time object tracking systems, it is found to
issue, varying templates are used. The inclusion of be difficult for constructing the proper feature vectors. This
background pixels into the template introduces the problem of method has been extended by Khan, et al. [11], for dealing
with the problem of interacting targets. The Markov Random

978-1-4244-8594-9/10/$26.00 2010
c IEEE 79
Field (MRF) has been used for modelling the interactions. In the detection of shadows the foreground objects are very
This has been achieved by adding an interaction weighted common, producing undesirable consequences. For example,
factor. However, in this method the tracking fails while there shadows connect different people walking in a group,
is an overlap between targets. generating a single object (typically called blob) as output of
In contrast to model-based tracking methods, the pixel-wise background subtraction. In such case, it is more difficult to
tracking methods are data-driven methods. In pixel-wise isolate and track each person in the group. There are several
tracking method, prior model of the target is not required. A techniques for shadow detection in video sequences [21].
parallel K-means clustering algorithm [18] has been used by The main objective of this paper is to develop an algorithm
Heisele, et al. [9, 10] for segmenting the color image sequence that can detect human motion at certain distance for object
and moving region is identified as target. However, the tracking applications. We carry out various tasks such as
method is computationally expensive due to large number of motion detection, background modeling and subtraction,
clusters. Similarly, another K-means based autoregressive foreground detection, shadow detection and removal,
model has been proposed and the clustering is performed only morphological operations and identifying occlusion.
WR WKH SRVLWLYH VDPSOHV 7KXV WKH WUDFNLQJ IDLOXUH FDQ¶W EH The paper organized as follows. Section 2.0 the object
detected and the failure recovery may not be possible. For segmentation of the video frames from the HSV color space is
tracking, the image pixels are divided as target and non-target presented. The proposed method is explained in Section 3.0.
pixels and K-means clustering algorithm is applied on these In section 4, we present the experimental results and we
pixels [20@ +RZHYHU WKLV PHWKRG FDQ¶W GHDO ZLWK WKH conclude the paper in the last section of the paper.
appearance changes of the target object such as size, pose, etc.
In addition, the computational cost is proportional to the
number of non-target points. II. BACKGROUND SUBTRACTION
It is understood from the above discussion that pixel-based
methods are robust against the background interfusion Human motion analysis and detection are the foremost task
methods. In this kind of method, the failure detection and in computer vision based problems. Human detection aims at
automatic failure recovery can be carried out effectively. segmenting regions corresponding to people from the entire
A very fundamental and critical task in computer vision is image. It is a significant issue in human motion analysis
the detection and tracking of moving objects in video system since the subsequent processes such as tracking and
sequences. Possible applications are as follows (i) Visual action recognition follows the motion detection. The motion
surveillance: A human action recognition system process detection and foreground object extraction algorithm consists
image sequences captured by video cameras monitoring of several sequential processes. The process algorithm is
sensitive areas such as bank, departmental stores, parking lots described in a flow chart and shown in Fig.1.
and country border to determine whether one or more humans In general, the Sum of Absolute Difference (SAD)
engaged are suspicious or under criminal activity. (ii) Content algorithm is used for background modelling, which is based
based video retrieval: A human behavior understanding on the frame differencing techniques. It is mathematically
system scan an input video, and an action or event specified in represented as
high-level language as output. This application will be very 1
much useful for sportscasters to retrieve quickly important D (t ) I (t i ) I (t j ) (1)
N
events in particular games. (iii) Precise analysis of athletic Where, N is the number of pixels in the frame and also
performance: Video analysis of athlete action is becoming an
important tool for sports training, since it has no intervention I (t i ) and I (t j ) are the frames at time
used as scaling factor,
to the athletic. i and j respectively. D(i ) is the normalized sum of
In all these applications fixed cameras are used with respect
absolute difference for that time instance. In an ideal case,
to static background (e.g. stationary surveillance camera) and a
when there is no motion, the following condition holds good.
common approach of background subtraction is used to obtain
an initial estimate of moving objects. First perform background I (t i ) I (t j ) and D(t ) 0 (2)
modeling to yield reference model. This reference model is
used in background subtraction in which each video sequence A. Background Subtraction
is compared against the reference model to determine possible Background subtraction is a popular technique to segment
variation. The variations between current video frames to that out the interested objects in a frame. This technique involves
of the reference frame in terms of pixels signify existence of subtracting an image that contains the object, with the
moving objects. The variation which also represents the previous background image that has no foreground objects of
foreground pixels are further processed for object localization interest. The area of the image plane where there is a
and tracking. Ideally, background subtraction should detect significant difference within these images indicates the pixel
real moving objects with high accuracy and limiting false location of the moving objects [22]. These objects, which are
negatives (not detected) as much as possible. At the same time, represented by groups of pixel, are then separated from the
it should extract pixels of moving objects with maximum background image by using threshold technique.
possible pixels, avoiding shadows, static objects and noise.

80 2010 International Conference on Signal and Image Processing


where I is the current pixel intensity value, B is the
Sequence of video background intensity value and T is the foreground threshold.
frames

No Motion
detected?
(a) (b)
End
Yes

Background modelling

Background subtraction/
Foreground object (c) (d)
extraction Fig.2. Background subtraction and moving object identification (a) Video
Frame, (b) Background, (c) Background Subtracked Image and (d) B/W
frames Showing Objects

Shadow detection and Fig.2 shows the video frames used for the background
removal subtraction and moving object identification. In Fig. 2(b), the
background frame is shown and that has been used for
constructing the background model. The foreground
information and background information are identified and
Morphology process finally subtracted for identifying the objects present in the
foreground frame, which is shown in Fig. 2(c). In Fig. 2(d),
the final outputs are shown where the objects present in the
frame is converted as black and white pixels for effective
Occlusions detection idenfication of the object.

B. Shadow Detection and Removal


Once the foreground object identified, each foreground
Draw bounding box and pixels are checked whether they are part of a shadow or the
human object tracking object. This process is necessary, since, shadow of the some
of the background object may get combined with the
Fig.1. Flow chart for the human object tracking foreground object. This causes the object tracking task as a
complicated task. For pixel ( x, y ) in a shadowed region, the
The mode model was chosen to perform the background Normalized Cross Covariance (NCC) in a neighboring region
modeling, which provides better results. If the absolute B ( x, y ) is found and the shadow can be detected using
difference between the current pixel and the mode modelled
equation given below,
background pixel is larger than a threshold, then that pixel is
considered as foreground object [23, 24]. RGB values of NCC ( x, y) Lnccc (4)
current frames pixels subtracted with that of background Where L ncc is fixed threshold. If L ncc is low, several
modeling frame. The mean of absolute difference of red value, foreground pixels corresponding to moving objects may be
green value and blue value are found. If the absolute misclassified as shadows. On the other hand, selecting a larger
difference greater than threshold, indicates the foreground value for L ncc results in less false positives, but pixels related
pixels else background pixels. Foreground pixels are detected to actual shadows may not be detected [25].
by calculating the Euclidean norm,
NCC for a pixel at position (i , j ) is given by
1, if I ( x, y ) B( x, y ) T
D ( x, y ) (3)
0, otherwise

2010 International Conference on Signal and Image Processing 81


ER (i, j ) shows the entry of new objects into the frame or separation of
NCC (i, j ) (5) occluded objects. Consequently if there is a sudden reduction
E B (i , j ) ETij
of number of objects present in the frame indicates the process
where of occlusion of two or more objects or the exit of the objects
N N
from the frame to the outside area. This situation is
ER(i, j ) B(i n, j m)Tij (n, m) (6) experimented and is depicted in Fig. 4(a)-(d).
n Nm N
N N
E B (i, j ) B(i n, j m) 2 (7)
n Nm N
and
N N
ETij Tij (n, m) 2 (8)
n Nm N

B (i, j ) is the background image and Tij (n, m) is the (a) (b)
template on current image.

(c) (d)
(a) (b) Fig.4. Occlusion detection and background subtraction (a) Video Frame, (b)
Background Subtraction, (c) BW Image Showing Objects and (d) Occlusion
Detection

As Fig.4(c), shows the two objects whitch are being combined


and treated as single object.

III. EXPERIMENTAL RESULTS

The experimental results are presented to show that the


(c) (d) proposed methods can achieve promising performance in
Fig.3. Background subtraction and shadow removal (a)Video Frame, (b) background subtraction and foreground object extraction. This
Background Subtraction, (c) BW Image Showing Objects and (d)Shadow system detects and tracks the moving objects exactly. In this
Detection approach, the background scene is modeled using a set of
Fig.3 shows video frames with shadows and the background image frames, which basically consists of 5-30
identification of the same. The foreground and background consecutive frames. The object pixels are segmented out from
objects are identified and are shown in Fig. 3(b) and (c). In its background followed by post- morphological process such
Fig. 3(c), Eqs.5-8, are applied and shadows are identified. The as dilation and erosion to eliminate noisy pixels thus
results are shown in Fig 3(d). In this approach, two dilation producing better results.
and one erosion operation processes were carried out. For Fig. 5(a)-(f), shows the object tracking process. This video
objects having area less than 0.5% of total image area has is a homemade video in the Laboratory and multiple objects
removed. were tracked. In Fig. 5(a) the original Video frame is shown.
The background frame is presented in Fig. 5(d), which will be
C. Occlusion Detection used as background and latter has been used for background
While two moving objects coming closer to each other, the modeling. Using 5(a) and (b), the foreground and background
background subtracted frame shows it as a single object. This information are extracted and subtracted for getting the target
situation is called as occlusion and will create problem while object, which is shown in Fig. 5(c). The tracking process of
tracking two objects. In this approach, an algorithm is the target objects are shown in Fig. 5(d)-(f). It is observed that
proposed for detecting the occlusion. This approach will the target objects are bound by a rectangular boundary and the
inform the frame number where the occlusion has taken place. proposed work detects the object effectively.
In the number of object in the frame is increased suddenly

82 2010 International Conference on Signal and Image Processing


(a) (b) (c)

(d) (e) (f)


Fig.5.Object tracking process (a) Video Sequences, (b) Background Model, (c) Background Subtraction, (d) Multiple Object
Tracking with Bounding Box, (e) & (f) Occlusion Detection

objects are not tracked due to the static behavior of


A. Comparative results objects in the consecutive frames by Kalman filter
Kalman filter approach [26], algorithm is used for algorithm. In the proposed approach, the background is
detecting and tracking the human objects in the video PRGHOOHG IURP ILUVW IUDPH WR VRPH ³Q´ QXPEHU RI
sequences. This algorithm is compared with the frames. Thus, the background model helps the proposed
proposed approach for performance evaluation. Fig. 6(a) approach to track the static human objects without
shows the tracking strategy of Kalman filter algorithm failure.
and (b) shows the tracking strategy of the proposed (ii) The computation time for tracking the human
technique. objects in the video sequences using proposed approach
While comparing the performance of Kalman filters is very low compared to Kalman filter algorithm.
and proposed approach, the following issues are
observed. (i) In most of the frames, the moving human

(a)

(b)
Fig.6. Performance comparison D .DOPDQILOWHU¶VUHVXOWV E 3URSRVHG%DFNJRUXQGVXEWUDFWLRQDQGVKDGRZUHPRYDO¶VUHVXOWV

2010 International Conference on Signal and Image Processing 83


[12] 7DR =KDR 5DP 1HYDWLD   ³7UDcking Multiple Humans in
IV. CONCLUSION &URZGHG (QYLURQPHQW´  ,((( &RPSXWHU 6RFLHW\ &RQIHUHQFH RQ
Computer Vision and Pattern Recognition (CVPR'04) vol. 2, pp.406-
413.
In this paper, an approach capable of detecting motion [13] 6LHEHO17DQG0D\EDQN6  ³)XVLRQRI0XOWLSOH7UDFNLQJ
and extracting object information which involves human as Algorithms for Robust People TrackinJ´ 7th European Conference
on Computer Vision (ECCV), Proceedings, vol. IV, pp.373-387.
object has been described. The algorithm involves [14] <LQJ :X *DQJ +XD 7LQJ <X   ³6ZLWFKLQJ 2EVHUYDWLRQ
modeling of the desired background as a reference model 0RGHOV IRU &RQWRXU 7UDFNLQJ LQ &OXWWHU´ ,((( &RPSXWHU 6RFLHW\
for later used in background subtraction to produce Conference on Computer Vision and Pattern Recognition (CVPR
foreground pixels which is the deviation of the current '03), vol. 1, pp.295-302.
[15] /L 3 DQG =KDQJ 7   ³9LVXDO &RQWRXU 7UDFNLQJ %DVHG RQ
frame from the reference frame. The deviation which Particle Filter, ,PDJHDQG9LVLRQ&RPSXWLQJ´YROSS-123.
represents the moving object within the analyzed frame is [16] 9HUPDDN73HUH]3*DQJQHW0DQG%ODNH$  ³7RZDUGV
further processed to localized and extract the information. Improved Observation Models for Visual Tracking: Selective
The occlusion has also been dealt effectively. $GDSWDWLRQ´7th European Conference on Computer Vision (ECCV),
vol.1, pp.645-660.
[17] 6LGHQEODGK + DQG %ODFN 0-   ³/HDUQLQJ ,PDJH 6WDWLVWLFV
ACKNOWLEDGMENT IRU %D\HVLDQ 7UDFNLQJ´ IEEE International Conference on
Computer Vision (ICCV), Vol.2, pp.709-716.
The work done by Dr. A.Vadivel is supported by the [18] +DUWLJDQ - DQG :RQJ0   ³$OJRULWKP $6 $ .-Means
&OXVWHULQJ $OJRULWKP´ Journal of the Royal Statistical Society.
research grant from the Department of Science and Series C (Applied Statistics), vol. 28, No.1. pp. 100-108.
Technology India, Under Grant SR/FTP/ETA ± 46/07 [19] <LOPD] $ /L ; DQG 6KDK 0   ³2Eject Contour Tracking
dated 25th October, 2007 and DST/TSG/ICT/2009/27. 8VLQJ/HYHO6HWV´Asian Conference on Computer Vision (ACCV),
Korea.
[20] +XD & :X + &KHQ 4 DQG :DGD 7   ³.- means
7UDFNHU $ *HQHUDO $OJRULWKP IRU 7UDFNLQJ 3HRSOH´ -RXUQDO RI
REFERENCES Multimedia, Vol.1, No. 4. Pp.46-53.
[21] R.CucchiDUD &*UDQD 03LFFDUGL DQG $3UDWL ³'HWHFWLQJ PRYLQJ
[1] CranH +' DQG 6WHHOH &0     ³7UDQVODWLRQ-tolerant Mask REMHFWVJKRVWVDQGVKDGRZVLQYLGHRVWUHDPV´ IEEE Transactions
0DWFKLQJ XVLQJ 1RQFRKHUHQW 5HIOHFWLYH 2SWLFV´ Pattern on Pattern Analysis and Machine Intelligence, 25(10):1337-1342,
Recognition, Vol.1, No.2, pp.129-136. October 2003.
[2] *UDVVO & =LQVVHU 7 DQG 1LHPDQU +    ³,OOXPLQDWLRQ [22] , +DULWDRJOX ' +DUZRRG DQG /6 'DYLV ³: 5HDO-time
Insensitive Template Matching with HyperSODQHV´ in Proc. 25th VXUYHLOODQFH RI SHRSOH DQG WKHLU DFWLYLWLHV´ IEEE Trans. Pattern
Pattern Recognition Symposium (DAGM '03), vol. 2781 of Lecture Analysis and Machine Intelligence .,2000, pp. 809-830.
Notes in Computer Science, pp. 273-280, Springer-Verlag, [23] *XLVKHQJ<LQ<DQER/LDQG-LQJ=KDQJ³7KH5HVHDUFKRI9LGHR
Magdeburg, Germany. 7UDFNLQJ6\VWHP%DVHGRQ9LUWXDO5HDOLW\´Internation Conference
[3] &RPDQLFLX ' 5DPHVK 9 DQG 0HHU 3    ³5HDO-time on Internet Computing in Science and Engineering, 2008.
Tracking of Non-rigid Objects usinJ0HDQ6KLIW´IEEE Conference [24] 6ZDQWMH-RKQVHQDQG$VKOH\7HZV³5HDO-Time Object Tracking and
on Computer Vision and Pattern Recognition (CVPR'00), Vol.2, &ODVVLILFDWLRQ 8VLQJ D 6WDWLF &DPHUD´ Proceedings of the IEEE
pp.142-149. ICAR, Workshop on People Detection and Tracking, Kobe, Japan,
[4] &RPDQLFLX ' 5DPHVK 9 DQG 0HHU 3    ³.HUQHO-Based May 2009.
2EMHFW 7UDFNLQJ´ ,((( 7UDQVDFWLRQV RQ 3DWWHUQ $QDO\VLV DQG [25] Julio Cezar Silveira Jacques Jr, Claudio Rosito Jung, and Soraia
Machine Intelligence, vol.25, No. 5, pp.564-577. 5DXSS 0XVVH ³Background Subtraction and Shadow Detection in
[5] &ROOLQV5DQG/LX<  ³2QOLQH6HOHFWLRQRI'LVFULPLQDWLYH *UD\VFDOH 9LGHR 6HTXHQFHV´ IEEE Proceeding of the XVIII
7UDFNLQJ )HDWXUH´  ,((( 7UDQVDFWLRQV RQ 3DWWHUQ $QDO\VLV DQG Brazilian Symposium on Computer Graphics and Image Processing
Machine Intelligence, vol.27, No. 10, pp.1631-1643. (SIBGRAPI¶  pp: 530-1834/05, 2005.
[6] 1JX\HQ+7DQG6HPHXOGHUV$  ³7UDFNLQJDVpects of the [26] Shiuh-Ku Weng, Chung Ming Kuo and Shu-.DQJ 7X ³ 9LGHR
IRUHJURXQG DJDLQVW WKH EDFNJURXQG´  th European Conference on object tracking using adaptive Kalman filter´ Journal of Visual
Computer Vision (ECCV), Proceedings, vol. 2, pp.446-456. Communication and Image Representation, Volume 17 , Issue 6,
[7] .DVV 0 :LWNLQ $ DQG 7HU]RSRXORV '    ³6QDNHV DFWLYH pp.1190-1208, 2006.
FRQWRXU PRGHOV´  International Journal of Computer Vision, vol.1,
No. 4, pp.321-331.
[8] ,VDUG 0 DQG %ODNH $   ³&RQWRXU WUDFNLQJ E\ VWRFKDVWLF
SURSDJDWLRQ RI FRQGLWLRQDO GHQVLW\´  th European Conference on
Computer Vision, Proceedings, vol.1, pp.343-356.
[9] +HLVHOH%  ³0RWLRQ-based Object Detection and Tracking in
&RORU,PDJH6HTXHQFH´th Asian Conference on Computer Vision.
[10] +HLVHOH%.UHVVHO8DQG5LWWHU:  ³7UDFNLQJ1RQ-Rigid
0RYLQJ 2EMHFWV %DVHG RQ &RORU &OXVWHU )ORZ´ Conference on
Computer Vision and Pattern Recognition, Proceeding, pp.253-257.
[11] .KDQ = %DOFK 7 DQG 'HOODHUW )   ³$Q 0&0&-based
3DUWLFOH )LOWHU IRU 7UDFNLQJ 0XOWLSOH ,QWHUDFWLQJ 7DUJHWV´ 8th
European Conference on Computer Vision (ECCV), Proceedings,
vol.4, pp.279-290.

84 2010 International Conference on Signal and Image Processing

You might also like