Professional Documents
Culture Documents
Kelompok 1. Object Detection in A Maritime Environment - Compressed
Kelompok 1. Object Detection in A Maritime Environment - Compressed
Abstract— This paper provides a benchmark of the per- of high spatio-temporal correlation as foreground objects and
formance of 23 classical and state-of-the-art background waves present high spatio-temporal correlations [3]. See [1]
subtraction (BS) algorithms on visible range and near infrared and [4] for more extensive discussion on challenges of mar-
range videos in the Singapore Maritime dataset. Importantly, our
study indicates the limitations of the conventional performance itime vision problem, including BS.
evaluation criteria for maritime vision and proposes new perfor- In Table I, we compare the performance of 22 algorithms
mance evaluation criteria that is better suited to this problem. from Sobral’s background subtraction library [5] on four
This paper provides insight into the specific challenges of BS in datasets - the background models challenge (BMC) dataset [6],
maritime vision. We identify four open challenges that plague BS the change detection (CD) dataset [7] and the Singapore
methods in maritime scenario. These include spurious dynamics
of water, wakes, ghost effect, and multiple detections. Poor recall Maritime (SM) dataset [1] containing both visible (SMV) and
and extremely poor precision of all the 23 methods, which have infrared (SMIR) videos. The open source availability of the
been otherwise successful for other challenging BS situations, source codes and the previous benchmarking results [5], [7]
allude to the need for new BS methods custom designed for were the reasons for the selection of these methods. The
maritime vision. details of computing the precision and recall are discussed
Index Terms— Maritime vehicles, autonomous automobiles, in sections IV and section V. We mention here that the true
background subtraction, object detection, computer vision. positive detections used for computing the precision and recall
in Table I are determined using intersection over union (IOU)
I. I NTRODUCTION
ratio being more than 0.5. The best performing methods for
TABLE I
T HE P ERFORMANCE FOR BMC [5], CD [7], SMV AND SMIR D ATASETS . T HE D ATA C ORRESPONDS TO 22 M ETHODS IN S ECTION III
TABLE II according to the survey in [5] and briefly review each method
D ETAILS OF THE S INGAPORE -M ARITIME D ATASET U SED IN T HIS PAPER in this section. These categories are: basic methods that
employ basic statistical techniques for modeling dynamics of
background; methods that model background using Gaussian
distributions, this is a large and expanding family of methods;
methods that model background using complex statistical
distributions other than Gaussian, also an expanding family;
methods that use texture and color descriptors, comprising of
family of the popular local binary and ternary patterns; and
of researchers in this exciting and challenging problem with other machine learning based methods which employ math-
significant impact on the future of maritime technology. ematical models of background that can be solved as some
The dataset and methods used in this study are described in form of optimization problem incorporating priors. Existing
sections II and III, respectively. Our methodology is presented maritime background subtraction methods often ignore the
in section IV. The metrics for analyzing the performance temporal information and estimate background for each frame
of object detection through BS are presented in section V. independently [1]. In a few cases, Gaussian models have been
Results, both quantitative and qualitative, are presented in used with limited success, see [1]. We also note indepen-
section VI. A discussion on the study is presented in VII. dent multimodal background subtraction method (IMBS) [17],
The study is concluded in section VIII. which has been developed for maritime problem specifically.
Default parameters have been used for all the methods being
II. DATASET benchmarked, unless stated otherwise. For further details of
the methods, please see [5], [16], and references therein.
Singapore Maritime dataset [1] is composed of videos
We note that the BS methods considered in this paper
acquired in high definition format (1080 × 1920 pixels) using
are not exhaustive and it would be ideal to include all BS
Canon 70D cameras around Singapore waters. The dataset has
approaches. Interested readers are referred to review arti-
on-shore (camera fixed on a platform) and on-board (camera
cles [5], [18]–[20], [21]–[24], references therein, and other
on-board a moving vessel) videos. The ground truths (GT)
recent articles [25]–[45]. These methods have not been tested
for the objects were annotated by volunteers unrelated to
for various reasons, such as unavailability of source code,
the project, using annotation tools developed in Matlab. The
insufficient information for faithful reproduction of algorithm,
objects were enclosed in bounding boxes. The dataset and
unsuitability to maritime environment, and small increment
annotation files of the GTs are available on the project web-
over the methods included in this study. In this sense, we deem
page https://sites.google.com/site/dilipprasad/home/singapore-
that benchmarking on these 23 methods is sufficient to obtain
maritime-dataset. We use only on-shore videos in this paper
insights about the effectiveness of BS methods in maritime
to focus on inherent problems of BS in maritime scenario that
domain and opens the field for richer evaluation by other
are unrelated to the motion of the ship on which the camera
researchers active in development of BS approaches in com-
is mounted. The on-shore videos are further categorized into
puter vision.
visible range and near infrared (NIR) range videos. The NIR
videos are acquired using a modified Canon 70D camera with
its hot filter removed and a visible range blocking filter added. A. Basic BS Methods
The details of the dataset are given in Table II. 1) Mean Background [8]: This method computes the back-
ground in the current frame as the mean image of the last T
III. M ETHODS U SED FOR B ENCHMARKING frames. Binary foreground mask is computed using a simple
As mentioned earlier, we use 22 BS methods for threshold of the difference between the current frame and the
whose source codes are available at https://github.com/ background. Due to the default small value of T , it can adapt
andrewssobral/bgslibrary. We also include a recent method, fast but can only detect those objects which are subject to
namely generalized fused lasso foreground modeling [16], significant movement in these frames.
whose matlab source code is available. We performed para- 2) Adaptive Median Background [14]: This method initial-
meter optimization to achieve reasonable performance since izes the background as the median of first T0 frames. Then,
the default parameters were completely unsuitable for the mar- after every T frames, the background is updated in selected
itime problem. We categorize the methods used in this study regions such that the updated intensities in these regions is
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
closer to the corresponding intensities in the current frame. model and update is derived from [47] but the number of
The background regions to update are determined adaptively Gaussian distributions used to represent the background is
in each update interval. This method is suitable for slowly determined through [13].
varying background and relatively fast moving objects. 6) Fuzzy Gaussian [9]: This method models background
3) Prati’s Median Background [46]: This method is dif- as a Gaussian low pass filter and adaptively updates the back-
ferent from [14] in three major ways. First, it updates its ground using a fuzzy running average scheme. The decision
background model using a temporal median filter over the on foreground pixels is also taken using fuzzy logic.
last T frames and the last estimate of the background. Second, 7) T2 Fuzzy GMM (T2FGMM) [15]: It models the back-
it computes the thresholds for determining background and ground as GMM with either uncertain means (UM) or uncer-
foreground adaptively. And third, it uses a fast bootstrapping tain variances (UV) specified by fuzzy logic. The background
method for background initialization which involves partition- is made adaptive by updates of the fuzzy functions for
ing the image region into blocks, adaptively adjusting blocks mean or variance.
that demonstrate high difference in consecutive frames and
quickly converging to stable blocks such that a stable initial-
ization of background is obtained. Through these features, it is C. Methods That Use Other Statistical Background Models
expected to perform better in vacillating background such as 1) Kernel Density Estimation (KDE) [12]: KDE models
encountered in maritime videos. background as composed of non-parametric kernels, thus
allowing non-Gaussian distributions of the background to be
B. Methods That Use Gaussian Background Models modeled. The adaptation of the background occurs as online
1) Grimson’s Gaussian Mixture Model (GMM) [47]: tweak of the probability density function. The intensities with
Grimson’s GMM models the temporal distribution of the larger probabilities correspond to background. In the current
intensities (or color values) at a pixel as a mixture of Gaussian implementation, Gaussian kernel of zero mean is used.
distributions, which are adaptively updated using online 2) VuMeter [50]: VuMeter is a KDE based non-parametric
K-means approximation approach. Then the spatial distribu- background model. However, instead of using Gaussian ker-
tion of the parameters and weights of the fitted GMMs are nels [12], it uses Kronecker delta kernels. A simple temporal
used to adaptively estimate the background as the GMMs update of the model is used and foreground detection is further
update after each frame. Large deviations from the current cast as likelihood maximization of Gaussian particle filters
background model indicate foreground pixels. This approach representing the foreground segmentations.
allows for a variety of backgrounds with large standard devia- 3) Independent Multimodal Background Subtraction (IMBS)
tions (such as rustle of leaves or water ripples) and long term [17]: IMBS has three components. The first component is an
temporal changes to be effectively modeled. on-line clustering algorithm. The RGB values observed at a
2) Wren’s Gaussian Average (GA) [48]: Conceptually, pixel is represented by histograms of variable bin size. This
Wren’s GA is similar to Grimson’s GMM, but it fits only allows for modeling of non-Gaussian and irregular intensity
one Gaussian distribution per pixel. The mean values of the patterns. The second component is a region-level understand-
Gaussian distributions at the pixels are used to model the ing of the background for updating the background model.
background. Online learning of background is made possible The regions with persistent foreground for a certain number of
through the update of the Gaussian distributions for all the frames are included as background in the updated background
pixels in each frame. Such method accommodates long term model. The third component is a noise removal module that
slow temporal changes in the background but allows only one helps to filter out false detections due to shadows, reflections,
form of background distribution per pixel. Thus, in comparison and boat wakes. It models wakes as outliers of the foreground,
to Grimson’s GMM, it is expected to be less effective in forming a second background model specifically for such
dealing with wakes as background. outliers. It is the only method in this list of methods which
3) Simple Gaussian [49]: This implementation is concep- has been developed for maritime computer vision problem
tually similar to Wren’s GA [48], but it utilizes likelihood specifically.
maximization for updating the parameters of the Gaussian
distribution at each pixel.
4) Zivkovic’s Adaptive GMM (AGMM) [13]: Zivkovic’s D. Methods That Use Texture and Color Descriptors
AGMM models both background and foreground as GMMs. 1) Texture as Background [51]: Background is modeled as
Thus, although the GMMs can be fit for each pixel, Gaussian composed of textures described using histogram vectors (HVs)
distributions with small weights would indicate the fore- of local binary patterns (LBPs). At each pixel, the weights of
ground. The need to include a new Gaussian distribution in the LBPs are computed. The weight vector is compared against the
GMM is also an indicator of the foreground. In addition to the stored HVs of stored background. Small intersection implies
online learning of background through this concept, Zivkovic’s foreground pixel. Large intersection prompts the update of
AGMM incorporates forgetting of the old background model HVs of LBPs of background.
through the use of suitable constant weight for updating the 2) Multicue [52]: It uses multiple cues such as texture,
parameters and weights of the GMMs. color, and region appearance. Texture model with scene
5) Mixture of Gaussian (MoG) [5]: This is a hybrid of adaptive LBP descriptors is used to detect the initial fore-
Grimson’s GMM [47] and Zivkovic’s [13], where the GMM ground regions. Color value statistics are integrated with
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
texture descriptors to refine the results of texture based fore- IV. M ETHODOLOGY
ground segmentations. Lastly, Hausdroff distance between the All videos are down-sampled in spatial dimensions by a
edgemap and the segmented foreground regions is used to factor of 2 for methods in BGS library and by a factor of 4
refine the foreground. As already seen in Fig. 3, Multicue for GFLFM. Each BS method generates a binary image in
may not perform well for maritime problem if the texture is which background is assigned 0 and the foreground region
not appropriately modeled. is assigned 1. All the results are generated on an Intel Xeon
3) Local Binary Similarity Segmenter (LOBSTER) [53]: CPU with E5-1650@3.2 GHz and 64 GB RAM system. The
LOBSTER uses local binary patters with self-similarity holes in the foreground binary mask are filled and closed
throughout the images. It is adapted from ViBe [54] and regions are detected. The bounding boxes of closed regions
replaces ViBe’s pixel-intensity level description of the back- are determined. Bounding boxes of dimensions smaller than
ground with spatiotemporal similarity of local binary patterns. 10 pixels are considered as spurious detections due to noise
The model is updated using simple rules based on distance and water dynamics and are removed from further processing.
between the similarity patterns. Each of the remaining bounding boxes is considered as a
4) Self-Balanced Sensitivity Segmenter (SuBSENSE) [11]: detected object.
SuBSENSE employs pixel level change detection using com- For each pair of detected object (DO) and ground truth
parisons of colors and local binary similarity patterns. It uses object (referred to as GT), we compute a performance metric C
an adaptive feedback mechanism for updating the background (see section V for the metrics considered). We note that it is
features as the background dynamics change with time. preferable to use the shape segmentation of the GT and DO.
However, obtaining shape segmentations manually for all the
E. Other Machine Learning Based Methods frames of maritime videos is tedious and resource demanding,
1) Eigen-Background [10]: In this technique, M eigen- and ultimately a heuristic exercise. Thus, we use the bounding
images of last N frames are incrementally computed with boxes of the GT and DO objects in maritime videos for
each frame. These eigenimages encode static and large scale performance evaluation. The metrics have been defined such
dynamic features, including long term temporal changes such that if the DO-GT pair does not qualify certain thresholds,
as in illumination conditions. Thus, they are suitable for small the value of the metrics is assigned to be −∞. Then, we use
foreground objects and are expected to deal with wakes which the Hungarian algorithm [59] to find one-to-one correspon-
persist for a long duration. dences (i.e. only one DO should be considered as representing
2) Adaptive Self Organizing Maps (SOM) [55]: In order a GT) between the detected objects and ground truth objects.
to mimic human neural systems, it uses hue-saturation-value We use 1−C as the input argument for the Hungarian method.
color space and trains a neural networks-based self-organizing Indeed, the pairs of DO-GT with the value of metric equals
map in which the background at each pixels is represented to infinity do not result in any correspondences due to the
by weights of multiple neurons in an intermediate layer. It is value −∞ being assigned to the metrics when the thresholds
shown in [55] to be amenable to moving backgrounds, varia- are not met. The other possible correspondences are ranked
tions in gradual illumination and camouflage, and shadows. to minimize the value of 1 − C after all correspondences
3) Fuzzy Adaptive SOM (ASOM) [56]: It has two major are complete. The total-number of correspondences is used
enhancement over adaptive SOM. First, it introduces a spa- as the number of true positives (TP). The number of detected
tial coherence variant to its neural network. Second, it uses objects for which correspondences could not be determined
fuzzy logic to make soft decisions about classifying pixels corresponds to false positives (FP). Similarly, the number of
as background or foreground. Consequently, this method has ground truth objects for which correspondences could be found
lower false positives and better robustness in comparison to is used as false negatives (FN).
Adaptive SOM. Then precision, recall, F-score are determined as follows:
4) Sigma Delta [57]: Sigma-Delta background employs a
recursive non-linear operator called − filter. This filter
T
T
TPt TPt
emulates a simple counter that adds one to the output if the t =1 t=1
Precision = ; Recall =
current signal is more than previous signal and deducts one
T
T
from the output if the current signal is less than the previous (TPt + FPt ) (TPt + FNt )
t =1 t=1
signal [58]. The method is made robust using a spatio-temporal (1)
regularizer. Multiple filters with different time constants are 2 Precision Recall
F-score = (2)
used to accommodate complex background dynamics. Precision + Recall
5) Generalized Fused Lasso Foreground Modeling [16]: where t denotes the frame and T is the total number of frames.
This method is similar to eigen-background in representing the
background as low-ranked sub-space. Foreground estimation V. M ETRICS FOR P ERFORMANCE E VALUATION
is made robust through generalized fused lasso regularization
In computer vision, object detection is usually evaluated by
and incorporation of penalty for total variation. The Matlab
the metric known as Intersection over Union (IOU), which is
code of the method, provided on the project website of [16] is
defined as
used for generating the results. Computation time of GFLFM
Area(GT ∩ DO)
is prohibitively long. Thus, we have provided results for visible IOU = (3)
range videos only. Area(GT ∪ DO)
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 1. The ticks indicate true detections while the crosses indicate false detections. (a) Metric CIOU is illustrated here. (b) Metric CIOG is illustrated here.
(c) Metric CBEP is illustrated here.
Fig. 2. Comparison of metrics for three methods providing superior quantitative results with balanced precision and recall on visible range videos.
Fig. 4. Illustration of why metric Cbottom may be really needed for some
cases. Black boxes indicate the ground truth. LOBSTER method is used to
detect the objects (red and green boxes). Green and red boxes indicates the
DOs identified as the true positive and false positive, respectively.
Fig. 6. Statistics of the performance metrics for visible range videos. The colors denote the different types of methods and match the colors used in Table III.
TABLE III
M EAN VALUES OF P RECISION , R ECALL , AND F-S CORE OF D IFFERENT M ETHODS FOR V ISIBLE R ANGE V IDEOS
maritime problem. Here again, the role of temporal scales of Multicue method ensures that the suppressed background is
learning becomes relevant. Stationary or slow moving objects unlikely to have any foreground objects. On the other hand,
correspond to almost no additional learning over a long period Multicue makes processing of the detected foreground region
of time except due to illumination changes or occlusion. This for finer detection of foreground objects mandatory. Lastly,
property may be useful in detecting stationary objects in SuBSENSE is effective in detecting fast moving foreground
maritime background. objects, and suppressing all forms of maritime background.
c) Methods that use other statistical background models: It however suppresses the stationary and slow moving objects
Among these methods (Fig. 8(b)), KDE is more sensitive to as background, which is undesirable.
SDOW, although not as sensitive as simple Gaussian, fuzzy e) Machine learning based methods: All machine learn-
Gaussian, etc. KDE also suffers from the ghost effect. While ing based methods, except sigma-delta and GFLFM methods
IMBS is better among these methods for detecting foreground suffer from ghost effect. Sigma-delta method suffers from
pixels, even related to stationary or slow moving foreground false positives due to wakes. GFLFM is effective in detect-
objects, it will benefit from some form of region growing ing fast moving foreground objects and suppressing SDOW.
techniques as post-processing. We note that IMBS is ineffec- It, however, suppresses the stationary and slow moving objects
tive for modeling wakes as foreground, as seen in the second as background. It also suffers from MD and inability to model
example in Fig. 8(b). It suffers from a very alleviated form of wakes as background. All the machine learning methods,
the ghost effect, where it is sensitive to recent motion but not except GFLFM, are affected by SDOW to some extent.
to prolonged history, as seen in the first example in Fig. 8(b). 3) Summary of Results in Visible Range: The quantitative
d) Methods that use texture and color descriptors for and qualitative results appear conflicting at a cursory glance
background: Except the basic texture method, the other three at Tables III and IV. However, they are actually consistent
methods are very effective in suppressing SDOW and wakes. and quite insightful, as we discuss here. A major observa-
Multicue and LOBSTER suffer from ghost effect, although tion from Table III is that all methods have poor precision,
LOBSTER is less severely effected in comparison to other i.e. they generate a large number of false positive detections.
methods that are afflicted by ghost effect. Multicue has a The largest number of false detections is due to the incapability
tendency of forming foreground blobs much larger than the of the current methods to deal with SDOW. The second largest
foreground. In fact, Multicue may easily bleed the foreground contributor in false detections is wakes and the third largest
blobs into the water region and incorrectly detect a large contributor is the MD. Among all the methods used in this
portion of water as foreground. This is evident in the second dataset, only LOBSTER is effective in dealing with these three
example of Fig. 8(d) as well as Fig. 1(c). On one hand, causes of false detections. In general, methods that use texture
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 8. Qualitative comparison of the methods using different examples. Yellow boxes indicate the ground truth while the blue boxes indicate the detected
objects. Magenta colored text indicates failure to deal with wakes and blue colored text indicates ghost effect.
and color descriptors (Multicue, LOBSTER, and SuBSENSE) Four inferences can be drawn from these analysis. First,
are better at dealing with these issues. Thus, it is not surprising no non-regional pixel-only methods can alone be useful in
that their precision is superior to other methods. maritime vision problem. Incorporation of additional back-
MD is potentially an easier problem to tackle among all ground descriptors potentially with regional characteristics is
these issues. If MD is not significant, some form of region essential. Second, background modeling at multiple temporal
growing or graph cut methods may provide simple solutions to scales may be quite critical for suppression of ghost effect.
the MD problem. However, modeling SDOW as background is Third, background suppression for maritime vision problems
more difficult; only 9 methods out of 22 can model SDOW as need additional region growing or region cleaning procedures
background. Modeling wakes as background is even tougher; to effectively deal with false detections due to SDOW, wakes,
only 6 methods out of 22 are capable of dealing with wakes. and MD. Fourth, new background modeling approaches for
We do note that there have been some works on detecting SDOW and wakes should be developed.
water regions that are considered as background [60], [61].
Lastly, ghost effect is related to temporal scales of learning B. NIR Range Videos
and the motion characteristics of the vessels. Potentially, 1) Quantitative Evaluation: The performance of the meth-
multiple temporal scales of learning and incorporation of ods for the NIR range videos is compared in Fig. 9
additional regional cues may help in suppressing ghost effect. and Table V. Similar to visible range videos, the precision
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 9. Statistics of the performance metrics for NIR range videos. The colors denote the different types of methods and match the colors used in Table V.
across the scene due to non-linear mapping of the world space image units. As a consequence, the temporal scales of learning
coordinates to the sensor coordinates. Further, glint, speckle, which are suitable for certain speeds are unsuitable for other
color variations induced due to illumination conditions, and speeds. This causes ghost effect related to the historical trail
underwater topography together form spurious dynamics of of the foreground objects for which the temporal scales of
water with large variation in the spatial scales in the image. learning were unsuitable.
The methods that are able to model SDOW well (for example
SuBSENSE [11]) usually intensifies the issue of MD. MD is D. Intensity Only Data and Infrared Spectrum
often a result of the edge regions of foreground being assigned Our results show deteriorated performance of most methods
to background. In most situations, this happens because the when applied to near infrared (NIR) range videos instead of
method models drastic variations as background in order to visible range videos. Poor performance of most methods even
model SDOW and the edge regions of the foreground objects in visible range, and worse performance in NIR range indicates
also demonstrate such drastic variations. the need for the design of new BS algorithms that can provide
better performance for maritime vision in the absence of color
B. Modeling Wakes cues.
Our study shows that modeling wakes as background is
difficult for existing BS methods. Even IMBS [17], developed E. Potential Solutions
specifically for maritime vision, shows failure in modeling Improvement in modeling SDOW and wakes involves trade-
wakes as background. As with SDOW, methods that model off with MD. A potential solution is to choose methods that
wakes as background suffer from MD. are effective in modeling SDOW and wakes as the core and
address the issue of MD in the post-processing. For instance,
C. Ghost Effect growing regions of multiple detection and merging regions
The speeds of foreground objects in maritime vision prob- with local homogeneous properties after region growing could
lems vary greatly in both the physical units as well as be used to reduce multiple detections.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
TABLE V
M EAN VALUES OF P RECISION , R ECALL , AND F-S CORE OF D IFFERENT M ETHODS FOR NIR R ANGE V IDEOS
Fig. 11. Qualitative comparison of various methods for example frames of NIR range videos. Yellow boxes indicate the ground truth while the blue boxes
indicate the detected objects. Magenta colored text indicates failure to deal with wakes and blue colored text indicates ghost effect.
[10] N. M. Oliver, B. Rosario, and A. P. Pentland, “A Bayesian computer [33] G. Ramirez-Alonso, J. Ramirez-Quintana, and M. I. Chacon-Murguia,
vision system for modeling human interactions,” IEEE Trans. Pattern “Temporal weighted learning model for background estimation with an
Anal. Mach. Intell., vol. 22, no. 8, pp. 831–843, Aug. 2000. automatic re-initialization stage and adaptive parameters update,” Pattern
[11] P.-L. St-Charles, G.-A. Bilodeau, and R. Bergevin, “Flexible background Recognit. Lett., vol. 96, pp. 34–44, Sep. 2017.
subtraction with self-balanced local sensitivity,” in Proc. IEEE Conf. [34] G. Han, J. Wang, and X. Cai, “Background subtraction based on
Comput. Vis. Pattern Recognit. Workshops, Jun. 2014, pp. 414–419. modified online robust principal component analysis,” Int. J. Mach.
[12] A. Elgammal, D. Harwood, and L. Davis, “Non-parametric model Learn. Cybern., vol. 8, no. 6, pp. 1839–1852, 2017.
for background subtraction,” in Proc. Eur. Conf. Comput. Vis., 2000, [35] D. Berjón, C. Cuevas, F. Morán, and N. García, “Real-time nonpara-
pp. 751–767. metric background subtraction with tracking-based foreground update,”
[13] Z. Zivkovic, “Improved adaptive Gaussian mixture model for back- Pattern Recognit., vol. 74, pp. 156–170, Feb. 2018.
ground subtraction,” in Proc. 17th Int. Conf. Pattern Recognit., vol. 2, [36] M. Balcilar and A. Sonmez, “Background estimation method with
Aug. 2004, pp. 28–31. incremental iterative Re-weighted least squares,” Signal, Image Video
[14] N. J. B. McFarlane and C. P. Schofield, “Segmentation and tracking of Process., vol. 10, no. 1, pp. 85–92, 2016.
piglets in images,” Brit. Mach. Vis. Appl., vol. 8, no. 3, pp. 187–193, [37] M. Dkhil, A. Wali, and A. M. Alimi, “An intelligent system for
1995. road moving object detection,” in Proc. Int. Conf. Hybrid Intell. Syst.,
[15] Z. Zhao, T. Bouwmans, X. Zhang, and Y. Fang, “A fuzzy background vol. 420, 2016, pp. 189–198.
modeling approach for motion detection in dynamic backgrounds,” in [38] S. Javed, S. H. Oh, A. Sobral, T. Bouwmans, and S. K. Jung, “Back-
Multimedia and Signal Processing (Communications in Computer and ground subtraction via superpixel-based online matrix decomposition
Information Science), vol. 346, F. L. Wang, J. Lei, R. W. H. Lau, and with structured foreground constraints,” in Proc. IEEE Int. Conf. Com-
J. Zhang, Eds. Berlin, Germany: Springer, 2012. put. Vis. Workshop, Dec. 2015, pp. 930–938.
[16] B. Xin, Y. Tian, Y. Wang, and W. Gao, “Background Subtraction via [39] D.-H. Kim and J.-H. Kim, “Effective background model-based RGB-D
generalized fused lasso foreground modeling,” in Proc. IEEE Conf. dense visual odometry in a dynamic environment,” IEEE Trans. Robot.,
Comput. Vis. Pattern Recognit., Jun. 2015, pp. 4676–4684. vol. 32, no. 6, pp. 1565–1573, Dec. 2016.
[17] D. Bloisi and L. Iocchi, “Independent multimodal background subtrac- [40] J. Son, S. Kim, and K. Sohn, “Fast illumination-robust foreground detec-
tion,” in Proc. Int. Conf. Comput. Modeling Objects Presented Images, tion using hierarchical distribution map for real-time video surveillance
Fundamentals, Methods Appl., 2012, pp. 39–44. system,” Expert Syst. Appl., vol. 66, pp. 32–41, Dec. 2016.
[18] A. Vacavant, L. Tougne, L. Robinault, and T. Chateau, “Special section [41] Y. Yang, Q. Zhang, P. Wang, X. Hu, and N. Wu, “Moving object
on background models comparison,” Comput. Vis. Image Understand., detection for dynamic background scenes based on spatiotemporal
vol. 122, pp. 1–3, May 2014. model,” Adv. Multimedia, vol. 2017, May 2017, Art. no. 5179013.
[19] T. Bouwmans, J. Gonzàlez, C. Shan, M. Piccardi, and L. Davis, “Special [42] H. Sajid and S.-C. S. Cheung, “Universal multimode background sub-
issue on background modeling for foreground detection in real-world traction,” IEEE Trans. Image Process., vol. 26, no. 7, pp. 3249–3260,
dynamic scenes,” Mach. Vis. Appl., vol. 25, no. 5, pp. 1101–1103, Jul. 2017.
2014. [43] D. D. Bloisi, A. Pennisi, and L. Iocchi, “Parallel multi-modal
background modeling,” Pattern Recognit. Lett., vol. 96, pp. 45–54,
[20] T. Bouwmans, “Traditional and recent approaches in background mod-
Sep. 2016.
eling for foreground detection: An overview,” Comput. Sci. Rev.,
vols. 11–12, pp. 31–66, May 2014. [44] R. Trabelsi, I. Jabri, F. Smach, and A. Bouallegue, “Efficient and fast
multi-modal foreground-background segmentation using RGBD data,”
[21] W. Kim and C. Jung, “Illumination-invariant background subtraction:
Pattern Recognit. Lett., vol. 97, pp. 13–20, Oct. 2017.
Comparative review, models, and prospects,” IEEE Access, vol. 5,
[45] T. Akilan, Q. M. J. Wu, and Y. Yang, “Fusion-based foreground
pp. 8369–8384, 2017.
enhancement for background subtraction using multivariate multi-
[22] S. R. E. Datondji, Y. Dupuis, P. Subirats, and P. Vasseur, “A survey of model Gaussian distribution,” Inf. Sci., vols. 430–431, pp. 414–431,
vision-based traffic monitoring of road intersection,” IEEE Trans. Intell. Mar. 2018.
Transp. Syst., vol. 17, no. 10, pp. 2681–2698, Oct. 2016.
[46] S. Calderara, R. Melli, A. Prati, and R. Cucchiara, “Reliable background
[23] S. K. Choudhury, P. K. Sa, S. Bakshi, and B. Majhi, “An evalu- suppression for complex scenes,” in Proc. ACM Int. Workshop Video
ation of background subtraction for object detection vis-a-vis miti- Surveill. Sensor Netw., 2006, pp. 211–214.
gating challenging scenarios,” IEEE Access, vol. 4, pp. 6133–6150, [47] C. Stauffer and W. E. L. Grimson, “Adaptive background mixture models
2017. for real-time tracking,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis.
[24] Z. Chen, T. Ellis, and S. A. Velastin, “Vision-based traffic surveys in Pattern Recognit., vol. 2, Jun. 1999, p. 252.
urban environments,” J. Electron. Imag., vol. 25, no. 5, p. 051206, [48] C. R. Wren, A. Azarbayejani, T. Darrell, and A. P. Pentland, “Pfinder:
2016. Real-time tracking of the human body,” IEEE Trans. Pattern Anal. Mach.
[25] S. Varadarajan, H. Wang, P. Miller, and H. Zhou, “Fast convergence of Intell., vol. 19, no. 7, pp. 780–785, Jul. 1997.
regularised region-based mixture of Gaussians for dynamic background [49] Y. Benezeth, P.-M. Jodoin, B. Emile, H. Laurent, and C. Rosenberger,
modelling,” Comput. Vis. Image Understand., vol. 136, pp. 45–58, “Review and evaluation of commonly-implemented background subtrac-
Jul. 2015. tion algorithms,” in Proc. 19th Int. Conf. Pattern Recognit., Dec. 2008,
[26] K. L. Chan, “Detection of foreground in dynamic scene via two-step pp. 1–4.
background subtraction,” Mach. Vis. Appl., vol. 26, no. 6, pp. 723–740, [50] Y. Goya, T. Chateau, L. Malaterre, and L. Trassoudaine, “Vehicle
2015. trajectories evaluation by static video sensors,” in Proc. IEEE Intell.
[27] A. K. S. Kushwaha and R. Srivastava, “Maritime object segmentation Transp. Syst. Conf., Sep. 2006, pp. 864–869.
using dynamic background modeling and shadow suppression,” Comput. [51] M. Heikkila and M. Pietikäinen, “A texture-based method for modeling
J., vol. 59, no. 9, pp. 1303–1329, Sep. 2016. the background and detecting moving objects,” IEEE Trans. Pattern
[28] G. Gemignani and A. Rozza, “A robust approach for the background Anal. Mach. Intell., vol. 28, no. 4, pp. 657–662, Apr. 2006.
subtraction based on multi-layered self-organizing maps,” IEEE Trans. [52] S. J. Noh and M. Jeon, “A new framework for background subtrac-
Image Process., vol. 25, no. 11, pp. 5239–5251, Nov. 2016. tion using multiple cues,” in Proc. Asian Conf. Comput. Vis., 2012,
[29] Z. Zeng, J. Jia, Z. Zhu, and D. Yu, “Adaptive maintenance scheme for pp. 493–506.
codebook-based dynamic background subtraction,” Comput. Vis. Image [53] P.-L. St-Charles and G.-A. Bilodeau, “Improving background subtraction
Understand., vol. 152, pp. 58–66, Nov. 2016. using local binary similarity patterns,” in Proc. IEEE Winter Conf. Appl.
[30] Z. Qu and X.-L. Huang, “The foreground detection algorithm combined Comput. Vis., Mar. 2014, pp. 509–515.
the temporal–spatial information and adaptive visual background extrac- [54] O. Barnich and M. Van Droogenbroeck, “ViBe: A universal background
tion,” Imag. Sci. J., vol. 65, no. 1, pp. 49–61, 2017. subtraction algorithm for video sequences,” IEEE Trans. Image Process.,
[31] C. Cuevas, R. Martínez, D. Berjón, and N. García, “Detection of vol. 20, no. 6, pp. 1709–1724, Jun. 2011.
stationary foreground objects using multiple nonparametric background- [55] L. Maddalena and A. Petrosino, “A self-organizing approach to back-
foreground models on a finite state machine,” IEEE Trans. Image ground subtraction for visual surveillance applications,” IEEE Trans.
Process., vol. 26, no. 3, pp. 1127–1142, Mar. 2017. Image Process., vol. 17, no. 7, pp. 1168–1177, Jul. 2008.
[32] Z. Zhong, B. Zhang, G. Lu, Y. Zhao, and Y. Xu, “An adaptive back- [56] L. Maddalena and A. Petrosino, “A fuzzy spatial coherence-based
ground modeling method for foreground segmentation,” IEEE Trans. approach to background/foreground separation for moving object detec-
Intell. Transp. Syst., vol. 18, no. 5, pp. 1109–1121, May 2017. tion,” Neural Comput. Appl., vol. 19, no. 2, pp. 179–186, 2010.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
[57] A. Manzanera and J. C. Richefeu, “A new motion detection algorithm Lily Rachmawati received the B.Eng. and Ph.D.
based on – background estimation,” Pattern Recognit. Lett., vol. 28, degrees from National University of Singapore
no. 3, pp. 320–328, 2007. in 2004 and 2009, respectively. She has seven years
[58] S. R. Norsworthy, R. Schreier, and G. C. Temes, Delta-Sigma Data of research experience in maritime technology. She
Converters: Theory, Design, and Simulation. Hoboken, NJ, USA: is currently a Staff Technologist with Rolls-Royce,
Wiley, 1996. Singapore.
[59] H. W. Kuhn, “The Hungarian method for the assignment problem,”
Naval Res. Logistics Quart., vol. 2, nos. 1–2, pp. 83–97, Mar. 1955.
[60] D. Gao, V. Mahadevan, and N. Vasconcelos, “On the plausibility of
the discriminant center-surround hypothesis for visual saliency,” J. Vis.,
vol. 8, no. 7, p. 13, 2008.
[61] A. Mumtaz, W. Zhang, and A. B. Chan, “Joint motion segmentation
and background estimation in dynamic scenes,” in Proc. IEEE Conf.
Comput. Vis. Pattern Recognit., Jun. 2014, pp. 368–375.