Gomez Silva2017

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Multi-Object Tracking Errors Minimisation by

Visual Similarity and Human Joints Detection

M. J. Gómez-Silva, J.M. Armingol, A. de la Escalera

{magomezs, armingol, escalera}@ing.uc3m.es


Intelligent Systems Lab (LSI) Research Group, Universidad Carlos III de Madrid, Leganés, Madrid, Spain

Keywords: Multi-Object Tracking, Data Association, Con- Many of the research efforts, focused on reducing the track-
volutional Pose Machine, Degree of Appearance Similarity, ing errors, exploit the temporal coherency, [1], by the ex-
Detection-by-Tracking. traction of people tracklets, i.e. a set of frames, sometimes
even requiring future frames. However, this paper proposes a
Abstract frame-by-frame association method, allowing an online track-
ing, without requiring the training of models for every individ-
Multi-object tracking for video-surveillance is a challenging ual. This is achieved by the fusion of three strategies, explained
task, especially in complex real-world scenes, where crossing below, which constitute the main contributions of this paper.
people can be easily mismatched, and the occlusions lead to
lose some tracks. Moreover, tracking-by-detection approaches 1. A novel formulation to score the matching of a certain
are badly affected by the inaccuracies of the chosen detector, so identity with a certain detection, by the fusion of a geo-
that the number of both false positives and misses objects are metric and a visual metric. The geometric metric is com-
highly increased. This paper presents a methodology to reduce puted from the distances from the joints locations of the
such errors rates, by means of the design of a human shape individuals to those in the previous frame, and the visual
validation filter, a detection-by-tracking strategy and a novel metric measures the degree of appearance similarity be-
matching score, which fusions the geometric data given by the tween the new detections and the reference previously
detection of every person's joints and the visual similarity with saved for each tracked agent. The way in which these
respect to a previously saved reference. The resulting tracking metrics are combined is adaptable to the situations of
performance enhancements are shown in a comparative evalu- disappearing agents and crowded scenes, resulting in a
ation, also proving the capacity of this method to deal with the reduction of the identity switches.
faults of the provided detections. 2. A specialised search of every missed individual follow-
ing a detection-by-tracking approach. A Convolutional
1 Introduction Pose Machine (CPM), [16], is used to search a human
Tracking multiple people is one of the main tasks of an Intel- shape in the location where each missed agent is ex-
ligent Surveillance System (ISS), whose purpose is to improve pected to be found. Therefore, this strategy reduces the
and automate the management of the information provided by number of false negatives caused by the people detector.
a increasing number of security cameras. Multi-object tracking 3. A filter to validate the human shapes of the detected ob-
(MOT) task consists of visually finding the location of multiple jects, which notably decreases the false positives rate.
individuals and conserving their identities in a sequence.
Most MOT algorithms address such problem following the The proposed approach has been tested on the CAVIAR1
’tracking-by-detection’ paradigm, which joins a people detec- dataset [4], proving the improvements in the tracking perfor-
tor and a data association algorithm to match each detection mance thanks to the cited contributions. The obtained results
with its corresponding identity. The application of tracking- present the proposed MOT algorithm as a versatile one, adapt-
by-detection approaches in crowded scenes, with occlusions able to different detectors and robust against its errors rates,
and changing-trajectory or crossing people, produces certain providing successful results in comparison with other tracking-
tracking errors, whose number is highly increased by the lack by-detection approaches.
of precision inherited from the used detector. The rest of the paper is organised as follows. Section 2
There are three main types of tracking errors: the false neg- gives an overview of the related state-of-the-art methods. Sec-
atives or missed agents, the false positives caused by the asso- tion 3 describes the proposed solution, and its analysis and ex-
ciation of an identity to some object different from the agents perimental results are presented in Section 4. Finally, the con-
to track, and the identity switches, due to the association of the clusions are exposed in Section 5.
wrong identity to a person. The definition of these errors and 1
The dataset is publicly available under
the protocol used to measure them is established in [2]. http://homepages.inf.ed.ac.uk/rbf/CAVIARDATA1/.

8th International Conference on Imaging for Crime Detection and Prevention, ICDP-2017, Madrid 13-15 Dec. 2017
IET Digital Library 25
2 Related Work 3 Proposed Approach
A lot of research has been developed in order to improve the In this section, the proposed tracking algorithm is presented.
MOT task performance by means of enhancing one or several Firstly, a general view of the designed architecture and data
of these four key parts of the tracking: the detections provided, structures is given. Subsequently, the three main modules,
the way in which each identity is represented or modelled, and Human Shape Validation, Data Association, and Detection-by-
the scoring metric and the association method used to match Tracking are described in detail.
the detections with their identities.
Some of the most commonly used data association methods 3.1 Architecture
are based on Bayesian probability, such as Joint Probabilistic
The architecture of the proposed tracking approach is shown in
Data Association Filters (JPDAF) [14] and Multiple Hypothe-
Figure 1, where two iterations of the algorithm are represented.
ses Tracking (MHT), recently revisited by [10]. Instead of that,
In every iteration, the inputs are two sets, one of them is consti-
a global association method is proposed here, based on the
tuted by the list of identities to track, hereinafter called agents,
Hungarian method [11], which finds the detections-identities
and the other contains the detections given by a previous people
assignment that minimises a certain matching cost. In [8] part
detection stage. Subsequently, most of the false detections are
of the data association is solved by the Hungarian method,
removed by a Human Shape Validation filter. This module uses
where the cost is obtained from the detections locations.
a CPM [16] to detect the locations of every detection’s joints,
Data association methods based on the detections locations
and then it analyses the coherence of such locations according
are usually improved by their previous estimation with a fil-
to some physical human constraints.
ter algorithm. Literature presents two main configurations for
The next stage is the data association process, which per-
them: a decentralized one, which assigns a tracker to each ob-
forms the assignment of every remained detection to its corre-
ject, like the Decentralized Particle Filter (DPF) [5], and a cen-
sponding agent, whose representation is updated by that detec-
tralized one, where all the agent states form a single represen-
tion, constituting the input of the next frame.
tation, such as the Tracker Hierarchy [19], and the Reversible
However, not always there are as many detections as agents.
Jump Markov Chain Monte Carlo - Particle Filter (RJMCMC)
When a person is appearing for the first time in the scene, as it
[9]. In this work, the head location of every tracked identity is
is represented in the first frame of Figure 1, this causes a detec-
predicted by a Kalman Filter [17], widely used to estimate the
tion that is not matched with any of the existing agents, and it
state of an agent with a set of measurements over time.
must be added to the list of agents to be tracked in the follow-
In crowded scenes, a location or motion-based associa- ing frames. Moreover, sometimes the detector miss an individ-
tion method, such as [12], could find problems to deal with ual, as it is shown in the second frame of Figure 1. Therefore,
changing-trajectory and crossing agents. To solve that, this there is not always a detection for every agent, so after the as-
work presents the adaptation of the matching score formulation sociation process, for the non-matched agents, its detection is
to such situations, considering not only every person’s joints searched by the CPM. This strategy has been called detection-
locations but also the degree of appearance similarity with its by-tracking because joints are searched around the expected
identity, [6]. location of the agent, which is predicted by a Kalman Filter. If
Other approaches exploit the temporal consistency with a the detected joints present a human shape, the agent is consid-
batch tracking, like [1], using a tracklet to provide the cor- ered as found and its representation is updated.
rect matching. Some batch algorithms, such as the presented The output of every iteration is the updated list of agents to
in [15], present extraordinary results on complex surveillance track, whose identities are represented by different colours.
scenarios. However, it requires future frames, making impos-
sible an online tracking. Instead of that, the matching metric 3.2 Human and Agent Structure
used in this paper allows a frame-by-frame identification of the
agents. Other on-line approaches require learning a model for The proposed association method is based on localisation and
every individual to track, [18], which results in the necessity of visual data about the searched agents and the found detections.
a certain number of frames until getting reliable models. On For that reason, two different types of structures have been de-
contrary, our method uses only one pre-trained model able to signed to render the necessary data for both sets. Each detec-
visually identify any agent in critical situations, avoiding the tion is represented by a structure named Human, formed by:
switch of the crossing-agents’ identities.
• A RoI (Region of Interest) structure rendering a box
On the other hand, many efforts have been focused on the
bounding a human in a frame, with the upper left pixel
enhancement of the human detection methods to avoid the loss
position (px , py ), and its dimensions.
of some tracks and the starting of incorrect ones, that badly af-
fects the performance of the tracking-by-detection approaches, • A vector, J, where each element jk is the location
as the proposed in this paper. In [1], the position and articula- (jk,x , jk,y ) of the joint k in the image. The length of J is
tion of the limbs are used to model every individual. This paper 15, corresponding to this 15 human joints: head, neck,
proposes the use of the Convolutional Pose Machine (CPM), shoulderr , elbowr , wristr , shoulderl , elbowl , wristl ,
presented in [16], to search a human joints structure for agents hipr , kneer , ankler , hipl , kneel , anklel , waist, where
missed by the detector, and to validate the detected ones. the sub-indexes r and l, mean right and left, respectively.

8th International Conference on Imaging for Crime Detection and Prevention, ICDP-2017, Madrid 13-15 Dec. 2017
IET Digital Library 26
Figure 1. Proposed architecture.

Every agent is represented by a structure called Agent, 3.4 Data Association


whose main attributes are the followings:
The aim of the data association process is to find the correct
• A Human structure (defined above), updated by the cor-
assignment between the available human detections, given by
responding detection in every iteration.
an array, H, of Human structures, and the agents saved in
• An identity number, ID. an array, A, of Agent structures. Here, it is proposed the use
of a global association method, where all the possible match-
• A T racker structure to manage the Kalman filter asso- ings are considered at the same time, instead of independently
ciated with the agent. This structure contains two main searching the best one for each agent.
variables: its state, x̂, defined by the location and veloc-
In order to achieve that goal, a matrix, M , is created to col-
ity of the agent’s head, and the measurement, ẑ, updated
lect the costs of matching every agent, ai , with every human
with the head location of its corresponding detection.
detection, hj . The problem of finding the assignment with the
• An image called V isualRef erence, rendering the minimum total matching cost is solved by an optimisation al-
agent's appearance the last time that it was matched. gorithm based on the Hungarian Method [11].
The cost matrix, M , combines the information about the
• A flag named M atched that indicates if an agent has
agents’ localisation and their visual appearance, which is col-
been matched or not in the association process.
lected by the matrices D and S, respectively. D saves the
weighted mean Euclidean distance from the joints locations
3.3 Human Shape Validation, HSV of every agent to the joints of every human detection, as it is
The vector of joints, J hdi , of every detection is filled by a hu- defined by 1, where dE is the Euclidean distance operation be-
man joints detection algorithm, whose input is the zone delim- tween two points. W is a vector, where each element, wk , is
ited by its bounding box, RoI hdi , in the current frame. The the weight given to the joint k. E is a vector, where each ele-
joints detector is based on the CPM, presented in [16], which ment, ek , takes value 1 if the joint k has been detected and 0,
has been adapted to be independent of the scale, that is the RoI otherwise.
size. S collects the Degree of Appearance Similarity, DOAS, be-
Even when the given RoI is delimiting a frame region tween every agent and every human detection, as is shown in
where there is no human shape, the CPM tries to find its joints. 2. The DOAS is computed as the distance from a multidimen-
Obviously, in such cases, the given joints do not follow the sional point to the boundary hyperplane learnt by a pre-trained
physical structure of a human body. For that reason, a Hu- Support Vector Machine, SVM, as it is explained in [6]. That
man Shape Validation, HSV, filter has been designed, coding multidimensional point is calculated by a descriptor based on a
some constraints to ensure coherence among the human joints Volume Local Binary Pattern, VLBP, that compares an agent’s
locations in a walking pose, with the aim of eliminating those V isualRef erence with the region of the current frame delim-
detections not corresponding to people. ited by a human detection’s RoI.

8th International Conference on Imaging for Crime Detection and Prevention, ICDP-2017, Madrid 13-15 Dec. 2017
IET Digital Library 27
K 3.5 Detection-by-Tracking, DBT
ai h
k=1 dE (jk , jk )wk ek
j

dij = K (1) For those agents, whose corresponding detection was not found
k=1 wk ek by the detector, a human shape is searched in the frame area
h
sij = DOAS(Rkai , Rk j ) (2) where those were supposed to be. This strategy is called
detection-by-tracking (DBT), because the Kalman Filter algo-
Once D and S have been computed, their combination to
rithm, [17] is used to predict the agents'locations. Then the
create M , defined by 3, can take three different formulations to
joints detector algorithm explained in 3.3 is used to find human
adapt it to some critical situations.
joints around that.
First, in order to make easier the assignment task, those de-
Although it is impossible to detect new people who have
tections whose heads locations are outside a radial perimeter, of
never been tracked before with the DBT strategy, this reduces
radius Td , around the expected head location of a query agent,
the number of lost tracks since it allows us to find a new de-
are penalised with a high-cost matching value, P . The expected
tection for missed agents. In addition, the HSV filter, 3.3, is
head location of each agent is predicted by the Kalman filter
again used to ensure that the corresponding human detection
algorithm from the data of the agent’s T racker structure. For
has been found, avoiding the detection of false positives in this
the detections located inside the perimeter, the matching value
second people detection.
corresponds to the weighted mean distance between the joints
locations, dij , given by the matrix D.
However, if the joints distances from several detections to a 4 Experimental results
certain agent are similar, that means they differ in a value lower
In this section, an analysis of the improvements produced by
than a certain threshold Tam , the matching for that agent is con-
each one of the presented contributions is given, as well as its
sidered as ambiguous, as it is defined by the function fam (Ai ),
comparison with other tracking-by-detection methods. More-
in 4. When this situation happens, for example in crowded
over, the dataset and the evaluation metrics used to measure the
scenes with crossing people, the cost matching value is not
tracking performance are described.
just dij , but this is multiplied by the corresponding DOAS,
sij , given by the matrix S. The DOAS metric takes values
in the range (-1, 1), where the higher value the more similar- 4.1 Test Dataset
ity it means. Since our method penalises the dissimilarity, the
Our test dataset consists of six sequences belonging to
multiplying value is not sij , but its difference with respect the
CAVIAR dataset [4], that presents some of the most chal-
maximum value found in the matrix S.
lenging situation for the tracking: crossing people (En-
⎧ terExistCrossingPaths1 and 2), changing-trajectory individu-
⎨ P (hdij > Td ) als (OneLeaveShopReenter1 and 2), groups and people reap-
mij = dij (hdij ≤ Td ) ∧ (fam (ai ) = 0) pearing in the scene, where until 10 different identities appear

dij (max(S) − sij ) (hdij ≤ Td ) ∧ (fam (ai ) = 1) (OneShopOneWait1 and 2). For all the experiments, the pro-
(3) posed approach has been forced to track every individual in
those sequences, as long as their detections were bigger than

⎨ 1 ∃j ((dij − dij  ) < Tam ) ∧ (j = j  ) 13x26 pixels, and occluded in less than the 50%.
fam (ai ) = : dij  := min(di )

0 otherwise 4.2 Evaluation Metrics
(4)
Once the matrix M is computed, the Hungarian Method To evaluate the performance of our approach, the Multi-Object
found the best assignment, so that only one detection is Tracking Accuracy (M OT A),[2], defined by 6 has been calcu-
matched with each agent and vice-versa. The solution is given lated. FN , FP and IdSW are the summations of the number
through a matrix, R, where the elements corresponding to a of false negatives, FN,t , false positives, FP,t , and identification
matching take value 0. switches, Idsw,t , respectively, divided by the sum of the num-
Only those agents that have been correctly matched, that ber of ground truth objects, gt , at every frame, t, as is defined
means its flag M atched is up, must be updated with its corre- by 7, 8 and 9.
sponding detection. However, when the detection correspond-
ing to a certain agent is not available, any other incorrect detec- M OT A = 1 − (FN + FP + IdSW ) (6)
tion could be assigned to it. To avoid the improperly updating 
of an agent ai , its M atched flag, M atchedai , is not up un- t FN,t
FN =  (7)
less the corresponding cost matching value was under a certain t gt
threshold Tc , as well as, the corresponding element, rij , of the 
FP,t
resulting matrix R values 0, as is defined by 5. FP = t (8)
t gt
 
1 ∃j | (rij = 0) ∧ (mij < Tc ) t IdSW,t
matched ai
= (5) IdSW =  (9)
0 otherwise t gt

8th International Conference on Imaging for Crime Detection and Prevention, ICDP-2017, Madrid 13-15 Dec. 2017
IET Digital Library 28
4.3 Results The low number of identity switches, IdSW , is due to the
proposed global association method based on the novel formu-
One of the main goals of this work is to obtain a MOT ap- lation of the matching score (Section 3.4). To prove that fact,
proach robust against the failures of the detections given as in- the performance of our complete MOT algorithm (referred as
put, to achieve a versatile MOT, where any people detector was Global Association in Figure 3) has been compared with a ver-
suitable. Our method has been fed by these three detectors: a sion where the data association is performed in an independent
implementation of HOG-SVM people detector [3], one based way for each agent, and based only on its head location (Indi-
on the upper-body model trained by [7] for a Haar-Cascade vidual Association). This experiment proves that the proposed
classifier, and the fusion of the two previous ones. association method not only reduces IdSW but also FN , FP ,
The tracking errors presented by out method are mainly leading to a prominent enhancement of the MOTA value.
caused by the detector, as Table 1 shows: in the first case, a
80.08% of the FN are caused by the HOG-SVM detector, and
in the second case, a 94.40% of the FP are inherited from the
HAAR-Cascade detector. Moreover, our tracking algorithm is
not only able of avoiding the increase of the main errors pre-
sented by the detectors (which means a 100% performance)
but it also reduces such error rates. The reduced error rates are
highlighted in bold in Table 1. The high rate of FP presented
by HOG-SVM detector is decreased, as well as, the FN of the
Figure 3. Comparison of Data Association methods.
HAAR-CASCADE detector and both metrics are improved for
the fusion of their detections. For that reason, the third detector The results obtained for the EnterExistCrossingP aths
has been selected to conduct the rest of the experiments. and OneShopOneW ait sequences are presented in Table
2. The resulting performance has been compared with other
Table 1. Detectors and MOT errors rates. tracking-by-detection methods: Decentralized Particle Filters
Detector Detector+MOT (DP F ), [13], Tracker Hierarchy (Hierarchy), [19], and Re-
FN FP FN FP versible Jump Markov Chain Monte Carlo - Particle Filter
HOG-SVM 10,465 11,577 13,067 610 (RJM CM C), [9]. These methods present high errors rates,
HAAR-Cascade 8,846 472 6,753 500 which are generally decreased by the proposed strategies, re-
FUSION 6,279 10,427 6,207 1,043 sulting in a remarkable improvement of the MOTA.

To measure the performance improvement caused by the Table 2. MOT results comparison on .
DBT and the HSV modules, four different versions of our MOT FN FP IdSW M OT A
algorithm (see Figure 1) have been tested over the mentioned EnterExistCrossingP aths sequence
sequences (Section 4.1), and the results are shown in Figure 2. Proposed MOT 0.34 0.07 0.02 0.57
In the first version, both modules have been removed, so the DPF 0.41 0.37 0.00 0.22
tracking task is performed by the global data association mod-
Hierarchy 0.63 0.12 0.02 0.25
ule. The second and the third versions include the modules of
RJMCMC 0.40 0.08 0.01 0.51
DBT and HSV respectively, and the fourth version corresponds
to the complete proposed algorithm. The implementation of the OneShopOneW ait sequence
DBT strategy notably reduces the FN , which was its purpose, Proposed MOT 0.49 0.09 0.05 0.37
although the FP increases. On the other hand, the implementa- DPF 0.45 0.45 0.01 0.09
tion of the HSV filter remarkably decreases FP , for what it was Hierarchy 0.67 0.04 0.03 0.26
meant, but increments FN . However, the combination of both RJMCMC 0.69 0.15 0.02 0.14
strategies, on version 4, highly improves both metrics, resulting
in the enhancement of the tracking performance represented by
the M OT A value. 5 Conclusions
In this paper, the problem of minimising the most common
Multi-Object Tracking errors has been addressed by the design
of three strategies, implemented in a people tracking applica-
tion. A novel matching score formulation adaptable to the situ-
ations of crossing, missed and occluded individuals have been
proposed. Its use in a global data association method has pro-
vided a notable reduction of the identities switches. Further-
more, the rate of objects improperly tracked has been decreased
by the search of human shape coherence in the joints locations
Figure 2. Effects of DBT and HSV modules on the M OT A. for every detection, by the designed Human Shape Validation

8th International Conference on Imaging for Crime Detection and Prevention, ICDP-2017, Madrid 13-15 Dec. 2017
IET Digital Library 29
filter. In addition, the problem of losing some individuals has Surveillance and Performance Evaluation of Tracking
been managed with a detection-by-tracking method, based on and Surveillance, October 2003”.
the proper scale adaptation of a Convolutional Pose Machine.
The proposed combination of these contributions has re- [8] Chang Huang, Bo Wu, and Ramakant Nevatia. Robust
sulted in a versatile and robust Multi-Object Tracking method object tracking by hierarchical association of detection
with a highly improved tracking performance and able not only responses. In European Conference on Computer Vision,
to cope with the failures of the detections used to feed it, but pages 788–801. Springer, 2008.
also to decrease them. The comparison with other tracking- [9] Zia Khan, Tucker Balch, and Frank Dellaert. Mcmc-
by-detection approaches has provided successful results for based particle filtering for tracking a variable number of
the presented method, whose frame-by-frame data association interacting targets. IEEE transactions on pattern analysis
method allows an online tracking of any agent, without previ- and machine intelligence, 27(11):1805–1819, 2005.
ous or online learning of dedicated models for them.
[10] Chanho Kim, Fuxin Li, Arridhana Ciptadi, and James M
Acknowledgements Rehg. Multiple hypothesis tracking revisited. In Proceed-
ings of the IEEE International Conference on Computer
This work was supported by the Spanish Government through Vision, pages 4696–4704, 2015.
the CICYT project (TRA2013-48314-C3-1-R), (TRA2015-
63708-R) and Ministerio de Educación, Cultura y Deporte para [11] Harold W Kuhn. The hungarian method for the assign-
la Formación de Profesorado Universitario (FPU14/02143), ment problem. Naval research logistics quarterly, 2(1-
and Comunidad de Madrid through SEGVAUTO-TRIES 2):83–97, 1955.
(S2013/MIT-2713).
[12] Niall McLaughlin, Jesus Martinez Del Rincon, and Paul
Miller. Enhancing linear programming with motion mod-
References eling for multi-target tracking. In Applications of Com-
[1] Mykhaylo Andriluka, Stefan Roth, and Bernt Schiele. puter Vision (WACV), 2015 IEEE Winter Conference on,
People-tracking-by-detection and people-detection-by- pages 71–77. IEEE, 2015.
tracking. In Computer Vision and Pattern Recognition, [13] Patrick Perez, Jaco Vermaak, and Andrew Blake. Data
2008. CVPR 2008. IEEE Conference on, pages 1–8. fusion for visual tracking with particles. Proceedings of
IEEE, 2008. the IEEE, 92(3):495–513, 2004.
[2] Keni Bernardin and Rainer Stiefelhagen. Evaluating mul- [14] Christopher Rasmussen and Gregory D. Hager. Proba-
tiple object tracking performance: the clear mot met- bilistic data association methods for tracking complex vi-
rics. EURASIP Journal on Image and Video Processing, sual objects. IEEE Transactions on Pattern Analysis and
2008(1):1–10, 2008. Machine Intelligence, 23(6):560–576, 2001.
[3] Navneet Dalal and Bill Triggs. Histograms of oriented [15] Bing Wang, Gang Wang, Kap Luk Chan, and Li Wang.
gradients for human detection. In Computer Vision and Tracklet association with online target-specific metric
Pattern Recognition, 2005. CVPR 2005. IEEE Computer learning. In Proceedings of the IEEE Conference on Com-
Society Conference on, volume 1, pages 886–893. IEEE, puter Vision and Pattern Recognition, pages 1234–1241,
2005. 2014.
[4] Robert Fisher, Jose Santos-Victor, and James Crowley. [16] Shih-En Wei, Varun Ramakrishna, Takeo Kanade, and
Caviar: Context aware vision using image-based active Yaser Sheikh. Convolutional pose machines. In Proceed-
recognition, 2005. ings of the IEEE Conference on Computer Vision and Pat-
tern Recognition, pages 4724–4732, 2016.
[5] David Gerónimo Gomez, Frédéric Lerasle, and Antonio
López Peña. State-driven particle filter for multi-person [17] Greg Welch and Gary Bishop. An introduction to the
tracking. In Advanced Concepts for Intelligent Vision Sys- kalman filter. 1995.
tems, pages 467–478. Springer, 2012.
[18] Min Yang and Yunde Jia. Temporal dynamic appearance
[6] Marı́a José Gómez-Silva, José Marı́a Armingol, and Ar- modeling for online multi-person tracking. Computer Vi-
turo de la Escalera. Multi-object tracking with data asso- sion and Image Understanding, 153:16–28, 2016.
ciation by a similarity identification model. In 7th Inter-
national Conference on Imaging for Crime Detection and [19] Jianming Zhang, Liliana Lo Presti, and Stan Sclaroff.
Prevention (ICDP 2016), pages 25–30. IET, 2016. Online multi-person tracking by tracker hierarchy. In
Advanced Video and Signal-Based Surveillance (AVSS),
[7] Modesto Castrillon-Santana Hannes Kruppa and Bernt 2012 IEEE Ninth International Conference on, pages
Schiele. Fast and robust face finding via local con- 379–385. IEEE, 2012.
text. In Joint IEEE International Workshop on Visual

8th International Conference on Imaging for Crime Detection and Prevention, ICDP-2017, Madrid 13-15 Dec. 2017
IET Digital Library 30

You might also like