Professional Documents
Culture Documents
Chapter 3 CYTED Book
Chapter 3 CYTED Book
Trajectory Prediction
1 Introduction
Video monitoring and autonomous driving are key technologies behind the future of
Intelligent Transportation and, more generally, Smart Cities. For those applications
and for many more, the developed systems critically need to predict the short-term
motion of pedestrians, e.g. to anticipate anomalous crowd motion through a video
1
2 M. Canche et al.
Modeling human motion has a long history, taking its roots, in particular, in the
simulation community, where researchers have proposed hand-tailored models such
as the Social forces model [12], in which the motion of agents is seen has the result
of attractive and repulsive forces acting on these agents. Many other approaches
have been explored, such as cellular automata [3] or statistical models like Gaussian
Processes [34, 35]. More recently, following the rise of deep learning techniques,
data-driven methods have turned the field upside down and have allowed huge im-
Epistemic Uncertainty Quantification in Human Trajectory Prediction 3
Observations Observations Observations
Ground truth Ground truth Ground truth
positions of other pedestrians. . . ), but here we limit the scope of our approach to
using only the past positions, expressed in world relative coordinates to the position
at 𝑇𝑜𝑏𝑠 . We also focus on the case of parametric, deterministic regression models, i.e.
models 𝜋 parameterized by parameters w. Finally we consider the case of forecasting
systems producing an estimate Σ̂𝑇w𝑜𝑏𝑠 +1:𝑇𝑝𝑟𝑒𝑑 of the associated aleatoric uncertainty.
Hence, our target model has the form
Here, in addition of the aleatoric uncertainties that the formulation above allows
top cope with, we would also like to quantify the underlying epistemic uncertainties.
This section gives a short introduction to Bayesian deep learning and shows how it
can be used to produce estimates of both aleatoric and epistemic uncertainties.
𝑝(y | x, w) 𝑝(w)
𝑝(w | D) = ∫ . (3)
𝑝(y | x, w) 𝑝(w) dw
where the 𝑁 weights w𝑖 are sampled from the posterior over the weights 𝑝(w | D).
In the trajectory prediction problem we described in the previous section, x is a time
series of observed positions and y is a time series of future positions.
𝑁
1 ∑︁
𝑝(y | x, D) = N y; ŷw𝑖 (x), 𝚺ˆ w𝑖 (x) . (6)
𝑁
𝑖=1;w𝑖 ∼ 𝑝 (w |D)
From this representation estimates for the first two moments of the posterior
distribution over y, given x and D, as
6 M. Canche et al.
15
Observations Observations
13 Ground truth 14 Ground truth
Prediction 14 Prediction
13
12 13
12
12
11
11
11
10
10 10
9
9 9
8
8 8
Observations
Ground truth
7 7 Prediction 7
Fig. 2 Uncertainty representation over the predicted position at timestep 𝑇𝑝𝑟𝑒𝑑 = 12. The blue
trajectory corresponds to the 𝑇𝑜𝑏𝑠 observed positions (used as an input for the prediction system),
the red trajectory corresponds to the 𝑇𝑝𝑟𝑒𝑑 ground truth future positions, the green trajectories are
the predictions and the corresponding covariances (estimated as in Eq 1 at 𝑇𝑝𝑟𝑒𝑑 ) are depicted
in pink. Left: Output from a deterministic model. Middle and Right: Outputs from a Bayesian
model (in this case, a variational model described in Section 5.1.2). The right subfigure depicts the
covariance resulting from the mixture in the middle.
𝑁
1 ∑︁ w𝑖
ŷ(x) ≜ ŷ (x), (7)
𝑁 𝑖=1
𝑁 𝑁
1 ∑︁ ˆ w𝑖 1 ∑︁ w𝑖
𝚺ˆ y (x) ≜ 𝚺 (x) + ( ŷ (x) − ŷ(x)) ( ŷw𝑖 (x) − ŷ(x)) 𝑇 , (8)
𝑁 𝑖=1 𝑁 𝑖=1
where ŷw𝑖 (x), 𝚺ˆ w𝑖 (x) is the output of the forecasting model to the input x at the
specific operation point w𝑖 . The volume of the variance matrix 𝚺ˆ y (x) gives us a first
rough way to quantify the uncertainty over our output.
To get a more fine-grained representation of the uncertainty, we can also simply
keep the mixture of Gaussian of Eq. 6.
As an illustration, in Fig. 2, we depict both uncertainty representations for Eqs. 6
(middle), 7 and 8 (right) for one single sampled predicted trajectory.
To ease the depiction and the explanations about the uncertainty calibration
process, we will only consider the uncertainty at the last predicted position 𝑇 𝑝𝑟 𝑒𝑑 ,
i.e. we will consider the 2 × 2 variance matrix
Σ̂𝑇w𝑝𝑟𝑒𝑑 . (9)
ing [7]. However, it is not easy to generalize or extrapolate these methods to other
areas. In [20], a method is proposed to re-calibrate the output of any 1D regression
algorithm, including Bayesian neural networks. It is inspired by the Platt scaling [30]
and uses a specific subset of recalibration data with isotonic regression to correct
the estimate for uncertainty. However, like other uncertainty calibration methods,
only the case of regression with a single independent variable to adjust is presented.
However, our trajectory prediction problem outputs sequences of pairs of variables,
which represent the position of the pedestrian in a 2D plane. In the following sec-
tion 4.1, we first recall the case of 1D regression as presented in Kuleshov et al [20].
The generalization of the uncertainty calibration method to the case of multi-variate
regression, with several independent variables to adjust, is described in Section 4.2.
ˆ 1
𝑃(𝛼) = |{𝑦 𝑘 | 𝐹𝑘 (𝑦 𝑘 ) ≤ 𝛼, 𝑘 = 1, ..., 𝐾 }| (14)
𝐾
is the proportion of the data within the calibration set where 𝑦 𝑘 is less than the 𝛼-th
quantile of 𝐹𝑘 . The uncertainty calibration pseudocode of this regression is presented
in the algorithm 1, and for more references, we urge the reader to consult [20].
C = { ( [ 𝜋 (x 𝑘 ) ] ( 𝑦𝑘 ) , 𝑃ˆ ( [ 𝜋 (x 𝑘 ) ] ( 𝑦𝑘 ))) } 𝐾
𝑘=1 , (15)
where
1
𝑃ˆ ( 𝛼) = | {𝑦𝑘 | [ 𝜋 (x 𝑘 ) ] ( 𝑦𝑘 ) ≤ 𝛼, 𝑘 = 1, ..., 𝐾 } |. (16)
𝐾
3: Train an isotonic regression model 𝛾 on C.
4: Output: Auxiliary regression model 𝛾 : [0, 1] → [0, 1].
The notion of quantile used in the 1D regression case presented above is specific
to the 1D case, hence we propose here to extend it to higher dimensions. For this
purpose, we make use of the 𝛼-highest density regions (HDR) concept [14], defined
for a probability density function 𝑓 as follows
𝑅( 𝑓 𝛼 ) = {y s.t. 𝑓 (y) ≥ 𝑓 𝛼 }
where 1 − 𝛼 defines an accumulated density over this region, that defines 𝑓 𝛼 :
P(y ∈ 𝑅( 𝑓 𝛼 )) = 1 − 𝛼. (17)
The interpretation is quite straightforward: for a value 𝛼, the HDR 𝑅( 𝑓 𝛼 ) is the
region corresponding to all values of y where the density is superior to 𝑓 𝛼 , where
𝑓 𝛼 is such that the total density summed up in this region is exactly 1 − 𝛼. When
𝛼 → 0, 𝑓 𝛼 → 0 and the HDR tends to the support of the density function. When
𝛼 → 1, then 𝑓 𝛼 → maxy 𝑓 (y) and the HDR tends to the empty set.
Epistemic Uncertainty Quantification in Human Trajectory Prediction 9
i.e., we add the values 𝑓 (𝑖) (equation 18) in descending order until the sum is
greater than or equal to 1 − 𝛼. The last value 𝑓 (𝑖) added to the descending sum, will
correspond to the estimation value of 𝑓 𝛼 . This method to estimate 𝑓 𝛼 is the one used
in the experiments of section 5.
Supposing that for each example 𝑘, our model outputs a probability density
function 𝑓 𝑘 representing the uncertainty on our answer, we can generalize the notion
of a well-calibrated output by stating that the model will be well calibrated, when for
any 𝛼 ∈ [0, 1], the proportion of ground-truth data falling into the 100(1−𝛼)%-HDR
is roughly 1 − 𝛼. Translating this into equations:
Í𝐾 Í𝐾
𝑘=1 I{y 𝑘 ∈ 𝑅( 𝑓 𝛼 )} 𝑘=1 I{ 𝑓 𝑘 (y 𝑘 ) ≥ 𝑓𝛼} 𝐾→inf
= −−−−−→ 1 − 𝛼 ∀𝛼 ∈ [0, 1]. (20)
𝐾 𝐾
P( y ∈ 𝑅 ( 𝑓 𝑘 (y 𝑘 )) ) = 1 − 𝛼ˆ 𝑘 . (21)
3: Construct the dataset:
C = { 𝛼ˆ 𝑘 , 𝑃ˆ 𝛼 ( 𝛼ˆ 𝑘 ) } 𝐾
𝑘=1 (22)
where
| {𝑦𝑘 | 𝑓 𝑘 (y 𝑘 ) ≥ 𝑓 𝛼 , 𝑘 = 1, ..., 𝐾 } |
𝑃ˆ 𝛼 ( 𝛼) = (23)
𝐾
4: Train an isotonic regression model 𝛾 on C.
5: Output: Auxiliary regression model 𝛾 : [0, 1] → [0, 1].
P( y ∈ 𝑅( 𝑓 𝑘 (y 𝑘 )) ) = 1 − 𝛼ˆ 𝑘 . (24)
where
|{𝑦 𝑘 | 𝑓 𝑘 (y 𝑘 ) ≥ 𝑓 𝛼 , 𝑘 = 1, ..., 𝐾 }|
𝑃ˆ 𝛼 (𝛼) = (26)
𝐾
denotes the fraction of data belonging to the 100(1 − 𝛼)% HDR. As in the case
of 1D regression, we use isotonic regression as the regression model to calibrate
on these data. This calibration model corrects the 𝛼 values that define the HDR so
that the condition of equation 20 is fulfilled. The whole algorithm is described in
Algorithm 2.
5 Experiments
For all the experiments described in this section, we use, as the base deterministic
model, a simple LSTM-based sequence-to-sequence model that we describe here-
after, and that outputs a single trajectory prediction with its associated uncertainty.
Epistemic Uncertainty Quantification in Human Trajectory Prediction 11
e𝑡 = W𝑒𝑚𝑏 𝛿𝑡 , (27)
which is fed to a first encoding LSTM layer referred to as L𝑒𝑛𝑐 , which encodes the
sequence e1:𝑇𝑜𝑏𝑠 . Its hidden state is denoted as h𝑡𝑒 ∈ R𝑑𝑐 and is updated recursively
according to the following scheme
On the decoding side, another LSTM layer L 𝑑𝑒𝑐 is used to perform the prediction
of the subsequent steps, for 𝑡 = 𝑇𝑜𝑏𝑠 + 1, ..., 𝑇 𝑝𝑟 𝑒𝑑
where the encodings e𝑡 for 𝑡 > 𝑇𝑜𝑏𝑠 encode the previously predicted 𝛿ˆ𝑡 , in testing,
or the ground truth 𝛿𝑡 ≜ p𝑡 − p𝑡−1 for 𝑡 > 𝑇𝑜𝑏𝑠 (principle of teacher forcing), in
training. The outputs 𝜅ˆ𝑡 ∈ R3 (Eq. 31) are used to define a 2 × 2 variance matrix 𝚺ˆ 𝑡
over the result, so that the parameter optimization (corresponding to Eq. 2) is written
as the minimization of the following log-likelihood
𝐾
∑︁ 𝑇∑︁
𝑝𝑟𝑒𝑑 𝑡
∑︁ 𝑡
∑︁
w∗ = arg min − log 𝑁 (p 𝑘,𝑡 ; p 𝑘,𝑇𝑜𝑏𝑠 + 𝛿ˆw
𝑘, 𝜏 , 𝚺ˆ w𝑘, 𝜏 ), (33)
w
𝑘=1 𝑡=𝑇𝑜𝑏𝑠 +1 𝜏=1 𝜏=1
where 𝑘 spans the indices in our training dataset (or mini-batch), that contains 𝐾 data.
This deterministic model has as its parameters w ≜ (W𝑒𝑚𝑏 , W𝑒𝑛𝑐 , W𝑑𝑒𝑐 , W𝑜𝑢𝑡 ).
Finally, to reconstruct the trajectory, for 𝑡 > 𝑇𝑜𝑏𝑠 we simply apply
𝑡
∑︁
p̂𝑡 = p𝑇𝑜𝑏𝑠 + 𝛿ˆ 𝜏 , (34)
𝜏=1
For our experiments, we have used the five sub-datasets forming the dataset known
as ETH-UCY [29, 23], which is a standard benchmark in HTP. These sub-datasets
are referred to as ETH-Univ, ETH-Hotel, UCY-Zara01, UCY-Zara02, UCY-Univ.
In all the experiments presented hereafter, we use the Leave-One-Out methodology,
where four of the sub-datasets provide the training/validation data and the remaining
one is used for testing.
The base model described above, and its Bayesian variants, is implemented with
𝑇𝑜𝑏𝑠 = 8, 𝑇 𝑝𝑟 𝑒𝑑 = 12, embeddings of dimension 𝑑 𝑒 = 128 and hidden size for
Epistemic Uncertainty Quantification in Human Trajectory Prediction 13
Table 1 Negative log-likelihoods (lower is better) over the different sub-datasets for deterministic,
ensemble, dropout and variational models.
MACE RMSCE MA
Det. Bayes Det. Bayes Det. Bayes
Eth-Hotel 0.126 0.120 0.144 0.152 0.133 0.126
Eth-Univ 0.069 0.044 0.081 0.059 0.071 0.045
Ucy-Zara01 0.063 0.082 0.075 0.113 0.066 0.086
Ucy-Zara02 0.110 0.079 0.145 0.109 0.114 0.082
Ucy-Univ 0.099 0.104 0.115 0.112 0.104 0.109
Table 2 Calibration metrics before applying the calibration process (lower is better). Deterministic
model vs. Dropout as the Bayesian approach.
LSTMs 𝑑 𝑐 = 256. The mini-batch size was chosen as 64. Early stopping was used
on the validation data to stop training.
The different models are all implemented with PyTorch [28]. Note that the im-
plementation of the ensembles does not require additional software layers. This is
not the case for the other two Bayesian techniques (Variational and Dropout), that
require significant modifications to the standard code. For this purpose, we have
make use of the library Bayesian-Torch [19] that implements the Bayesian versions
of the neural networks required for our model (LSTMs and Dense layers). All the
Bayesian variants have been used at testing with 𝑁 = 5 samples.
First, we compare the quality of the uncertainties coming from the deterministic
network and from its Bayesian counterparts. To do so, we evaluate, for each sub-
dataset, the average Negative Log Likelihood (NLL) of the ground truth data over
the output predictive distribution. This likelihood is a Gaussian in the deterministic
case, corresponding to Eq. 35, and a mixture of Gaussians in the Bayesian variants,
corresponding to Eq. 36. The results are presented in Table 1. As it can be seen,
the Bayesian variants tend to give significantly lower values for the NLL, which
indicates that they represent more faithfully the uncertainties over their outputs.
Then, we compare the uncertainty calibration quality between the outputs of the
deterministic model and of a Bayesian model, here the Dropout implementation.
14 M. Canche et al.
MACE RMSCE MA
Eth-Hotel 0.146/0.017 0.174/0.020 0.152/0.015
Eth-Univ 0.069/0.014 0.081/0.018 0.072/0.012
Ucy-Zara01 0.076/0.019 0.092/0.026 0.080/0.017
Ucy-Zara02 0.122/0.019 0.162/0.024 0.128/0.016
Ucy-Univ 0.140/0.011 0.156/0.013 0.147/0.008
Table 3 Calibration metrics before/after calibration (lower is better). Deep ensembles as the
Bayesian approach.
We follow the calibration process proposed in Section 4.3 and use the three Bayesian
variants described above to perform the experiments. In Fig. 5.4, we depict an
example of how the calibration is applied on the ETH-UCY dataset, with ETH-Hotel
as the testing data. The left side shows the raw calibration dataset C (see Algorithm 2)
before calibration, and the right side depicts the same data after applying calibration,
i.e. determining the transformation 𝛾.
Then, in Fig. 4, 5 and 6, we show the application of the function 𝛾 on the data that
are not part of the calibration dataset. Here the curves are constructed by sampling
regularly 𝛼 and by evaluating P̂ 𝛼 (𝛼) according to Eq. 23. As it can be seen, the
calibration process results in having well-calibrated uncertainties, in the sense that
for any 𝛼 ∈ [0, 1], the proportion of ground-truth data falling into the 100(1 − 𝛼)%-
HDR is roughly 1 − 𝛼. Note that, before calibration, the purple curves indicate that in
most of the cases (excepting UCY-Zara02), the uncertainties before calibration were
in general under-estimating the true uncertainties. This means that, for a given value
of 𝛼, the rate of ground truth points y at which the output p.d.f. satisfies 𝑓 (y) > 𝑓 𝛼
is significantly lower than 1 − 𝛼 (i.e. the purple curve is well under the diagonal).
Epistemic Uncertainty Quantification in Human Trajectory Prediction 15
Estimating HDR of Forecast Calibration with Isotonic Regression
1.0 1.0
Perfect calibration Perfect calibration
0.8 0.8
Empirical HDR
Empirical HDR
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Predicted HDR Predicted HDR
Fig. 3 Illustration of the calibration process under a LOO scheme on the ETH-UCY dataset, with
ETH-Hotel as the testing data: On the left is the dataset { 𝛼ˆ 𝑡 , 𝑃ˆ 𝛼 ( 𝛼ˆ 𝑡 ) }𝑇𝑡=1 , estimated as described
in Section 4.3, that we use for calibration. Each blue dot corresponds to one of these 𝑡. The 𝛼ˆ 𝑡 ,
on the 𝑥 axis tells us the level of confidence measured for ground truth data over the predicted
distribution, while the 𝑃ˆ 𝛼 ( 𝛼ˆ 𝑡 ) on the 𝑦 axis gives an empirical estimate of the proportion of cases
for which this level 𝛼ˆ 𝑡 is passed. For the data presented here the system is under-confident for
values of 𝛼 < 0.5 and over-confident for values of 𝛼 > 0.5.
Calibration Plot on Testing Data (0) Calibration Plot on Testing Data (1) Calibration Plot on Testing Data (2)
1.0 Uncalibrated 1.0 Uncalibrated 1.0 Uncalibrated
Calibrated Calibrated Calibrated
Observed Confidence Level
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Confidence Level (1- ) Confidence Level (1- ) Confidence Level (1- )
Calibration Plot on Testing Data (3) Calibration Plot on Testing Data (4)
1.0 Uncalibrated 1.0 Uncalibrated
Calibrated Calibrated
Observed Confidence Level
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Confidence Level (1- ) Confidence Level (1- )
Fig. 4 Application of the calibration process with the Ensemble-based Bayesian approach (see
Section 5.1.2) on the 5 datasets considered in the Leave-One-Out testing process: from left to right
and top to bottom, ETH-Univ, ETH-Hotel, UCY-Zara01, UCY-Zara02, UCY-Univ. On the 𝑥 axis
is the confidence level 1 − 𝛼, sampled regularly on the [0, 1] interval, and on the 𝑦 axis is the
observed proportion of data that correspond to this confidence region 𝑅 𝛼 .
6 Conclusions
We have presented in this Chapter what is, to the best of our knowledge, the first
intent to introduce Bayesian Deep Learning in the problem of Human Trajectory
Prediction. We use it in order for the prediction system to cope not only with aleatoric
uncertainties over its prediction, but also with epistemic uncertainties related, among
others, to the finiteness of the training data. We have also proposed a novel approach
16 M. Canche et al.
Calibration Plot on Testing Data (0) Calibration Plot on Testing Data (1) Calibration Plot on Testing Data (2)
1.0 Uncalibrated 1.0 Uncalibrated 1.0 Uncalibrated
Calibrated Calibrated Calibrated
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Confidence Level (1- ) Confidence Level (1- ) Confidence Level (1- )
Calibration Plot on Testing Data (3) Calibration Plot on Testing Data (4)
1.0 Uncalibrated 1.0 Uncalibrated
Calibrated Calibrated
Observed Confidence Level
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Confidence Level (1- ) Confidence Level (1- )
Fig. 5 Application of the calibration process with the Variational Bayesian approach (see Sec-
tion 5.1.2) on the 5 datasets considered in the Leave-One-Out testing process: from left to right and
top to bottom, ETH-Univ, ETH-Hotel, UCY-Zara01, UCY-Zara02, UCY-Univ. On the 𝑥 axis is the
confidence level 1 − 𝛼, sampled regularly on the [0, 1] interval, and on the 𝑦 axis is the observed
proportion of data that correspond to this confidence region 𝑅 𝛼 .
Calibration Plot on Testing Data (0) Calibration Plot on Testing Data (1) Calibration Plot on Testing Data (2)
1.0 Uncalibrated 1.0 Uncalibrated 1.0 Uncalibrated
Calibrated Calibrated Calibrated
Observed Confidence Level
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Confidence Level (1- ) Confidence Level (1- ) Confidence Level (1- )
Calibration Plot on Testing Data (3) Calibration Plot on Testing Data (4)
1.0 Uncalibrated 1.0 Uncalibrated
Calibrated Calibrated
Observed Confidence Level
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Confidence Level (1- ) Confidence Level (1- )
Fig. 6 Application of the calibration process with the Dropout Bayesian approach (see Sec-
tion 5.1.2) on the 5 datasets considered in the Leave-One-Out testing process: from left to right and
top to bottom, ETH-Univ, ETH-Hotel, UCY-Zara01, UCY-Zara02, UCY-Univ. On the 𝑥 axis is the
confidence level 1 − 𝛼, sampled regularly on the [0, 1] interval, and on the 𝑦 axis is the observed
proportion of data that correspond to this confidence region 𝑅 𝛼 .
MACE RMSCE MA
Eth-Hotel 0.120/0.013 0.152/0.018 0.126/0.011
Eth-Univ 0.044/0.011 0.059/0.014 0.045/0.008
Ucy-Zara01 0.082/0.015 0.113/0.018 0.086/0.013
Ucy-Zara02 0.079/0.020 0.109/0.026 0.082/0.017
Ucy-Univ 0.104/0.015 0.112/0.020 0.109/0.013
Table 4 Calibration metrics before/after calibration (lower is better). Dropout as the Bayesian
approach.
tures (e.g. attention-based networks instead of recurrent networks) and using more
contextual data (e.g., map of the surrounding area).
References
1. Alexandre Alahi, Kratarth Goel, Vignesh Ramanathan, Alexandre Robicquet, Li Fei-Fei, and
Silvio Savarese. Social lstm: Human trajectory prediction in crowded spaces. In 2016 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), pages 961–971, 2016.
2. Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. Weight uncer-
tainty in neural network. In International Conference on Machine Learning, pages 1613–1622,
2015.
3. C Burstedde, K Klauck, A Schadschneider, and J Zittartz. Simulation of pedestrian dynam-
ics using a two-dimensional cellular automaton. Physica A: Statistical Mechanics and its
Applications, 295(3-4):507–525, Jun 2001.
4. Tianqi Chen, Emily Fox, and Carlos Guestrin. Stochastic gradient hamiltonian monte carlo. In
International conference on machine learning, pages 1683–1691, 2014.
5. Y Gal and Z Ghahramani. Dropout as a bayesian approximation. In 33rd International
Conference on Machine Learning, ICML 2016, volume 3, pages 1661–1680, 2016.
6. Francesco Giuliari, Irtiza Hasan, Marco Cristani, and Fabio Galasso. Transformer networks
for trajectory forecasting, 2020.
7. Tilmann Gneiting and Adrian E. Raftery. Weather forecasting with ensemble methods. Science,
310(5746):248–249, 2005.
8. Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil
Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks, 2014.
9. Alex Graves. Practical variational inference for neural networks. In Advances in neural
information processing systems, pages 2348–2356, 2011.
10. Agrim Gupta, Justin Johnson, Li Fei-Fei, Silvio Savarese, and Alexandre Alahi. Social gan:
Socially acceptable trajectories with generative adversarial networks. In Proc. of the Int. Conf.
on Computer Vision and Pattern Recognition (CVPR), pages 2255–2264, 06 2018.
11. Bobby He, Balaji Lakshminarayanan, and Yee Whye Teh. Bayesian deep ensembles via the
neural tangent kernel. Advances in Neural Information Processing Systems, 33:1010–1022,
2020.
12. Dirk Helbing and Péter Molnár. Social force model for pedestrian dynamics. Physical Review
E, 51(5):4282–4286, May 1995.
13. Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural Comput.,
9(8):1735–1780, November 1997.
14. Rob J. Hyndman. Computing and graphing highest density regions. The American Statistician,
50:120–126, 1996.
18 M. Canche et al.
15. Boris Ivanovic and Marco Pavone. The trajectron: Probabilistic multi-agent trajectory modeling
with dynamic spatiotemporal graphs, 2019.
16. Alex Kendall and Yarin Gal. What uncertainties do we need in bayesian deep learning for
computer vision? In Proc. of the Annual Conference on Neural Information Processing Systems
(NIPS’17), page 5580–5590. Curran Associates Inc., 2017.
17. Jason Kong, Mark Pfeiffer, Georg Schildbach, and Francesco Borrelli. Kinematic and dynamic
vehicle models for autonomous driving control design. In 2015 IEEE Intelligent Vehicles
Symposium (IV), pages 1094–1099, 2015.
18. Vineet Kosaraju, Amir Sadeghian, Roberto Martín-Martín, Ian Reid, S. Hamid Rezatofighi,
and Silvio Savarese. Social-bigat: Multimodal trajectory forecasting using bicycle-gan and
graph attention networks, 2019.
19. Ranganath Krishnan, Pi Esposito, and Mahesh Subedar. Bayesian-torch: Bayesian neural net-
work layers for uncertainty estimation. https://github.com/IntelLabs/bayesian-torch, January
2022.
20. Volodymyr Kuleshov, Nathan Fenner, and Stefano Ermon. Accurate uncertainties for deep
learning using calibrated regression, 2018.
21. Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scalable pre-
dictive uncertainty estimation using deep ensembles. In Proceedings of the 31st International
Conference on Neural Information Processing Systems, NIPS’17, page 6405–6416, Red Hook,
NY, USA, 2017. Curran Associates Inc.
22. Namhoon Lee, Wongun Choi, Paul Vernaza, Christopher B. Choy, Philip H. S. Torr, and
Manmohan Chandraker. Desire: Distant future prediction in dynamic scenes with interacting
agents, 2017.
23. A. Lerner, Yiorgos L. Chrysanthou, and D. Lischinski. Crowds by example.
24. Matteo Lisotto, Pasquale Coscia, and Lamberto Ballan. Social and scene-aware trajectory
prediction in crowded spaces, 2019.
25. David JC MacKay. A practical bayesian framework for backpropagation networks. Neural
computation, 4(3):448–472, 1992.
26. Wesley J Maddox, Pavel Izmailov, Timur Garipov, Dmitry P Vetrov, and Andrew Gordon
Wilson. A simple baseline for bayesian uncertainty in deep learning. In Advances in Neural
Information Processing Systems, pages 13153–13164, 2019.
27. Radford M Neal. Bayesian learning for neural networks. PhD thesis, University of Toronto,
1995.
28. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan,
Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas
Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy,
Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-
performance deep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-
Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32,
pages 8024–8035. Curran Associates, Inc., 2019.
29. S. Pellegrini, A. Ess, K. Schindler, and L. van Gool. You’ll never walk alone: Modeling social
behavior for multi-target tracking. In 2009 IEEE 12th International Conference on Computer
Vision, pages 261–268, 2009.
30. J. Platt. Probabilistic outputs for support vector machines and comparisons to regularized
likelihood methods. Advances in Large Margin Classifiers, 10(3), 1999.
31. Amir Sadeghian, Vineet Kosaraju, Ali Sadeghian, Noriaki Hirose, Hamid Rezatofighi, and
Silvio Savarese. Sophie: An attentive gan for predicting paths compliant to social and physical
constraints. In Proc. of the Int. Conf. on Computer Vision and Pattern Recognition (CVPR),
pages 1349–1358, 06 2019.
32. Tim Salzmann, Boris Ivanovic, Punarjay Chakravarty, and Marco Pavone. Trajectron++:
Dynamically-feasible trajectory forecasting with heterogeneous data. In Computer Vision –
ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings,
Part XVIII, page 683–700, Berlin, Heidelberg, 2020. Springer-Verlag.
Epistemic Uncertainty Quantification in Human Trajectory Prediction 19
33. Kihyuk Sohn, Xinchen Yan, and Honglak Lee. Learning structured output representation using
deep conditional generative models. In Proceedings of the 28th International Conference on
Neural Information Processing Systems - Volume 2, NIPS’15, page 3483–3491, Cambridge,
MA, USA, 2015. MIT Press.
34. Christopher Tay and Christian Laugier. Modelling Smooth Paths Using Gaussian Processes.
In Proc. of the Int. Conf. on Field and Service Robotics, Chamonix, France, 2007. voir basilic
: http://emotion.inrialpes.fr/bibemotion/2007/TL07/.
35. Pete Trautman, Jeremy Ma, Richard M. Murray, and Andreas Krause. Robot navigation in
dense human crowds: Statistical models and experimental studies of human-robot cooperation.
The International Journal of Robotics Research, 34(3):335–356, 2015.
36. Daksh Varshneya and G. Srinivasaraghavan. Human trajectory prediction using spatially aware
deep attention models, 2017.
37. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez,
Ł ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In I. Guyon, U. V. Luxburg,
S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in
Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
38. Anirudh Vemula, Katharina Muelling, and Jean Oh. Social attention: Modeling attention in
human crowds. 2018 IEEE International Conference on Robotics and Automation (ICRA),
pages 1–7, 2018.
39. Max Welling and Yee W Teh. Bayesian learning via stochastic gradient langevin dynamics.
In Proceedings of the 28th international conference on machine learning (ICML-11), pages
681–688, 2011.
40. Andrew G Wilson and Pavel Izmailov. Bayesian deep learning and a probabilistic perspective
of generalization. Advances in neural information processing systems, 33:4697–4708, 2020.
41. Ruqi Zhang, Chunyuan Li, Jianyi Zhang, Changyou Chen, and Andrew Gordon Wilson. Cycli-
cal stochastic gradient mcmc for bayesian deep learning. International Conference on Learning
Representations, 2020.
42. Tianyang Zhao, Yifei Xu, Mathew Monfort, Wongun Choi, Chris Baker, Yibiao Zhao, Yizhou
Wang, and Ying Nian Wu. Multi-agent tensor fusion for contextual trajectory prediction, 2019.