Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

ISPRS Open Journal of Photogrammetry and Remote Sensing 4 (2022) 100013

Contents lists available at ScienceDirect

ISPRS Open Journal of Photogrammetry


and Remote Sensing
journal homepage: www.journals.elsevier.com/isprs-open-journal-of-photogrammetry-and-remote-
sensing

Detection of anomalous vehicle trajectories using federated learning


Christian Koetsier a, *, Jelena Fiosina b, Jan N. Gremmel c, Jörg P. Müller b,
David M. Woisetschläger c, Monika Sester a
a
Institute of Cartography and Geoinformatics, Leibniz University Hannover, Appelstraße 9A, Hannover, D-30167, Germany
b
Institute of Informatics, Clausthal Technical University, Julius-Albert Str. 4, Clausthal-Zellerfeld, D-38678, Germany
c
Chair of Services Management, Braunschweig Technical University, Muehlenpfordtstr. 23, Braunschweig, D-38106, Germany

A R T I C L E I N F O A B S T R A C T

Keywords: Nowadays mobile positioning devices, such as global navigation satellite systems (GNSS) but also external sensor
Machine learning technology like cameras allow an efficient online collection of trajectories, which reflect the behavior of moving
Federated learning objects, such as cars. The data can be used for various applications, e.g., traffic planning or updating maps, which
Anomaly detection
need many trajectories to extract and infer the desired information, especially when machine or deep learning
Vehicle trajectories
approaches are used. Often, the amount and diversity of necessary data exceeds what can be collected by in­
dividuals or even single companies. Currently, data owners, e.g., vehicle producers or service operators, are
reluctant to share data due to data privacy rules or because of the risk of sharing information with competitors,
which could jeopardize the data owner’s competitive advantage. A promising approach to exploit data from
several data owners, but still not directly accessing the data, is the concept of federated learning, that allows
collaborative learning without exchanging raw data, but only model parameters.
In this paper, we address the problem of anomaly detection in vehicle trajectories, and investigate the benefits
of using federated learning. To this end, we apply several state-of-the-art learning algorithms like one-class
support vector machine (OCSVM) and isolation forest, thus solving a one-class classification problem. Based
on these learning mechanisms, we successfully proposed and verified a federated architecture for the collabo­
rative identification of anomalous trajectories at several intersections. We demonstrate that the federated
approach is beneficial not only to improve the overall anomaly detection accuracy, but also for each individual
data owner. The experiments show that federated learning allows to increase the anomaly detection accuracy
from in average AUC-ROC scores of 97% by individual intersections up to 99% using cooperation.

1. Introduction ranging (LiDAR) scanners and cameras) can be coupled with the infor­
mation from other vehicles, but also other sources such as surveillance
Advancements in safety-systems over the past decades made vehicles cameras or traffic flow statistics. Anomalies are usually defined as
gradually more safe, preventing accidents or protecting passengers from irregular patterns that differ significantly from the mainstream – a
injuries after accidents. Despite all technological advancements, the problem that can be considered as a learning problem. However, it is still
main reason for traffic accidents remain human errors, which can be difficult to predict anomalies with today’s assistance systems. Even
reflected in anomalous movement behavior (e.g., sudden stop or reverse though various machine learning (ML) methods exist to detect anoma­
driving). Anomaly detection in vehicle trajectories is an essential lous behavior, such systems are not yet ready for the market because
component of advanced driver assistance systems and traffic monitoring data to learn and train these models exist only in form of isolated islands
systems. The road safety is increased by adapting and enhancing vehicle throughout the whole industry (Yang et al., 2019). Aggregation of
systems in order to prevent traffic accidents or warn other participants. various and diverse data sources has the potential to allow for more
In principle, anomalous behaviour could be identified from vehicle precise anomaly detection than using only individual partners’s data.
trajectories. The information from internal (e.g., vehicle speed, accel­ However, this would require some form of inter-organizational coop­
eration or turning angle) and external environmental vehicle sensors (i. eration. Therefore, this paper utilizes a federated learning approach to
e., global navigation satellite systems (GNSS), light detection and show in a series of experiments how data from different sources can be

* Corresponding author.
E-mail address: christian.koetsier@ikg.uni-hannover.de (C. Koetsier).

https://doi.org/10.1016/j.ophoto.2022.100013
Received 18 October 2021; Received in revised form 24 February 2022; Accepted 28 February 2022
Available online 9 March 2022
2667-3932/© 2022 Published by Elsevier B.V. on behalf of International Society of Photogrammetry and Remote Sensing (isprs). This is an open access article under
the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
C. Koetsier et al. ISPRS Open Journal of Photogrammetry and Remote Sensing 4 (2022) 100013

coupled and processed without sharing the data but only model pa­ 2. Related work
rameters and still result into equally satisfying anomaly detection.
Though, data owners such as car manufacturers or service operators Various vehicle producers are interested to incorporate assistance
currently are reluctant to share data with other organizations, due to systems in their vehicles that predict anomalous behavior in order to
data privacy rules or because the risk of sharing information with prevent traffic accidents or warn other participants. However, to result
competitors that could jeopardize the data owner’s competitive advan­ in better accuracy, a higher installed base for measuring data is neces­
tage. Research from the field of inter-organizational collaboration in­ sary. Since market adoption (and replacement of older models) is
dicates that cooperating in data sharing alliances among organizations inherently slow in the car market, cooperation of different car manu­
from different industries could facilitate the development of vehicle facturers and other stakeholder (e.g., cities or fleet operator) could in­
safety systems (Faems et al., 2020). This is because the probability in­ crease the amount of trajectory data collected. Besides evolving the
creases to gather a representative set of scenarios for relevant situations assistance systems, the resulting insights from data cooperation could be
of vehicle trajectories. Jointly collecting the data in a shared database beneficial for various stakeholders. Though, sharing data with other
would also decrease data collection costs. As data exists mostly in form stakeholders represents a form of inter-organizational relationship be­
of isolated islands throughout the industry, jointly collecting and pro­ tween two or more organizations, in which one firm has to reveal data
cessing data is especially relevant (Yang et al., 2019). first, in order to retain data, so that sharing data is associated with
Therefore, we propose to use a federated learning approach after certain obstacles (Zhang et al., 2019a). In short, firms are usually con­
Konecny et al. (2016), Mc Mahan et al. (2017) and Yang et al. (2019) as cerned about the actual value and quality of the shared data, about a fair
an opportunity to overcome this obstacle through distributed and distribution between participants and moreover, about information
anonymous data processing using data from several intersections, which leakage to competitors (Audretsch and Belitski, 2020; Wang et al.,
are especially prone to anomalous behaviour. Furthermore, we will 2021). However, a data sharing alliance to advance an innovative so­
demonstrate that following a federated learning approach lead to lution in the field of safety and security technology is more likely to be
equally good results as in centralized models, compensating disadvan­ solved in cooperation and is beneficial for participating firms under
tages of organizations in collecting data. Thereby, cross-silo federated certain conditions (Davis, 2016). Concerning the value and quality of
learning is considered (Kairouzet. al., 2021) and the inter-organizational shared data, the main reason to engage in a data sharing alliance is the
collaboration will be organized without sharing the data but only model possibility to access unique resources of different partner organizations,
parameters in order to adhere to data privacy. Hence, we address a which impacts the firm’s internal resource base and innovation capa­
general concept of federated learning based on the detection of anom­ bility (Chesbrough and Brunswicker, 2014). However, the utility of
alous vehicle trajectories. We propose a federated learning architecture sharing more of the same data might have diminishing returns for
and a parameter synchronization algorithm for the considered problem. certain use cases. A collaboration between different types of stake­
The goal of the paper is to demonstrate the conceptualized benefits of holders of unequal size e.g., car manufacturers, suppliers, cities, fleet
federated collaboration using the example of anomaly detection at in­ operators, and insurance companies allows to build a database encom­
tersections. Therefore, besides the technological perspective, we provide passing a larger spectrum of different use cases or adding missing data
an interdisciplinary view by adding a management perspective to into existing ones. To summarize, engaging in a data sharing coopera­
discuss potentials of the proposed approach. In a preliminary work an tion can be considered as beneficial for a focal firm when it receives
investigation has been conducted into the suitability of ML-approaches access to valuable resources in exchange, thus it is possible to mobilize
for the anomaly detection task. A federated version of the OCSVM internal resources to access the data (Davis, 2016; Ghoshal et al., 2020).
approach has been proposed and a proof-of-concept was conducted with Works like Djenouri et al. (2019) and Belhadi et al. (2020) present an
an exemplary small dataset (Koetsier et al., 2021). Hence, our research overview of anomaly detection methods and show that the broad field
questions in this study are: 1) Can isolation forest (iForest) (Liu et al., can be tackled from different sides with various methods. Trajectory
2012) and one-class support vector machines (OCSVM) (Schölkopf et al., anomaly detection methods are generally divided into metric-based and
1999) be effectively applied to different intersections? 2) Can the learning-based algorithms. The first category is based on explicit models
methods be generalized for not equally distributed data, such that a or on hand-crafted features. An example for an explicit model is
model trained on one intersection achieves a comparable anomaly described in Huang et al. (2014), which models anomalies as probabi­
detection accuracy on other intersections? 3) Can federated learning listic combinations of different elementary anomalies, e.g., driving de­
improve the detection accuracy of anomalous vehicle trajectories in tours. The hand-crafted approaches work in two steps: 1) defining
general, and additionally also of each individual partner? 4) What are normal behavior with representative trajectories and 2) comparing the
potential obstacles and advantages for an individual firm to adopt and target trajectories with the representative trajectories based on distance
engage in an inter-organizational collaboration based on a federated or density metrics. For instance Knorr et al. (2000), study the problem of
learning approach? detecting distance-based anomalies in a multi-dimensional dataset. Lee
To answer these questions, we perform a series of experiments that et al. (2008) develop an approach to detect anomalies by calculating the
rely on a dataset containing 5314 trajectories from 3 different in­ distance defined by a distance-density-hybrid-frame of each segment of
tersections. We provide a novel federated algorithm for an OCSVM the target trajectory to all other trajectories in the dataset. Zhu et al.
model to detect anomalous behavior within the data and compare the (2015) extract the top-k popular trajectories and derived the anomaly
results to centralized approaches like isolation forest (iForest) models. score by editing the distance from target trajectory and popular trajec­
Thereby, we show that the federated approach leads to equally satis­ tories. Additionally, Ge et al. (2011) propose an approach combining
fying results compared to the ‘ideal’ case when all data is centrally distance and density to detect the fraud taxi trajectories. The second
available. Our results demonstrate, that the federated approach is an category of metric-based approaches are widely used in many applica­
opportunity for firms engaging in a data sharing alliance to access to a tions, but it has drawbacks: the definition of representative trajectories
wider resource pool while adhering to data privacy rules. is always manually selected (i.e., frequency- or density-related), which
The paper is organized as follows. In section 2 we describe the related makes it highly dependent on the current dataset and it is hard to apply
work. Section 3 states the problem and describes the learning approach to other situations.
as well as the methodology. Section 4 presents the considered dataset Advanced ML-based sequence modelling methods are widely used in
and its preparation. Section 5 contains the experimental setup and practice. Anomalous trajectories are frequently learned on the base of
experimental results. The discussion of our results will follow in section regular models. Clustering based methods, like Morris and Trivedi
6. Section 7 draws final conclusions and gives an outlook on future work. (2011), Kumar et al. (2017) and Belhadi et al. (2021), learn path models
for normal patterns via grouping trajectory data in an unsupervised

2
C. Koetsier et al. ISPRS Open Journal of Photogrammetry and Remote Sensing 4 (2022) 100013

manner. Cheng et al. (2018) consider anomalous trajectory detection in shared updates while maintaining a good level of utility for the learned
mixed spaces and Li et al. (2018) utilize constructed probabilistic models models (Kairouzet. al., 2021). Differential privacy is the standard
to detect anomalies in both whole and partial trajectories. Further, Ma approach to mitigate such privacy risks, which can be achieved by
et al. (2019) and Smolyak et al. (2020) demonstrate a bi-directional having each client add noise locally (Huang et al., 2015; Bellet et al.,
generative adversarial network (BiGAN) for anomaly detection. In 2018). However, this often comes at a large cost in utility. To achieve
addition, ensembles of generative adversarial networks for improved better trade-offs between privacy and utility in peer-to-peer federated
anomaly detection are considered in Han et al. (2021). learning, one can rely on decentralization itself to amplify differential
One-class support vector machines (OCSVM) (Schölkopf et al., 1999) privacy guarantees, i.e., by considering appropriate relaxations of local
are one of the state-of-the-art approaches for novelty detection (or differential privacy (Cyffers and Bellet, 2021).
anomaly detection) in ML, due to their flexibility in fitting complex
nonlinear boundaries between normal and novel data (Zhang et al., 3. Methodology
2006; Ramyar et al., 2016). In Yang et al. (2018), an OCSVM is suc­
cessfully applied for anomaly detection in the Internet of Things. Ruff In the first step, adequate ML models were identified and imple­
et al. (2018) propose several other one-class classification methods mented that provide a high anomaly detection accuracy on the given
based on deep learning. Recent research of Ma et al. (2021) present a data. Identifying outliers in data is referred to as outlier or anomaly
model for anomaly detection in network traffic based on a kernel SVM. detection and can be solved as one-class classification problem. We
Kaplan and Alptekin (2020) show that in some cases a SVM gives more implemented semi-supervised learning algorithms with the attempt to
accurate results than deep learning models, solving one class classifi­ model normal driving behavior patterns in order to classify new patterns
cation problems. as either normal or abnormal. Based on previous research and related
In Vikram (2020) the iForest method is successfully applied to detect work we selected two techniques: OCSVM and iForest, which are
outliers and probable attacks in network traffic. Hofmockel and Sax described in the following. The two techniques have been selected to
(2018) show that the iForest method outperforms a replicator neural compare their performance for an ‘ideal’ baseline scenario, assuming all
network approach for anomaly detection in the communication between available data is available and used. In the second step, a federated
vehicles and server backend exchanging raw vehicle sensor data. A learning architecture had to be designed and developed. This was done
different coupling of iForest and k-means clustering are introduced in for the OCSVM only, due to fact that the federated version of iForest is
Laskar et al. (2021). Shouyu et al. (2020) use iForest to detect effectively not evident, because it is not a parametric approach.
various kinds of anomalies by power grid dispatching. Further, a parallel
version of iForest implemented in SPARK/Scala is proposed in Tao et al.
3.1. Learning approaches
(2018). Dridi et al. (2021) present a spatio-temporal anomaly detection
mechanism for mobile network management using a combination of ML
Support Vector Machine (SVM): The SVM is a supervised learning
techniques including OCSVM, support vector regression and recurrent
technique, which was introduced by Vapnik (1998) with remarkable
neural networks, which outperforms iForest and auto-regressive inte­
robust performance for sparse and noisy data. This feature makes it the
grated moving average models. In addition, a transformed iBAT
first choice for several applications. The SVM classifies objects by con­
approach based on iForest for trajectory anomaly detection was pro­
structing hyperplanes in a multidimensional space that separates cases
posed in Zhang et al. (2011).
of different class labels. SVMs iterative procedure constructs an optimal
Commonly used ML techniques require data processing on a central
hyperplane with the aim to minimize an error function. First we consider
server. In contrast, Yang et al. (2019) show that a federated learning
a traditional two-class SVM. Let Ω = {(x1, y1), (x2, y2), …, (xn, yn)}, point
approach allows for decentralized ML, which is adhering to data pri­
xi ∈ Rd, is the i-th input data point and yi ∈ { − 1, 1} is the i-th output
vacy. Ito et al. (2020) propose an on-device federated learning approach
pattern, indicating the class affiliation. The SVM can also create a
based on an autoencoder model for detecting anomalies cooperatively in
non-linear decision boundary by projecting the data through a
vehicle trajectories at a macro-level. A federated learning approach
non-linear function φ to a space with a higher dimension. This means
based on a generative adversarial recurrent network, which adapts to
that data points, which cannot be separated by a straight line in their
the dynamic change of users’ normal driving behavior is considered in
original space I, are transformed to a feature space F, in which there can
Zhang et al. (2019b). Here, various anomalies connected with the speed,
be a ‘straight’ hyperplane. This hyperplane separates the data points of
orientation and angular rate are considered. A federated generative
one class from an other. The hyperplane is represented with the equation
adversarial network is suggested by Rasouli et al. (2020) with the main
w2x + b = 0, with w ∈ F and b ∈ R. The constructed hyperplane de­
idea to couple federated learning and a classical generative adversarial
termines the margin between the classes; all data points for the class − 1
network to analyze both non-independent and identically-distributed
are on one side, the other data point for class 1 are on the other side. The
data. Recently, Fiosina (2021) presents an explainable horizontal
distance form the closest point from each class to the hyperplane is
learning architecture for collaborative taxi travel time prediction. A
equal. Therefore, the constructed hyperplane searches for the maximal
federated SVM-based approach for a linear kernel SVM is proposed in
margin (which is referenced to ‘separating power’) between the classes.
Bakopoulou et al. (2021) and the findings show its advantages over local
Slack variables ξi are introduced to prevent the SVM classifier from
SVM models applied for mobile packet classification. Furthermore,
over-fitting to noisy data, or creating a soft margin. Thereby, slack
privacy-preserving SVM using nonlinear kernels on horizontally parti­
variables allow some data points to lie within the margin. The constant
tioned data is proposed in Yu et al. (2006). A good overview and open
C > 0 describes the trade-off between maximizing the margin and the
problems of federated learning is discussed in Kairouzet et al. (2021).
number of training data points within that margin (and thus training
A very important problem in federated learning are privacy concerns
errors). The objective function of the SVM classifier is the following
because ML model parameters exchanged between parties in a FL system
minimization formulation:
still conceal sensitive information, which can be exploited in some pri­
vacy attacks (Truong et al., 2021). In this paper, we do not focus on this ‖w‖2 ∑n
min +C ξ
problem. However, there are a number of publications researching this w,b,ξ 2 i=1 i

aspect. The state-of-the-art privacy-preserving techniques as e.g.,


s.t. yi (wT φ(xi + b)) ≥ 1 − ξi , ξi ≥ 0, for all i = 1, …, n
encrypting, secure aggregation, secure shuffling or differential privacy
are discussed with this end in Kairouz et al. (2021). An important (1)
challenge raises in peer-to-peer federated learning is to prevent any Lagrange multipliers are used to solve the minimization problem.
client from reconstructing the private data of another client from its The function K(x, xi) = φ(x)Tφ(xi) is a kernel function. Popular choices

3
C. Koetsier et al. ISPRS Open Journal of Photogrammetry and Remote Sensing 4 (2022) 100013

for the kernel function are linear, polynomial, sigmoidal; but in most parameter, which sets the expected percentage of points in the data to be
cases the Gaussian radial basis function (rbf) is selected: anomalous - similar to the parameter ν in the OCSVM. The advantage of
( ) iForest is a fast anomaly detection. In addition, this approach requires
‖x − x ‖2


K(x, x ) = exp − , (2) less memory compared to other algorithms. iForest isolates anomalies in
2σ 2 the data points instead of profiling normal data points. As abnormal data
points do mostly have shorter tree paths than the normal data points,
where σ ∈ R is a kernel parameter and ‖x − x′ ‖ is the dissimilarity
trees in the iForest do not need to have large depth. Therefore, a smaller
measure. Special transformations, using the initial data to approximate a
max_depth can be used, which results in even lower memory re­
kernel map, are required. For example, the Nyström algorithm con­
quirements. As shown by Liu et al. (2008), the iForest algorithm also
structs an approximation feature map for an arbitrary kernel using a
works very well with small datasets.
subset of the data as basis (Williams and Seeger, 2001).
The OCSVM is a natural extension of the SVM approach (Schölkopf
3.2. Federated approach
et al., 1999). For identifying ‘suspicious’ observations, a OCSVM esti­
mates a distribution that encompasses most of the observations and la­
The obtained results show that a centralized and local OCSVM give
bels those as suspicious, that are out of a certain range to a suitable
accurate results for the given dataset. Moreover, parametric ML-
metric. The OCSVM estimates a probability distribution function which
methods like linear regression, SVM or neural networks can be suc­
makes most of the observed data more likely than the rest and separates
cessfully applied for federated learning, especially as they support
these observations by the largest possible margin. The computational
sequential or incremental learning and use the Stochastic Gradient
complexity of the learning phase is intensive because the training of a
Descent (SGD) optimizer to find the optimal parameter values. Besides
OCSVM involves a quadratic programming problem, but once the de­
that, it is impossible to construct a federated version of iForest, because
cision function is determined, it can be used to predict the class label of
it is not parametric, not optimized with the SGD algorithm and supports
new test data effortlessly. The quadratic programming minimization
no sequential learning. Thus, in Koetsier et al. (2021) we proposed a
function is slightly different from the original stated above (1), but the
novel federated algorithm for an OCSVM model. The classical OCSVM
similarity is still clear:
which is usually applied for anomaly detection, does not support
‖w‖2 1 ∑n sequential learning and therefore could not be used for the federated
min + ξ − ρ version. Instead, we used OCSVM modifications with optimization steps
w,ρ,ξ 2 νn i=1 i
(3)
performed with the SGD algorithm, which enables sequential learning.
s.t. w⋅φ(xi ) ≥ ρ − ξi , ξi ≥ 0, for all i = 1, …, n
This modified OCSVM was applied in the federated algorithm.
The main difference from the standard SVM is that it classifies the Furthermore, there is no realization of a non-linear (kernel) OCSVM
data in semi-supervised manner and does not provide the normal based on a SGD SVM algorithm. Summarizing, we combined the prior
hyperparameters for tuning the margin like C. Instead, it provides a non-linear rbf transformation with a standard linear OCSMV model and
hyperparameter ν that controls the sensitivity of the support vectors and the SGD incremental learning algorithm that allowed us to introduce a
should be turned to the approximate ratio of outliers in the data. Thus, federated non-linear OCSVM. The above introduced OCSVM is used to
this method creates a hyperplane characterized by w and ρ, which has design a federated architecture for detecting abnormal trajectories.
maximal distance from the origin in feature space F and separates all the According to Yang et al. (2019) and Kairouz et al. (2021) federated
data points from the origin. An effective stochastic sub-gradient descent learning can be categorized into horizontal (sample-based) federated
algorithm (Pegasos) for solving the optimization problem of a standard learning, vertical (feature-based) federated learning and federated
SVM was proposed in Shalev-Shwartz et al. (2007). This algorithm al­ transfer learning based on the data distribution among various parties.
lows incremental learning and provides comparable results to a standard In horizontal or sample-based federated learning, datasets share the
sequential minimal optimization approach. The limitation of the Pega­ same feature space but different data samples (sample space) (Fiosina,
sos algorithm is that it could be used only with a linear kernel, thus the 2021). In vertical or feature-based federated learning, datasets share the
Nyström transformation is required (Williams and Seeger, 2001). same sample space but differ in feature space. In federated transfer
Isolation Forest (iForest): iForest is a tree-based non-parametric learning, datasets differ in both sample and feature space, having small
anomaly detection algorithm (Liu et al., 2012). iForest is very similar to intersections. In this study, we consider a horizontal federated learning
Random Forests and is built based on an ensemble of decision trees for a case.
given dataset; however, there are some differences. iForest separates Two principle designs of a federated architecture are shown in Fig. 1,
each point out from other points randomly and constructs a tree based a centralized one (top) and a decentralized one (bottom). For the
on its number of splits with each point (tree node). Outliers (anomalies) centralized design: At first, each partner within the data sharing alliance
appear closer to the root in the tree and inliers (normal data) appear in will gather local data from its own data acquisition sources (e.g., from
higher depth. Thus, iForest identifies anomalies as the observations with vehicle sensors or surveillance cameras). At the second level, the raw
the shortest average path lengths on the isolation trees. There is a pro­ data is used for local anomaly detection models that are built and trained
cedure applied for each isolation tree: a) randomly select two features with locally available data. At the third level, the local models are
and b) split the data points by randomly selecting a value between the periodically synchronized with a cloud server. The disadvantage of this
maximum and the minimum of the selected features. The partition of approach is that all organizations involved in the data sharing alliance
observations is repeated recursively until all observations are isolated. have to agree on one central entity, creating a dependence to the central
Then, normal and abnormal data points are distinguished based on the server. Its failure would disrupt the training process of all clients (Roy
average path length. Shorter paths indicate anomalies, while longer et al., 2019). Alternatively, we propose at the bottom of Fig. 1 a feder­
paths show normal observations. The iForest method needs an anomaly ated learning architecture that does not rely on a central cloud instance,
score to measure the degree of anomaly of a single data point. This but enables to exchange and synchronize model parameters between the
measure lies in a spectrum from 0 to 1. The anomaly score is defined as: partners directly. However, both approaches are possible and it is likely
E(h(x))
that the chosen approach depends on the partners involved in a data
s(x, n) = 2− c(n) , (4) sharing alliance. As a proof of concept, a centralized cloud server is
implemented to receive model parameters from the involved organiza­
where E(h(x)) is the average of h(x), which is the path length from the tions (Fig. 1, top). Thereby, the parameters are processed and used for
root node to the external node x, while c(n) is the average of h(x) given n training the method. The trained data is sent back to the individual
and is used to normalize h(x). iForest also uses a so-called contamination learning models. Note, that our basis OCSVM contains parameters that

4
C. Koetsier et al. ISPRS Open Journal of Photogrammetry and Remote Sensing 4 (2022) 100013

need to be synchronized at the third federated model aggregation level. 4. Datasets


To this end, synchronization and learning algorithms had to be devised,
which are based on adoption and further development of the proposed To demonstrate our concept, we chose three exemplary intersections
approach in Bakopoulou et al. (2021). from the INTERACTION dataset (Zhan et al., 2019): the intersections
DR_USA_Intersection_EP (EP), DR_USA_Roundabout_SR (SR) and
DR_USA_Intersection_MA (MA). Those intersections have been chosen as
they have trajectories of various complex movement behaviors, repre­
sented by the trajectory length, shape, driving speed and heading di­
rection (Table 1). Both, MA, with 2982 trajectories, and SR, with 965
trajectories, contain consecutively recorded data. EP consists of two
consecutively recorded datasets, DR_USA_Intersection: EP0 and EP1,
which are merged. A visualization of the intersections can be seen in
Fig. 2.
Since the dataset does only provide the vehicle trajectories and no
ground truth anomaly labels, we manually created them. To annotate
the data, we observed the provided video material and labeled anoma­
lies and potentially dangerous situations within the trajectories by
considering trajectory shape, speed, heading and interaction. In
compliance with the valid traffic regulations, the anomaly classes shown
in Table 2 have been applied. It has to be noted that an anomaly type is
The federated learning process is described formally as follows: Let N
not exclusive to an event. One labeled anomaly could contain multiple
participants {Fi }Ni=1 own datasets {Ti }Ni=1 , such that T = ∪Ni=1 Ti is a whole classes, e.g., wrong driving direction and near collision. This is espe­
dataset. Each participant Fi divides its dataset Ti = TTR i ∪ Ti
TE
into cially relevant for interaction related anomalies. While in this work we
TR TE TR
training set Ti and test set Ti . We train the models on Ti set only and focus on analyzing individual trajectory segments, we can still detect
use the whole test set ∪Ni=1 T TR
i to check the quality of the models for these events since they occur together with other anomaly classes like
anomaly detection. Each participant has a local OCSVM model repre­ unreasonable or sudden stop (or others).
sented as OCSVMi, which is periodically synchronized among each other In a preprocessing step, we eliminated recording errors and trajec­
with the help of the central server. As we consider epochal incremental tories, which were cut off due to the start and end of the record, to
learning, OCSVM<epoch> is a local OCSVM model of the participant Fi for consider only full trajectories entering and leaving the scene. At the
i
same time we split the trajectories in segments of equal length l of 25 m
the current epoch (w<epoch> , ρ<epoch> ). are the current parameters of
i i with an overlap with the previous segment of 30%. This is shown in
OCSVMepoch
i (Δw<epoch>
i , Δρ<epoch>
i ), are parameter updates after the Fig. 3 and ensured a continuous representation in a way that no possible
current batch of data and (Δw<epoch>
FD , Δρ<epoch>
FD ) are the aggregated events are cut. Additionally, to decide for a segment length, we per­
parameter updates. The training process is described in Algorithm 1. formed primarily experiments testing varying lengths. While ‘too small’
Note, that it is not necessary and also expensive to synchronize all (<10m) trajectories gave significant worse results, larger segments gave
models at each epoch. Therefore, the models of participants that will be comparable results. ‘Too short’ trajectory segments most likely cut the
synchronized at each epoch are selected randomly in accordance with a anomaly events so that no segment contain the ‘full event’. Further, the
pre-defined number of synchronizations num_synch at each step. Finally, splitting into trajectory segments enabled a general analysis of the tra­
the synchronized models are used locally by each participant for the jectories across the different intersections, and additionally also
anomaly detection. increased the amount of available training data. Within the dataset
various data features for each point of the trajectory are given, namely
relative position, speed, acceleration as well as relative heading for the
trajectory classification. Using trajectory segments as well as relative
feature attributes is a way to generalize the movement behavior beyond
a local scenario (e.g., elimination of directional information). Thus, the
trained model can ultimately be applied for different intersections. In
order to process the trajectory segments, the segments were interpolated
to contain the same fixed number of points. For our experiment, we set
the number of points n to 32. The number of segments in each dataset
after preprocessing is presented in Table 1.
In order to inspect the characteristics of the data, the data distribu­
tion as t-distributed stochastic neighbor embedding (t-SNE) of the tra­
jectories has been determined and visualized in Fig. 4. As expected, it
can be seen, that data from the same intersection (EP) recorded at
different times (EP0, EP1) has the same distribution (Fig. 4, top).

Table 1
Description of the intersections from the INTERACTION dataset (Zhan et al.,
2019).
Name Figure Number of Number of Video length
trajectories segments (min)

DR_USA_Intersection_EP 2 (a) 1367 4991 66.53


Fig. 1. Federated anomaly detection architecture using a centralized server for DR_USA_Intersection_MA 2 (b) 2982 11786 107.37
DR_USA_Roundabout_SR 2 (c) 965 5092 40.90
aggregation (top) and peer-to-peer aggregation architecture (bottom) (Koetsier
et al., 2021).

5
C. Koetsier et al. ISPRS Open Journal of Photogrammetry and Remote Sensing 4 (2022) 100013

Fig. 2. Example intersections EP (a), MA (b), SR (c) from the INTERACTION


Dataset (Zhan et al., 2019).

Table 2
Anomaly classes and potential dangerous situations.
Speed related Location related Interaction related
anomalies anomalies anomalies

• Unreasonable stop • Reverse driving • Near collision


• Sudden stop • U-turn • Evading
• Driving too fast • Wrong driving direction • Ignoring right of way
• Driving unreasonably • (On street) parking • Not stopping on red light,
slow maneuver stop or yield sign

Fig. 4. t-SNE plot of the trajectories of intersections EP0+EP1 (top) and


EP+MA+ER (bottom), where NT are normal trajectories and AT abnormal
trajectories.

the area under the receiver operating characteristic (ROC) curve, which
describes the true positive rate against the false positive rate at all
threshold settings. The AUC-ROC score can therefore range from 0 to 1.
A value of 1 represents the best possible classification. A value of 0.5
defines the worst score, which is equivalent to a perfect random classi­
fier. Values below a AUC-ROC score of 0.5 describe a negative predictive
Fig. 3. Data preprocessing step: trajectory splitting into segments (Koetsier power and thus can be reversed to derive a positive predictive power.
et al., 2021). All experiments were carried out 200 times each, in which the cor­
responding sets were randomly shuffled every run. We compared the
methods using the average AUC-ROC score along with its standard error.
Further, the normal trajectories from different intersections (EP, MA, Moreover, we verified the statistical significance of the improvements
SR), are not sharing the same distribution (Fig. 4, bottom). Also, it can be given by each alternative approach (mean difference) by performing an
seen that the anomalies of all datasets are forming a separate cluster and ANOVA test. When the p-value obtained from ANOVA analysis falls
are not lying inside the clusters of normal trajectories, which makes the below the chosen significance level (p ≤ 0.05), we concluded that the
detection of these a solvable task. This data basis shows the potential of differences between means of two experiments are statistically
collaboration, which is a good prerequisite for a federated approach. significant.

5. Experiments and results 5.2. Design of experiments

5.1. Experimental setup A set of experiments was conducted to evaluate the overall achiev­
able quality of anomaly detection and subsequently to evaluate the ef­
To evaluate our concept we conducted a number of experiments. In fect and quality of the federated approach. To prove our claim that data
preparation, each dataset T (Section 4) was divided into two sets T =
TNorm ∪ TAbnorm, where the set TNorm contains all normal trajectory seg­ Table 3
ments and the set TAbnorm contains all abnormal trajectory segments. Training set and test set sizes of the data set by intersection.
Then, the training set TTR was formed as randomly chosen 85% of TNorm Intersection Dataset Training set Test set (anomalies)
and the test set TTE as 15% of TNorm combined with all trajectory seg­ |T| |TTR| |TTE|(|TAbnorm|)
ments of TAbnorm. This led to the training sets and test sets described in
EP 4991 4104 887 (162)
Table 3. MA 11786 9984 1802 (39)
To evaluate and compare the performance of the different methods, SR 5092 4313 779 (17)
we used the AUC-ROC metric (Fawcett, 2006). The AUC-ROC represents Total: 21869 18401 3468(217)

6
C. Koetsier et al. ISPRS Open Journal of Photogrammetry and Remote Sensing 4 (2022) 100013

owners benefit from exchanging model parameters, respectively Table 4


parameter updates (gradients), and therefore do not need to share raw The optimal hyperparameters used for anomaly detection.
data but only aggregated network parameters, we performed two sets of a) Hyperparameters for iForest and OCSVM
experiments: we distributed the training data among a given number of
iForest contamination = [0.02,0.03], n_estimators = 100, max_samples = auto,
data owners where a) each partner exclusively owns a portion of the max_features = 1, warm_start = False
data for one specific intersection and b) each partner owns a mixed OCSVM kernel = rfb, ν = [0.02, 0.03],
subset of the three intersections. Federation can be organized in several local Nyström approx. (γ = .003), epochs = 20, power_t = 0.5, η0 = 0.02,
ways and for several reasons, like increasing diversity and preventing learning_rate: “invscaling”, optimizer = SGD, warm_start = True
OCSVM the same as for local model,
over-fitting. In both experiments we investigate the two possibilities of federated num_synch = 1‥N, depending on experiment
cooperation between data owners in a federated manner. With a) we
(b) Learning rate schedules for OCSVM with SGD
simulate a case when each data owner only possesses data from one
distribution, e.g., partners have data acquired only in one city or only at constant ηt = η0
optimal 1
a certain junction (traffic surveillance camera) or partners owning data ηt =
α(t0 + t)
of specific types (e.g., collected with logistics vehicles vs. normal cars). invscaling η0
ηt =
In b), we simulate the case that one data owner, in contrast to a), pos­ tpower_t
adaptive ηt = η0 until
sesses a different amount of data from multiple intersections, e.g.,
the stopping criterion
partners have a fleet of vehicles (like car manufactures or mobility is reached then ηt = ηt/5
services operators, with experienced and non-experienced drivers),
which are not equally distributed. Additionally, we compared the results
to a baseline experiment, in which the model was trained on all the data 5.5. Centralized learning
to investigate if data owners not only benefit from exchanging infor­
mation, but also can reach close to optimal results. This experiment As a preparation for the federated learning experiments and as an
analyzed only a theoretical optimum since every data owner needs to ‘ideal’ baseline experiment, we used a centralized approach assuming
share it’s whole data, which in reality most likely would not happen. that the whole dataset is centrally available. Experiments were con­
ducted in which the models were trained on each individual intersection
and evaluated on the combined test sets of all intersections. This results
5.3. Determination of hyperparameters were compared with models trained and tested on the combined set of
all intersections. The results reported in Table 6 prove that sharing in­
We applied the three described methods for each experiment, formation with other data owners is beneficial. As expected, the
choosing the hyperparameters by grid-search. For iForest we varied anomaly detection accuracy is worse for each individual partner ranging
different values for the contamination parameter in the interval [0.01, from a AUC-ROC score of 92,0% (SR) to 97,5% (EP) for iForest and
0.05]. For the OCSVM we explored different kernels (linear, polynomial, 98,1% (EP) to 98,7% (SR) for OCSVM in comparison to the scenario
rbf), different values of parameter η0 in the interval [0.01, 0.9] and ν in assuming that the whole dataset is centrally available (EP+MA+SR)
the interval [0.01, 0.4]. Further, we tested different learning rate with a AUC-ROC score up to 98,9%. This indicates that the cooperation
schedules (constant, optimal, invscaling, adaptive), see Table 4b. For the can improve the overall anomaly detection accuracy. Furthermore,
federated learning approach we varied the pre-defined number of syn­ comparing Table 6 with the results from Table 5 it can be seen that each
chronizations num_synch at each step, starting from the number of all intersection performs worse on the combined test set in comparison to
participants N, decreasing until only one randomly selected participant the test set from the individual intersections. In case of iForest the AUC-
sends its parameter updates to the central server. The finally chosen ROC score of EP decreases from 98,2% to 97,3%, MA from 97,7% to
parameters are presented in Table 4a. 95,9% and SR from 97,3% to 92,2%. In case of OCSVM the AUC-ROC
score of EP decreases from 98,5% to 98,1%, MA from 98,4% to 98,1%
5.4. Local learning and transfer and SR from 99,2% to 98,7%. This also supports the benefit of
cooperation.
We first checked the anomaly detection accuracy on each individual In addition, it was also investigated whether there is an effect of the
intersection using iForest and OCSVM, such that each model is trained fact that the datasets have an unequal number of data points. To
and tested at one intersection. Furthermore, we analyzed whether the investigate whether the different dataset sizes have an impact on the
model can be also transferred to other intersections, such that a model is overall anomaly detection accuracy, we tested the performances on the
trained at one intersection and tested at another intersection. original training set sizes of all intersections (unbalanced) and a
The results of the first two local experiments with iForest and balanced training set with an equal number of segments based on the
OCSVM regarding the anomaly detection accuracy and its generalization smallest training set size. Thereby, we achieved similar results for both
potential are shown in Table 5. In general, it can be observed that the methods used as reported in Table 6, showing that a balancing of the
models achieve a very high classification accuracy. The results show, training sets is not necessary in this case.
that while both methods iForest and OCSVM achieve a similar anomaly
detection accuracy, OCSVM outperforms iForest. Thereby, the OCSVM 5.6. Federated learning: overall benefit
models reach higher mean AUC-ROC scores, which are tested to be
statistically significant by ANOVA, with a smaller standard error by a The next experiment aimed to evaluate the potential benefits of a
factor of around 10. It can be seen that similar AUC-ROC scores could be federated learning approach. Therefore, we divided the training set
reached for each individual intersection: EP with 98,5%, MA with 98,4% among the organizations so that each intersection belongs to one orga­
and SR with 99,2%, proving that the method is suitable to detect nization. In this way, each data owner only has access to data from his
anomalies within intersections. Furthermore, the results show very good own vehicle fleet or sensors and contributes with its model parameters
transfer characteristics: although the anomaly detection accuracy, when to the federated learning solution. The experiment results are presented
training the model on one intersection and testing it on another can not in Table 7. Note, that Table 7 shows that the models were trained on the
be fully reached, the detection works still good: the AUC-ROC score for whole training set (EP+MA+SR), however it is not exactly true for the
EP decreases from 98,5% to 98,3% (MA) and 98,2% (SR), MA decreases federated learning, because the partners exchange the parameters but
from 98,4% to 98,1% (EP) and 98,0% (SR) and SR decreases from 99,2% not the data, thus the whole training set (EP+MA+SR) could be reached
to 98,8% (EP) and 98,5% (MA). only indirectly. We can see that the federated approach for three

7
C. Koetsier et al. ISPRS Open Journal of Photogrammetry and Remote Sensing 4 (2022) 100013

Table 5
AUC-ROC scores for all different local model combinations and datasets.
Training set EP MA SR EP EP MA MA SR SR

Test set EP MA SR MA SR EP SR EP MA
iForest local .982±.012 .977±.017 .973±.021 .972±.014 .968±.007 .952±.019 .930±.018 .887±.020 .968±0.013
OCSVM local .985±.001 .984±.001 .992±.002 .983±.001 .982±.001 .981±.002 .980±.002 .988±.003 .985±.001

Table 6
AUC-ROC scores of local models for different datasets in comparison to centralized models.
Training set, TTR / Test set, TTE EP / EP+MA+SR MA / EP+MA+SR SR / EP+MA+SR EP+MA+SR / EP+MA+SR

iForest local (unbalanced) .973±.012 .959±.013 .922±.014 .987±.002


iForest local (balanced) .975±.013 .961±.012 .920±.015 .988±.002
OCSVM local (unbalanced) .981±.003 .981±.004 .987±.002 .989±.001
OCSVM local (balanced) .983±.001 .981±.002 .987±.001 .989±.001

partners with their complete datasets allows to achieve the same could be optimized and the number of required synchronizations could
anomaly detection accuracy as the ‘ideal’ baseline model described be reduced. In Table 9 the two previously described experiments with 3
above, which is 98.9%. and 15 partners are considered. We reduced the number of synchroni­
Further, we analyzed how an increasing number of participants in­ zations num_synch, which means how many partners send their param­
fluences the prediction accuracy (Table 7); we randomly and equally eters to the central server. Thus, for 3 partners if num_synch = 3 all
divided each dataset among two partners, thus obtaining 6 partners. The partners share their parameters at each epoch. The results show that for
accuracy decreased slightly. However, still allowing to detect the 3 participants even when only one random partner sends its parameters
anomalies with 98.7% accuracy. Distributing the data in an analogue to the server the accuracy remains the same, at 98.9%. However, the
way among 15 partners, so that each partner has only 20% of data of one learning rate in case of num_synch is slightly lower (Fig. 5). Furthermore,
of the datasets, leads to accuracy in the range of around 98.7%. we experimented with 15 partners and detected the threshold in the
The last row of Table 7 represents the situation when the partners do value of num_synch. It was num_synch = 5, which is 33% of partners (the
not have the data from one intersection but from a subset of several same 33% as num_synch = 1 for 3 participants), which still led the same
intersections. In this experiment, the first partner had half of a training accuracy. Decreasing the num_synch to 1 reduced the accuracy to 98.4%.
set from EP and half of a training set from SR and so on. Note that the The obtained experimental results with the federated OCSVM
distribution of data for each partner was not equal. The experimental allowed us to make modifications in the standard federated learning
result in this case shows a comparable anomaly detection accuracy as architecture and to propose an alternative design to share the model
the previous results, of around 98.9%. parameters, as introduced in the bottom part of Fig. 1. Thus, we showed
(Fig. 5) that for our data it is not necessary to perform a parameter ag­
5.7. Federated learning: individual benefit of partners gregation step at the central server, but it is sufficient to broadcast the
parameters from a randomly selected participant (num_synch = 1) at
Our next aim was to show how federated learning demonstrates the each time unit. As presented, this has no effect (for 3 participants) or
benefits of each partner in the collaboration. We considered the same little effect (for 15 participants) on the model accuracy;it is only a
case as previously with 3 partners, who own the data from one inter­ different way of exchanging parameters. This dezentralised exchange
section. The experiment shows, that training with all data and applying would be not possible, if all the parameters from all participant had to be
it to the own dataset in a local fashion (‘ideal’ scenario) achieves a broadcast, because of the big network load. In contrast to this, when
slightly higher accuracy than doing it in a federated manner (Table 8). only one participant transmits its data at every epoch, it becomes
However, in order to evaluate the benefit of the federated learning, this possible with less data transfer.
accuracy has to be compared with the results from Table 5, where no
access to the partners’ data was possible: e.g., the owner of data set EP
yields an accuracy of 98.5% with the local approach, and 99.1% with the 5.9. Overall comparison of the centralized and federated approach
federated one; MA improved from 98.4% to 98.7%. This indicates that
the cooperation can not only improve the overall anomaly detection Fig. 6 presents a t-SNE plot for the classification of trajectory seg­
accuracy but is also beneficial for each individual partner. Both im­ ments using the federated OCSVM (N = 3) (top) and the (local) iForest
provements are proven to be statistically significant by the ANOVA test. (N = 1) (bottom). The true positives represent correctly classified
abnormal segments, true negatives are the normal segments. False
5.8. Federated learning: synchronization positives describe normal segments, which were classified as abnormal
segments and false negatives abnormal segments, which were classified
Furthermore, we researched how the federated learning architecture as normal segments. Overall, the anomalies (red) and normal trajectory
segments (blue) can be visually differentiated well. Remarkably, the
anomalies therefore seem to have similar attributes, even though
Table 7
AUC-ROC scores of federated OCSVM models.
Table 8
Training set, TTR EP+MA+SR
AUC-ROC scores for federated models on different datasets.
Test set, TTE EP+MA+SR
Training set, TTR EP+MA+SR EP+MA+SR EP+MA+SR EP+MA+SR
Exclusive training sets
Test set, TTE EP MA SR EP+MA+SR
3 partners - (EP;MA;SR) .988±.001
6 partners - 2x (0.5 EP;0.5 MA;0.5SR) .987±.001 OCSVM local .992±.001 .988±.001 .993±.001 .989±.001
15 partners - 5x (0.2 EP;0.2 MA;0.2SR) .987±.001 OCSVM federated .991±.001 .987±.001 .992±.001 .988±.001
Merged training sets 3 partners -
3 partners (0.5 EP+0.5SR;0.5 MA+0.5SR;0.5 EP+0.5 MA) .989±.003 (EP; MA; SR)

8
C. Koetsier et al. ISPRS Open Journal of Photogrammetry and Remote Sensing 4 (2022) 100013

Table 9
AUC-ROC scores for federated models depending on the number of synchronizations, num_synch.
OCSVM federated models Training set, TTR Test set, TTE num_synch AUC-ROC score

3 partners - (EP;MA;SR) EP+MA+SR EP+MA+SR 3,2,1 .988±.001


15 partners - EP+MA+SR EP+MA+SR 15,10 .987±.001
5x(0.2 EP;0.2 MA;0.2SR)
5 .985±.004
1 .984±.004

Fig. 5. Federated OCSVM model training with number of partners and different
parameter synchronization numbers.

different anomaly classes (Section 4) are taken from a real world dataset,
since they form a cluster in the t-SNE representation and are not scat­
tered within the normal trajectory segments. This proves that shallow
learning provides a good anomaly detection accuracy (99%) for our
dataset.
Fig. 7 (left) shows the confusion matrix for the best performing
method (fed. OCSVM, N = 3). We can see that only 1 out of 218
abnormal segments were falsely identified as normal, as well as 69 out of
3250 normal segments were categorized wrongly as abnormal. Fig. 7
(right) shows the confusion matrix for the a federated OCSVM model
with 15 partners. As the experiments show, the smaller the training set is
for each individual partner, the less accurate normal trajectories can be
Fig. 6. t-SNE plot for the classification of trajectory segments, federated
correctly identified as such.
OCSVM (N = 3) (top), iForest N = 1 (bottom).

5.10. Qualitative results

Fig. 8 illustrates some qualitative results, in which anomalies have


been successfully identified. Fig. 8 (left) shows a right of way violation of
car 99 leaving the parking lot and causing car 96 to make a sudden stop.
Fig. 8 (middle) presents an on street parking maneuver of car 483
causing following cars to overtake and by this, narrowing down the
street, leading to a traffic jam. Fig. 8 (right) demonstrates an actual
accident: car 31 stops in front of the intersection, reverses without any
visible reason and thereby crashes into car 34. The false positives often Fig. 7. Confusion matrix with fed OCSVM, N = 3 (left), N = 15 (right).
represent cases where a normal slowdown or waiting situation has
erroneously been classified as anomaly. Also, in some cases a sharp turn
into a parking lot was wrongly classified; most likely because this situ­ to share their raw data, but still want to have the benefit of exploiting
ations are similar to a u-turn behavior. the information from other data owners data. This enables organizations
to share data by adhering to data privacy rules. This approach has the
6. Discussion potential to enable and facilitate data sharing alliances.
Considering our first research question, the experimental results
In this paper, we use ML methods to detect anomalous behavior in show that shallow learning models like OCSVM and iForest provide
vehicle trajectories, which is possible with very high accuracy. We accurate anomaly detection for the considered datasets. In the local
presented a federated approach for exchanging and processing data by scenarios for each individual intersections AUC-ROC scores between
sharing only model parameters. Thereby, we proposed and verified a 98.4% and 99.2% were reached. Observing the t-SNE plot (Figs. 4 and 6
federated learning reference architecture, which is especially relevant (right)) one can see that the anomalies are very well clustered and thus
when data is distributed among various data owners, which do not want are separable from the normal observations, which could be the reason

9
C. Koetsier et al. ISPRS Open Journal of Photogrammetry and Remote Sensing 4 (2022) 100013

Fig. 8. Examples of correctly identified anomalies.

why shallow learning methods were so successful. Thereby, large data providers could also gain data from smaller, but
Regarding our second research question, it could be shown that the more specific data owners and complete their own dataset. This allows
chosen methods can be generalized for not equally distributed data, such to increase detection accuracy, which is especially relevant for a tech­
that a model trained at one intersection achieves a comparable anomaly nology of safety and assistance.
detection accuracy on other intersections. This could be achieved by However, there are some potential obstacles that could inhibit the
using normalized, location independent data features like relative po­ data sharing. Such a cooperation might not be equally useful for all
sition and heading, speed and acceleration. In case training set and test partners of a potential sharing alliance. Since it is likely that the partners
set are from different intersections, the anomaly detection accuracy is will differ in terms of available data, data processing capacity or actual
comparable but slightly worse in comparison when the datasets are from demand, the distribution mechanism between participants should
the same intersection. This is to be expected since the training- and test feature mechanisms for a fair compensation and safe exchange (Ghoshal
set are from different datasets and thus have different distributions. et al., 2020). This could be especially relevant in case that critical data
For the third research question, we found evidence that federated owners like cities with access to unique data (e.g., from surveillance
learning improves the detection accuracy of anomalous vehicle cameras) might not have a superficial benefit from sharing data because
trajectories and also of each individual partner. We showed that of no direct compensation. Hence, the compensation could depend on
federated learning can improve the anomaly detection accuracy for the extent and quality of the data shared. Here, especially the diversity
individual data owners from 97% to 99% in case when each partner of data contributed could be an additional quality asset. Moreover, by
possesses the dataset of one intersection. For larger alliances, when each cooperating with organizations from different industries could increase
partner owns and contributes less data of the overall available data, the transaction costs for initiating and maintaining the relationship.
benefit increases even further. Hence, the detection accuracy will Although, this could also enable new or strengthen established business
improve applying a federated learning approach. The federated OCSVM models in the respective industry of a focal organization. These obstacles
model leads to the same near 99% accuracy as with the whole dataset provide future research opportunities considering the implementation of
independent on the number of participants. This underlines that sharing a federated learning approach.
more of the same data implies diminishing returns of utility and Overall, our findings indicate that sharing data is advantageous. This
emphasizes the importance of sharing data with different stakeholders implies that the federated approach has the potential to compensate the
in order to retrieve a larger spectrum of various scenarios. Furthermore, disadvantage of not having all data available in one data source through
we considered the situation when each partner had the data from a collaboration. These findings are in line with e.g., Davis (2016) or
subset of intersections and the distribution of data for each partner still Ghosal et al. (2020), though we show a novel method of how to enable
remained not equal. The experimental result in this case showed a and facilitate the inter-organizational exchange.
comparable anomaly detection accuracy with the previous results, of
around 98.9%. Moreover, our experiments show that the number of the 7. Conclusion
partners, which send their parameters to the server at each epoch can be
significantly reduced. Thus, we concluded that we can get almost the In this study, we successfully proposed and verified a federated
same anomaly identification accuracy, when only 33% randomly learning architecture and the corresponding federated OCSVM for the
selected participants send their data to the server at each epoch. For the collaborative identification of anomalous trajectories at intersections.
case with 15 partners, we found that if only one randomly selected This approach achieves a comparable accuracy with the ‘ideal’
partner transmits its parameters then still the accuracy of around 98.4% centralized approach of around 98%–99%, reduces the local data la­
can be achieved. This considerably reduces the communication load beling load, keeps individual data private and provides – with an
of the federated server and makes perspectives to use peer-to-peer increasing number of partners – more accurate anomaly detection
federated learning architecture for such kind of applications (Fig. 1 models than the local models of each partner.
(bottom)). This is especially relevant as in practice datasets often exist in form of
Considering the fourth research question, our results indicate that isolated islands, which implies that the information necessary for
individual data owners benefit from adopting a federated learning detecting anomalies are distributed among different stakeholders. These
approach for sharing model parameters as proposed in our paper. This is stakeholders are likely to vary in size and type. With the federated
especially true when relying on fragmented data islands for detecting approach, every stakeholder knows only a fraction of the ‘truth’, so that
anomalies in trajectories otherwise, so that sharing data offers an sharing data with other stakeholders is likely to be beneficial to achieve
adequate return by receiving data. Through collaboration the organi­ an innovative solution in the field of safety and security technology.
zation gains access to data that it would otherwise not be able to obtain, Following a federated learning approach, only relevant data parameters
or only at great cost. We demonstrate that this is the case when an or­ will be exchanged, so that data privacy is guaranteed. In sum, the pro­
ganization has a fraction of data from every intersection as well when posed approach contributes to increase road safety by detecting anom­
having only fractions of data from some intersections. In addition, also alous behavior through sharing model parameters. Therefore, assistance
organizations that have access to larger portions of data could be systems and safety systems can be further developed and prevent
encouraged to share data because we could show that due to collabo­ accidents.
ration each individual needs to provide less data to still reach excellent In future research, we will address the detection accuracy for indi­
performances, whereby local data labeling procedures can be reduced. vidual anomaly classes and estimate the efficiency of the federated

10
C. Koetsier et al. ISPRS Open Journal of Photogrammetry and Remote Sensing 4 (2022) 100013

approach in this case. Additionally, because of a relation between cho­ Ito, R., Tsukada, M., Matsutani, H., 2020. An On-Device Federated Learning Approach for
Cooperative Anomaly Detection arXiv:2002.12301.
sen segment length and number of points for the results of the anomaly
Kairouz, P., et al., 2021. Advances and open problems in federated learning. Found.
detection part, we will perform an in-depth analysis about minimum Trend Mach. Learn. 14 (1–2), 1–210.
requirements for the segment length and number of points of a segment Kaplan, M., Alptekin, S., 2020. An improved BiGAN based approach for anomaly
in future work. Also, investigations regarding the compensation for each detection. Procedia Comput. Sci. 176, 185–194.
Knorr, E., Ng, R., Tucakov, V., 2000. Distance-based outliers: algorithms and
partners contributions will be undertaken. One aspect would be to applications. The VLDB J. 8, 237–253.
consider the degree of novelty or complementarity of the provided data. Koetsier, C., Fiosina, J., Gremmel, J.N., Sester, M., Müller, J.P., Woisetschläger, D., 2021.
Further, we aim to label and make use of the full INTERACTION dataset Federated cooperative detection of anomalous vehicle trajectories at intersections.
In: Proc. Of the 4th ACM SIGSPATIAL Int. Workshop on Advances in Resilient and
to investigate the performance across intersections from different parts Intelligent Cities, ARIC ’21. Association for Computing Machinery, New York, NY,
of the world, reflecting various, also culturally grounded, driving USA, pp. 13–22.
behaviors. Konečný, J., McMahan, H.B., Yu, F.X., Richtarik, P., Suresh, A.T., Bacon, D., 2016.
Federated learning: strategies for improving communication efficiency. In: Proc. Of
NIPS Workshop on Private Multi-Party Machine Learning.
Declaration of competing interests Kumar, D., Bezdek, J.C., Rajasegarar, S., Leckie, C., Palaniswami, M., 2017. A visual-
numeric approach to clustering and anomaly detection for trajectory data. Vis.
Comput. 33 (3), 265–281.
The authors declare that they have no known competing financial Laskar, M.T.R., Huang, J.X., Smetana, V., Stewart, C., Pouw, K., An, A., Chan, S., Liu, L.,
interests or personal relationships that could have appeared to influence 2021. Extending isolation forest for anomaly detection in big data via k-means. ACM
the work reported in this paper. Trans. Cyber-Phys. Syst. 5 (4), 26.
Lee, J.-G., Han, J., Li, X., 2008. Trajectory outlier detection: a partition-and-detect
framework. In: Proc. Of the 2008 IEEE 24th Int. Conf. on Data Engineering, ICDE
Acknowledgement ’08. IEEE Computer Society, USA, pp. 140–149.
Li, X., Zhao, K., Cong, G., Jensen, C.S., Wei, W., 2018. Deep representation learning for
trajectory similarity computation. In: 2018 IEEE 34th Int. Conf. On Data
The research was funded by the Lower Saxony Ministry of Science Engineering. ICDE), pp. 617–628.
and Culture under grant number ZN3493 within the Lower Saxony Liu, F.T., Ting, K.M., Zhou, Z.-H., 2008. Isolation forest. In: Proc. Of IEEE Int. Conf. on
“Vorab” of the Volkswagen Foundation and supported by the Center for Data Mining, pp. 413–422.
Liu, F.T., Ting, K.M., Zhou, Z.-H., 2012. Isolation-based anomaly detection. ACM Trans.
Digital Innovations. Knowl. Discov. Data 6 (1), 39.
Ma, C., Miao, Z., Li, M., Song, S., Yang, M.-H., 2019. Detecting anomalous trajectories via
References recurrent neural networks. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (Eds.),
Computer Vision – ACCV 2018. Springer Int. Pub., Cham, pp. 370–382.
Ma, Q., Sun, C., Cui, B., Jin, X., 2021. A novel model for anomaly detection in network
Audretsch, D.B., Belitski, M., 2020. The limits to collaboration across four of the most
traffic based on kernel support vector machine. Comput. Secur. 104, 102215.
innovative UK industries. Br. J. Manag. 31 (4), 830–855.
McMahan, H.B., Moore, E., Ramage, D., y Arcas, B.A., 2017. Federated learning of deep
Bakopoulou, E., Tillman, B., Markopoulou, A., 2021. Fedpacket: A Federated Learning
networks using model averaging. Artif. Intell. Stat. 54, 1273–1282.
Approach to Mobile Packet Classification. IEEE Transactions on Mobile Computing.
Morris, B.T., Trivedi, M.M., 2011. Trajectory learning for activity understanding:
Belhadi, A., Djenouri, Y., Lin, J.C.-W., Cano, A., 2020. Trajectory outlier detection:
unsupervised, multilevel, and long-term adaptive approach. IEEE Trans. Pattern
algorithms, taxonomies, evaluation, and open challenges. ACM Trans. Manage. Inf.
Anal. Mach. Intell. 33 (11), 2287–2301.
Syst. 11 (3).
Ramyar, S., Homaifar, A., Karimoddini, A., Tunstel, E., 2016. Identification of anomalies
Belhadi, A., Djenouri, Y., Srivastava, G., Cano, A., Lin, J.C.-W., 2021. Hybrid group
in lane change behavior using one-class svm. In: 2016 IEEE International Conference
anomaly detection for sequence data: application to trajectory data analytics. IEEE
on Systems, Man, and Cybernetics (SMC), pp. 4405–4410.
Trans. Intell. Transport. Syst. 1–12.
Rasouli, M., Sun, T., Rajagopal, R., 2020. FedGAN: Federated Generative Adversarial
Bellet, A., Guerraoui, R., Taziki, M., Tommasi, M., 2018. Personalized and private peer-
Networks for Distributed Data arXiv:2006.07228.
to-peer machine learning. In: Proceedings of the 21st International Conference on
Roy, A.G., Siddiqui, S., Pölsterl, S., Navab, N., Wachinger, C., 2019. Braintorrent: A Peer-
Artificial Intelligence and Statistics (AISTATS) 2018 84.
To-Peer Environment for Decentralized Federated Learning, CoRR Abs/1905,
Cheng, H., Sester, M., 2018. Mixed traffic trajectory prediction using LSTM–based
p. 6731.
models in shared space. In: Mansourian, A., Pilesjö, P., Harrie, L., van Lammeren, R.
Ruff, L., Vandermeulen, R., Goernitz, N., Deecke, L., Siddiqui, S.A., Binder, A., Müller, E.,
(Eds.), Geospatial Technologies for All. Springer Int.Pub., Cham, pp. 309–325.
Kloft, M., 2018. Deep one-class classification. In: Dy, J., Krause, A. (Eds.),
Chesbrough, H., Brunswicker, S., 2014. A fad or a phenomenon? The adoption of open
Proceedings of the 35th International Conference on Machine Learning, vol. 80. of
innovation practices in large firms. Res. Technol. Manag. 57 (2), 16–25.
Proceedings of Machine Learning Research, PMLR, pp. 4393–4402.
Cyffers, E., Bellet, A., 2021. Privacy Amplification by Decentralization, p. 5326 arXiv:
Schölkopf, B., Williamson, R., Smola, A., Shawe-Taylor, J., Platt, J., 1999. Support vector
2012.
method for novelty detection. In: Proc. Of the 12th Int. Conf. on Neural Information
Davis, J.P., 2016. The group dynamics of interorganizational relationships: collaborating
Processing Systems, NIPS’99. MIT Press, Cambridge, MA, USA, pp. 582–588.
with multiple partners in innovation ecosystems. Adm. Sci. Q. 61 (4), 621–661.
Shalev-Shwartz, S., Singer, Y., Srebro, N., 2007. Pegasos: primal estimated sub-gradient
Djenouri, Y., Belhadi, A., Lin, J.C.-W., Djenouri, D., Cano, A., 2019. A survey on urban
solver for svm. In: Proc. Of ICML, pp. 807–814.
traffic anomalies detection algorithms. IEEE Access 7, 12192–12205.
Shouyu, L., Zhang, K., Fang, W., Zhou, Z., Hu, R., Zhu, W., Li, Y., Wang, Y., Hou, J., 2020.
Dridi, A., Boucetta, C., Hammami, S.E., Afifi, H., Moungla, H., 2021. Stad: spatio-
Anomaly detection of power grid dispatching platform based on isolation forest and
temporal anomaly detection mechanism for mobile network management. IEEE
k-means fusion algorithm. J. Phys. Conf. 1601, 22010.
Trans. Netw. Serv. Manag. 18 (1), 894–906.
Smolyak, D., Gray, K., Badirli, S., Mohler, G., 2020. Coupled IGMM-GANs with
Faems, D., de Visser, M., Andries, P., Looy, B.V., 2020. Technology alliance portfolios
applications to anomaly detection in human mobility data. ACM Trans. Spatial
and financial performance: value-enhancing and cost-increasing effects of open
Algorithm. Syst. 6, 1–14.
innovation. J. Prod. Innovat. Manag. 27 (6), 785–796.
Tao, X., Peng, Y., Zhao, F., Zhao, P., Wang, Y., 2018. A parallel algorithm for network
Fawcett, T., 2006. An introduction to ROC analysis. Pattern Recogn. Lett. 27 (8),
traffic anomaly detection based on isolation forest. Int. J. Distributed Sens. Netw. 14
861–874.
(11), 1550147718814471.
Fiosina, J., 2021. Explainable federated learning for taxi travel time prediction. In: Proc.
Truong, N., Sun, K., Wang, S., Guitton, F., Guo, Y., 2021. Privacy preservation in
Of the 7th Int. Conf. on Vehicle Technology and Intelligent Transport Systems:
federated learning: an insightful survey from the gdpr perspective. Comput. Secur.
VEHITS2021. SciTePress, pp. 670–677.
110, 102402.
Ge, Y., Xiong, H., Liu, C., Zhou, Z.-H., 2011. A taxi driving fraud detection system. In:
Vapnik, V.N., 1998. Statistical Learning Theory. Wiley-Interscience.
IEEE 11th Int. Conf. On Data Mining, pp. 181–190.
Vikram, A., 2020. Mohana, Anomaly detection in network traffic using unsupervised
Ghoshal, A., Kumar, S., Mookerjee, V., 2020. Dilemma of data sharing alliance: when do
machine learning approach. In: 2020 5th International Conference on
competing personalizing and non-personalizing firms share data. Prod. Oper. Manag.
Communication and Electronics Systems. ICCES), pp. 476–479.
29 (8), 1918–1936.
Wang, Z., Zheng, Z.E., Jiang, W., Tang, S., 2021. Blockchain-enabled data sharing in
Han, X., Chen, X., Liu, L.-P., 2021. GAN ensemble for anomaly detection. Proc. AAAI
supply chains: model, operationalization, and tutorial. Prod. Oper. Manag. 30 (7),
Conf. Artif. Intell. 35 (5), 4090–4097.
1965–1985.
Hofmockel, J., Sax, E., 2018. Isolation forest for anomaly detection in raw vehicle sensor
Williams, C., Seeger, M., 2001. Using the nyström method to speed up kernel machines.
data. In: Proceedings of the 4th International Conference on Vehicle Technology and
In: Advances in Neural Information Processing Systems, vol. 13. MIT Press,
Intelligent Transport Systems - VEHITS, INSTICC. SciTePress, pp. 411–416.
pp. 682–688.
Huang, H., Zhang, L., Sester, M., 2014. A recursive bayesian filter for anomalous
Yang, Q., Liu, Y., Chen, T., Tong, Y., 2019. Federated machine learning: concept and
behavior detection in trajectory data. In: Connecting a Digital Europe through
applications. ACM Trans. Intell. Syst. Technol 10 (2), 12.
Location and Place. Springer, pp. 91–104.
Yang, K., Kpotufe, S., Feamster, N., 2021. An Efficient One-Class SVM for Anomaly
Huang, Z., Mitra, S., Vaidya, N., 2015. Differentially private distributed optimization. In:
Detection in the Internet of Things, CoRR abs/2104, p. 11146.
Proc. Of the 2015 Int. Conf. on Distributed Computing and Networking, ICDCN ’15.
Association for Computing Machinery, New York, NY, USA.

11
C. Koetsier et al. ISPRS Open Journal of Photogrammetry and Remote Sensing 4 (2022) 100013

Yu, H., Jiang, X., Vaidya, J., 2006. Privacy-preserving SVM using nonlinear kernels on Zhang, J., Jiang, H., Wu, R., Li, J., 2019a. Reconciling the dilemma of knowledge
horizontally partitioned data. In: Proc. Of the 2006 ACM Symposium on Applied sharing: a network pluralism framework of firms’ R&D Alliance Network and
Computing. Innovation Performance. J. Manag. 45 (7), 2635–2665.
Zhan, W., Sun, L., Wang, D., Shi, H., Clausse, A., Naumann, M., Kümmerle, J., Zhang, X., Qiao, M., Liu, L., Xu, Y., Shi, W., 2019b. Collaborative cloud-edge
Königshof, H., Stiller, C., de La Fortelle, A., Tomizuka, M., 2019. INTERACTION computation for personalized driving behavior modeling. In: Proc. Of the 4th ACM/
Dataset: an INTERnational, Adversarial and Cooperative moTION Dataset in IEEE Symposium on Edge Computing, SEC ’19. Association for Computing
Interactive Driving Scenarios with Semantic Maps [cs, eess] 1910.03088. Machinery, New York, NY, USA, pp. 209–221.
Zhang, X., Gu, C., Lin, J., 2006. Support vector machines for anomaly detection. In: 2006 Zhu, J., Jiang, W., Liu, A., Liu, G., Zhao, L., 2015. Time-dependent Popular Routes Based
6th World Congress on Intelligent Control and Automation, vol. 1, pp. 2594–2598. Trajectory Outlier Detection, pp. 16–30.
Zhang, D., Li, N., Zhou, Z.-H., Chen, C., Sun, L., Li, S., 2011. Ibat: Detecting Anomalous
Taxi Trajectories from gps Traces, pp. 99–108.

12

You might also like