Professional Documents
Culture Documents
Public Version
Public Version
Input
of sensor value A 1) Artificial Neural Networks (ANNs): We use an ANN
sensor
value
in a supervised training setup to map the high-dimensional
fractional
binding
binding
bundling
vector C to a driving style class. Since the different sensor
vi,t s .vi,t
BASE binding C types and the temporal context are already encoded in the
input vector, a simple fully-connected feed-forward network
Output A single vector C that encodes with one hidden layer is sufficient. Although we work in the
all sensor values at all timesteps
space of complex numbers, it should be noted that the vector
C only stores the angles of the complex numbers on the unit
Fig. 3. Schema of context encoding with HDC operations. Inputs are
(scalar) sensor values from different sensors at different time-steps. The
circle. Therefore, it is a vector of real numbers, that can be
blue part shows the encoding across multiple sensors, the red part across processed by standard neural networks. Sec. IV-A provides
multiple time-steps. All shown vectors are from the same vector space. The implementation details.
output is a single vector that encodes all information and can, e.g, be fed
to a (feed-forward neural network) classifier.
2) Spiking Neural Networks (SNNs): In contrast to rate-
based ANNs used in “traditional” machine learning, Spiking
B. Vector representation of time series sensor signals Neural Network (SNN) employ discrete spikes rather than
average firing rates to encode information and are therefore
Fig. 3 visualizes our approach of encoding a time series
often referred to as the third generation of neural networks.
of sensor signal in the substrate of high-dimensional vectors
In theory [42], SNNs have at least the same computational
using fractional binding. In the first step, we represent each
power as traditional neural networks of similar size. One
sensor type by a randomly chosen vector SENSORi . Given
major hindrance for the widespread adoption of SNNs how-
values vi,t measured by sensor SENSORi for i = 1, · · · , n
ever, is that, due to the binary output of the spiking neurons,
at time-step t, we use fractional binding to encode all sensor
the neural network is not differentiable and thus standard
measurements into one representational vector by
training procedures such as backpropagation are not directly
n
M applicable. One option to circumvent this problem is to
Mt = SENSORi ⊗ BASEs·vi,t , (3) train the network with a differentiable approximation of
i=1 the spiking neurons and replace this approximation with
the actual spiking neurons during inference [7]. Spiking
where BASE is the initial base vector for the scalar-value neurons as algorithmic substrate are appealing for tasks with
encoding, which is also randomly chosen and the same for all tight restrictions regarding power consumption, particularly
sensor-types. The parameter s is the aforementioned scaling automotive applications, since they promise to be more
factor. In the second step, we again assign a randomly chosen energy-efficient when deployed on neuromorphic hardware.
vector TIMEt to represent each step t of the time series For our SNN in this paper, we adopt the ANN’s architecture
to be encoded. To encode the temporal context in high- with some alterations in terms of hyper-parameters.
dimensional vectors, we bind each measurement vector Mt
with the time stamp vector TIMEt representing the time- IV. EXPERIMENTS
step t, i.e., Mt ⊗TIMEt . Ultimately, bundling or summation We will first present the experimental setup including
of all vectors representing the sensor values at each time-step datasets, compared algorithms, as well as training and pa-
yield our final context vector, i.e., rameter settings. This is followed by evaluation of driving
M style classification performance, data efficiency, parameter
C= Mt ⊗ TIMEt (4)
influences, and runtime.
t∈I
n
!
M M A. Experimental setup
s·vi,t
= SENSORi ⊗ BASE ⊗ TIMEt ,
1) UAH-DriveSet data set: Since we primarily compare
t∈I i=1
our proposed method with the LSTM approach from [4], we
where I denotes the time interval, i.e., all time-steps t of use the same data set for experimental evaluation: Romera et
the current sliding window. The result is a high-dimensional al. [3] introduced a data set with more than 8 h driving scenes
vector C containing all sensor values combined with the for driving behavior analysis and classification. The sensor
corresponding sensor type for every time-step in the current values are collected using a smartphone with a monitoring
sliding window I. Finally, we standardize the resulting application (shown in Fig. 4). Six different drivers generate
vectors per dimension by their mean and standard devia- data for two types of roads: secondary and motorway. All
tion, since [41] showed promising results in applying this recorded sequences are labeled with one of three classes:
approach. normal, aggressive, or drowsy driving style.
Vector-Encodings Prediction Models
Multivariate
time series data LSTM
Fig. 4. Screenshot from the UAH-DriveSafe reader tool, which recorded 3) Evaluated approaches: The most related approach
the data with a smartphone. Image source [3]. from the literature is the LSTM from [4] that directly
operates on the raw sensor data (that is pre-processed as
The provided data contains raw sensor values from GPS described in Sec. IV-A.1, but is still in the raw form of
and IMU, but also additional pre-processed data such as road multivariate time series data). Fig. 5 provides an overview
lanes and vehicle detections from the smartphone camera. of all evaluated methods, which are different combinations
We use the same 9 different sensor values as [4]: IMU data of vector encodings and prediction models.
for acceleration AccX , AccY , AccZ , Roll, P itch, and Y aw; Encodings: We compare our HDC encoding approach with
Speed from GPS; as well as Distance to ahead vehicle and two alternative approaches to generate vector representations
number of detected vehicles obtained from the camera. from multivariate time series data: Concatenate Input Se-
We also use the same data preprocessing [4]: resampling quences contain the raw input data from [4] but all sensor
to constant frequency followed by per-dimension standard- sequences are concatenated together (similar to time series
ization to mean 0 and standard deviation 1. To generate the processing in [45]) – hence, the outcome is a 64 × 9 =
sequence samples, Saleh et al. [4] used rolling windows with 576 dimensional vector; Spectral feature vectors are built
a size of 64 time-steps and 50 % overlap. The resulting total with a discrete Fourier-transformation for each sequence and
number of sequence samples is 9499. For training of the sensor-type, and the resulting magnitude of each frequency
prediction models (HDC does not require training) we use is concatenated to a final spectral feature vector.
the same random training and test split with a ratio of 2:1. Prediction models: Each of these vector encodings can
2) Parameters and training setup: The default setup for be combined with different prediction models. Besides the
the HDC encoding uses 2048 dimensional vectors and scal- FF-ANN and SNN described in Sec. III-C, we evaluate the
ing s = 6 in fractional binding. The ANN classifier uses following baseline models: SVM is a Support Vector Machine
40 neurons in the hidden layer and 75 % dropout between classifier. We use the standard Matlab implementation with
the input and the hidden layer. These parameters result an error-correcting output code for multi-class classification.
in a very similar number of parameters for ANN training k-NN is a simple k nearest neighbour classifier using a stored
compared to the LSTM [4] (cf. Table IV). An evaluation of lookup table of all training sample data that outputs the most
these parameters will be topic of Sec. IV-D. The ANN was frequently occurring class of the k neighbours (similar to
implemented in Keras and Tensorflow and was trained with [2]).
a learning rate of 0.001, a cross-entropy-loss-function, the 4) Metric: We use a standard evaluation procedure based
adam optimizer, and a batch size of 1500 for 2000 epochs. on ground-truth information (labels) about the corresponding
class to a given sequence. The evaluation metric is weighted
Similar to our ANN model, the SNN’s architecture em-
F1-Score for multiple classes computed by Python Scikit-
bodies one hidden layer of neurons. In this work, we use the
learn’s report function.
simple Leaky-Integrate-and-Fire (LIF) spiking neuron model
[43]. Since networks of spiking neurons are not differen-
B. Driving style classification results
tiable, we train the network with the rate approximation of
the LIF neuron using back-propagation for 200 epochs with a The main result of this paper is presented in Table I
batch size of 500, a drop-out rate of 50 %, the cross-entropy- that shows driving style classification results on the UAH
loss function and the RMSprop optimizer with a learning dataset. To account for randomness within ANNs, we repeat
rate of 0.001. We implemented our SNN in the Nengo each experiment 10 times and report means and standard
simulator [44] and its extension Nengo-DL [8] integrating deviations. We observe that the combination of HDC with the
deep learning approaches from Tensorflow and replace the simple ANN prediction model achieves the best results. The
rate approximation with the actual LIF spiking neuron model results are almost preserved when using a smaller number
during inference. Empirical evaluations similar to those of of HDC dimensions (reduction from 2048 to 1024) which
Table III showed that the SNN achieved its best performance also leads to significantly smaller number of parameters in
with 1000 neurons in the hidden layer. the ANN (reduction from 82,083 to 41,123, cf. Table IV).
TABLE I Data efficiency
1
D RIVING STYLE CLASSIFICATION RESULTS OF ALL MODELS (F1 SCORE
ON THE UAH TEST SET ). B EST RESULTS ARE IN BOLD FONT. A LL 0.8
10 TIMES - RESULTS ARE MEAN AND
F 1 Score
LEARNED MODELS WERE RUN
STANDARD DEVIATION . 0.6
0.4 LSTM-ANN
vector prediction HDC UAH-DriveSet [3]
encod. model dim. Secondary Motorway Full HDC-ANN
HDC ANN 2048 0.99 ±0.00 0.94 ±0.00 0.94 ±0.00 0.2
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
(ours) (ours) 1024 0.98 0.90 0.90
training volume
576 0.97 ±0.00 0.90 ±0.00 0.89 ±0.00
SNN 2048 0.99 ±0.01 0.94 ±0.00 0.92 ±0.02
(ours) 1024 0.93 0.88 0.91 Fig. 6. Data efficiency of the original LSTM-ANN and the HDC-ANN
SVM 576 0.84 0.72 0.71 (with 2048 dimensions, scaling of 6 and 40 neurons in the ANN hidden
kNN 576 0.93 0.91 0.90 layer).
TABLE II
(none) LSTM [4] 0.91 ±0.05 0.82 ±0.03 0.88 ±0.04
Concat ANN 0.77 ±0.01 0.62 ±0.01 0.61 ±0.01 F1 - SCORES WITH 3- FOLD - CROSS VALIDATION ON THE UAH-D RIVE S ET
SVM 0.51 0.35 0.30 [3].
kNN 0.89 0.82 0.84
Spect ANN 0.67 ±0.04 0.67 ±0.02 0.64 ±0.03 vector prediction HDC Split Split Split
SVM 0.70 0.71 0.61 encod. model Dim 1 2 3
kNN 0.88 0.75 0.81 HDC ANN 2048 0.46 ±0.01 0.54 ±0.01 0.41 ±0.01
(ours) SNN 2048 0.46 ±0.01 0.54 ±0.05 0.40 ±0.02
kNN 576 0.45 0.45 0.41
Note, that the performance of the SNN with the same (none) LSTM [4] 0.43 ±0.02 0.48 ±0.04 0.41 ±0.03
number of neurons (40) in the hidden layer drops to 0.85 Spect kNN 0.42 0.41 0.40
after conversion to spikes compared to the rate-based net-
work. By increasing the number of neurons in the hidden an unknown driver.
layer to 1000 neurons, we are able to reach a compara-
ble performance of 0.92, hence the SNN requires a much C. Data efficiency
larger number of hidden neurons to achieve similar results. To evaluate the required amount of training data, we
However, we expect that our SNN will provide significant analyzed the data efficiency of the LSTM [4] model and
improvements in terms of energy-efficiency on dedicated our proposed HDC approach. We varied the training volume
neuromorphic hardware such as Intel’s Loihi system [46], between 20 % and 100 % of the full training set, the test set
where one chip is able to process up to 130, 000 spiking is unaltered. The results in Fig.6 show that the HDC-ANN is
neurons with significantly lower power consumption than considerably more data efficient than the LSTM. This means
traditional computing hardware [9] that learning the internal time representation in this recurrent
The Concat and Spect vector encodings each create a 576 LSTM requires more training examples than generating the
dimensional vector. For a fair comparison, we also report temporal structure through systematic application of HDC
the performance of the HDC approach when using the same (that does not require training) and then using a simple feed-
number of dimensions in combination with the SVM and forward ANN.
kNN prediction models. The results demonstrate the general
potential of HDC encodings of multivariate time series data D. Parameter Evaluation
beyond the combination with neural networks. The previous experiments used the default parameters as
The results from Table I were obtained using the exact described in Sec. IV-A.2. Table III shows the influence on
same training and evaluation procedure as in [4]. In addition the classification performance when varying the number of
to the random train-test split from [4], we also evaluated dimensions, the scaling parameter in the fractional binding,
a 3-fold cross-validation with three independent blocks of and the number of neurons in the hidden layer of the ANN.
data. These three blocks contain the consecutive ordered The results indicate that the scale parameter of the fractional
driver records (one block contains at least one complete binding is the only parameter which is potentially hard
driver record). During cross-validation, we used two blocks to choose. Fortunately, the performance degrades smoothly
for training and one for testing. Using these blocks ensures when moving away from the sweet-spot of the default value
that there are previously unseen drivers in the test data. 6. For the two other parameters (numbers of dimensions
Results are reported in Table II. Although the HDC encoded and neurons), increasing their values tends to improve the
vectors with the ANN achieves better classification results performance. However, this also changes the number of
than the LSTM network, the overall results show that this weights in the ANN as provided in Table IV. For the default
is a considerably more challenging task than the random setup of 2,048 dimensions and 40 neurons the number of
split from [4] since all models show a significant decrease weights is comparable to the LSTM [4].
in performance compared to the random splits from Table I.
This indicates that the recorded sessions are either relatively E. Complexity and runtime
dissimilar to each other or the number of training samples is Table V shows runtime measures of the original LSTM
too small to generalize to completely unseen sessions, e.g., model from [4] and the HDC models using a Nvidia GTX
TABLE III
R ESULTS OF OUR HDC APPROACH WITH ANN PREDICTION MODEL AND DIFFERENT PARAMETER RANGES . DATA SET IS THE UAH-D RIVE S ET [3].
B EST RESULTS ARE HIGHLIGHTED IN BOLD .
TABLE IV TABLE V
N UMBER OF TRAINABLE PARAMETERS FOR OUR HDC MODEL . C OMPUTING TIME ON PC WITH N VIDIA G E F ORCE GTX 1080 T I ON
O RIGINAL LSTM MODEL HAS 81,703 TRAINABLE PARAMETER . THE UAH-D RIVE S ET. S ECOND TERM IN HDC TRAINING REPRESENTS
THE ENCODING TIME FOR THE CONTEXT VECTOR . I NFERENCE TIMES
# hidden Dim 20 40 60 80 100 FOR SNN ARE IN PARENTHESES SINCE THEY ARE COMPUTED BY
# Dim. = 512 10,323 20,643 30,963 41,283 51,603 SIMULATION ON A CONVENTIONAL GPU. (1) AND (2) MEANS
# Dim. = 1024 20,563 41,123 61,683 82,243 102,803
DIFFERENT INFERENCE METHODS – SEE TEXT.
# Dim. = 2048 41,043 82,083 123,123 164,163 205,203