Professional Documents
Culture Documents
Hdc-Minirocket: Explicit Time Encoding in Time Series Classification With Hyperdimensional Computing
Hdc-Minirocket: Explicit Time Encoding in Time Series Classification With Hyperdimensional Computing
Abstract—Classification of time series data is an important A simple 2-class time series classification problem:
task for many application domains. One of the best existing Gaussian noise with a single sharp peak either in the
methods for this task, in terms of accuracy and computation first half (class 1) or second half (class 2) of the signal
arXiv:2202.08055v1 [cs.LG] 16 Feb 2022
Wi , resulting in 119 · 84 = 9, 996 different binary vectors This is just another way to compute output vectors that point
ck,d,b ∈ {0, 1}T (T is the length of the input time series x). in the same directions as the MiniROCKET output – which
Again, for the details, please refer to [9]. preserves cosine similarities between vectors. But much more
The final step of MiniROCKET is to compute each of the importantly, it encourages the integration of a second HDC
9,996 dimensions of the output descriptor y P P V as the mean operation, binding ⊗, as described in the following section.
graded similarity between consecutive timestamps. In HDC-
MiniROCKET the scalar values pt at time t in a time series
with length T is given by pt = t·s T . The scale factor s adjusts
how fast the similarity of timestamp encodings decreases with
increasing difference in t. This is visualized in Fig. 2. If s is
equal to 0, all timestamp encodings are the same. s = 1 means
the similarity decreases for increasing difference of timestamps
and only reaches zero for the difference between the first and
the last timestamp of the whole time series. s = 2 means that
the similarity went to zero after the half of the length, and so
on. s weigths the importance of temporal position – the higher
its value, the more dissimilar become timestamps.
This systematic encoding of timestamps is exploited in
Fig. 2: Different graded similarities for timestamp encodings. HDC-MiniROCKET in Eq. 4. Although binding ⊗ and
bundling ⊕ can be implemented as simple as element-wise
B. From MiniROCKET to HDC-MiniROCKET multiplication and addition, these operations have partially sur-
HDC-MiniROCKET extends MiniROCKET by using the prising properties in very high-dimensional vector spaces such
HDC binding ⊗ operation to also encode the timestamp t of as the 9,996-dimensional MiniROCKET features. In HDC-
each feature FtBP in the output vector (without increasing the MiniROCKET, the underlying mechanism is the similarity
vector size). preserving property of the HDC-binding operation ⊗ from the
In HDC, the input to the binding ⊗ operation are two vectors beginning of this section: the similarity of two feature vectors,
from the same vector space and the output is a single vector, each bound to the encoding of their timestamp, gradually
again of the same size, that is non-similar to each of the input decreases with increasing difference of the timestamps. This
vectors. We will use the property that binding is similarity pre- information is maintained when bundling (⊕) feature vectors
serving, i.e., ∀a, b1 , b2 ∈ V : sim(a⊗b1 , a⊗b2 ) ≈ sim(b1 , b2 ). to create the output y HDC , since the output of bundling is
A typical usage of binding is to encode key-value pairs. In the similar to each input. For more details on these underlying
MAP architecture, binding is implemented by simple element- mechanisms of hyperdimensional computing, please refer to
wise multiplication [16]. one of [22], [25], [30].
The output of HDC-MiniROCKET y HDC is computed by: As described earlier, the proposed temporal encoding cre-
T T
ates a more general variant of MiniROCKET – if param-
eter s is set to 0, the cosine similarities based on HDC-
M X
y HDC = (FtBP ⊗ Pt ) = (1 − 2Ft ) Pt (4)
t=1 t=1
MiniROCKET are identical to those of MiniROCKET. The
higher s, the more distinctive is the temporal context inside
where Pt is a systematic encoding of the timestamp t in
HDC-MiniROCKET’s feature calculation. The selection of the
a vector of the same length as FtBP , i.e. 9,996-dimensional.
value of s will be discussed and evaluated in Sec. V-C.
Before we provide details on the creation of Pt , we want to
emphasize that all vectors, including y HDC , are of the same C. Efficient implementation
length and that all operations are simple and efficient local A low computational complexity is an important feature
operations. In the MAP architecture, ⊕ is simple element-wise of MiniROCKET. HDC-MiniROCKET can be implemented
addition, ⊗ is element-wise multiplication. with the same (in fact, even slightly lower) computational
The vectors Pt are intended to encode temporal position complexity during inference.
by making early events dissimilar to later ones. For creating The efficient implementation of HDC-MiniROCKET builds
such hypervectors, we use fractional binding [38]. It uses a on two simple observations: First, Pt can be pre-computed
predefined real valued hypervector v of dimensionality D and for all t since it only depends on the fixed seed vector v and
transfers a scalar value p to a hypervector by applying the the scaling parameter s, i.e. it is independent of the signal x.
following equation: Second, we can omit all additions and multiplications when
computing FtBP ⊕ Pt = (1 − 2Ft ) Pt in Eq. 4. The term
p D−1
vp := IDF T (DF Tj (v) )j=0 . (5)
(1 − 2Ft ) in Eq. 4 can be directly computed by replacing the
DFT and IDFT are the Discrete Fourier Transform and the binarization in MiniROCKET from Eq. 2:by:
Inverse Discrete Fourier Transform. The resulting vector vp (
0 1 if ck,d > Bb
represents a systematically encoded hypervector of the scalar ck,d,b = (6)
p, which means that the procedure preserved the similarity of −1 if ck,d < Bb
scalar values (euclidean distance) within the hyperdimensional This can also directly integrate the mutiplication with Pt :
vector space (cosine similarity). (
To be able of adjust the temporal similarity of feature 00 Pt,i if ck,d > Bb
ck,d,b = (7)
vectors, we introduce a parameter s, which influences the −Pt,i if ck,d < Bb
Here, Pt,i is the i-th dimension of the vector time encoding TABLE I: Results on the synthetic dataset of random guessing
of time step t. Finally, the i-th dimension of the output of and MiniROCKET without and with temporal encoding
Method Accuracy [%] Accuracy [%]
HDC-MiniROCKET is then simply: standard case challenging case
T random guess 50.0 50.0
MiniROCKET 65.0 56.8
X
yiHDC = c00k,d,b (8) HDC-MiniROCKET (s=1) 97.0 94.1
t=1
Signals of Class 1 Signals of Class 2 Signals of Class 1 Signals of Class 2
In summary, the computational steps of HDC-MiniROCKET
are: Examples of
synthetic Signals
... ... ... ...
Signals of Class 1
2) Compare each of them to a set of bias values Bb to select
either a positive or negative precomputed time encoding
Pt,i (Eq. 7).
...
3) Compute the sum over all time steps (Eq. 8).
Signals of Class 2
Other than the precomputation of time encodings Pt , there
are no additional computations compared to MiniROCKET.
The PPV computation of the original MiniROCKET even
...
requires an additional division per output dimension; however,
if cosine similarities are used, then this division could also be Fig. 3: Similarity matrices on the synthetic dataset. (left)
omitted in the original MiniROCKET. Original MiniROCKET (right) HDC-MiniROCKET with ex-
plicit temporal encoding. Bright pixels indicate high and dark
V. E XPERIMENTAL E VALUATION low similarity. On top and left to the matrices are overlayed
examples of the synthetic signals colored in red for class 1
We evaluate the performance of the proposed HDC-
and blue for class 2.
MiniROCKET in two different setups: (1) the simple synthetic
dataset classification task from the introduction and Fig. 1, and
The length of the time series is 500 time steps. For each
(2) the 128 datasets from the UCR benchmark for univariate
time step, we create one sample in the dataset with the peak
time series classification. Finally, we will analyze the compu-
centered at this position. Accordingly, the dataset D consists
tational efficiency of our implementation.
of 500 samples, 250 samples of each class.
We use a Python implementation based on the code
Classification with a random split of 80% training and 20%
of MiniROCKET from the Python library sktime1 , which
test data leads to the accuracies in table I second column
contains the original code2 of [9]. For classification, we
(“standard case“). It can be seen that MiniROCKET can
also use the same ridge regression classifier as the original
correctly classify only 65% of the test samples, while the
MiniROCKET [9].
HDC version with s = 1 can achieve 97%. An even more
A. Synthetic Dataset drastic result is seen if we select a particularly challenging
subset from the samples; i.e., those class-1 and class-2 sample
As described before, the global pooling by PPV in
pairs of the original MiniROCKET-transformed signals that
MiniROCKET can neglect potentially important temporal in-
have high similarity and can be easily confused. The result
formation if it is not well captured by the dilated convolu-
is a dataset with 250 highly similar data samples that leads
tions. To demonstrate possible problems of such behavior, we
to the classification accuracies in the rightmost column of
constructed a simple synthetic dataset, which consists of two
table I (again using a 80%-20% split). In this challenging
classes with high temporal dependency. Example signals for
scenario, MiniROCKET without global temporal encoding is
both classes are shown in Fig. 1. Each signal consists of a
not significantly better than random guessing. However, when
single sharp peak either in the first (class 1) or second (class
HDC-based temporal encoding is added to MiniROCKET, it
2) half of the signal and additional Gaussian noise (µ = 0,
can solve the classification problem with 94.1% accuracy.
σ = 1). The peak is created using an approximate delta
To better illustrate the effect of the time encoding, Fig. 3
function, which is based on the normal distribution function:
shows the two similarity matrices of the original encoded
1 t2
MiniROCKET descriptors (left) and the HDC-MiniROCKET
δ(t) = √ · e− 2a (9)
2πa descriptors (right). The i-th row as well as the i-th column
The parameter a influences the shape of the peak. We use both correspond to the signal from the dataset with the peak
a = 0.03 which creates a high and sharp peak that resembles at time step i. The matrix entries are the cosine similarities.
a Dirac impulse. The blue and red signals exemplify the shape of samples of
classes 1 and 2. The first half of the rows and columns of
1 sktime, GitHub, https://github.com/alan-turing-institute/sktime the similarity matrices refer to class 1 and the second half to
2 minirocket, GitHub, https://github.com/angus924/minirocket class 2. In the left similarity matrix, it can be seen that for the
30
optimal selection of s (oracle)
relative Accuracy change [%]
20
15
10
-5
EOGVerticalSignal
EthanolLevel
WordSynonyms
FaceFour
Phoneme
UWaveGestureLibraryZ
Chinatown
ProximalPhalanxOutlineCorrect
Crop
PowerCons
ScreenType
MixedShapesRegularTrain
NonInvasiveFetalECGThorax1
NonInvasiveFetalECGThorax2
SyntheticControl
HandOutlines
ProximalPhalanxTW
UWaveGestureLibraryX
UWaveGestureLibraryY
PickupGestureWiimoteZ
RefrigerationDevices
TwoLeadECG
WormsTwoClass
EOGHorizontalSignal
FiftyWords
Herring
MelbournePedestrian
ArrowHead
DodgerLoopGame
UWaveGestureLibraryAll
OliveOil
ChlorineConcentration
CricketZ
FreezerSmallTrain
ItalyPowerDemand
Meat
MedicalImages
CinCECGTorso
PhalangesOutlinesCorrect
SemgHandGenderCh2
SemgHandSubjectCh2
BeetleFly
GunPoint
StarLightCurves
Wafer
AllGestureWiimoteZ
DistalPhalanxOutlineAgeGroup
PLAID
GesturePebbleZ2
HouseTwenty
Lightning2
ShakeGestureWiimoteZ
MixedShapesSmallTrain
FaceAll
Beef
GunPointAgeSpan
InsectWingbeatSound
MiddlePhalanxOutlineAgeGroup
MoteStrain
CricketX
CricketY
MiddlePhalanxOutlineCorrect
Rock
SemgHandMovementCh2
Wine
Adiac
SwedishLeaf
AllGestureWiimoteX
AllGestureWiimoteY
DistalPhalanxTW
GestureMidAirD1
GestureMidAirD2
GestureMidAirD3
SmoothSubspace
DiatomSizeReduction
FacesUCR
Fish
MiddlePhalanxTW
ECG200
Ham
Symbols
Yoga
Dataset
Fig. 4: Relative accuracy improvements for selecting the optimal s and automatically calculated s while training. Only datasets
with a change unequal zero are shown.
1
HDC-MiniROCKET is better MiniROCKET. The blue colored dots show the performance
of HDC-MiniROCKET with an oracle that selects the optimal
0.8 scale value s for each dataset from the set {0, 1, 2, ..., 6}. It is
important to keep in mind that dependent on the underlying
HDC-MiniROCKET
0.6 task and dataset, explicit time encoding can be intended and
helpful or also unintended and harmful. This oracle and a
0.4 simple data-driven approach to select the scale s are discussed
in the next Sec. V-C. The evaluation with oracle is intended
0.2 optimal selection of s (oracle) to demonstrate the potential of the explicit time encoding
simple data-driven selection of s
in HDC-MiniROCKET. Fig. 4 shows the relative change
MiniROCKET is better
0
in accuracy for the individual URC datasets. For 81 out of
0 0.2 0.4 0.6 0.8 1
MiniROCKET
128 datasets, there is an improvement by the explicit time
Fig. 5: Classification accuracies on the 128 UCR datasets when encoding. Although the improvement is sometimes rather
using MiniROCKET or HDC-MiniROCKET. There are two small (the average improvement across these 81 datasets is
points for each dataset. Blue color is with an oracle that selects 3.1%), it also reaches more than 27% for one dataset. For
the best scale parameter, orange color is with the simple cross those datasets without improvement, the performance is the
validation for parameter selection. For points in the top-left same as for MiniROCKET (which is the special case of
triangle, HDC-MiniROCKET is better. s = 0) – if we are able to select the optimal scale.
C. Selecting the scale s
original MiniROCKET without explicit temporal coding, class
1 signals (red) sometimes resemble class 2 signals (blue) and The evaluation from the previous section used an oracle to
vice versa. In contrast, the right similarity matrix separates select the scale s. Since this time scale parameter is physically
the two different classes by explicit temporal coding - class 1 grounded, we assume that in a practical application, there
signals are less likely to be similar to class 2 signals. will be expert knowledge that guides the selection of this
parameter. When chosing the scale s, it is important to keep
B. UCR benchmark two properties in mind:
To evaluate the practical benefit of the proposed HDC- (1) There is no single value of the scale parameter that
MiniROCKET, we show results on the 128 datasets from the works for all datasets. This is supported by table II that shows
UCR benchmark [13]. This standard benchmark has previously mean, worst-case and best-case performances across all 128
been used to compare different time series classification algo- UCR datasets for each scale parameter. No scale choice is
rithms and contains datasets from a broad range of domains considerably better than all others. These average statistics
[13]. Since MiniROCKET showed excellent performance (es- show that the average benefit of scale selection across all
pecially computational efficiency) on this dataset collection datasets is rather small – however, as illustrated in Fig. 4,
compared to other state-of-the-art approaches like [5], [6], we the benefit for individual datasets can be quite high.
only compare against MiniROCKET in our experiments. (2) The influence of s is quite smooth. Fig. 6 shows the
Fig. 5 shows the classification accuracies on all UCR influence of different values of s on the performance on
datasets for MiniROCKET in comparison to HDC- individual datasets. Small changes of s create rather small
TABLE II: Results for different similarity parameters s on UCR benchmark.
Param. s 0 1 2 3 4 5 6 optimal s automatic selected s
Mean 0.8540 0.8537 0.8500 0.8450 0.8407 0.8366 0.8336 0.8681 0.8569
Worst Case 0.2906 0.3080 0.3054 0.2788 0.1827 0.0865 0.0769 0.3080 0.2991
Best Case 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
0.1
10
0 HDC-MiniROCKET is slower
60 -0.3
70
-0.4 100
80
-0.5
90
100 -0.6
110
-0.7
120 10-1
0 1 2 3 4 5 6 MiniROCKET is slower
parameter s