Professional Documents
Culture Documents
Ts Art 1
Ts Art 1
UPTEC F 23021
Examensarbete 30 hp
Juni 2023
Evaluating clustering
techniques in financial time
series
Abstract
This degree project aims to investigate different evaluation strategies for clustering methods
used to cluster multivariate financial time series. Clustering is a type of data mining technique
with the purpose of partitioning a data set based on similarity to data points in the same cluster,
and dissimilarity to data points in other clusters. By clustering the time series of mutual fund
returns, it is possible to help individuals select funds matching their current goals and portfolio. It
is also possible to identify outliers. These outliers could be mutual funds that have not been
classified accurately by the fund manager, or potentially fraudulent practices.
To determine which clustering method is the most appropriate for the current data set it is
important to be able to evaluate different techniques. Using robust evaluation methods can
assist in choosing the parameters to ensure optimal performance. The evaluation techniques
investigated are conventional internal validation measures, stability measures, visualization
methods, and evaluation using domain knowledge about the data. The conventional internal
validation methods and stability measures were used to perform model selection to find viable
clustering method candidates. These results were then evaluated using visualization techniques
as well as qualitative analysis of the result. Conventional internal validation measures tested
might not be appropriate for model selection of the clustering methods, distance metrics, or data
sets tested. The results often contradicted one another or suggested trivial clustering solutions,
where the number of clusters is either 1 or equal to the number of data points in the data sets.
Similarly, a stability validation metric called the stability index typically favored clustering results
containing as few clusters as possible. The only method used for model selection that
consistently suggested clustering algorithms producing nontrivial solutions was the CLOSE
score. The CLOSE score was specifically developed to evaluate clusters of time series by
taking both stability in time and the quality of the clusters into account.
We use cluster visualizations to show the clusters. Scatter plots were produced by applying
different methods of dimension reduction to the data, Principal Component Analysis (PCA) and
t-Distributed Stochastic Neighbor Embedding (t-SNE). Additionally, we use cluster evolution
plots to display how the clusters evolve as different parts of the time series are used to perform
the clustering thus emphasizing the temporal aspect of time series clustering. Finally, the results
indicate that a manual qualitative analysis of the clustering results is necessary to finely tune the
candidate clustering methods. Performing this analysis highlights flaws of the other validation
methods, as well as allows the user to select the best method out of a few candidates based on
the use case and the reason for performing the clustering.
Tek nisk-naturvetensk apliga fak ulteten, Upps ala universitet. Utgiv nings ort U pps al a. H andl edare: Erik Br odi n, Ämnesgransk are: Antônio Horta Ri beir o, Ex aminator: T omas Ny berg
Teknisk-naturvetenskapliga fakulteten
Uppsala universitet, Utgivningsort Uppsala
3
Populärvetenskaplig sammanfattning
I denna rapport behandlades olika metoder för att utvärdera klustringsalgoritmer som
applicerades på dataset av finansiella tidsserier. Klustring är en typ av oövervakat
lärande, som är en klass av maskininlärning där datapunkterna i dataseten inte har
några förbestämda etiketter, utan uppgiften istället går ut på att hitta strukturer i
datasetet. Klustring innebär att datapunkterna delas upp i olika grupper eller kluster,
beroende på någon form av avståndsmetrik som används för att definiera hur lika eller
olika två datapunkter är.
För att välja den optimala klustringsmetoden behövs evalueringsmetoder som kan hjälpa
användaren att avgöra vilken metod som ger optimala kluster. De metoder som behand-
lades i denna rapport är konventionella interna valideringsmetoder, stabilitetsbaserade
metoder, samt visualiseringsmetoder. De konventionella interna valideringsmetoderna
består av ett antal metriker vars gemensamma faktor är att metrikerna endast är baser-
ade på klustringsresultaten, det vill säga distanserna mellan de olika datapunkterna
samt vilket kluster de har blivit indelade i. Stabilitetsbaserade metoder värderar hur
stabila resultaten av en klustringsalgoritm är, antingen över tid eller när ny data in-
troduceras. Visualiseringsmetoder är som namnet antyder metoder för att visualisera
klustringsresultatet. I denna rapport presenterades både metoder för att visualisera
klustren i sig, samt en metod som visar hur klustren förändras över tid när olika delar
av tidsserierna klustras. Slutligen så analyserades klustringsresultaten kvalitativt i syfte
att tolka resultaten samt utvärdera de andra evalueringsmetoderna.
4
vända avstånden som datapunkterna i ett kluster hade till den teoretiska mittpunkten
av klustret, och i takt med att ett kluster blev större och innefattade fler datapunkter
så ökade även detta avstånd. Därmed minskade kvaliteten.
5
Contents
1 Introduction 8
2 Background 9
3 Theory 10
3.1 Financial time series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.1.1 Stylized facts of financial time series . . . . . . . . . . . . . . . . 10
3.2 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2.1 Clustering algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3 Measures of similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3.1 Hellinger distance . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3.2 Kendall’s tau . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.4 Robustness and stability . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4 Data 15
4.1 Mutual funds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2 Stocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.3 Data frequency and length . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5 Methodology 17
5.1 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.2 Evaluation methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.2.1 Qualitative evaluation . . . . . . . . . . . . . . . . . . . . . . . . 18
5.2.2 Internal clustering evaluation . . . . . . . . . . . . . . . . . . . . 22
5.2.3 Measures of stability in time . . . . . . . . . . . . . . . . . . . . . 25
5.2.4 Replication stability evaluation . . . . . . . . . . . . . . . . . . . 28
5.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.3.1 Benchmarking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.3.2 Bootstrapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.3.3 Pre-evaluation processing of cluster results . . . . . . . . . . . . . 32
5.3.4 Reference methods based on a priori knowledge . . . . . . . . . . 33
5.3.5 Clustering experiments . . . . . . . . . . . . . . . . . . . . . . . . 33
6 Results 35
6.1 Model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.1.1 Agglomerative clustering using Kendall’s tau . . . . . . . . . . . . 36
6.1.2 Agglomerative clustering using the Hellinger distance . . . . . . . 39
6.1.3 Hybrid hierarchical clustering using Kendall’s tau . . . . . . . . . 42
6.1.4 Hybrid hierarchical clustering using the Hellinger distance . . . . 45
6.1.5 Summary of model selection results . . . . . . . . . . . . . . . . . 48
6.2 Results of reference clustering methods . . . . . . . . . . . . . . . . . . . 49
6.2.1 Mutual fund data set . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.2.2 Stock data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.3 Results of tuned methods . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.3.1 Agglomerative clustering using Kendall’s tau . . . . . . . . . . . . 50
6.3.2 Agglomerative clustering using the Hellinger distance . . . . . . . 51
6
6.3.3 Hybrid hierarchical clustering using Kendall’s tau . . . . . . . . . 51
6.3.4 Hybrid hierarchical clustering using the Hellinger distance . . . . 51
6.3.5 Summary and comparison to reference methods . . . . . . . . . . 52
6.3.6 Selected clustering methods for mutual fund data . . . . . . . . . 52
6.3.7 Selected clustering methods for stock data . . . . . . . . . . . . . 52
6.4 Visualization of clustering results . . . . . . . . . . . . . . . . . . . . . . 53
6.4.1 Scatter plots created using dimension reduction . . . . . . . . . . 53
6.4.2 Cluster evolution plots . . . . . . . . . . . . . . . . . . . . . . . . 55
6.5 Qualitative analysis of clustering results . . . . . . . . . . . . . . . . . . 58
6.5.1 Comparing the clustering results . . . . . . . . . . . . . . . . . . 61
7 Discussion 61
7.1 The clustering results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
7.2 Quantitative evaluation methods . . . . . . . . . . . . . . . . . . . . . . 62
7.2.1 Conventional internal validation measures . . . . . . . . . . . . . 62
7.2.2 CLOSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
7.2.3 Stability index . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
7.3 Cluster visualization methods . . . . . . . . . . . . . . . . . . . . . . . . 64
7.3.1 Scatter plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
7.3.2 Cluster evolution plots . . . . . . . . . . . . . . . . . . . . . . . . 65
7.4 Qualitative analysis using domain knowledge . . . . . . . . . . . . . . . . 66
7.5 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
7.6 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
8 Conclusions 68
9 References 71
10 Appendix A 74
10.1 Internal validation measures of bootstrapped time series . . . . . . . . . 74
10.1.1 Agglomerative clustering using Kendall’s tau . . . . . . . . . . . . 74
10.1.2 Agglomerative clustering using the Hellinger distance . . . . . . . 75
10.1.3 Hybrid hierarchical clustering using Kendall’s tau . . . . . . . . . 77
10.1.4 Hybrid hierarchical clustering using the Hellinger distance . . . . 78
11 Appendix B 80
11.1 Process description for performing clustering of financial time series . . . 80
11.1.1 Data inspection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
11.1.2 Model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
7
1 Introduction
Clustering is a form of unsupervised machine learning used for partitioning data into
groups. This can help identify patterns in the data previously not observed by analysts.
The manner in which the data is partitioned is heavily dependent on the data and which
features are chosen to be used when clustering. The choice of features depends on the
reasons why the clustering is carried out in the first place, and will lead to different
classifications of the data.
One special application of clustering techniques is clustering of time series, and partic-
ularly financial time series. A general time series is simply data measured over a period
of time, in evenly spaced intervals. Financial time series can describe a multitude of
different measured quantities, such as the price of fund or stock over a period of time.
There are a number of properties that empirical financial time series display, and these
properties are referred to as the stylized facts of financial time series [1]. When perform-
ing analysis financial time series, it is important to keep these properties in mind since
these properties are some of the unique concepts of financial time series.
Due to the nature of time series the data has many dimensions, since each observation in
the series corresponds to one unique dimension. One of the main challenges of clustering
financial time series is therefore dimension reduction when selecting features to cluster
on. One can for example choose to calculate the similarity between the financial time
series in the data set, and use these distances to cluster time series that are close to each
other in the same partition. Another approach is to use basic statistic metrics such as
mean, variance, or skew of the distribution of returns of the time series.
The choice of clustering algorithm is another factor that will heavily affect the result.
Many clustering algorithms rely on some kind of defined similarity between the data
points to perform the clustering, so which metric that will be used to calculate the sim-
ilarity is another choice that needs to be made.
Due to the options previously mentioned as well as many more, the choice of clustering
method is not straight forward. In order to enable data driven decision making when de-
signing clustering algorithms, it is therefore important to be able to evaluate clustering
techniques to see which design choices result in the best partitioning of the data. Bear
in mind that it is not always trivial to define which clustering result is optimal, and this
is the case when clustering financial time series. The evaluation of the clustering results
will in the cases where a known cluster solution is not available rely on the properties of
the clusters themselves, such as intra-cluster distance between data, distance to other
clusters, or stability.
The main objectives of this thesis will be determining methods and metrics that can
be used to evaluate clustering results, as well as how these evaluation methods can
measure "robustness" of a clustering algorithm in order to make the results of the al-
gorithm as robust as possible. Since financial time series by definition vary over time,
the clustering result of an algorithm likely varies depending on the time period used to
perform the clustering. Examining how these results change will help determining ro-
8
bustness, and can give insight into how to develop better, robuster clustering techniques.
Our goal is to investigate how clustering techniques applied to financial time series can
be evaluated. Our specific objectives are to:
3. Using the metrics defined, compare the results of different clustering algorithms.
A framework for classifying financial time series using different clustering algorithms has
been implemented by Kidbrooke® , and it is a subset of these algorithms that will be
evaluated using the defined metrics. Kidbrooke® is a Swedish company providing finan-
cial software for financial institutions [2]. This master thesis is written at Kidbrooke® in
order to investigate how their clustering methods can be evaluated, so that they in turn
can use the methods to find new, better clustering methods to use in their applications.
2 Background
During recent years, the amount of available tools for planning and structuring the per-
sonal finances of individuals have seen a steady increase. The digitalization of these tools
have been enabled by an increased use of mathematics, statistics and machine learning.
An example of a use case of these tools is to help individuals not only choose mutual
funds to invest in, but also over time evaluate if the choice of funds is still good.
Clustering algorithms can be used in order to classify a large number of mutual funds.
The resulting clusters can then be used in order to recommend mutual funds to cus-
tomers. If a customer owns a number of mutual funds within a cluster, better funds
within that same cluster can be recommended, similarly to how recommendation algo-
rithms in other applications operate.
Additionally, due to the time dependent nature of financial time series, the resulting
clusters of each clustering algorithm may change over time. It is therefore of interest
to define a metric of robustness for clustering methods used on financial time series.
This enables the investigation of how time series are moved between clusters when the
clustering algorithms are applied to different time periods.
9
3 Theory
3.1 Financial time series
Time series are defined as sequences of ordered data T = ⟨t1 , t2 , ..., td ⟩, where T ∈ Rd
and d ∈ N+ . A multivariate time series is a set D = {T1 , T2 , ..., Tn } of n ∈ N+ time
series [3].
• Autocorrelations of returns are usually not significant, except for when examining
shorter periods of time during intraday trading.
• The distribution of the returns is heavy tailed, meaning that extreme events are
more likely compared to if the returns had been normally distributed. If that
would have been the case, events with large negative or positive returns would
have been more unlikely.
• There exists an asymmetry when it comes to gains and losses. In empirical financial
time series data, large decreases in stock price can be observed while increases in
the same range are significantly less common.
• As the time period over which the returns are calculated increases, the distribution
of returns start to more and more resemble a Gaussian distribution.
10
• Despite correcting returns to account for volatility clustering, the distribution of
returns of the time series still display heavy tails. These tails are however less
heavy than in the distribution of the uncorrected returns.
• The autocorrelation function of the absolute value of the returns decay slowly.
This means that the correlation between a data point and the previous data points
decreases more and more for data points further back in time.
• The correlation between the majority of volatility measures and the returns of an
asset is negative. This is called the leverage effect.
• The trading volume of a financial instrument is correlated with the volatility.
• Measures of volatility that are calculated using coarser data is able to more ac-
curately predict measures of volatility calculated using finer data than the other
way around [1].
3.2 Clustering
The task of partitioning a data set into groups is called clustering. This partitioning is
carried out in such a way that the data points within the same cluster are similar to
each other in some way. Data points that are different from one another are partitioned
into separate clusters. The methods used when carrying out the partitioning of data
are called clustering algorithms. Clustering is a form of unsupervised learning, and a
useful technique for discovering patterns and common traits of unlabeled data points [4].
One application of clustering algorithms is partitioning financial time series. More for-
mally, when clustering time series data the goal is to partition multivariate time series
D = {T1 , T2 , ..., Tn }, n ∈ N+ into clusters C = {C1 , C2 , ..., Cm }, where Ci ⊆ D and
1 ≤ m ≤ n such that the time series in D are clustered together based on some metric
of similarity [3]. There exists a multitude of different clustering algorithms, as well as
different ways to measure similarity between data points of different kinds of data. Some
clustering algorithms perform clustering using the raw time series, while others utilize
some reduction method to reduce the dimensionality of the data [4].
In this section, the clustering methods evaluated in this report will be presented. All
time series clustering algorithms discussed are based on a pair-wise distance matrix of
time series. The distance measures between time series used will for this reason also be
presented here.
11
between each data point in the two clusters and uses the average of these. Clusters are
then merged such that this average distance between clusters is minimized. Another
example is single linkage, which minimizes the minimum value of the distance between
data points in separate clusters [5]. An agglomerative clustering algorithm will be one of
the clustering methods evaluated in this degree project, and the stop condition for this
method will be a predetermined number of clusters. When the algorithm has reached
a certain number of clusters, the partitioning task is considered finished. When per-
forming divisive clustering the starting number of clusters is one, in stark contrast to
agglomerative clustering. This initial cluster contains all data points in the data set.
The cluster is then partitioned recursively into more and more clusters until the stopping
condition is met [4].
Another type of hierarchical clustering that will be evaluated is a method that will
be referred to as hybrid hierarchical clustering. This method combines elements from
agglomerative and divisive clustering algorithms, and incorporates a stopping criterion
that is based on the maximum distance within the clusters. An elaborate description of
this algorithm follows:
1. Begin by using agglomerative clustering to divide the entire data set into two
clusters. This step is similar to the first step of a divisive clustering algorithm,
where the entire data set is split in two.
2. For both clusters that were obtained in the previous step, check whether the max-
imum distance between the data points within the cluster exceeds the predefined
maximum distance threshold. If it does, this means that this cluster requires
further partitioning.
3. For the clusters whose internal distances exceed the maximum distance threshold,
an agglomerative clustering algorithm is used to again split the cluster into two
new clusters.
4. This process is repeated recursively until no clusters in the entire set of clusters
have members whose distance between each other exceed the predetermined max-
imum distance threshold. Once this condition has been fulfilled, the clustering
task is considered finished.
In this degree project, the hierarchical clustering methods are implemented using aver-
age linkage for scope limiting reasons. Additionally, as opposed to the Ward linkage this
linkage is available for an arbitrary distance metric in scikit-learn’s implementation of
agglomerative clustering [6].
Sequential clustering
Kidbrooke® uses a clustering technique called sequential clustering. Sequential cluster-
ing uses multiple clustering methods to partition the data recursively. When sequentially
clustering a data set D, the first clustering method is applied to the data set, which
leads to the data being partitioned into clusters C = {C1 , C2 , ..., Cm }, where m is the
total amount of clusters after the first clustering round. In the next clustering round,
the next clustering algorithm is applied to each cluster Ci in order to partition the
data further. This methodology creates a flexibility, since many clustering algorithms
12
can be combined in different ways, yielding different results. This sequential clustering
algorithm will be used as a reference method when experiments are performed. Assum-
ing that the sequential algorithm consists of two hierarchical clustering methods this
method will have two parameters, one for each clustering step. In order to find the
optimal parameters to use when clustering the data sets, the clustering result that the
algorithm produces is evaluated for a range of different parameter values. When the
number of parameters two rather than one, the number of experiments that need to be
evaluated increase drastically. Since the main focus of this thesis is evaluation methods
rather than the clustering methods themselves, the sequential clustering method will
be excluded from the majority of the experiments, and will only be used as a reference
method when comparing clusterings of mutual funds.
In the context of financial time series, P and Q are the distributions of the returns
of two time series. In order to find the distributions, the probability density function
of the returns are calculated by determining the histogram bins of each time series’
returns. In order to calculate the histogram bin the returns are divided into a number
of sub-intervals, and the number of return values that fall into each bin (sub-interval)
are counted. The probability density function is then defined for each bin by dividing
the number of values in each bin with the total number of return values. Since the
Hellinger distance is a measure of similarity between distributions, using the distance
when clustering could enable the separation of different types of mutual funds. Mutual
funds of different kinds have different objectives and strategies, and the choices made
regarding these factors will affect the distribution of the returns. For example, the
distribution of the returns of equity mutual funds may have heavier tails than fixed-
income mutual funds that invest in bonds. This difference in distribution will create
a distance between the two time series, and it would be possible to divide them into
separate partitions using a clustering algorithm.
13
of data from the two rankings. Let one pair of observation pairs be (Xi , Yi ) and (Xj , Yj ).
This pair of observations is classified as concordant if Xj − Xi and Yj − Yi has the same
sign. If this is not the case for the pair, they are discordant instead [8].
The algorithms used for generating the distance matrix between time series based on
Kendall’s tau uses a specific version of the metric, called Tau-b. This statistic is imple-
mented in the scientific computation library SciPy [9], and accounts for the case where
Xi = Xj or Yi = Yj . The Tau-b metric is calculated as follows:
C −D
τB = p , (2)
(C + D + T )(C + D + U )
where C is the total amount of concordant observation pairs and D is the amount of
discordant pairs. T is the total number of tied values present in the first ranking, and
U is the number of tied values in the second ranking [10]. When using Kendall’s tau
to measure correlation between two financial time series, Xi and Yi in each observation
pair is the returns of each time series at the index i.
1 − τB (T1 , T2 )
dKendall (T1 , T2 ) = . (3)
2
Kendall’s tau distance dKendall (T1 , T2 ) is then bounded in the range 0 ≤ dKendall (T1 , T2 ) ≤
1, where a distance of 0 indicates perfect agreement and a distance of 1 indicates perfect
inversion, or a total disagreement. Due to the fact that Kendall’s tau is a measure of
correlation, it can help identifying the relationships between financial time series in a
data set. Partitioning financial time series using Kendall’s tau can facilitate the creation
of a more diversified portfolio, since investing in assets that have a low correlation with
each other can help minimize the risk of the portfolio. By partitioning according to
Kendall’s tau, clusters of mutual funds that have a high correlation between one another
can be identified. One can then create a portfolio containing assets from different clusters
in order to reduce the overall risk.
14
Additionally, how appropriate a clustering method is for clustering a given data set can
be evaluated by examining another type of clustering stability. This stability is based
on the similarity between the clustering results from clustering methods applied to two
different data sets of the same characteristics. More formally, the two data sets that
are clustered are assumed to contain data drawn from the same probability distribution
[11]. A measure of this type of stability, henceforth referred to as replication stability
is discussed in Section 5.2.4. As previously described in Section 3.1.1 regarding the
stylized facts of financial time series, the distribution of returns is typically heavy tailed
to a varying degree. The return distributions of the time series in the data sets will
likely have assume a range of different shapes depending on the historical events of the
underlying asset the time series is describing. When examining this type of stability it is
therefore important to be mindful of the properties of the two data sets whose clustering
results are being compared, and ensure that the data consists similar types of financial
time series.
4 Data
In this section, a description of the data that will be clustered will be provided. In order
to evaluate the performance of the clustering algorithms, different kinds of financial
time series will be used. The purpose of this is to evaluate how the different clustering
methods perform on a wider variety of time series data. While the main purpose of this
report is evaluating the clustering of mutual funds, mixing the types of financial time
series used in the data set allows the performance to be evaluated for different scenarios.
The meta data of the different kinds of financial time series will also be utilized in
the evaluation of clustering methods. Having access to meta data enables additional
analysis of the clustering result. For example, it is possible to investigate how well the
clustering techniques can partition the financial time series into clusters based on the
specific region of the financial time series. This facilitates the detection of outliers, and
poor classification of the fund made by the fund manager for example. Since the relevant
meta data available for financial time series differs between the kind of underlying asset
that the time series represents, the meta data used for each data type is also presented
in this section. The returns of the financial time series are adjusted to have a mean
of 0, allowing for a better comparison between the different time series based on their
patterns and fluctuations rather than a mean value.
15
The mutual funds that will be clustered are a subset of those available on the Swedish
market. By ensuring that a diverse collection from different industries and regions are
clustered, it is possible to analyze how well these regional classifications seem to fit the
fund, for example. The meta data available for financial time series describing mutual
funds is presented in the following table. It will be used to perform a reference clustering
method as well as facilitate qualitative analysis of the clusters since the data may help
explain similarities between funds. The meta data may for example help the qualitative
analysis of the cluster results. For example, one can look at the region of each fund in
one cluster and determine that this cluster contains most of the Swedish funds in the
data set.
Data headers
Ticker
ISIN
Yearly fee
Name
Region
Asset class
Category
Currency
4.2 Stocks
As opposed to mutual funds, stocks describe the price of one share of a single company.
Stock prices tend to display a larger volatility and level of unpredictability. There is no
diversification in a single stock, and the price of the stock is impacted by a significant
number of parameters, apart from the performance of the company. The stock price
is ultimately based on the supply and demand and influenced by market sentiment,
financial reports, and news [13]. The portfolio of a mutual fund is affected by these
phenomena as well, but the diversification tends to soften the short term effects.
Including stocks as part of the benchmark data enables the evaluation of clustering
methods on different kinds of financial time series, and provides an insight into the clus-
tering tasks that each clustering algorithm is most fit to perform. It is entirerly possible
that a clustering algorithm struggles to cluster mutual funds, but performs significantly
better when clustering financial time series representing stock prices.
The time series of stocks that will be clustered represent a number of stocks of companies
listed on Nasdaq Stockholm. The reason for choosing these stocks is to investigate
how well the clustering algorithm partitions the time series with respect to variables
unrelated to geographic region. Since the companies listed on Nasdaq Stockholm are
based in Sweden, their main differences may be caused by other factors. The meta data
available for financial time series of stocks is described in the table below, and will be
16
used to perform a reference clustering. Similarly to the mutual funds, the data can aid
in explaining differences and similarities between stocks.
Data headers
Ticker
SEDOL
Name
Industry sector
GICS Industrial name
GICS Sub industry name
Currency
The amount of observations in each financial time series will have a strong impact on the
result of the clustering algorithms. A time period of 5 years is chosen for the experiments
in this report. Due to the cyclical nature of the global economy, a relatively short time
frame of five years could possibly capture patterns and trends that would not be present
if the time series extended further back in time. Using a time period of 5 years also
enables the inclusion of funds and stocks which made an entry into the market in more
recent years, since their time series data does not extend as far back in time as funds or
stocks which have been present on the market longer.
5 Methodology
5.1 Clustering
The clustering methods described in Section 3.2.1 will be used to partition data sets
containing financial time series that represent mutual funds and stocks. The clustering
process starts by calculating the pairwise distance between the returns of the financial
time series in the data set.
Let a time series describing the pricing of an asset with n observations be T = {p1 , p2 , ..., pn }.
The returns at observation i is then calculated as:
pi − pi−1
ri = . (4)
pi−1
17
The returns of each asset are calculated and stored as time series of d observations
T = ⟨r1 , r2 , ..., rd ⟩. The return time series of each asset are then used to calculate the
distance between each asset using a metric of similarity, as described in Section 3.3.
This enables the creation of the distance matrix D, that is a M × M matrix where M
is the total amount of time series in the data set.
d1,1 d1,2 . . . d1,M
d2,1 d2,2 . . . d2,M
D = .. .. ... .. .
. . .
dM,1 dM,2 . . . dM,M
Once the distance matrix has been acquired, the clustering algorithm of choice is applied
in order to partition the data set into clusters. The clustering result of each method and
choice of hyperparameters is then saved to facilitate evaluation and further analysis of
the clusters.
Dimension reduction
Since financial time series are high dimensional where each entry in the series adds an-
other dimension, it can be difficult to visualize the time series in a way that maintains
the differences between the different time series in a data set. This is due to the fact that
in order to create visualizations such as two dimensional scatter plots, the dimensions
of the data need to be reduced to two dimensions. This procedure is called dimension
reduction, and there exists many algorithms for carrying out this task. The goal when
performing dimension reduction is, apart from reducing the dimensions of data, to pre-
serve the structure of the data set so that the as little information as possible is lost in
the dimension reduction process [14].
While performing dimension reduction on the data set enables the creation of visual-
izations, these visualizations can become misleading if too much information about the
local or global structure of the data is lost. Using our solar system as an example,
preserving the global structure could be likened to preserving the distances between the
planets. Preserving the local structure would instead be preserving the relative dis-
tances between planets and their moons [14]. In other words, the global structure is
the structure between the different clusters of the data and the local structure is the
18
internal structure of the clusters.
When the dimensions of the data is reduced to two dimensions, the projection may even
seem to display patterns and structures in the data that is not there in the original,
non reduced data. Methods of dimension reduction tend to either preserve the global or
local structure. Due to this, a number of different dimension reduction methods will be
used to create data visualizations. One method of dimension reduction that preserves
global structure is Principal Component Analysis (PCA), and a method for local struc-
ture preservation is t-Distributed Stochastic Neighbor Embedding (t-SNE) [14]. Scatter
plots will be created using both methods to reduce the dimensions of the time series
data, and the cluster result in order to assign labels to the data points.
Let K be the number of dimensions of the principal subspace such that K < D. The
direction of his subspace can be defined using K number of D dimensional vectors
{u1 , u2 , ..., uK }. The variance of the projected data for each vector un is given by:
N
1 X
{un T xn − un T x̄}2 = un T Sun . (7)
N n=1
The goal is then to maximize the variance with respect to un , constrained by the
normalization condition un T un = 1. A Lagrange multiplier λn is introduced, so that
the following can be maximized:
un T Sun + λn (1 − un T un ). (8)
By deriving with respect to un and letting the derivative be 0, the variance will be
stationary when
un T Sun = λn . (9)
This means that the variance will be maximized when un is the eigenvector of S that
corresponds to the eigenvalue with the largest magnitude λn This eigenvector is known
as the first principal component, and additional principal components can be extracted
by using the eigenvector corresponding to the second largest eigenvector, and so on [15].
The projection of the original data into this subspace is then defined by:
19
Z = XU, (10)
where U is the matrix where all columns are the K eigenvectors of the covariance ma-
trix S. In the context of plotting a two dimensional scatter plot, K = 2. The Z matrix
would in this case have the dimensions (N × 2), and could be visualized in a scatter plot.
The SNE algorithm defines a distance metric pi|j and then attempts to find a low di-
mensional mapping such that the relative distances in the low dimensional space qi|j
resemble the high dimensional distances as much as possible. The first step of the al-
gorithm is to convert the original, high dimensional distances between the data points
into conditional probabilities that can be viewed as similarity between points [16]. Let
the similarity between data points xi and xj in the high dimensional space be:
where σ i is the variance of a Gaussian distribution centered on data point i. This pa-
rameter is found by performing a binary search, such that σ i produces a value Pi with a
user specified fixed perplexity [16]. In the equation above, the distance between the data
points is the Euclidean distance. It is important to note that the method is versatile,
and other distance metrics can be used as well.
H(Pi ) is the Shannon entropy of Pi in bits. The conditional probability that indicates
similarity between data points yi and yj in the low dimensional space is defined as:
exp(−||y i − y j ||2 )
qj|i = P 2
. (14)
k̸=i exp(−||y i − y k || )
The goal of SNE is to find a mapping to lower dimensional space such that the difference
between pj|i and qj|i . This is done by minimizing the cost function C, which is the sum
of the Kullback-Leibler divergences:
X XX pj|i
C= KL(Pi ||Qi ) = pj|i log . (15)
i i j
qj|i
The optimization is performed iteratively using gradient descent [16]. Once the opti-
mization is complete, the mapped data set will be Y, where each row is a lower dimension
20
mapping of one data point.
t-SNE builds on this method. It uses a Student t-distribution to calculate the similarity
between two data points in the low dimensional space. Alternatively to minimizing
the cost function in Equation 15, one can minimize the Kullback-Leibler divergence
between a joint probability distribution in the high dimensional space called P , and a
joint probability distribution in the low dimensional space called Q [16]:
XX pij
C = KL(P ||Q) = pij log . (16)
i j
qij
The joint probabilities qij is calculated using a Student t-distribution with one degree
of freedom:
The joint probabilities in the high dimensional space, pij , are defined as the symmetrized
p +p
conditional probabilities pij = j|i2n i|j , where n is the number of points in the data set.
Using a Student t-distribution to calculate the low dimensional joint probabilities allows
a moderate distance in the high dimensional distance in the high dimensional space to be
transformed into a longer distance in the lower dimensional space due to the heavy tails
of the Student t-distribution, providing a more faithful low dimensional representation
[16].
PCA and t-SNE have been implemented in the Python machine learning library Scikit-
learn [17]. It is these implementations that will be used to perform dimension reduction
in this project. Since PCA is performed using the data vector of each data point, the
returns of each time series will be the input to be reduced. Since t-SNE is based on
finding lower dimensional mappings using the distances between the data points, a dis-
tance matrix will describing the point-wise distances between the financial time series
will be used instead. In order to produce scatter plots of the clusters the data will be
reduced to two dimensions.
More formally, let S be a set of financial time series. Let T be the time period that will
be analyzed. T is then split into m subsequences τ1 , τ2 , ..., τm [18]. For each subsequence
21
of T , the returns of the price of each financial time series during this period is used to
calculate the distance matrix between the time series. This distance matrix is then used
as input to a clustering method that partitions the data into clusters.
The set of nodes is defined as the set of clusters produced by clustering each time inter-
val. All clusters that only contain one element are excluded from this set. The relational
edges between all nodes in adjacent subsequences are computed, by adding an edge be-
tween them when the intersection of the time series within the two clusters is non zero.
The weight assigned to each specific edge is the cardinality of the intersection [18].
In a similar vein, domain knowledge can help in outlier and anomaly detection. For ex-
ample, a person knowledgeable in mutual funds can inspect the clustering result and find
clusters where mutual funds that have been clustered together or apart unexpectedly.
For example, let one cluster contain three Swedish interest funds. The user observes
that a fourth interest fund has been placed in a different cluster, or even been classified
as an outlier. This would possibly warrant a more thorough investigation into the fourth
interest fund, since it has not been clustered with the other funds with similar strategies.
Since there is usually no ground truth available when performing clustering on financial
time series, domain knowledge could possibly be invaluable when determining the use-
fulness and quality of the clustering results. One of the main problems regarding this
type of cluster evaluation is the fact that it requires manual inspection. For this reason,
the method is less feasible when performing model selection and parameter tuning since
the amount of clustering results to inspect become very large. Ideally, this type of eval-
uation would instead be used once quantitative scores have singled out a few clustering
method candidates.
22
Silhouette score
One evaluation metric that is high when the clusters in the result are clearly separated
and dense is the Silhouette score. The score is based on the Silhouette coefficient, which
is calculated using the cluster labels of each data point, as well as the distances between
the objects in the clusters. Here, a distance matrix created some measure of time series
similarity is used.
Let a(i) be the average distance of an object in a cluster i to all other objects in the
same cluster. Let d(i, C) be the average distance between an object i in cluster A and
all objects in a cluster C [20]. Finally, let:
Let the cluster where the minimum described in 18 is found be B. This is the cluster
that is determined to be the cluster that is the closest to the object i that is not the
cluster A where the object is currently located. B is in this way determined to be the
neighboring cluster to i. The Silhouette coefficient for each sample in the data set can
then be calculated as follows [20]:
b(i) − a(i)
s(i) = . (19)
max{a(i), b(i)}
The Silhouette score can then be acquired by calculating the mean of all Silhouette
coefficients in the data set [21]:
n
1X
S= s(i). (20)
n i=1
Calinski-Harabasz score
The Calinski-Harabasz score is also known as the Variance Ratio Criterion (VRC). Let
n be the total number of data points in the data set P = {p1 , p2 , ..., pn }. If each data
point (observation) has v features, the data matrix X has dimensions v × n. This ma-
trix can be written as X = (x̄1 , x̄2 , ..., x̄n ), where x̄i is the feature vector of data point
pi Let k be the total number of clusters C = {C1 , C2 , ..., Ck }. The score is defined as [22]:
BGSS n − k
V RC = , (21)
W GSS k − 1
where W GSS is the within-group sum of squares and BGSS is the between-group sum
of squares. These are calculated using the trace of two different matrices:
23
W GSS = tr(Wk ), (22)
k X
X
Wk = (x̄ − cq )(x̄ − cq )T (23)
q=1 x̄∈Cq
where Cq is the q:th cluster, cq is the centroid of cluster q, nq is the number of data
points in the cluster, and cP is the center of the data set P .
Davies-Bouldin score
The Davies-Bouldin score is a similarity measure that defines a measure of cluster sep-
aration R. R will then be used to determine the average similarity between each cluster
and the cluster in the set that is the most similar to it [23]. Let a cluster C in a cluster
set contain the data points {X1 , X2 , ..., Xm } ∈ Ep . Here, Ep is a Euclidean space of p
dimensions. Let S(X1 , X2 , ..., Xm ) be a measure of dispersion such that:
Si + Sj
Rij = . (28)
Mij
The Davies-Bouldin score can then be defined as the average of the similarity measure
R of each cluster and the cluster closest to it:
N
1 X
R̄ = max Rij . (29)
N i=1 i̸=j
The optimal choice of cluster algorithm when only considering the Davies-Bouldin score
will thus be the algorithm which results in clusters that minimize Equation 29. A lower
score indicates that the similarity between the different clusters are smaller, which in-
dicates a better cluster separation [23].
24
The centroid of a cluster or data set is defined as the arithmetic mean of all points
in the cluster or data set [24]. This works well when the dissimilarity between the
points in the data set is described by a Euclidean distance. In the case of clustering
time series however, this distance metric may not be the best choice. The Euclidean
distance between two time series T and S is the distances between each corresponding
observation:
v
u n
uX
d(T, S) = t (Ti − Si )2 . (30)
i=1
This limits the comparison to time series of the exact same length. This is not unique
for the Euclidean distance, but the distance metric is also sensitive to noise and outliers
in the data as well as signal transformations [25]. In the interest of clustering financial
time series in order to potentially diversify a portfolio, measures such as the Hellinger
distance and Kendall’s tau is likely more appropriate. Constructing a portfolio of funds
whose returns has a low or negative historic correlation can increase the diversification,
since the funds has historically responded differently to conditions of the market. This
can reduce the impact of individual fund return fluctuations. According to the styl-
ized facts of financial time series, there exists an asymmetry when it comes to gains
and losses. Large decreases in price is more common in empirical financial data than
large increases, and diversification by correlation is a method for mitigating the large
decreases. This is the main motivation why Kendall’s tau is appropriate for measuring
dissimilarity between financial time series. The Hellinger distance is in this case on the
other hand a measure of similarity between the distributional properties of the time
series, which allows it to capture both shape and spread of the distributions. Since the
Hellinger distance is a measure between distributions, it is also not sensitive to outliers
in the data.
An issue that arises with these distances is that the concept of a centroid is not defined
in the context of non-Euclidean distances. In order to utilize the quantitative metrics
that depend on a center of a cluster or data set, the metrics need to be modified. Instead
of calculating the centroids when computing the different metrics, the medioid will be
computed instead. The medioid of a set is the object where the sum of its dissimilarities
to the other objects in the set is the smallest [26]. By using this definition of a cluster
center instead of the centroid, metrics that utilize centroids can also be extended to non-
Euclidean distances between objects. More formally, the medioid of cluster C containing
data points {p1 , p2 , ..., pn } is calculated as follows for any distance metric d(p, q) [26]:
n
X
pmedioid = arg min d(q, pi ). (31)
q∈C
i=1
25
one measure of robustness that will be evaluated. Do note that this evaluation measure
is an internal validation measure as well, but the since the focus of the method lies
within the temporal stability of the results this will be classified as a separate type of
metric compared to the scores in in the previous section.
CLOSE score
When performing clustering for multiple successive time periods it is desirable that the
clustering methods produces similar results for each time period, while also producing
reasonable clusters given the current data. Encapsulating both of these criteria into an
evaluation method will for this reason allow the user to find clustering methods that are
both robust over time, and produces clusters of quality. One such evaluation method is
presented by Klassen et al. [27]. The algorithm is called CLOSE, which is an acronym
for Cluster Over-Time Stability Evaluation. The method is designed to evaluate multi-
variate time series clustering. The CLOSE algorithm can be used to evaluate clustering
results of methods that produce crisp clusters, meaning that each data point the the
set is assigned to one cluster only. Additionally, it was created specifically to evaluate
evolutionary clustering results. In evolutionary clustering the time series that will be
clustered are split into subsequences, effectively creating k data sets where k is the num-
ber of subsequences the time series data has been split into. The clustering algorithm
to be evaluated is then applied to each subsequence data set, and the clustering result
is stored in the order of the subsequences. Klassen et al. designed the algorithm to
compare the cluster result similarity between all subsequences, rather than two succes-
sive sequences. This strategy enables the capturing of large changes in cluster contents
caused by small changes over longer periods of time. This way of quantifying change
in clusters is called over-time stability [27]. The CLOSE method is parameter free, and
makes no assumptions regarding the nature of the clustering algorithm that has pro-
duced the clusters. This facilitates the comparison of a wide range of different clustering
algorithms, granted that the clusters have been generated based on the same distance
metric.
The following notation is used by Klassen et al. [27] and will be used here as well
in order to describe the algorithm and its implementation. Let one time series be
T = {ot1 , ot2 , ..., otn }, where oti is one observation at time point i and n is the total num-
ber of observations. The set O = ot1 ,1 , ..., otn ,m includes the vectors of all time series in
the data set, and Oti is used as shorthand for all data points at time i. Let a subsequence
of one time series be defined as Tti ,tj l = {oti ,l , ..., otj ,l } where j > i. In other words, it
is a section of a time series Tl beginning at time ti and ending at time tj . The CLOSE
algorithm makes a distinction between data points oti ,l that are cluster members, and
points that are noise (also known as outliers). One cluster containing time series data
points based on subsequence i of each time series is denoted as Cti ,j ⊆ Oti . Any data
point in Oti that has been partitioned into cluster Cti ,j is a member of this cluster. A
data point in the same set that is not assigned to any cluster during time period ti is
defined as noise. The second index of the cluster j indicates which unique identifier the
cluster has in the entire evolutionary clustering results. Here, j ∈ {1, ..., NC } where NC
is the total number of clusters across all subsequence clusterings [27].
The CLOSE score of an evolutionary clustering result depends on the over-time stability
of each cluster in the clustering result. The over-time stability of each cluster is in turn
26
dependent the subsequence scores of all data points that have been assigned to that
cluster. The subsequence score of data point otk ,l is defined as
k−1
1 X
subseq_score(otk ,l ) = p(cid(oti ,l ), cid(otk ,l )), (32)
ka i=1
where ka is the number of previous timestamps where the data point is assigned to a
cluster, and cid(oti ,j) is the cluster identity function that returns the cluster assignment
of data point oj at time ti . The function p describes the proportion of data points that
have remained together from one cluster in the previous timestamp to another cluster
in the current timestamp:
|Cti ,a ∩t Ctj ,b |
p(Cti ,a , Ctj ,b ) = , ti < tj . (33)
|Cti ,a |
Klassen et al. [27] describes ∩t as the temporal cluster intersection, and is defined as
In words, the temporal cluster intersection is the set of data points that have been
partitioned into the same clusters on different time intervals. An additional factor that
influences the over-time stability of a cluster is how many clusters from previous time
intervals that have merged when forming the cluster. More formally:
m(Ctk ,i ) = |{Ctl ,j |tl < tk ∧ ∃a : otl ,a ∈ Ctl ,j ∧ otk ,a ∈ Ctk ,i }|. (35)
1
P
|Ctk ,i | otk ,l ∈Ctk ,i subseq_score(otk ,l )
ot_stability(Ctk ,i ) = 1 . (36)
k−1
m(Ctk ,i )
Finally, the CLOSE score for an evolutionary clustering result ξ can be defined:
2 ! X !
1 n
CLOSE(ξ) = 1− ot_stability(C)(1 − quality(C)) . (37)
NC NC C∈ξ
The function quality(C) is measure of cluster quality. Klassen et al. suggests the usage
of the mean squared error [27], this measure is thus chosen to be used in this degree
project. The mean squared error of each cluster is defined in the inner sum of Equation
23, where cq is the medioid of cluster q.
The formula for the CLOSE score described in Equation 37 does not punish outliers
directly. Instead, one can choose to include an exploitation term in the equation, which
serves the purpose of punishing outliers more harshly. This exploitation term is defined
27
by Klassen et al. [27] as the number of data points that are assigned to clusters Nco
divided by all the data points in the entire data set, No :
Nco
exp_term = , (38)
No
2 ! !
1 n X Nco
CLOSE(ξ) = 1− ot_stability(C)(1 − quality(C)) . (39)
NC NC C∈ξ
N o
It is worth noting clustering methods that result in outliers are not directly undesired,
since it can be of great interest why a certain fund was not clustered with other funds
that have been labeled similarly by the fund managers. However, a clustering result with
a very large amount of outliers may become less useful, as the purpose of performing
the clustering in the first place may be lost. For this reason, the exploitation term will
be used when performing calculations of CLOSE in this thesis.
During evaluation, the financial time series that will be clustered will be split into sub-
sequences of equal size. Each subsequence will contain weekly data, and have a length
of 200 weeks. A better choice would possibly have been to let each subsequence have a
length of 260 weeks (5 years), which corresponds to the length of the time series in the
clustering results that are evaluated using the other evaluation methods in this report.
The main problem with this is that this approach would require that the entire time se-
ries would cover 20 years. The historical data available for the stocks in the benchmark
stock data set does not extend far enough back in time for many stocks. This would in
turn result in the stock data set shrinking drastically.
Since the CLOSE algorithm takes both temporal robustness as well as cluster quality into
account, this measure is deemed a promising candidate for selecting hyperparameters of
the clustering methods that will be evaluated using the validation methods described in
this report.
28
A classifier ϕ is then trained on data set X and the clustering result Y to perform
cluster label predictions. The trained classifier is then used to predict the cluster labels
of the other data set, X ′ . The predicted labeling ϕ(X ′ ) is used as a method to extend
the clustering solution Y of the training data set X to X ′ . The labels predicted by
the classifier can then be compared to the actual cluster assignments ξ(X ′ ). By quan-
tifying the dissimilarity between the predicted labels and the labels retrieved from the
clustering solution, a stability measurement for the clustering method can be calculated.
The choice of classifier will have a significant impact on the calculated stability of the
method, and it is therefore important that the classifier is chosen such that the misclas-
sification rate is as low as possible.
In order to compare the predicted clustering solution ϕ(X ′ ) and the actual clustering
labels Y ′ , Roth et al. proposes the normalized Hamming distance [11]:
n
1X
d(ϕ(X ′ ), Y ′ ) := 1{ϕ(Xi′ ) ̸= Yi′ }. (40)
n i=1
The stability index is thus a measure of the average dissimilarity between the predicted
labels given the data set X ′ and the actual clustering result Y ′ . In order to compute
this empirical expectation value, the entire data set is split in half randomly r times. For
each random split of the data, both splits are clustered using the clustering algorithm
ξ, and the classifier is trained on one half of the data X and the clustering labels of
the training data Y . The predictions of the cluster labels of the other half of the data
set, ϕ(X ′ ), is then computed and the distance between ϕ(X ′ ) and Y ′ is calculated and
stored.
Once r splits resulting in r dissimilarity calculations have been performed, the empirical
average stability index of the cluster method can be computed as
29
r
1X
Ŝ(ξ) = dS (ϕ(X)i , Y ′ i ). (42)
r i=1 k
In order to use the stability index to evaluate clustering methods equally, the stability
index needs to be normalized. A classification accuracy of 0.5 when the total amount of
clusters is 2 indicates that the classifier does not do a better job than simply guessing the
cluster label of data points. However, if the amount of clusters is 50, an accuracy of 0.5
is significantly better. In order to facilitate a comparison between the stability indices
for clustering solutions that result in different amounts of clusters, the stability index
is normalized using the asymptotic random misclassification rate. This normalizing
factor can be estimated by producing two new clustering results by clustering n points
randomly twice, where the number of clusters k is equal to the number of clusters
received when applying the clustering algorithm to X and X ′ . Here, n is the number
of data points in X and X ′ . By calculating the Hamming distance between the two
random clustering solutions, the normalizing factor Ŝ(Rk ) is acquired. The random
Using this, the normalized stability index can be calculated for clustering method ξ:
Ŝ(ξ)
S̄(ξ) := . (43)
Ŝ(Rk )
When comparing multiple clustering methods during model selection, the choice of pa-
rameters that results in the smallest value of S̄(ξ) is the most stable of the clustering
methods being compared [11]. These are the best choice of parameters according to the
stability index.
The classifier chosen in this project is a K nearest neighbor classifier, or KNN. KNN is
a non parametric classifier that classifies data points according to the majority vote of
the K closest data points in the training data set. Given a data point and a training
the data set, the distances between the data point and all data points in the entire data
set is computed. The class assignments of the majority of the K closest data points in
the training set will determine the classification of the unknown data point [28]. The
distance metric that is chosen in this application is the same distance metric used to
perform the clustering.
5.3 Analysis
This describes more in detail how the different experiments and evaluations where carried
out. It also describes the reference methods used for comparison.
30
5.3.1 Benchmarking
In order to evaluate the different clustering algorithms, a benchmark data set is needed.
Using a benchmark data set ensures that clustering algorithms are evaluated using the
same underlying data, and will therefore provide a fair comparison. Establishing a
benchmark data set containing different kinds of financial time series such as mutual
funds and stocks can also showcase how well the different clustering algorithms performs
on different kinds of data. This is due to the fact that a financial time series describing
a mutual fund will be different to a time series describing the price of a stock. For
example, prices of stocks tend to display more volatility than time series of mutual fund
prices due to the diversification of mutual funds. The meta data available will also differ
between stocks and mutual funds.
Performing clustering on mutual funds and stocks separately will showcase how well the
methods performs on different kinds of data. Some method may perform better when
clustering mutual funds rather than stocks, or vice versa. The benchmark data sets will
therefore be one mutual fund data set, and one stock data set. The mutual fund data
set consists of 430 funds from the Swedish fund market. The stock data set is made up
of 116 stocks that are listed on Nasdaq Stockholm. The time series in both data sets
consist of 273 observations.
5.3.2 Bootstrapping
In order to construct confidence regions for scores received from quantitative evaluation
methods, a technique known as bootstrapping will be utilized. The technique is used
to create a number of bootstrap samples that are drawn from the same empirical dis-
tribution as the data points in the original data sets. Multiple bootstrap samples can
be created using the underlying data sets, and then clustered and evaluated in order to
estimate the median and variance of a given evaluation score [29]. By doing this, it is
possible to get an idea how a clustering method may perform on new data that is similar
to the data in the bootstrapped data set.
It is important to ensure that the bootstrapped samples are able to successfully emulate
the empirical distribution of the original data. In the case of multivariate time series
such as the returns of mutual funds and stocks that are clustered in this project, a
bootstrapping method called moving block bootstrap [30]. The method is a resampling
scheme applicable to dependent time series data. Instead of simply sampling individual
observations at a time to form a new bootstrapped sample, the moving block bootstrap
method samples blocks of consecutive observations from the time series. The purpose of
this is that the structural dependence within each sampled block is maintained. Once
the total length of all sampled blocks is the same as the length of the original time series,
the blocks are stitched together to form the bootstrapped sample.
31
of blocks from {B1 , ..., BN } are selected randomly. This is accomplished by randomly
selecting the start indices of the blocks, and extracting l observations forward in time.
By concatenating the blocks of consecutive observations into a new sequence, the boot-
strapped sample Xn∗ can be acquired [30].
Since the time series in the data sets that are clustered in this thesis have a dependence
on one another, it is important that the blocks are extracted from the time series in
such a way that the temporal relationships between the multivariate time series are
maintained. For this reason, the starting indices and length of all blocks are determined
before the bootstrapping process is started. The same indices and block lengths are then
used to create bootstrap samples from all time series in the data sets. By performing the
moving block bootstrap in this manner, the temporal dependence between the financial
time series are transferred to their bootstrapped counterparts.
When calculating the internal validation measures as well as the CLOSE score during
model selection, both bootstrapped cluster results as well as the cluster results of the
actual data will be evaluated. In order to determine medians and confidence intervals
of the evaluation metrics, the bootstrapped evaluation will be performed a number of
times, where each iteration evaluates results of new bootstrap samples. The stability
index will on the other hand only be calculated for clusterings of the actual data sets.
The reason for this is the fact that the method relies on calculating the metric for
an ideally large number of random splits of the data. Due to limitations in computer
hardware, bootstrapping will not be used for this metric.
For this reason, all singleton clusters will be removed from the clustering result before
performing evaluation using internal validation measures. When performing evaluation
using the other methods presented in this report, the singleton clusters will still be
considered. The CLOSE score handles outliers using the exploitation term in Equation
39, and the stability index is normalized with respect to the total number of clusters in
Equation 43.
32
5.3.4 Reference methods based on a priori knowledge
In order to facilitate the evaluation of the clustering methods, a number of different
baseline clustering methods will be evaluated as well. The evaluation methods will then
be applied to the baseline results, and compared to the results of the clustering methods
that have achieved the best CLOSE score. The reference methods will be evaluated
using the internal validation measures, as well as the stability index.
Filter clustering
By utilizing the meta data of the financial time series, it is possible to perform filter
clustering. These clustering methods are executed by partitioning the financial time
series according to some chosen meta data. For the data set containing mutual funds,
the attributes that will be used to partition the data is the asset type, region and the
currency of the mutual funds. For the data set containing stocks, the data will instead
be partitioned according to GICS sector name [31]. GICS is an acronym for Global
Industry Classification Standard, and is an analysis framework with the purpose of clas-
sifying companies according to their sector, industry group, industry, and sub-industry.
The sector classification is the coarsest classification in the GICS framework, which will
theoretically result in a clustering result with fewer, larger clusters.
The filter clustering methods will not be evaluated using the CLOSE score. This is
due to the fact that the filter clustering will result in the exact same result regardless of
which time period it is being applied to, and will for this reason achieve an unreasonably
high CLOSE score.
The different sub classes of evaluation methods will be evaluated over different desirable
properties. These properties will differ between internal validation measures, qualitative
clustering methods, and stability evaluation methods. It is not trivial to determine these
33
properties, since what is classified as an appropriate measure of similarity or a good clus-
tering result differs between applications and data sets. The purpose of performing the
clustering is to detect similar financial time series and partition them together, as well
as detecting outliers. One example of an outlier in this case could be if a Swedish equity
fund is partitioned into the same cluster as an American bond fund. For this reason,
it is not beneficial to use partitions according to meta data from the fund managers
such as the region or asset type of a mutual fund as a ground truth. Instead, the best
cluster result when comparing two methods is the result where the similarity of data
points within clusters is the highest, and the dissimilarity between data points in differ-
ent clusters is the highest.
The methods for cluster evaluation will be evaluated in terms of usability in analysis.
This way of evaluating a visualization method is admittedly a bit vague. When evalu-
ating the usability of a cluster visualization method, these are the criteria that will be
examined:
• The extent to which a plot enables a user to quickly and efficiently comprehend
the clustering results
• How well the visualization method scales for large data sets and clusters
All quantitative evaluation methods such as the internal validation measures, the CLOSE
score, and the stability index will be calculated for each tuned clustering method and will
be presented in tables. By comparing the results obtained from the variety of clustering
validation methods, it is possible to discuss their usefulness for evaluating clustering
results of financial time series.
Experiments
A number of different clustering algorithms will be applied to the benchmark data in
order for their result to be evaluated. The clustering methods and distance metrics that
will be evaluated are:
Table 3: Specification of the clustering methods and distance metrics that will be tuned
In order to compare the clustering methods in a way that is as fair as possible, the pa-
rameters of each method are fine tuned with the purpose of maximizing CLOSE score.
Other measures of stability or internal validation measures can be used as well, one can
for example use the "elbow" of the plot of the within cluster sum of squares in order
34
to find the optimal parameters for a model. This metric is utilized in the method com-
monly known as the Elbow method [32]. Due to the occasional ambiguity regarding the
exact number of clusters that represent the elbow in the plot, the method of maximizing
the CLOSE score is chosen. Most importantly, this method of cluster validation has
been specifically developed in order to evaluate clusterings of time series. In addition
to evaluating the temporal stability of the clustering results, the CLOSE score takes
the quality of the clusters into account as well. The optimal parameters will have to be
determined for each data set that is being clustered, due to the nature of different kinds
of financial time series.
The reason for performing different evaluations for different kinds of financial time series
is that the method that succeeds best at clustering mutual funds may not be as success-
ful when clustering stocks. When performing clustering analysis it is important to have
knowledge about the data that is being clustered, and which methods performing well
on this particular kind of data.
Once the different clustering methods have been fine tuned to each data set, the cluster-
ing results of each clustering method applied to each data set in use is saved for further
evaluation. It is these clustering results that will be used when evaluating different
methods of cluster result evaluation. The collection of different cluster validation meth-
ods will then be used to draw conclusions regarding the performance of the evaluated
cluster methods, as well as the usefulness of the validation methods themselves for these
particular methods and data sets.
6 Results
6.1 Model selection
In this section, the CLOSE score, stability index and internal validation measures of
each method applied to each data set is presented for a range of parameters. Addi-
tionally, the CLOSE score of the bootstrapped clustering results are included as well.
The CLOSE score of the bootstrapped results are displayed using a box plot. The box
plotted for each parameter choice ranges from the first quartile to the third quartile of
the data, and the median is shown as a green line inside the box. The whiskers of the
plot show the range of the data, extending no longer than 1.5(Q3 − Q1) from the edges
of the box. Data points that fall outside of this interval are plotted separately as outliers
[33].
The internal validation measures are applied to the clusters of bootstrapped samples,
and these can be viewed in the appendix. In the following results, each parameter has
been evaluated 15 times, using new bootstrapped samples for each evaluation. Ideally,
this number of evaluations is as large as possible. Due to limitations in computer hard-
ware, 15 evaluations is deemed sufficient for the purpose. The number of randoms splits
of the data when calculating the stability index is also 15 in the evaluation experiments
shown below.
The parameters that are selected for each clustering method for further evaluation are
35
the parameters that result in the highest CLOSE score. The internal validation measures
have all been normalized to be in the range 0 to 1 for each clustering method. The
Davies-Bouldin score has also been inverted, so that it is easily compared to the other
validation measures where a higher score is better. It is important to bear in mind
that this means that the internal validation measure plots can not be compared to other
methods or distance metrics, and that the main point of including this plot is to showcase
which parameter yields the optimal clustering result according to these measures. The
actual values of the internal validation measures will be presented in the results of the
tuned methods.
36
Figure 3: Stability index of clustering results of mutual funds
In Figure 1, it can be observed that the CLOSE score achieves its maximum value when
the number of clusters is approximately 50. This choice of cluster method parameter is
strengthened further when observing the bootstrapped result in Figure 2. According to
the stability index in Figure 3 the optimal number of clusters is the smallest number
of clusters tested, which in this case is four clusters. The internal validation scores in
Figure 4 seem to instead favor the largest amount of clusters tested, 150 clusters. It is
worth noting however, that the slope of all three scores increase drastically when the
number of clusters is approximately 50.
37
Stock data set
38
Figure 8: Internal validation scores of clustering results of stocks
For the stock data set, the highest CLOSE score is achieved when the number of clusters
is 16 clusters. Please also note that the difference in CLOSE score between the different
number of clusters is significantly smaller in Figure 5 than in Figure 1. The stability
index seems to increase linearly as the number of clusters increases in Figure 7, suggesting
that the optimal number of clusters is instead 6. The silhouette score and the Davies-
Bouldin index in Figure 8 seem to once again suggest that the largest number of clusters
is the most optimal. The Calinski-Harabasz index has a local maximum the number of
clusters is 16, but the index suggests that the smallest number of clusters tested results
in the best results. Since the scores are normalized for the purpose of being comparable
in one plot, the actual values of the scores are not shown. The reader is referred to the
figures in Section 10.1.1 in the appendix, where the bootstrapped results are presented
in box plots.
39
Figure 10: CLOSE score of clustering results of bootstrapped mutual funds
The CLOSE score in Figure 9 indicates that the optimal number of clusters is 24, while
the bootstrapped results in Figure 10 instead seem to indicate that the optimal number
of clusters is 4. Other than this discrepancy, the two graphs are quite similar. Since
the median of the bootstrapped CLOSE score for 4 clusters is significantly higher than
40
the CLOSE score for 24 clusters, 4 clusters are chosen as the optimal parameter. This
choice is particularly motivated by the fact that the difference in CLOSE score between
4 and 24 clusters in Figure 9 is fairly small.
41
Figure 16: Internal validation scores of clustering results of stocks
As can be observed in Figure 13, the have two maxima when the number of clusters is
12 and 18. Due to the fact that the stability index is increasing steadily as the number
of clusters increase and the internal validation measures worsen, the smaller number of
clusters is picked as the optimal parameter.
42
Figure 18: CLOSE score of clustering results of bootstrapped mutual funds
This clustering method achieves its highest CLOSE score when the maximum distance
between the points in the clusters is 0.24. The stability index seen in Figure 19 instead
seems to suggest that the optimal maximum distance is significantly larger, resulting
43
in clusterings containing fewer and larger clusters. On the contrary, the internal val-
idation measures in Figure 20 suggest that the maximum distance is kept at a minimum.
44
Figure 24: Internal validation scores of clustering results of stocks
As can be observed in Figure 21, the maximum CLOSE score is achieved when the
maximum intra cluster distance is 0.4. Again, the stability index in Figure 23 indicates
essentially that the greater the allowed distance between the points in the clusters are,
the better. This is equivalent to selecting a clustering result with very few and large
clusters. When observing the internal validation scores in Figure 24, it appears that
the silhouette score and the Davies-Bouldin index favors a smaller distance between the
points in the clusters, while the Calinski-Harabasz index favors larger and fewer clusters.
45
Figure 26: CLOSE score of clustering results of bootstrapped mutual funds
As previously seen in Figure 9, the CLOSE score is the largest for the parameters that
result in fewer clusters when the clustering is performed using the Hellinger distance. In
this case, the CLOSE score has a maximum when the maximum intra cluster distance
is 0.75, as can be seen in Figure 25. The stability index in Figure 27 seems to support
46
this choice of parameter as well, since one of the local minima of the index is achieved
when the maximum distance is 0.75. The other minima of the stability index occurs
when the maximum distance is 0.6, but this is approximately where the CLOSE score
is has its global minima. The silhouette score and the Davies-Bouldin index in Figure
28 also indicate that a larger maximum distance in the range [0.7, 0.9] achieves the best
score. The Calinski-Harabasz index suggests a smaller value of 0.4 instead. The volatile
movements of the normalized internal validation scores in Figure 28 are a result of the
fact that the values of the internal validation scores do not change much when changing
the parameters, which can be seen in Section 10.1.1 in the appendix.
47
Figure 31: Stability index of clustering results of stocks
The CLOSE score in Figure 29 does not change much once it has reached a value of
approximately 0.35. The maximum CLOSE score is approximately the same for the
parameter values 0.2 and 0.3. Since the stability index in Figure 31 indicate a better
stability index for 0.3, this value is chosen.
48
CLOSE score. Often, the internal validation measures contradict one another, and tend
to favor clustering results with a very large amount of small clusters and many data
points not assigned to clusters at all.
Note that the filter clustering does not use a distance matrix to perform the clustering
task, and the results of the filter clustering have thus been calculated for both distance
metrics to allow comparison to the tuned clustering methods. The reference clustering
method primarily uses Kendall’s tau to perform the clustering, and will therefore be
evaluated using the Kendall’s tau distance between the time series. Please bear in mind
that the Davies-Bouldin score was previously inverted in the plots of internal validation
scores to facilitate comparison to the other scores. In the tables below, a smaller value
of the Davies-Bouldin index is a better score.
Table 4: Quantitative evaluation metrics of reference methods for fund data set
Sequential Filter
Filter
reference clustering
clustering
clustering (Hellinger
(Kendall’s tau)
method distance)
CLOSE score 0.45 N/A N/A
Stability index 0.89 0.72 0.73
Silhouette score 0.22 -0.06 -0.25
Calinski-
2695 699 171
Harabasz index
Davies-Bouldin
1.55 3.39 4.21
index
49
6.2.2 Stock data set
Table 5: Quantitative evaluation metrics of reference methods for stock data set
Table 6: Quantitative evaluation metrics of tuned clustering method for both data sets
50
6.3.2 Agglomerative clustering using the Hellinger distance
Table 7: Quantitative evaluation metrics of tuned clustering method for both data sets
Table 8: Quantitative evaluation metrics of tuned clustering method for both data sets
Table 9: Quantitative evaluation metrics of tuned clustering method for both data sets
51
6.3.5 Summary and comparison to reference methods
When comparing the quantitative scores of the reference methods with the results of the
tuned clustering methods, it is clear all tuned clustering methods outperform the filter
clustering methods. These results are expected, but it is noted how poorly the filter
clustering performs. This further motivates the need to perform clustering of financial
time series, rather than only relying on labels assigned to the instruments. Additionally,
it is noted that both the hybrid hierarchical clustering method and the agglomerative
clustering method outperform the reference clustering algorithm. During the qualitative
analysis of the clustering results, the clusterings produced by the tuned methods will be
briefly compared to the clusterings produced by the reference method.
The clustering method using Hellinger that has the highest performance on the mutual
fund data is determined to be the hybrid hierarchical clustering method in Table 9. This
clustering method results in a slightly higher CLOSE score compared to the agglomera-
tive method in Table 7. Other than this, the performance of the methods according the
quantitative evaluation measures are very similar.
Finally, the evaluation metrics indicate that the agglomerative clustering method in
Table 7 is the best performing method when using the Hellinger distance to cluster the
stock data. This method achieves a higher CLOSE score than the hybrid hierarchical
method in Table 9, and a slightly worse stability index. However, the choice is motivated
further by the fact that the internal validation measures all indicate a better cluster
result for the agglomerative method.
52
6.4 Visualization of clustering results
In this section, visualizations of clustering results created by the previously selected
clustering methods will be exhibited.
(a) Cluster result of agglomerative clustering (b) Cluster result of hybrid hierarchical
using Kendall’s tau method using the Hellinger distance
Figure 33: Scatter plots showing reduced data points and cluster assignments for
mutual fund data
(a) Cluster result of hybrid hierarchical (b) Cluster result of agglomerative clustering
method using Kendall’s tau method using the Hellinger distance
Figure 34: Scatter plots showing reduced data points and cluster assignments for stock
data
53
When observing the scatter plots of the fund data, it is difficult to see significant sepa-
ration between the clusters. Additionally, in Figure 33a, the data points that have been
classified as outliers by the clustering algorithm appear to be located very close and
even on top of data points assigned to clusters. Data points in clusters also appear to
be placed quite far apart, and the shapes of the clusters are hard to comprehend. This
task is easier in Figure 33b, since the amount of different clusters is much smaller. Here,
the clusters are more simple to tell apart, but they still appear oddly shaped.
Figure 34a, which displays the stock clustering result partitioned using Kendall’s tau is
also difficult to interpret for similar reasons as Figure 33a. When observing Figure 34b,
it appears that the cluster assignment of the data points are based on distance from the
center of the cluster of data points shown.
t-SNE
The scatter plots shown in this paragraph were acquired by using the t-SNE method
to perform dimension reduction. In contrast to PCA, t-SNE is used to find lower di-
mensional mappings of the data while maintaining the distance between the data point.
In order to find these mappings, the distance matrix used to perform the clustering are
used as input to the t-SNE algorithm in order to obtain the reduced data.
(a) Cluster result of agglomerative clustering (b) Cluster result of hybrid hierarchical
using Kendall’s tau method using the Hellinger distance
Figure 35: Scatter plots showing reduced data points and cluster assignments for
mutual fund data
54
(a) Cluster result of hybrid hierarchical (b) Cluster result of agglomerative clustering
method using Kendall’s tau method using the Hellinger distance
Figure 36: Scatter plots showing reduced data points and cluster assignments for stock
data
When observing the figures showing cluster results of the mutual fund data set, it
is apparent that the scatter plots offer a bit more in terms of interpretability when
the dimension reduction is performed using t-SNE. While the outliers still appear to
be placed in separate clusters in Figure 35a, the clusters in the plot are more easily
discerned and separated. This figure offers the observer an intuition regarding the size
of the clusters, as well as the distance between the clusters themselves. Figure 35b
displays clearly separated, albeit unevenly shaped clusters. Here, an observer can more
easily comprehend the clustering results and see the separation between the clusters.
Each timestamp shown in the cluster evolution plots are based on clustering results of
subsequences of the financial time series, each with a total length of 260 weeks. The
start and end dates of each subsequence is displayed on the x-axis of the figures. The
55
mutual fund data set used to create the first two cluster evolution plots contain a total
of 275 financial time series. The stock data is made up of 116 assets.
Figure 37 shows how the largest clusters are fairly equal in size, apart from the second to
last subsequence where the largest cluster becomes significantly larger than the others.
This appears to be due to the fact that two clusters in the previous subsequence merge
to form this cluster, allowing a fourth smaller cluster with 10 time series to be plotted
as well. In the last subsequence the largest cluster with 97 data points is split in two,
forming the two largest clusters in the final subsequence.
56
Figure 38: Cluster evolution of hybrid hierarchical clustering using the Hellinger
distance
In contrast to the previous figure, Figure 38 shows how the largest cluster is practi-
cally static throughout all subsequences. This cluster is also significantly larger than
all other clusters, containing between 83%− 92% of all data points in the entire data set.
Figure 39: Cluster evolution of hybrid hierarchical clustering using Kendall’s tau
The cluster evolution shown in Figure 39 shows how the cluster results are made up of
one large cluster and many smaller clusters in the first three subsequences. It can also be
57
observed however, that approximately half of the time series present in the cluster in the
second sequence have been partitioned into other smaller clusters by the last sequence.
Figure 40: Cluster evolution of agglomerative clustering using the Hellinger distance
Again, the cluster methods that use the Hellinger distance to partition the data seem-
ingly clusters the data into one very large cluster, and a few smaller clusters. The result
in Figure 40 is similarly to the cluster result in Figure 38 fairly static, where the majority
of the time series in the largest cluster stays in this cluster throughout all subsequences.
The cluster evolution plots may not be directly comparable to the the scatter plots pre-
viously shown, since they display how the size of the clusters change over time while the
purpose of the scatter plots are to show the cluster results in one time step. Regardless,
the cluster evolution plots seem to scale better to large data sets and clusters. Since
the data points are represented as numbers that indicate the amount of time series in a
cluster rather than points in a two dimensional space, the plots become less crowded and
easier to comprehend. The evolution plot lacks details regarding the within cluster dis-
tances, but instead displays the relationships between clusters in different subsequences,
might be of interest when performing time series clustering.
58
data set will be presented here.
The results obtained using the reference sequential clustering algorithm will be compared
to one of the clustering algorithms tested. The chosen method for comparison is the
agglomerative clustering method using Kendall’s tau distance, and the clustered data
set will be the mutual fund set. Only a selection of the clusters created by the different
algorithms are included below in order to maintain brevity.
In the reference clustering result, Swedish equity funds are primarily placed in the cluster
1, with a few Nordic funds added as well. This might indicate that the Nordic equity
funds in this cluster is largely made up of Swedish stocks. Cluster 2 is a global and North
American equity fund cluster. This indicates that the global funds in this cluster contain
primarily North American stocks. Cluster 4 is a Swedish small cap cluster with a few
Nordic small cup funds. It is clear that this clustering algorithm is able to partition small
cap funds separately from the other Swedish equity funds. Cluster 13 is a technology
cluster, containing most of the technology funds in the data set, as well as a few growth
funds. This may be due to the fact that the growth funds present in the cluster have a
significant portion of their placements within companies developing technology. This is
the case for the Öhman Global Growth fund [34], for example. Cluster 23 consists of all
59
inflation linked bond funds. These should all be strongly linked to the inflation, so this
clustering is expected. The total number of singleton clusters in the reference clustering
result is 97.
The agglomerative clustering result is reminiscent of the reference clustering result, but
the clusters are larger and some clusters that are present in the reference result seems
to have been merged in this clustering result. This clustering method does not partition
Swedish small cap funds separately, and instead places most Swedish equity funds in
cluster 1. The global equity cluster has also been merged with the technology cluster
that is present in the reference clustering algorithm. Cluster 7 consists of a range of
different interest funds. In the reference clustering result, these are partitioned further
into many different clusters, or classified as noise. The total number of singleton clusters
in this clustering result is 19 time series.
As a comparison between the different distance metrics, the clustering result of mutual
funds using the Hellinger distance is also included below:
60
Table 12: Hybrid hierarchical method using the Hellinger distance
When it comes to the clustering result produced using the Hellinger distance, the results
are vastly different. Here, the funds have been split into three clusters. The first
cluster contain most equity funds in the entire data set. The second cluster contains
all bond funds, and the third cluster contains most interest funds. This indicates that
these different kinds of funds have distinct returns distributions, but no other significant
patterns that the Hellinger distance could help make out.
7 Discussion
7.1 The clustering results
By analyzing the results in section 6.3, it seems that clustering results acquired using
the Kendall’s tau distance between the time series resulted in more and smaller clusters
compared to the results acquired using the Hellinger distance. The Hellinger distance
has seemingly captured the patterns in the data that are specific for the equity funds,
bond funds, and interest funds. Since the Hellinger distance is a measure of similarity
61
between two distributions, the returns distributions may follow a certain pattern for
each of these fund types. By clustering using the Kendall’s tau distance, more distinct
patterns in the data were captured by the clustering algorithms. For example, the clus-
tering method that generated the result visible in table 11 placed Swedish and global
equity funds into separate clusters. For this reason, using Kendall’s tau to partition
mutual funds appears to be a more efficient approach when the goal is portfolio opti-
mization and exploring more intricate patterns in the data set. When it comes to the
stock data set, the clustering methods using Kendall’s tau also seems to create a few
larger clusters and many small. This is likely due to the fact that the data set only
consists of Swedish stocks that all correlate in to some degree.
The filter clustering methods appear to display fairly poor performance when evaluated
using the quantitative metrics in tables 4 and 5. This is another motivator why other
types of clustering are needed in the first place to detect correlated mutual funds and
outliers. The filter clustering methods would not detect that the Nordic fund in cluster
1 in Table 10 is more correlated to the Swedish equity funds than the other Nordic funds
present in another cluster. An investor may then falsely believe that buying this Nordic
fund could increase portfolio diversity if the investor already owns Swedish equity funds.
It is important to note however that the filter clustering methods may perform better
if the data had been partitioned using other properties of the time series. For example,
the results may have been better if the mutual fund filter method only partitioned using
asset type and region instead of also grouping by currency. Making beneficial choices
regarding which properties to use when performing filter clustering requires more in-
tricate knowledge of the time series type in the data set. Using a different clustering
circumvents this choice, and might for this reason be easier to implement.
The internal validation measures used in this project are altered in order to allow the use
of non-Euclidean distances. One reason why these validation metrics do not perform well
during model selection could be that the clustering is performed using distance metrics
that are different than the point-to-point comparison that using the Euclidean distance
would entail. Instead, the Hellinger distance measures the similarity between the return
distributions of two time series in this case, and Kendall’s tau quantifies the correlation
between two time series. It is entirely possible that the internal validation metrics would
62
have been more useful if the time series were compared using the Euclidean distance
instead. Since one of the main purposes of performing the clustering in the first place is
portfolio diversification, measures such as the ones used throughout this project might
be more effective however.
7.2.2 CLOSE
By observing the results in Section 6.1, it becomes apparent that the CLOSE score is
the only cluster validation method applied to the different clustering results that con-
sistently has a maxima for parameters that do not result in trivial clustering solutions.
Trivial clustering solutions in this case is when the clustering method produces as few
clusters as possible, or as many clusters as possible. Additionally, it is the only validation
method applied in this project that takes both the stability over time and the quality
of the clusters into account. For example, a cluster result with a significant amount
of smaller clusters may have a high cluster quality if they are dense. However, these
results may be less stable since a smaller change in distance between the time series is
needed for the cluster assignments to change. Larger clusters on the other hand may be
very stable as time progresses, but are likely to have a lower quality since the distance
between the data points in a larger cluster is likely more significant than the distance
between data points in smaller clusters.
When performing the qualitative analysis of the reference clustering results and results
acquired using the agglomerative clustering method in Section 6.5, the reference method
seems to produce a finer clustering solution compared to the agglomerative. This finer
partitioning of the financial time series is not necessarily the better solution, but it may
be easier to comprehend at first glance. Since the agglomerative clustering solution re-
ceived a higher CLOSE score than the reference method, the temporal stability of this
solution is likely higher for slightly larger clusters while the cluster quality has not begin
to decline significantly. The exploitation term in Equation 39 may also be the reason
why the reference method receives a lower CLOSE score. In the results in Section 6.5,
the number of singleton clusters is significantly higher in the reference result compared
to the result of the agglomerative method. As a result, the score of the reference method
is punished more harshly than the agglomerative method.
Despite the fact that the clusters produced when using the method with the highest
CLOSE score seem to not capture some of the finer patterns in the data sets, it is still
the only quantitative clustering method tested in this thesis that favors clustering so-
lutions where more general patterns have been captured. For example, the clustering
solution in Table 11 fails to partition Swedish small cap funds separately from the other
equity funds but is still able to cluster most Swedish equity funds in one cluster. The
method has also resulted in a cluster consisting exclusively of interest funds.
When observing some of the plots of the CLOSE score during model selection, it can be
seen that the scores are similar for different parameter values. In Figure 1, the CLOSE
score does not differ much between N = 50 and N = 64. Another example is in Figure
17. Here, there is another local maxima when the maximum distance d = 0.16. The
CLOSE score is approximately 0.625 at this maxima, and can be compared to the global
63
maximum value of 0.66. While the score of the clustering result acquired by using a
maximum intra cluster distance of d = 0.16 is worse than the the score when d = 0.24,
this would result in a solution where the clusters are smaller and the number of clusters
is larger. This result would possibly capture the finer patterns in the data, and separate
the large clusters, such as the Swedish equity cluster in Table 11 further.
The behaviour of the stability index may be caused by the fact that the classifier used to
predict the clustering labels of half of the data set is significantly more successful when
the number of different clustering labels in the result is small. Intuitively this makes
sense, since there are fewer labels that the classifier is able to pick. This effect should
theoretically be offset by the normalizing factor in Equation 43, but it still appears that
solutions with few, large clusters are favored by the stability index. Due to this, the
clustering validation method is may not be appropriate when clustering the data sets
introduced in this thesis. It is not possible at this stage to draw conclusions regarding
any other data sets of financial time series however.
Unlike the CLOSE score, the stability index was not developed specifically for clusters
of time series. Time series have a temporal structure, and multivariate time series in
a data set also have a dependence on each other. This added complexity may require
additional considerations when designing a cluster validation method. It is also possible
that the stability index would have worked well in model selection if the data set would
have consisted of univariate time series, with no dependence on one another at all. This
theory is based on the fact that Roth et al. assume that the data points in the two
splits of the data set are independent [11]. Since the multivariate time series data in the
mutual fund and stock data sets are not independent, this assumption does not hold for
this data type. The validation method was still included though, in order to investigate
its effectiveness on dependent time series data. Overall, it appears that it might be
better to use methods specifically developed for time series and have no assumptions
regarding the independence of the data.
64
returns are used as input to the PCA algorithm, the data vector corresponding to each
time series is high dimensional. Each observation in the time series corresponds to one
individual dimension, meaning that as the number of observations increase in a time
series so does the dimensionality of the data. The data points shown in the scatter plots
in Section 6.4.1 consists of 273 dimensions. It appears that reducing the returns data
to two dimensions removes too much of the information, and the resulting plot becomes
difficult to interpret. Another issue with using PCA to reduce the dimensionality of the
time series data is the fact that according to the stylized facts of financial time series, the
returns data can exhibit some autocorrelation [1]. When performing PCA, it is usually
assumed that the data is independent in time [35]. If this assumption is broken, it may
negatively impact the descriptive ability of PCA. Vanhataloa and Kulahci [36] found
that if the data is autocorrelated, the number of principal components needed to keep a
determined fraction of the variability in the data could increase. This is one explanation
why the two principal components displayed in the scatter plots fail to preserve the
variability and patterns in the original data.
Compared to the scatter plots created using PCA, the plots whose data has been re-
duced using t-SNE appears to show a slightly improved separation between the clusters.
This is especially true in Figure 35b, where three clusters can easily be distinguished.
In Figure 35a, the clusters appear a bit more bunched together. At the same time, all
data points in a cluster are seemingly not plotted in the vicinity of one another. One
example is the two orange data points in the top middle of the figure. Additionally, the
data points determined to be outliers by the clustering algorithm appear to form their
own clusters. The same patterns can be observed in figures 36a and 36b. The clusters
are however slightly more difficult to make out in Figure 36a, since many of the data
points seem to have a fairly even distance to the points next to them. This may be
caused by the fact that the data set consists only of Swedish stocks, and that most of
the Swedish stocks tend to be correlated and influenced similarly by events. The slight
improvement in visualization quality over the scatter plots produced using PCA may
be caused by the fact that t-SNE is nonlinear technique that makes no assumptions
regarding the time dependence of the data. The distances between the data points in
Equation 11 are Euclidean, but one can choose to use other distance metrics as well.
This flexibility allows visualizations to be created based on the distance metrics actually
used when performing the clustering, which seems to have resulted in a slightly better
visualization compared to PCA. Additionally, using Students t-distribution to calcu-
late the lower dimensional representation of the data is more robust to outliers, due to
the heavy tails of the distribution. Longer distances between the time series in the high
dimensional space may transfer better to the lower dimensional mapping because of this.
While these scatter plots may be slightly easier to comprehend, they still provide a fairly
crowded cluster visualization with a limited usage in analysis. Increasing the number
of data points in the data set further would result in even more crowded plots, so the
scaling capabilities of cluster visualization via scatter plots is questioned.
65
since it shows how the largest clusters evolve over time. Since the method of visualiza-
tion shows the intersection between the clusters in adjacent subsequences, the method is
reminiscent of how the CLOSE score is calculated. A clustering result where the mem-
bers of the clusters are generally clustered together in the next timestamp is considered
a beneficial clustering result since it is consistent over time. In the cluster evolution
plots, this means that the edges from one cluster should be as few and large as possi-
ble since this indicates that most cluster members are transferred together to the next
cluster. On the contrary, if a cluster has many edges connected to many clusters in
the next subsequence, this indicates that the cluster has been split into many smaller
clusters. A cluster result with a higher CLOSE score is likely to display less changes
per subsequence than a clustering result with a lower score. One issue with the cluster
evolution plots as they are displayed in Section 6.4.2 is that the time series are simply
represented as numbers, both in the edges and in the cluster nodes themselves. This
fact makes it impossible to get an intuition regarding which time series tend to follow
one another, or which ones are placed in different clusters after a split of a larger cluster.
However, this method of cluster visualization is considered the most informative out of
the methods explored in this thesis. In the context of time series clustering, temporal
stability is of great interest considering the fact that a cluster method that has been
shown to produce approximately the same results for multiple different sequences of
time may be more reliable for future use as well. The cluster evolution plots provide
an intuition regarding the stability of the method, as well as an idea of the general
size of the clusters in the solution. One additional limitation of the method is that the
user is forced to determine a maximum number of clusters shown to prevent a messy
visualization with edges covering the figure. Another area were the method struggles is
when one or two clusters are significantly larger than the others. An example of this
can be seen in Figure 40, where the smaller edges are barely visible.
7.5 Limitations
One limitation and source of error in this project was the lack of access to Swedish
stock data that extended far enough back in time. Due to this, the data set containing
66
Swedish became comparatively small, possibly affecting the clustering and evaluation
outcome adversely. The CLOSE score is even explicitly mentioned to have an increased
sensitivity when the sample size is small [27]. This may explain the consistently lower
CLOSE score of the clustering results of the stock data set. As can be observed in the
cluster evolution plots in Section 6.4.2, the cluster results of the stock data set consisted
of one large cluster and many smaller clusters. This may be caused by the fact that the
difference between the distributions or correlations of the returns of Swedish stocks are
likely not as significant as the differences between different kinds of funds. In a way,
performing clustering of this data set is similar to clustering the Swedish equity cluster
in shown in Table 11. A more equal comparison to the mutual fund data set would
possibly have been clustering stocks from other stock markets than the Swedish as well.
It is likely that the different patterns in the data set would be more obvious, since the
different stocks would have been impacted differently by phenomena such as currency
value fluctuations.
Another limitation of this study is the fact that the cluster validation methods that
utilize the distance matrix between the time series to calculate a score are limited to
only using one type of distance at a time. This becomes an issue when the sequential
clustering method used as a reference in this study is evaluated. Since this clustering
method uses two different distance metrics to perform the clustering initially, the choice
of which distance metric to use for validation must be made. Using only one of the
distance metrics to evaluate the clustering method may not provide a fair evaluation of
the method. Ideally, a measure of cluster quality not dependant on the distance between
the points in the cluster or a way to combine the different distance matrices would be
developed. This would enable a better comparison to the clustering methods that only
use one distance metric to perform the clustering.
The issue of handling data points that are not partitioned into clusters is another factor
that may impact the validation results. These data points are not necessarily outliers.
During model selection, parameters that only create clusters if two data points are ex-
tremely close may consider data points as outliers while another method may consider
them parts of a cluster. Instead of filtering these singleton clusters from the actual
clustering result, it may be a better approach to use some kind of outlier detection
technique to preemptively remove these points from the data set before performing the
clustering. This would ensure that the only points that are removed from the data set
are actual outliers, and not just data points that an algorithm wouldn’t place in a cluster.
Finally, the results that were presented in this report are primarily based on the mutual
fund data set. While the stock data set provided interesting results as well, the inter-
pretability of labeled mutual fund data facilitated the qualitative analysis. If one was
to do the same analysis of the clustering results containing stocks, more effort would
have to go to analyzing the companies themselves. It is important to bear in mind that
the results presented in this report is likely not ubiquitous for all kinds of financial time
series, and conclusions should be drawn carefully.
67
7.6 Future work
First and foremost, it would be of great interest to study new cluster validation meth-
ods that are able to evaluate clustering results created using the sequential clustering
method more fairly. In this report, the Kendall’s tau distance matrix was chosen since
this is the distance metric used in the first of the two clustering rounds that make up
the sequential method. One possibility could be to calculate the quantitative validation
scores separately for each round of clustering, considering only the distance metric that
was used to perform the clustering in the round. The total score would then be the
mean of the scores of each round.
Another interesting area to study further would be additional development of the CLOSE
score. The method performed remarkably well for model selection when clustering fi-
nancial time series, but a version of the method that placed additional focus on cluster
quality rather than stability would be of interest. While stability of the method is im-
portant in order to produce reliable clustering results for different periods of time, this
particular use case focusing on portfolio diversification might call for more attention to
cluster quality.
Applying the validation methods presented in this report to a wider range of financial
time series data sets would be of interest in further studies. By doing so, it would be
possible to draw conclusions regarding a wider range of financial time series, and possi-
bly investigate whether some validation measures work better on one particular kind of
financial asset.
Relating to the visualization of the clusterings, further development of the cluster evo-
lution graph would be of great interest. One feature that might enhance the cluster
evolution plot further could be to select a few data points that differ from one another
in some way, and follow how these move from cluster to cluster as time progresses. One
way to do this would be to draw the cluster nodes as rectangles rather than smaller
circles where the identifiers of the selected time series as well as the total number of
time series in the cluster could be written.
8 Conclusions
Based on the results presented in this thesis, choosing methods of cluster validation
when evaluating clusters of financial time series must be done with intent, and should
be chosen with regard to the data type being clustered, the algorithm used, as well as
the primary use case of the clustering.
The goals of this thesis were to find metrics that could be useful when evaluating cluster-
ings of financial time series. We investigate robustness of clustering methods and ways
to quantify it. Three internal validation scores were used to evaluate the clusterings.
The silhouette score, the Calinski-Harabasz index, and the Davies-Bouldin score. The
results in the model selection part of this thesis indicated that these metrics are not
useful when tuning the parameters of clustering methods used to cluster the data sets
presented in this report. The scores occasionally contradicted each other, but for most
68
of the methods tested the scores suggested that the optimal number of clusters to use
is either the largest or smallest number of clusters tested. This may be caused by the
fact that non-Euclidean distances were used to perform the clustering, and both the
Calinski-Harabasz index as well as the Davies-Bouldin score had to be adapted to use
medioids instead of centroids in the score calculation. It is possible that this adaptation
was not sufficient to successfully validate the clusterings.
Similarly, the stability index was not successful at the task of model selection either.
Here, the index consistently suggested that the optimal parameter choice was the pa-
rameter that resulted in the smallest number of clusters. It was theorized that the issues
with the stability index are caused by the multivariate nature of the time series in the
data sets, since the authors of the method originally assume the independence of the
data points.
The CLOSE score combines the main goals of this thesis, it quantifies both the tem-
poral robustness as well as the quality of the clusters themselves by using the mean
squared error. The method shows how there is a trade off during model selection, since
the quality of the clusters decreases the larger the clusters become, but the temporal
stability increases. By choosing the parameter that results in the highest close score,
one can possibly acquire a clustering result that is both stable and contains clusters of
quality. For this reason, this metric is considered the most successful at quantifying
both robustness and cluster quality out of the quantitative validation metrics tested.
Two different methods of cluster visualization was tested in this thesis, scatter plots
created using dimension reduction and a plot displaying the evolution of clusters over
time. Two methods of dimension reduction was applied to the data, Principal Compo-
nent Analysis, and t-Distributed Stochastic Neighbor Embedding. t-SNE was determined
to be the most effective method to visualize the distances between the time series in the
data set, but the scatter plots were determined to be difficult to interpret as well as not
scalable to very large data sets. Overall, it appears to be a difficult task to reduce high
dimensional data to only two dimensions and still retain sufficient information contained
in the original data.
While the cluster evolution plots provided no information regarding the distance be-
tween the data points in the cluster result, it instead provided a visualization of the
temporal stability quantified by the CLOSE score. Using the cluster evolution plot, one
could observe how the clusters changed over time as well as how the time series moved
from one cluster to another as time progressed. Additionally, the plot allowed the user
to quickly comprehend the size of the clusters as well as the general division of data
points between the clusters as a whole. The cluster evolution plot was determined to
be the most appropriate method of cluster visualization, given the use case and data set.
Finally, the manual method of cluster validation of using domain knowledge was dis-
cussed. It was found that domain knowledge is crucial in cluster validation, since it
allows the user to perform fine adjustments to cluster methods selected by methods
such as the CLOSE score in order to acquire the type of clustering most appropriate for
the use case. It was observed that using the clustering method that received the highest
CLOSE score resulted in clusters that captured more general patterns, but were more
69
temporally stable than the reference method. Again, it seems that there is a trade off
between temporal stability and cluster quality, and in the end it is for the performer of
the clustering to decide what is most important.
70
9 References
[1] R. Cont. “Empirical properties of asset returns: stylized facts and statistical issues”.
In: Quantitative Finance 1.2 (2001), pp. 223–236.
[2] Kidbrooke. About Us. 2023. url: https://kidbrooke.com/about.
[3] Anton Yeshchenko et al. “Comprehensive process drift detection with visual an-
alytics”. In: Conceptual Modeling: 38th International Conference, ER 2019, Sal-
vador, Brazil, November 4–7, 2019, Proceedings 38. Springer. 2019, pp. 119–135.
[4] Saeed Aghabozorgi, Ali Seyed Shirkhorshidi, and Teh Ying Wah. “Time-series
clustering – A decade review”. In: Information Systems 53 (2015), pp. 16–38.
[5] Odilia Yim and Kylee T Ramdeen. “Hierarchical cluster analysis: comparison of
three linkage measures and application to psychological data”. In: The quantitative
methods for psychology 11.1 (2015), pp. 8–21.
[6] Scikit learn Developers. Agglomerative Clustering. Last used: 2023-02-14. url:
https://scikit-learn.org/stable/modules/generated/sklearn.cluster.
AgglomerativeClustering.html.
[7] Clément L Canonne. “A short note on learning discrete distributions”. In: arXiv
preprint arXiv:2002.11457 (2020).
[8] Sorana-Daniela Bolboaca and Lorentz Jäntschi. “Pearson versus Spearman, Kendall’s
tau correlation analysis on structure-activity relationships of biologic active com-
pounds”. In: Leonardo Journal of Sciences 5.9 (2006), pp. 179–200.
[9] Pauli Virtanen et al. “SciPy 1.0: Fundamental Algorithms for Scientific Computing
in Python”. In: Nature Methods 17 (2020), pp. 261–272.
[10] SciPy Developers. Kendall’s tau. Last used: 2023-02-16. url: https : / / docs .
scipy.org/doc/scipy/reference/generated/scipy.stats.kendalltau.html.
[11] Tilman Lange et al. “Stability-Based Validation of Clustering Solutions”. In: Neural
Comput. 16.6 (2004), 1299–1323.
[12] Francesco Pattarin, Sandra Paterlini, and Tommaso Minerva. “Clustering financial
time series: An application to mutual funds style analysis”. In: Computational
Statistics & Data Analysis 47 (Sept. 2004), pp. 353–372.
[13] David R. Harper. Forces That Move Stock Prices. Last updated: 2022-07. url:
https://www.investopedia.com/articles/basics/04/100804.asp.
[14] Yingfan Wang et al. “Understanding how dimension reduction tools work: an em-
pirical approach to deciphering t-SNE, UMAP, TriMAP, and PaCMAP for data vi-
sualization”. In: The Journal of Machine Learning Research 22.1 (2021), pp. 9129–
9201.
[15] Christopher Bishop. Pattern Recognition and Machine Learning. Springer, 2006.
Chap. 12.1.
[16] Laurens van der Maaten and Geoffrey Hinton. “Visualizing Data using t-SNE”. In:
Journal of Machine Learning Research 9.86 (2008), pp. 2579–2605.
71
[17] F. Pedregosa et al. “Scikit-learn: Machine Learning in Python”. In: Journal of
Machine Learning Research 12 (2011), pp. 2825–2830.
[18] Argimiro Arratia and Alejandra Cabaña. “A Graphical Tool for Describing the
Temporal Evolution of Clusters in Financial Stock Markets”. In: Comput. Econ.
41.2 (2013), 213–231.
[19] Yanchi Liu et al. “Understanding of Internal Clustering Validation Measures”. In:
2010 IEEE International Conference on Data Mining (2010), pp. 911–916.
[20] Peter J. Rousseeuw. “Silhouettes: A graphical aid to the interpretation and vali-
dation of cluster analysis”. In: Journal of Computational and Applied Mathematics
20 (1987), pp. 53–65.
[21] Scikit learn Developers. Silhouette score. Last used: 2023-02-17. url: https://
scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_
score.html.
[22] Tadeusz Caliński and Harabasz JA. “A Dendrite Method for Cluster Analysis”. In:
Communications in Statistics - Theory and Methods 3 (Jan. 1974), pp. 1–27.
[23] David Davies and Don Bouldin. “A Cluster Separation Measure”. In: IEEE Trans-
actions on Pattern Analysis and Machine Intelligence PAMI-1 (May 1979), pp. 224
–227.
[24] Friedrich Leisch. “A toolbox for k-centroids cluster analysis”. In: Computational
statistics & data analysis 51.2 (2006), pp. 526–544.
[25] Carmelo Cassisi et al. “Similarity Measures and Dimensionality Reduction Tech-
niques for Time Series Data Mining”. In: Sept. 2012.
[26] Hae-Sang Park and Chi-Hyuck Jun. “A simple and fast algorithm for K-medoids
clustering”. In: Expert Systems with Applications 36.2, Part 2 (2009), pp. 3336–
3341.
[27] Gerhard Klassen, Martha Tatusch, and Stefan Conrad. “Cluster-based stability
evaluation in time series data sets”. In: Applied Intelligence (2022), pp. 1–24.
[28] Mahmoud Mousavi Shiri, Sadegh Bafandeh Imandoust, and Mohammad Bolan-
draftar Pasikhani. “Application of K-Nearest Neighbor (KNN) for Predicting Cor-
porate Financial Distress in Tehran Stock Exchange”. In: Monetary & Financial
Economics 20.6 (2013), pp. 48–66.
[29] B. Efron and R. Tibshirani. “Bootstrap Methods for Standard Errors, Confidence
Intervals, and Other Measures of Statistical Accuracy”. In: Statistical Science 1.1
(1986), pp. 54–75.
[30] SK Lahiri and SN Lahiri. Resampling methods for dependent data. Springer Science
& Business Media, 2003, pp. 25–29.
[31] The Global Industry Classification Standard (GICS). Last used: 2023-05-07. url:
https://www.msci.com/our-solutions/indexes/gics.
[32] Fan Liu and Yong Deng. “Determine the Number of Unknown Targets in Open
World Based on Elbow Method”. In: IEEE Transactions on Fuzzy Systems 29.5
(2021), pp. 986–995.
72
[33] The pandas development team. pandas.DataFrame.boxplot. Feb. 2020. url: https:
//pandas.pydata.org/docs/reference/api/pandas.DataFrame.boxplot.
html.
[34] Öhman Fonder. Öhman Global Growth. Last used: 2023-05-10. url: https : / /
www.ohman.se/fonder/fond/ohman-global-growth/.
[35] Bartolomeu Zamprogno et al. “Principal component analysis with autocorrelated
data”. In: Journal of Statistical Computation and Simulation 90.12 (2020), pp. 2117–
2135.
[36] Erik Vanhatalo and Murat Kulahci. “Impact of autocorrelation on principal com-
ponents and their use in statistical process control”. In: Quality and Reliability
Engineering International 32.4 (2016), pp. 1483–1500.
[37] Fabrice Daniel. “Financial time series data processing for machine learning”. In:
arXiv preprint arXiv:1907.03010 (2019).
[38] Hae-Sang Park and Chi-Hyuck Jun. “A simple and fast algorithm for K-medoids
clustering”. In: Expert Systems with Applications 36.2, Part 2 (2009), pp. 3336–
3341.
73
10 Appendix A
10.1 Internal validation measures of bootstrapped time series
10.1.1 Agglomerative clustering using Kendall’s tau
Mutual fund data set
74
Stock data set
75
Figure 48: Calinski-Harabasz index of bootstrapped time series
76
Figure 52: Davies-Bouldin index of bootstrapped time series
77
Stock data set
78
Figure 60: Calinski-Harabasz index of bootstrapped time series
79
Figure 64: Davies-Bouldin index of bootstrapped time series
11 Appendix B
11.1 Process description for performing clustering of financial
time series
This section will be provided as a suggestion regarding the process of clustering financial
time series. The suggested course of action is designed using the results and conclusions
of this thesis, and it by no means a definitive guide. For this reason, it is important to
keep the specific properties of the time series data in mind, and remain critical of the
clustering result. As the conclusions of this thesis suggests, it is difficult to quantify
which clustering method that produces the optimal results. A qualitative evaluation
of the clustering results themselves is thus a necessity, once a few candidate clustering
methods have been suggested.
As is normally done when performing many kinds of machine learning, the data is nor-
malized. In the case of financial time series, this can be done by calculating the returns
of the price series of each asset/instrument [37]. In order to ensure that the mean of the
return series is zero, each observation in the entire return series is subtracted with the
mean value.
80
11.1.2 Model selection
This part of the clustering process is essential in order to find the clustering method that
performs optimally when clustering the financial time series in the data set. I recom-
mend selecting a few different clustering algorithms in order to get a nuanced comparison
between the choices available. The clustering algorithms tested in this thesis seemingly
performed adequately on the mutual funds and stocks, it is however recommended to
explore the options available. One clustering method that might be of interest in further
experiments is K-medioids[38]. This method is similar to K-means clustering, but uses
medioids of the clusters instead of centroids. As discussed in Section 5.2.2 in the para-
graph labeled Modifications for non-Euclidean distances between time series,
this facilitates the use of non-Euclidean distance measures. The distance measures that
will be tested when performing model selection will also need to be determined in this
stage. The choice will depend heavily on the use case and the purpose of performing
the clustering in the first place. The results in this thesis indicate however that using
the Kendall’s tau distance as described in Section 3.3.2 results in finer, more numerous
clusters compared to the results when using the Hellinger distance. In this particular
case, Kendall’s was determined to be able to capture more complex patterns in the data,
and was thus the more informative option.
Once the clustering methods that will be used as candidates have been selected, tuning
the parameters for optimal performance when clustering the data set is performed. As
shown in the results section of this report, this can be done by using some quantita-
tive score that outputs a graph that can be interpreted in order to find the optimal
parameter value. For financial time series, the only metric that seemed to provide easily
interpretable results were the CLOSE score. For this reason, this is the metric that is
recommended to use in order to ensure that the clustering method that receives the
highest score outputs partitions that are both stable in time as well as contains clusters
of quality. In general, one of the most significant conclusions of this thesis is that the
evaluation metrics used needs to be tailored according to both the data and the use
case. For this reason, further exploring validation methods specifically made for clusters
of time series is recommended.
Once a few promising clustering method candidates have been singled out, further vali-
dation can be done in order to select the method that is most useful for the particular
use case at hand. Here, cluster visualization methods such as the cluster evolution plot
described in Section 5.2.1 can prove useful since they provide an overview of the results
without showing too much detail. The results of this thesis also indicate that perform-
ing a qualitative analysis of the clustering results produced by the different clustering
methods can give insight into which method produces the most useful results. As dis-
cussed in this thesis, this approach requires considerably more domain knowledge than
simply applying a quantitative score of some kind to the clustering results. This method
of evaluation is important however, since the usefulness of a clustering method is dif-
ficult to quantify will depend on the use case specified by the performer of the clustering.
Once a clustering method has been selected and is being used, it is also important to
consistently reevaluate the performance of the algorithm. Due to the nature of financial
time series, a clustering method that produced useful results a few months ago may
81
perform worse now. This is one of the reasons why the CLOSE score is useful, since
it effectively quantifies how stable over time a clustering method has been in the past.
This might not be a guarantee that the method will remain stable in the future however,
and a regular reevaluation is appropriate in order to ensure optimal performance.
82