Professional Documents
Culture Documents
Fuzzy Clustering Based Segmentation of Time-Series
Fuzzy Clustering Based Segmentation of Time-Series
net/publication/221461066
CITATIONS READS
47 855
4 authors, including:
Some of the authors of this publication are also working on these related projects:
Industry 4.0 - Control, monitoring and optimization of production processes View project
All content following this page was uploaded by János Abonyi on 28 May 2014.
1 Introduction
A sequence of N observed data, x1 , . . . , xN , ordered in time, is called a time-
series. Real-life time-series can be taken from business, physical, social and be-
havioral science, economics, engineering, etc. Time-series segmentation addresses
the following data mining problem: given a time-series T , find a partitioning of
T to c segments that are internally homogeneous [1]. Depending on the appli-
cation, the goal of the segmentation is used to locate stable periods of time, to
identify change points, or to simply compress the original time-series into a more
compact representation [2]. Although in many real-life applications a lot of vari-
ables must be simultaneously tracked and monitored, most of the segmentation
algorithms are used for the analysis of only one time-variant variable [3].
The aim of this paper is to develop an algorithm that is able to handle
time-varying characteristics of multivariate data: (i) changes in the mean; (ii)
changes in the variance; and (iii) changes in the correlation structure among vari-
ables. Multivariate statistical tools, like Principal Component Analysis (PCA)
can be applied to discover such information [4]. PCA maps the data points into
a lower dimensional space, which is useful in the analysis and visualization of the
correlated high-dimensional data [5]. Linear PCA models have two particularly
desirable features: they can be understood in great detail and they are straight-
forward to implement. Linear PCA can give good prediction results for simple
time-series, but can fail to the analysis of historical data with changes in regime
M.R. Berthold et al. (Eds.): IDA 2003, LNCS 2810, pp. 275–285, 2003.
c Springer-Verlag Berlin Heidelberg 2003
276 J. Abonyi et al.
or when the relations among the variables are nonlinear. The analysis of such
data requires the detection of locally correlated clusters [6]. The proposed algo-
rithm does the clustering of the data to discover the local relationships among
the variables similarly to mixture of Principal Component Models [4].
The changes of the variables of the time-series are usually vague and do not
focused on any particular time point. Therefore, it is not practical to define
crisp bounds of the segments. For example, if humans visually analyze historical
process data, they use expressions like ”this point belongs to this operating
point less and belongs to the another more”. A good example of this kind of
fuzzy segmentation is how fuzzily the start and the end of early morning is
defined. Fuzzy logic is widely used in various applications where the grouping
of overlapping and vague elements is necessary [7], and there are many fruitful
examples in the literature for the combination of fuzzy logic with time-series
analysis tools [8,2,9,10,16].
Time-series segmentation may be viewed as clustering, but with a time-
ordered structure. The contribution of this paper is the introduction of a new
fuzzy clustering algorithm which can be effectively used to segment large, multi-
variate time-series. Since the points in a cluster must come from successive time
points, the time-coordinate of the data should be also considered. One way to
deal with time is to use time as an additional variable and cluster the observations
using a distance between cases which incorporates time. Hence, the clustering is
based on a distance measure that contains two terms: the first distance term is
based on how the data is in the given segment according to Gaussian fuzzy sets
defined in the time domain, while the second term measures how far the data
is from hyperplane of the PCA model used to measure the homogeneity of the
segments.
The remainder of this paper is organized as follows. Section 2 defines the time-
segmentation problem and presents the proposed clustering algorithm. Section 3
contains a computational example using historical process data collected during
the production of high-density polyethylene. Finally, conclusions are given in
Section 4.
data points should be grouped by their similarity, but with the constraint that all
points in a cluster must come from successive time points. In order to formalize
this goal, a cost function cost(S(a, b)) with the internal homogeneity of individual
segments should be defined. The cost function can be any arbitrary function.
For example in [1,11] the sum of variances of the variables in the segment was
defined as cost(S(a, b)). Usually, the cost function cost(S(a, b)) is defined based
on the distances between the actual values of the time-series and the values
given by the a simple function (constant or linear function, or a polynomial of
a higher but limited degree) fitted to the data of each segment. The optimal
c-segmentation simultaneously determines the ai , bi borders of the segments and
the θi parameter vectors of the models of the segments. Hence, the segmentation
algorithms minimize the cost of c-segmentation which is usually the sum of the
costs of the individual segments:
c
cost(STc ) = cost(Si ) . (1)
i=1
The (1) cost function can be minimized by dynamic programming to find the
segments (e.g. [11]). Unfortunately, dynamic programming is computationally
intractable for many real data sets. Consequently, heuristic optimization tech-
niques such as greedy top-down or bottom-up techniques are frequently used to
find good but suboptimal c-segmentations:
Sliding window: A segment is grown until it exceeds some error bound. The
process repeats with the next data point not included in the newly approxi-
mated segment. For example a linear model is fitted on the observed period
and the modelling error is analyzed [12].
Top-down method: The time-series is recursively partitioned until some stop-
ping criteria is met. [12].
Bottom-up method: Starting from the finest possible approximation, seg-
ments are merged until some stopping criteria is met [12].
Search for inflection points: Searching for primitive episodes located be-
tween two inflection points [5].
In the following subsection a different approach will be presented based on
the clustering of the data.
2
where Di,k (vix , xk ) represents the distance how far the vix = mean of the variables
in the i-th segment (center of the i-th cluster) is from the xk data point; and
278 J. Abonyi et al.
βi (tk ) = {0, 1} stands for the crisp membership of the k-th data point is in the
i-th segment:
1 if si−1 < k ≤ si
βi (tk ) = (3)
0 otherwise
This equation is well comparable to the typical error measure of standard k-
means clustering but in this case the clusters are limited to being contiguous
segments of the time-series instead of Voronoi regions in Rn .
The changes of the variables of the time-series are usually vague and do
not focused on any particular time point. As it is not practical to define crisp
bounds of the segments, Gaussian membership functions are used to represent
the βi (tk ) = [0, 1] fuzzy segments of time-series Ai (tk ), which choice leads to
the following compact formula for the normalized degree of fulfillment of the
membership degrees of the k-th observation is in the i-th segment:
2
1 (tk − vit ) Ai (tk )
Ai (tk ) = exp − , βi (tk ) =
c (4)
2 σi2
Aj (tk )
j=1
For the identification of the vit centers and σi2 variances of the membership
functions, a fuzzy clustering algorithm is introduced. The algorithm, which is
similar to the modified Gath-Geva clustering [13], assumes that the data can
be effectively modelled as a mixture multivariate (including time as a variable)
Gaussian distribution, so it minimizes the sum of the weighted squared distances
between the zk = [tk , xTk ]T data points and the ηi cluster prototypes
c
N c
N
m m
J= (µi,k ) 2
Di,k (ηi , zk ) = 2
(µi,k ) Di,k (vit , tk )Di,k
2
(vix , xk ) (5)
i=1 k=1 i=1 k=1
where µi,k represents the degree of membership of the observation zk = [tk , xTk ]T
is in the i-th cluster (i = 1, . . . , c), m ∈ [1, ∞) is a weighting exponent that
determines the fuzziness of the resulting clusters (usually chosen as m = 2).
2
As can be seen, the clustering is based on a Di,k (ηi , zk ) distance measure,
2 t
which consists of two terms. The first one, Di,k (vi , tk ), is the distance between
the k-the data point and the vit center of the i-th segment in time
2 t 1 1 (tk − vit )2
1 Di,k (vi , tk ) = exp − , (6)
2πσi2 2 σi2
where the center and the standard deviation of the Gaussian function are
N
m
N
m 2
(µi,k ) tk (µi,k ) (tk − vkt )
k=1 k=1
vit = , σi2 = . (7)
N
m
N
m
(µi,k ) (µi,k )
k=1 k=1
Fuzzy Clustering Based Segmentation of Time-Series 279
The second term represents the distance between the cluster prototype and the
data in the feature space
2 x αi det(Ai ) 1 T
1 Di,k (vi , xk ) = r/2
exp − (xk − vix ) (Ai ) (xk − vix ) , (8)
(2π) 2
where αi represents the a priori probability of the cluster and vix the coordinate
of the i-th cluster center in the feature space,
N
m
N (µi,k ) xk
1 k=1
αi = µi,k , vix = , (9)
N
N
m
k=1 (µi,k )
k=1
and r is the rank of Ai distance norm corresponding to the i-th cluster. The Ai
distance norm can be defined in many ways. It is wise to scale the variables so
that those with greater variability do not dominate the clustering. One can scale
by dividing by standard deviations, but a better procedure is to use statistical
(Mahalanobis) distance, which also adjusts for the correlations among the vari-
ables. In this case Ai is the inverse of the fuzzy covariance matrix Ai = F−1 i ,
where
N
m T
(µi,k ) (xk − vix ) (xk − vix )
Fi = k=1 . (10)
N
m
(µi,k )
k=1
When the variables are highly correlated, the Fi covariance matrix can be ill -
conditioned and cannot be inverted. Recently two methods have been worked
out to handle this problem [14]. The first method is based on fixing the ratio be-
tween the maximal and minimal eigenvalue of the covariance matrix. The second
method is based on adding a scaled unity matrix to the calculated covariance
matrix. Both methods result in invertible matrices, but none of them extract the
potential information about the hidden structure of the data.
Principal Component Analysis (PCA) is based on the projection of correlated
high-dimensional data onto a hyperplane which is useful for the visualization
and the analysis of the data. This mapping uses only the first few p nonzero
eigenvalues and the corresponding eigenvectors of the Fi = Ui Λi UTi , covariance
matrix, decomposed to the Λi matrix that includes the eigenvalues of Fi in its
diagonal in decreasing order, and to the Ui matrix that includes the eigenvectors
−1
corresponding to the eigenvalues in its columns, yi,k = Λi,p2 UTi,p xk ,
When the hyperplane of the PCA model has adequate number of dimensions,
the distance of the data from the hyperplane is resulted by measurement failures,
disturbances and negligible information, so the projection of the data into this
p-dimensional hyperplane does not cause significant reconstruction error
Qi,k = (xk − x̂k )T (xk − x̂k ) = xTk (I − Ui,p UTi,p )xk . (11)
280 J. Abonyi et al.
The main idea of this paper is the usage of these measures as measure of the
homogeneity of the segments:
Fuzzy-PCA-Q method: The distance measure is based on the Q model error.
The Ai distance norm is the inverse of the matrix constructed from the n−p
smallest eigenvalues and eigenvectors of Fi .
A−1
i = Ui,n−p Λi,n−p UTi,n−p , (13)
A−1
i = Ui,p Λi,p UTi,p , (14)
The most popular method to solve this problem is the alternating optimization
(AO), which consists of the application of Picard iteration through the first-order
conditions for the stationary points of (5), which can be found by adjoining the
constraints (15) to J by means of Lagrange multipliers [15]
c N N
c
m
J(Z, U, η, λ) = 2
(µi,k ) Di,k (zk , ηi ) + λi µi,k − 1 (16)
i=1 k=1 k=1 i=1
1 0.5
PE (t/h)
C (w%)
0.5
2
0
0 20 40 60 80 100 120 0
1
(kg/h)
0 20 40 60 80 100 120
1
0.5
C (w%)
6in
C
0 0.5
6
0 20 40 60 80 100 120
1
(t/h)
0
0.5 0 20 40 60 80 100 120
2in
0.6
H2 (mol%)
C
0
0 20 40 60 80 100 120 0.55
0.4
(kg/h)
0.2 0.5
0 20 40 60 80 100 120
2in
1
slurry (g/cm3)
H
0
0 20 40 60 80 100 120
1
0.5
IB (t/h)
0.5
in
0
0 0 20 40 60 80 100 120
0 20 40 60 80 100 120 0.4
0.5
T (oC)
in
0.3
Kat
0 0.2
0 20 40 60 80 100 120 0 20 40 60 80 100 120
Time [h] Time [h]
4
x 10
10 1
9 0.9
8 0.8
7 0.7
6 0.6
5 0.5
4 0.4
3 0.3
2 0.2
1 0.1
0 0
0 20 40 60 80 100 120 0 20 40 60 80 100 120
Time [h] Time [h]
Fig. 3. Fuzzy-PCA-Q segmentation: (a) 1/Di,k (vix , xk ) distances, (b) fuzzy time-series
segments, βi (tk )
Fuzzy Clustering Based Segmentation of Time-Series 283
120 1
0.9
100
0.8
0.7
80
0.6
60 0.5
0.4
40
0.3
0.2
20
0.1
0 0
0 20 40 60 80 100 120 0 20 40 60 80 100 120
Time [h] Time [h]
Fig. 4. Fuzzy-PCA-T 2 segmentation: (a) 1/Di,k (vix , xk ) distances, (b) fuzzy time-series
segments, βi (tk )
The distance between the data points and the cluster centers can be com-
puted by two methods corresponding to the two distance measures of PCA: the
Q reconstruction error and the Hotelling T 2 . These methods give different results
which can be used for different purposes.
The Fuzzy-PCA-Q method is sensitive to the number of the principal com-
ponents. With the increase of p the reconstruction error decreases. In case of
p = n = 11 the reconstruction error becomes zero in the whole range of the
data set. If p is too small, the reconstruction error will be large for the entire
time-series. In these two extreme cases the segmentation does not based on the
internal relationships among the variables, so equidistant segments are detected.
When the number of the latent variables is in the the range p = 3, ..., 8, reliable
segments are detected, the algorithm finds the grade transition about the 25th
hours Figure 3).
The Fuzzy-PCA-T 2 method is also sensitive to the number of principal com-
ponents. The increase of the model dimension does not obviously decreases the
Hotelling T 2 , because the volume of the fuzzy covariance matrix (the product of
the eigenvalues) tends to zero (see (8)). Hence, it is practical to use mixture of
PCA models with similar performance capacity. This can be achieved by defining
p based on the desired accuracy (loss of variance) of the PCA models:
p−1
n
p
n
diag (Λi,f )/ diag (Λi,f ) < q ≤ diag (Λi,f )/ diag (Λi,f ) (18)
f =1 f =1 f =1 f =1
As Figure 4 shows, when q = 0.99 the algorithm finds also correct segmentation
that is highly reflects the ”drift” of the operating regions (Figure 3).
284 J. Abonyi et al.
4 Conclusions
This paper presented a new clustering algorithm for the fuzzy segmentation of
multivariate time-series. The algorithm is based on the simultaneous identifica-
tion of fuzzy sets which represent the segments in time and the hyperplanes of
local PCA models used to measure the homogeneity of the segments. Two homo-
geneity measures corresponding to the two typical application of PCA models
have been defined. The Q reconstruction error segments the time-series accord-
ing to the change of the correlation among the variables, while the Hotelling T 2
measure segments the time-series based on the drift of the center of the operat-
ing region. The algorithm described here was applied to the monitoring of the
production of high-density polyethylene. When the Q reconstruction error was
used during the clustering, good performance was achieved when the number of
the principal components were defined as the parameters of the clusters. Con-
trary, when the clustering is based on the T 2 distances from the cluster centers,
it was advantageous to define the desired reconstruction performance to obtain
a good segmentation. The results suggest that the proposed tool can be applied
to distinguish typical operational conditions and analyze product grade tran-
sitions. We plant to use the results of the segmentations to design intelligent
query system for typical operating conditions in multivariate historical process
databases.
References
1. K. Vasko, H. Toivonen, Estimating the number of segments in time series data
using permutation tests, IEEE International Conference on Data Mining (2002)
466–473.
2. M. Last, Y. Klein, A. Kandel, Knowledge discovery in time series databases, IEEE
Transactions on Systems, Man, and Cybernetics 31 (1) (2000) 160–169.
3. S. Kivikunnas, Overview of process trend analysis methods and applications, ERU-
DIT Workshop on Applications in Pulp and Paper Industry (1998) CD ROM.
4. M. E. Tipping, C. M. Bishop, Mixtures of probabilistic principal components anal-
ysis, Neural Computation 11 (1999) 443–482.
5. G. Stephanopoulos, C. Han, Intelligent systems in process engineering: A review,
Comput. Chem. Engng. 20 (1996) 743–791.
6. K. Chakrabarti, S. Mehrotra, Local dimensionality reduction: A new approach to
indexing high dimensional spaces, Proceedings of the 26th VLDB Conference Cairo
Egypt (2000) P089.
Fuzzy Clustering Based Segmentation of Time-Series 285