Professional Documents
Culture Documents
Time Series Mining by Fuzzy Natural Logic and F-Transform: March 2015
Time Series Mining by Fuzzy Natural Logic and F-Transform: March 2015
Time Series Mining by Fuzzy Natural Logic and F-Transform: March 2015
net/publication/283580939
CITATIONS READS
4 85
2 authors:
Some of the authors of this publication are also working on these related projects:
I am now developing a fuzzy type theory with partially defined functions. View project
All content following this page was uploaded by Vilem Novak on 09 December 2015.
†). For precise formulation of the interpretation of formulas (7) ‡). By abuse of language, we call by direct as well as inverse F-
and (8) see the cited references. The evaluation is realized w.r.t. the transform both the procedure as well as its respective results F[f ] =
standard context w̄ = h0, 0.5, 1i (F0 [f ], . . . , Fn [f ]) and fˆ.
Remark 1 (important)
(b) The F-transform is linear, i.e., if f = αu + βv then
It should be noted that only the nodes c1 , . . . , cn−1
fˆ = αû + βv̂.
should be considered when dealing with the F-transform
All the details and full proofs can be found in [17], and the edge nodes c0 , cn should be omitted. The reason
[18]. is that the areas [c0 , c1 ] and [cn−1 , cn ] are covered by
halves of the basic functions A0 , An , respectively and
3.3. Higher degree F-transform so, the approximation of f in these areas is subject to
too large error. Hence, we should consider the function
The F-transform introduced above is F0 -transform fˆ on the interval [c1 , cn−1 ] only.
(i.e., zero-degree F-transform). Its components are real
numbers. If we replace them by polynomials of arbi- 4. Time series and F-transform
trary degree m ≥ 0, we arrive at the higher degree
Fm transform. This generalization has been in detail A time series is a stochastic process (see [21], [22])
described in [18]. Let us remark that the F1 transform X : Q × Ω −→ R where Ω is a set of elementary
enables to estimate also derivatives of the given func- random events and Q = {0, . . . , p} ⊂ N is a finite
tion f as weighted average values over a specified area. set whose elements are interpreted as time moments.
The direct F 1 -transform of f with respect Our basic assumption is that the time series can be
to A1 , . . . , An−1 is a vector F 1 [f ] = decomposed as follows:
(F11 [f ], . . . , Fn−1
1
[f ]) where the components Fk1 [f ], r
X
k = 1, . . . , n − 1 are linear functions X(t, ω) = TC (t) + Pj ei(λj t+ϕj ) + R(t, ω),
j=1
Fk1 [f ](x) = βk0 + βk1 (x − ck ) (14)
t ∈ Q, ω ∈ Ω, (20)
with the coefficients βk0 , βk1 given by
R ck+1 where Q is a time domain (usually a finite set), Ω is a
c
f (x)Ak (x)dx set of elementary random events, TC (t) is a trend-cycle
0
βk = k−1 R ck+1 , (15) and R(t, ω) is a random noise†) . The middle term is a
ck−1
Ak (x)dx
R xk+1 mixture of complex periodic functions for some finite
x
f (x)(x − ck )Ak (x)dx r where λj are frequencies, ϕj phase shifts and Pj are
1
βk = Rk−1 ck+1 . (16) amplitudes.
ck−1
(x − ck )2 Ak (x)dx
Let us now assume (without loss of generality) that
Note that βk0 = Fk [f ], i.e. the coefficients βk0 are just the frequencies in (20) are ordered as λ1 ≤ · · · ≤ λr .
the components of the F0 transform given in (13). These frequencies correspond to periodicities
The F1 transform has also the properties stated in
T1 ≥ · · · ≥ Tr (21)
Theorem 1 (see [18]).
We will also use the F2 transform. Its components (via the equality T = 2π/λ). Now choose some q < r
are the functions and define a bounded time series
h2
q−1
2 0 1 2 2
Fk [f ](x) = βk + βk (x − ck ) + βk (x − ck ) −
X
6 X̄(t) = TC (t) + Pj ei(λj t+ϕj ) . (22)
j=1
(provided that the basic functions are triangles).
The following theorem demonstrates the power of
Theorem 2 the F-transform for time series analysis.
If f is four-times continuously differentiable on [a, b]
Theorem 3
then for each k = 1, . . . , n − 1,
Let X(t) be realization of the stochastic process in (20)
βk0 = f (ck ) + O(h2 ), (17) considered over the interval [a, b]. If we construct a
βk1 0
= f (ck ) + O(h ).2
(18) fuzzy partition over the set of equidistant nodes (11)
00 with the distance h = d Tq where d ∈ N and Tq is a
f (ck )
βk2 = + O(h2 ). (19) periodicity corresponding to λq then the corresponding
2 inverse F-transform X̂ of X(t) gives the following
Thus, the F-transform components provide a weighted estimation of the bounded time series X̄(t):
average of values of the function f in the area around
the node ck (17), and also a weighted average of slopes |X̂(t) − X̄(t)| ≤ 2ω(h, X̄) + D (23)
(18) of f and that of its second derivatives (19) in the †). Each R(t) for t ∈ Q is a random variable with the zero mean
same area. value finite variance.
Pn−1
for t ∈ [c1 , cn−1 ], where D for d ≥ 2 is a certain small k=1 βk1 is minimal. This assures that we cover well
number and ω(h, X̄) is a modulus of continuity of X̄ areas where the slope of time series is (close to) zero.
w.r.t. h. Finally, we replace the original time series X by the
reduced one
The precise form of D in (23) and detailed proof of
this theorem can be found in [19]. The theorem holds X [h] = {β10 , . . . , βn−1
0
}. (25)
both for F0 as well as for F1 -transform.
This new time series has the following properties:
The periodicity Tq can be found using the well
known periodogram — see [22]. Then, by setting a 1) Its dimension n − 1 is significantly lower than
proper fuzzy partition, we first compute the F-transform the original dimension p. Of course, the reduction
of X(t) (either zero or first degree) depends on the choice of the distance h between
the nodes determining the fuzzy partition.
F[X] = (F1 [X], . . . , Fn [X]). 2) By Theorem 1, it fits well the shape of the original
Then the estimation of the bounded time series X̄ is time series X and keeps all its salient points.
obtained using the inverse F-transform: X̄(t) ≈ X̂(t). 3) Due to Theorem 3, it is free of all frequencies
Thus, the F-transform enables to filter out frequencies higher than λq and variance of its noise is lower
higher than a given threshold (including reduction of than variance of the noise R(t, ω) of the original
the noise). time series X.
Theorem 3 can be applied in several ways. First of Demonstration of dimensionality reduction of a time
all, by setting j = 1, we obtain X̂ ≈ TC which means series is in Fig. 3. The original time series†) has length
that we can estimate the trend cycle TC with high p = 188. The reduced time series has length n − 1 =
precision. Further applications are described below. 62. The distance between nodes is set to h = 3 on
the basis of the lowest periodicity T = 2.4 found by
5. Mining information periodogram.
This section contains the main contribution of this 5.2. Perceptionally important points
paper. We will show that both mentioned techniques
can be effectively used for mining information from Interesting problem also discussed in [1] is recog-
time series. nition of perceptionally important (salient) points. The
paper mentions several techniques based on special cri-
5.1. Reduction of dimensionality teria that can be summarized in the following definition:
perceptionally important point is a point where the time
One of the first problems mentioned in the paper [1] series essentially changes its course. Because of the
is reduction of dimensionality of long time series. The complicated character following from the presence of
paper cites several methods, mostly based on segmen- various frequencies and noise, we cannot expect that
tation of the time series into intervals and taking some this is just one isolated time point but better a certain
representative values (usually averages). In our opinion, area.
a more effective tool can be based on Theorem 3 for Since the F-transform provides estimation of the first
the following reasons: (a) reduction of the dimension and second derivative in a specified area we propose
is accompanied by reduction of the noise and filtering to use it also for finding the above points. A possible
out some of the high frequencies contained in it; (b) the algorithm is the following:
reduced time series can be anytime recovered to a time (i) Determine a fuzzy partition with the distance
series of the original dimension but still with reduced between nodes set to h = dTk for some d ∈ N
noise and free of some high frequencies. and k < q. Find position P of the nodes (11) in
n−1 1
The reduction procedure is as follows. First we such a way that the sum k=1 βk is minimal.
choose some boundary periodicity Tq from (21). We 2
Compute the F -transform of X.
will usually set either q = r or take some q close to r. (ii) Find all nodes ck with the highest |βk2 |. Mark such
Then we choose the distance between nodes to h = dTq ks as potentially important points.
(h ∈ N) and form a fuzzy partition (10). Finally, we
compute the F1 -transform of X Note however that not all potentially important
points may be indeed important. Their importance
F1 [X] = (F11 [X], . . . , Fn−1
1
[X]). (24) also depends on the width of the corresponding
To find the optimal fuzzy partition we will choose a †). The data are monthly numbers of slaughtered pigs in Victoria
position of the nodes (11) in such a way that the sum in the years 1980-1995.
(a)
(b)
Fig. 3. Demonstration of dimensionality reduction of a time series using F-transform with h = 3. Graph (a)
is the original time series (consisting of 186 points) and its inverse F0 -transform. Graph (b) is the reduced
time series consisting of only the components β10 , . . . , β62
0
.
basic function Ak . But this is relative to the length (ii) We can apply the learned linguistic description
0 0
of h. to forecast future components (βn+1 , . . . , βn+k ).
(iii) Check areas around the potentially important This is especially interesting when we character-
points with several (at least two) components ize behavior of the trend-cycle TC and try to
βk1 , βk+1
1
,. . . having the same sign followed by one forecast its future (unknown) development. This
or more components βj1 that have opposite sign can be done using the perception-based logical
and/or are sufficiently large. deduction.
An application of this algorithm to the Pigs time Recall that the trend-cycle TC can be obtained
series is in Fig. 4. from the time series on the basis of Theorem 3 when
choosing h = dT1 that is, the longest periodicity
5.3. Linguistic characterization of subsequent forming the seasonal component S in (??). Example
behavior of separation of the trend is in Fig. 5. The figure also
contains forecast of TC on the basis of the following
Let us consider a reduced time series X [h] given in learned linguistic description:
(25). Using the learning method developed in FNL,
we can learn a linguistic description characterizing Rule βk0 ∆βk0 ⇒ 0
βk+1
behavior of X [h] in various time moments on the basis 1 ex bi ra me qr bi
of its values in previous time moments. In other words, 2 ro bi -ml me qr bi
we can learn how to forecast future components βk0 on 3 ro bi -ex sm vr sm
the basis of their development in the past. As antecedent 4 ze -ex bi vr sm
variables, we consider the F0 -transform components 5 si sm si sm ra me
(17) as well as their first- and second-order differences: 6 ty me ra me ra me
7 qr sm -ml me ra me
∆βi0 = βi0 − βi−1
0
, i = 1, . . . , n − 1 8 ra me qr sm ml me
∆2 βi0 = ∆βi0 − 0
∆βi−1 , i = 2, . . . , n − 1 (the used shorts: sm-small, me-medium, bi-big,
respectively. To obtain the best fitting linguistic descrip- ex-extremely, ro-roughly, qr-quite roughly, vr-very
tion, we may define a validation set of time points and roughly, ra-rather, ty-typically, si-significantly, ml-more
test all possible combinations of linguistic description or less). Note that the last 12 real values depicted in
w.r.t. validation set so that the best combination can be the figure were not used in the computation but only
chosen. for comparison of the quality of prediction. The lower
The outcome of this procedure is twofold: part of Fig. 5 contains detail of the last 12 time points
(i) The automatically learned linguistic description that contains the following: (a) the real trend-cycle
gives information about behavior of the time TC computed using F-transform from the real data
series in a given time period. is depicted together with its forecast obtained from
Fig. 4. Demonstration of perceptionally important points in the Pigs time series. The points with highest
values of |βk2 | are marked by thick vertical lines. They evidently correspond to areas of more radical change.
Fig. 5. Demonstration of estimation of the trend-cycle TC of a given time series using F-transform. The
figure contains the used fuzzy partition and in the red (last) area both computed as well as forecasted TC ,
and also forecast of the whole time series for the next 12 time moments. Below is detail of the forecast.
the above linguistic description using the perception- task even when watching the graph (cf., e.g., the course
based logical deduction, (b) the real time series and its of the time series from Fig. 5.). Since F1 transform
forecasted obtained by combination of the forecast of provides estimation of the average slope (tangent), it is
TC and forecast of the remaining seasonal component a convenient tool for estimation of the trend. To make
S ‡) (cf. (??)). it more informative for managers, it seems better to
characterize it using natural language. For example, we
5.4. Linguistic evaluation of local trend can say “fairly large decrease (huge increase) of trend”,
“the trend is stagnating (negligibly increasing)”, etc.
Recall that the trend-cycle TC of a time series These expressions characterize trend (tendency) of the
X is the component that represents variations of low time series in an area specified by the user. In [23],
frequency in a time series, the high frequency fluc- a method for automatic generation of such linguistic
tuations having been filtered out. Thus, TC has two evaluation of trend was described.
subcomponents, namely trend (tendency) and cycle. Algorithm:
Trend is a general direction of the time series, namely
whether it is increasing, stagnating, or decreasing. The (i) Specify, what does it mean “extreme increase (de-
trend is, however, usually bound to a certain time crease)”. In practice, it can be determined as the
interval and then changes. This means that besides largest acceptable difference of time series values
trend there is also a certain cyclic characteristic that with respect to a given (basic) time interval (for
together with local trends forms the trend-cycle TC . example 12 months, 31 days) that is, a minimal
and maximal tangent. In practice, we set only
If a certain time interval is given, it may be interest-
the largest tangent vR while the smallest one is
ing to learn what kind of trend can be recognized in it.
usually vL = 0. The typical medium value vS is
Surprisingly, recognition of trend may not be a trivial
determined analogously as vR . The result is the
‡). Its forecast was obtained using other means not discussed in context wtg = hvL , vS , vR i.
this paper. and a substantiated formalization was developed
in a series of papers [8], [9], [10], [15]. (ii) Define basic functions Ai over each t̄i and com-
(ii) Specify an interesting time interval I ⊂ Q. pute the coefficients βi1 (16), i = 1, . . . , m.
Furthermore, compute a basic function A whose (iii) Using the function (26), generate linguistic ex-
base is I (cf. Subsection 3.1) and compute the pressions Ai evaluating the slope βi1 in the inter-
coefficient β 1 using formula (16). val t̄i .
(iii) Generate a linguistic evaluation of the trend of the (iv) Compute the measure (9) by counting the number
time series X in the area characterized by A with s of intervals t̄i , in which the slope of time series
respect to the context wtg . The required evaluative was evaluated in the sense of (iii) and evaluate
expression A will be obtained using the function the number s/m.
of local perception (21):
5.5.2. A set of time series. Let a set {Xi | i =
A = LPerc(β 1 , wtg ). (26) 1, . . . , s} of time series (20) be given. Using the theory
Example of linguistic evaluation of local trend in of intermediate quantifiers, we can model the meaning
the context wtg = h0, 1200/12, 3000/12i is in Fig. 6: of sentences, such as
negligible decrease in the time interval [17 − 44], • Most (many, few) analyzed time series stagnated
stagnating in [50 − 97], clear decrease in [104 − 114], recently but their future trend is slightly increas-
somewhat decrease in [116 − 126]. Trend of the whole ing.
time series is stagnating†) . • There is an evidence of huge (slight, clear) de-
crease of trend of almost all time series in the
5.5. Summarization recent quarter of the year.
Another possibility is to mine interesting informa-
Summarization of knowledge about time series be- tion from the given set of time series, summarize
longs among interesting tasks that was solved by sev- their properties and summarize also their possible fu-
eral authors (see, e.g., [24], [25]). In this paper, we ture development. Namely, we start with analysis and
suggest to apply the sophisticated formal theory of in- forecasting of all the time series. Then we generate
termediate quantifiers developed in fuzzy natural logic comments to interesting time slots, or we can also
(see Subsection 2.5). We can summarize information determine time slots in which behavior of the time
within one time series or over a set of time series. series is interesting for us, for example, “in which
period was the time series sharply increasing”, “how
5.5.1. One time series. Example of possible informa- long was the time series stagnating or decreasing before
tion mined from one time series: sharp increase”, etc. Finally, we can summarize the
• In most (many, few) cases, the time series was
results using intermediate quantifiers and derive further
stagnating (slightly increasing, decreasing). properties on the basis of valid syllogisms.
Moreover, we can also apply syllogistic reasoning
This can be formalized as (Q∀Bi Ve t̄i )(Stagn(t̄i ))‡) with such expressions, for example
where t̄i is a certain time interval.
• In almost all (most, many, few) cases, if the time
In few cases the increase of time series is not small
series is increasing (decreasing) then the increase
(decrease) is very slight (sharp, clear). In many cases the increase of time series is clear
This can be formalized as In few cases the clear increase of time series
is not small
(Q∀Bi Ve t̄i )(Increase(t̄i ), Sm Ve(Increase(t̄i )))
It is important to note that the latter is example of the
(cf. (7)). valid generalized Aristotle’s syllogism. Such syllogism
is true in all situations (models). Recall that in [9],
Algorithm: [10], validity of over 120 generalized syllogism with
(i) Divide a time series into intervals t̄1 , . . . , t̄m ⊂ Q. intermediate quantifiers was proven.
Their length may be naturally set, for example to
30 (month), 7 (week), etc. These intervals may 6. Conclusion
(but need not) form a partition of Q.
In this paper, we focused on the problem of mining
†). The results were obtained using experimental software LFL information from time series. There are many methods
Forecaster (see http://irafm.osu.cz/en/c110 lfl-forecaster/ ) which im-
plements the described method. providing such information with various success and
‡). This formula is a simplification of (7). reliability (see [1] and the citations therein). Our main
Fig. 6. Local linguistic evaluation of the trend of time series in areas [17 − 44] (negligible decrease), [50 − 97]
(stagnating), [104 − 114] (clear decrease), [116 − 126] (somewhat decrease — forecast of the time series).
contribution consists in utilization of coherent tech- [7] A. Dvořák and M. Holčapek, “L-fuzzy quantifiers of the type
niques of F-transform and fuzzy natural logic. With h1i determined by measures,” Fuzzy Sets and Systems, vol. 160,
pp. 3425–3452, 2009.
the help of them we are able to disclose various kinds [8] V. Novák, “A formal theory of intermediate quantifiers,” Fuzzy
of information and put them together using natural Sets and Systems, vol. 159, no. 10, pp. 1229–1246, 2008.
language. Strength and reliability of these techniques [9] P. Murinová and V. Novák, “A formal theory of generalized
intermediate syllogisms,” Fuzzy Sets and Systems, vol. 186, pp.
are based on their mathematically proved properties. 47–80, 2012.
Our future work will be focused on refinement and [10] ——, “Structure of generalized intermediate syllogisms,” Fuzzy
extension of the above presented techniques. Unfortu- Sets and Systems, vol. 247, pp. 18–37, 2014.
[11] G. Klir and Y. Bo, Fuzzy Set Theory: Foundations and Appli-
nately, due to space limit, we could not make com- cations. Upper Saddle River, NJ: Prentice Hall, 1995.
parison with the other published methods overviewed [12] V. Novák, M. Štěpnička, A. Dvořák, I. Perfilieva, V. Pavliska,
in [1]. Of course, this is also one of future tasks. We and L. Vavřı́čková, “Analysis of seasonal time series using
fuzzy approach,” Int. Journal of General Systems, vol. 39, pp.
will also focus on comparison of time series using the 305–328, 2010.
dynamic time warping method [26] and searching char- [13] V. Novák, “On modelling with words,” Int. J. of General
acteristic sequences. Our goal is to develop algorithms Systems, vol. 42, pp. 21–40, 2013.
[14] P. Peterson, Intermediate Quantifiers. Logic, linguistics, and
for finding a complete information about behavior of Aristotelian semantics. Aldershot: Ashgate, 2000.
time series and providing it in a concise and well [15] P. Murinová and V. Novák, “Analysis of generalized square
understandable form, namely in natural language. of opposition with intermediate quantifiers,” Fuzzy Sets and
Systems, vol. 242, pp. 89–113, 2014.
[16] M. Delgado, M. Ruiza, D. Sanchez, and M. Vila, “Fuzzy
quantification: a state of the art,” Fuzzy Sets and Systems, vol.
Acknowledgment 242, pp. 1–30, 2014.
[17] I. Perfilieva, “Fuzzy transforms: theory and applications,” Fuzzy
This paper was supported by the program MŠMT- Sets and Systems, vol. 157, pp. 993–1023, 2006.
[18] I. Perfilieva, M. Daňková, and B. Bede, “Towards a higher
KONTAKT II, project LH 12229. Additional support degree F-transform,” Fuzzy Sets and Systems, vol. 180, pp. 3–
was given also by the European Regional Develop- 19, 2011.
ment Fund in the IT4Innovations Centre of Excellence [19] V. Novák, I. Perfilieva, M. Holčapek, and V. Kreinovich, “Fil-
tering out high frequencies in time series using F-transform,”
project (CZ.1.05/1.1.00/02.0070). Information Sciences, vol. 274, pp. 192–209, 2014.
[20] V. Kreinovich and I. Perfilieva, “Fuzzy transforms of higher
order approximate derivatives: A theorem,” Fuzzy Sets and
References Systems, vol. 180, pp. 55–68, 2011.
[21] J. Anděl, Statistical Analysis of Time Series. Praha: SNTL,
1976 (in Czech).
[1] T.-C. Fu, “A review on time series data mining,” Engineering [22] J. Hamilton, Time Series Analysis. Princeton University Press:
Applications of Artificial Intelligence, vol. 24, pp. 164–181, Princeton, 1994.
2011. [23] V. Novák, V. Pavliska, I. Perfilieva, and M. Štěpnička, “F-
[2] V. Novák, “A comprehensive theory of trichotomous evalua- transform and fuzzy natural logic in time series analysis,” in
tive linguistic expressions,” Fuzzy Sets and Systems, vol. 159, Proc. Int. Conference EUSFLAT-LFA’2013, Milano, Italy, 2013.
no. 22, pp. 2939–2969, 2008. [24] R. Castillo-Ortega, N. Marı́n, and D. Sánchez, “A fuzzy ap-
[3] ——, “Mathematical fuzzy logic in modeling of natural lan- proach to the linguistic summarization of time series,” Multiple-
guage semantics,” in Fuzzy Logic – A Spectrum of Theoretical Valued Logic and Soft Computing, vol. 17, no. 2-3, pp. 157–
& Practical Issues, P. Wang, D. Ruan, and E. Kerre, Eds. 182, 2011.
Berlin: Elsevier, 2007, pp. 145–182. [25] J. Kacprzyk, A. Wilbik, and Zadrożny, “Linguistic summariza-
[4] ——, “Perception-based logical deduction,” in Computational tion of time series using a fuzzy quantifier driven aggregation,”
Intelligence, Theory and Applications, B. Reusch, Ed. Berlin: Fuzzy Sets and Systems, vol. 159, pp. 1485–1499, 2008.
Springer, 2005, pp. 237–250. [26] D. Berndt and J. Clifford, “Finding patterns in time series:
[5] V. Novák and S. Lehmke, “Logical structure of fuzzy IF-THEN a dynamic programming approach,” Advances in Knowledge
rules,” Fuzzy Sets and Systems, vol. 157, pp. 2003–2029, 2006. Discovery and Data Mining, pp. 229–248, 1996.
[6] V. Novák and I. Perfilieva, “On the semantics of perception-
based fuzzy logic deduction,” International Journal of Intelli-
gent Systems, vol. 19, pp. 1007–1031, 2004.