Time Series Mining by Fuzzy Natural Logic and F-Transform: March 2015

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/283580939

Time Series Mining by Fuzzy Natural Logic and F-Transform

Article · March 2015


DOI: 10.1109/HICSS.2015.181

CITATIONS READS
4 85

2 authors:

Vilem Novak Irina Perfiljeva


University of Ostrava University of Ostrava
313 PUBLICATIONS   5,984 CITATIONS    266 PUBLICATIONS   5,225 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Image processing on the basis of F- (fuzzy) transforms View project

I am now developing a fuzzy type theory with partially defined functions. View project

All content following this page was uploaded by Vilem Novak on 09 December 2015.

The user has requested enhancement of the downloaded file.


Time Series Mining by Fuzzy Natural Logic and F-Transform

Vilém Novák and Irina Perfilieva


University of Ostrava, Institute for Research and Applications of Fuzzy Modeling
NSC IT4Innovations, 30. dubna 22, 701 03 Ostrava 1, Czech Republic
Email: {Vilem.Novak,Irina.Perfilieva}@osu.cz

Abstract 2. Elements of fuzzy natural logic


In this paper, we discuss application of special soft
computing methods, namely the fuzzy natural logic The fuzzy natural logic is a formal mathematical
and fuzzy transform, to the problem of mining in- theory that consists of:
formation from time series. The mined information (a) A formal theory of evaluative linguistic expressions
is formulated in natural language. We discuss the explained in detail in [2] (see also [3]).
following applications: reduction of size of time series, (b) A formal theory of fuzzy IF-THEN rules and
extraction of its trend-cycle, linguistic characterization approximate reasoning presented in [4], [5], [6].
of its future course and forecast of future values, (c) A formal theory of intermediate and generalized
finding perceptionally important points, extraction and fuzzy quantifiers, presented in [7], [8], [9], [10].
linguistic evaluation of local trend in a given area, and
summarization of characteristics of time series using
2.1. Evaluative linguistic expressions
generalized intermediate quantifiers.
The central role in all these theories is played by the
1. Introduction theory of evaluative linguistic expressions. These are
expressions with the general form
This paper is focused on mining information from
hlinguistic modifierihTE-adjectivei (1)
time series. Our presentation follows overview given
in [1] where many specific problems and their solution where hTE-adjectivei∗) is one of the adjectives “small,
using various techniques are presented. In this paper, medium, big” (and possibly other specific adjectives,
we propose to solve these problems using techniques especially the so called gradable or evaluative ones),
based on the theory of fuzzy (F-)transform and fuzzy or “zero” as well as arbitrary symmetric fuzzy number.
natural logic (FNL). While the former makes it possible The hlinguistic modifieri is a special expression that be-
to analyze and elaborate time series, the latter provides longs to a wider linguistic phenomenon called hedging
methods for automatic generation of comments in nat- and that specifies more closely the topic of utterance.
ural language. In our case, the linguistic modifier makes the meaning
More concretely, we apply the F-transform to the of the hTE-adjectivei more specific. Quite often it is
following tasks: to reduction of size of the time series, represented by an intensifying adverb such as “very,
extraction of its trend-cycle and finding perceptionally roughly, approximately, significantly”, etc. The linguis-
important points. Using methods of FNL, we can tic modifiers can have narrowing (“extremely, signif-
also automatically generate linguistic characterization icantly, very, typically”) and widening effect (“more
of future course of the trend cycle and forecast both or less, roughly, quite roughly, very roughly”) on the
the trend-cycle as well as values of the time series. Fur- meaning of the hTE-adjectivei.
thermore, since the F-transform provides estimation of If hlinguistic hedgei is not present (expressions such
the first derivative, it also gives us estimation of a local as “weak, large”, etc.) then we take it as presence of
trend in a specified area. Then, again using methods of empty linguistic hedge. Thus, all the simple evaluative
FNL, we can generate its linguistic evaluation. Finally, expressions have the same form (1). Since they char-
we can also apply the theory of intermediate quantifiers acterize values on an ordered scale, we may consider
(part of FNL) to summarize characteristics of time also scales divided into two parts that are usually inter-
series. The related theory of generalized syllogisms preted as positive and negative. Hence, the evaluative
enables us to learn more about behavior of sets of time
series. ∗). The “TE” is a short for “trichotomic evaluative”
1.2
expressions may have also a sign, namely “positive” or
<hedge>Small <hedge>Medium <hedge>Big
“negative”. 1.0

Simple evaluative expressions of the form (1) can


0.8
also be combined using logical connectives (usually
“and” and “or”) to obtain compound ones. A lim- 0.6

ited usage of the particle “not” is also possible. Let


0.4
us emphasize, however, that syntactic and semantic
limitations of natural language prevent the compound 0.2
evaluative expressions to form a boolean algebra!
We distinguish abstract evaluative expressions from 0.0

more specific evaluative predications. The latter are


expressions of natural language of the form ‘X is A’ 0.0 0.2 0.4 0.6 0.8 1.0 1.2

where A is an evaluative expression and X is a


Fig. 1. Shapes of extensions of some evalu-
variable which stands for objects, for example “degrees
ative expressions in the context h0, 0.5, 1i. The
of temperature, height, length, speed”, etc. Examples
hedges are {Extremely, Significantly, Very, empty
are “temperature is high”, “speed is extremely low”,
hedge} for “small” and “big” and {More-or-Less,
“quality is very high”, etc. In general, the variable
Roughly, Quite Roughly, Very Roughly} for “small”,
X represents certain features of objects such as “size,
“medium”, and “big”.
volume, force, strength,” etc. and so, its values are often
real numbers.
Important notion is that of linguistic context. In our
theory it is a triple of (real) numbers w = hvL , vS , vR i of such rules is called a linguistic description, that is,
where vL is the leftmost typically small value, vS a finite set of fuzzy/linguistic IF-THEN rules
is typically medium value and vR is the rightmost R1 = IF X is A1 THEN Y is B1 ,
typically big value. For example, when speaking about ............................. (4)
temperature of water, we may set vL = 15◦ C (water
temperature in the crane), vS = 50◦ C and vR = 100◦ C. Rm = IF X is Am THEN Y is Bm
In the sequel, we will consider a set of all linguistic where “X is Aj ”, “Y is Bj ”, j = 1, . . . , m are evalu-
contexts ative linguistic predications. The linguistic description
can be understood as a specific kind of a (structured)
W = {w = hvL , vS , vR i | vL , vS , vR ∈ R, text that can be used for description of various situa-
vL < vS < vR }. (2) tions and processes.

The element x belongs to a context w ∈ W if x ∈ 2.3. Perception-based logical deduction


[vL , vR ]. Then we write x ∈ w.
The meaning of an evaluative linguistic expression Linguistic description taken as a special text requires
and predication is represented by its intension a special inference method, namely the Perception-
based Logical Deduction (PbLD). This inference
Int(X is A) : W −→ F(R) (3)
method works with genuine evaluative linguistic ex-
where F(R) is a set of all fuzzy sets on R. For each pressions and it is based on formal properties of math-
context w ∈ W , the extension Extw (X is A) is a ematical fuzzy logic (see [3], [4], [6]). The method is
specific fuzzy set on R. Example of extensions of based on local properties of the linguistic description,
several evaluative linguistic expressions is in Fig. 1. Let so that we distinguish the rules as such but at the same
us emphasize that their shapes have been established time deal with them as vague expressions of natural
on the basis of logical analysis of the meaning of the language. The PbLD has nothing in common with the
corresponding evaluative expressions (for the details, classical Mamdani’s inference ( cf., e.g., [11]).
see [2]). The PbLD requires a defuzzification method called
DEE (Defuzzification of Evaluative Expressions). Its
2.2. Linguistic description variant realized using the F-transform is called smooth
DEE (see [6]).
To demonstrate PbLD, let us consider the following
The evaluative linguistic predications are basic con-
linguistic description:
stituents of fuzzy/linguistic IF-THEN rules that are
special conditional clauses of natural language. A set R1 = IF X is small THEN Y is small,
predications with respect to a given context w and
consequently, also linguistic descriptions of the form
(4). The learning procedure is realized by implementing
(a) a function of local perception
LPerc(x, w) = A (6)
where w ∈ W is a given context and x ∈ w is a given
(b) value. The linguistic expression A is an evaluative one
that characterizes the value x in the given context
w. For example, the value x = 0.15 in a context
w = h0, 4, 10i is evaluated by the evaluative expression
“very small”.
(c) Thus, the data
 
u11 u12 . . . u1c v1
 u21 u22 . . . u2c v2 
 
 .. .. .. .. .. 
 . . . . . 
(d)
um1 um2 . . . umc vm
Fig. 2. (a) A function obtained from the simple
linguistic description (5) using the PbLD method can be transformed into a linguistic description consist-
with smooth DEE defuzzification. (b) Extensions of ing of m fuzzy/linguistic IF-THEN rules of the form
the used evaluative expressions “small–medium– IF X1 is A1 AND · · · AND Xc is Ac THEN Y is B.
big” in the context h0, 0.4, 1i. (c) A function ob-
tained using Mamdani’s-COG method from linguis- 2.5. Intermediate quantifiers
tic description of the form (5) interpreted as fuzzy
relation constructed using triangular membership Intermediate quantifiers are expressions of natural
functions depicted in (d). language such as most, many, almost all, a few, a large
part of, etc. Analysis of their meaning reveals that they
refine quantification and their meaning lays between the
limit cases for all (∀) and exists (∃). A detailed analysis
R2 = IF X is medium THEN Y is big, (5)
of their meaning was presented in [14]. A mathematical
R3 = IF X is big THEN Y is small. model of their meaning was developed using means of
This description characterizes linguistically a function higher-order fuzzy logic in a series of papers [8], [9],
that has small functional values on the left and right [10], [15]. Let us remark that these quantifiers belong
side of the graphs and big ones in the middle. The among fuzzy generalized quantifiers (cf. [16]).
result using PbLD method is depicted in part (a) of Without formal details, let us mention few models
Fig. 2. In part (b) are extensions of the used evalua- of type h1, 1i intermediate quantifiers studied in the
tive expressions in the context h0, 0.4, 1i. To see the mentioned papers:
difference from the Mamdani’s method, we depicted A: All B are A := Q∀Bi∆
∆ (B, A)
in Fig. 2(c), (d) the result obtained from (5) using
Mamdani’s method on the basis of triangular fuzzy sets (the biggest possible part of B has A)
often (incorrectly) considered in literature as extensions P: Almost all B are A := Q∀Bi Ex (B, A)
of evaluative expressions. The reason why Mamdani‘s (extremely big part of B has A)
method does not work in this case is the fact that it
provides very good approximation of a function, but it B: Few B are A := Q∀Bi Ex (B, ¬ A)
is not logical inference suitable for manipulation with (extremely big part of B has no A)
linguistic expressions.
T: Most B are A := Q∀Bi Ve (B, A)
2.4. Learning of linguistic description (very big part of B has A)

K: Many B are A := Q¬ ν ) (B, A)
(Sm ν̄
In applications of the above methods in time series
(not small part of B has A)
mining, we use a learning procedure developed in FNL
(cf. [12], [13]), using which we can learn linguistic I: Some B are A := Q∃Bi∆
∆ (B, A)
defined over nodes
The meaning of the above quoted quantifiers is a = c0 , . . . , cn = b. (11)
characterized by the formulas
Properties of the fuzzy sets from A are specified by
(Q∀Ev x)(B, A) := (∃z)((∆
∆(z ⊆ B) five axioms, namely: normality, locality, continuity,
unimodality, and orthogonality that is formally defined
& (∀x)(z x ⇒ Ax)) ∧ Ev ((µB)z)), (7)
by
Xn

(Q∃Ev x)(B, A) := (∃z)((∆


∆(z ⊆ B) Ai (x) = 1, x ∈ [a, b.] (12)
i=0
& (∃x)(zx ∧ Ax)) ∧ Ev ((µB)z)). (8)
(equation (12) is sometimes called Ruspini condition).
Interpretation of these formulas is simple (informally): A fuzzy partition A is called h-uniform if the nodes
we take the largest fuzzy set z ⊆ B such that each c0 , . . . , cn are h-equidistant, i.e., for all k = 0, . . . , n −
element x having the property represented by z (and 1, ck+1 = ck + h, where h = (b − a)/n and the fuzzy
so, having also the property B) has also the property A sets A1 , . . . , An−1 are shifted copies of a generating
and, at the same time, the size of z w.r.t. B is evaluated function A : [−1, 1] −→ [0, 1] such that for all k =
by the expression Ev (for example, it can be very big, 1, . . . , n − 1
not small, etc.). The size of z w.r.t. B is mathematically 
x − xk

characterized by a measure (µB)z given, for example, Ak (x) = A , x ∈ [ck−1 , ck+1 ]
h
by a formula
( (for k = 0 and k = n we consider only half of
1 if z = B, the function A, i.e. restricted to the interval [0, 1]
(µB)z = |z| (9)
|B|
and [−1, 0], respectively). The membership functions
A0 , . . . , An of fuzzy sets forming the fuzzy partition
†)
P
where |B| = x Bx . A are usually called basic functions.

3. The principle of F-transform 3.2. Zero degree F-transform


The fuzzy (F-)transform is a universal technique Once the fuzzy partition A0 , . . . , An ∈ A is deter-
introduced in [17], [18] that has many kinds of ap- mined, we define a direct F-transform of a continuous
plications. Its fundamental idea is to map a bounded function f as a vector F[f ] = (F0 [f ], . . . , Fn [f ]),
continuous function f : [a, b] −→ R to a finite vector where each k-th component Fk [f ] is equal to
of numbers and then to transform it back. The former Rb
is called a direct F-transform and the latter an inverse f (x)Ak (x) dx
Fk [f ] = a R b , k = 0, . . . , n. (13)
one. The result of inverse F-transform is a function fˆ Ak (x) dx
a
that approximates the original function f . Parameters
of the F-transform can be set in such a way that the Clearly, the Fk [f ] component is a weighted average
approximating function fˆ has desired properties. of the functional values f (x) where weights are the
The power of F-transform stems from the proof of membership degrees Ak (x). The inverse F-transform
its approximation abilities, its ability to filter out high of f with respect to F[f ] is a continuous function‡)
frequencies and reduce noise [19] and the ability to fˆ : [a, b] −→ R such that
estimate average values of first and second derivatives n
X
in a given area [20]. fˆ(x) = Fk [f ] · Ak (x), x ∈ [a, b].
k=0

3.1. Fuzzy partition Theorem 1


The inverse F-transform fˆ has the following properties:
The first step of the F-transform procedure is to form
(a) The sequence of inverse F-transforms {fˆn } deter-
a fuzzy partition of the domain [a, b]. It consists of a
mined by a sequence of uniform fuzzy partitions
finite set of fuzzy sets
based on uniformly distributed nodes with h =
A = {A0 , . . . , An }, n ≥ 2, (10) (b − a)/n uniformly converges to f for n → ∞.

†). For precise formulation of the interpretation of formulas (7) ‡). By abuse of language, we call by direct as well as inverse F-
and (8) see the cited references. The evaluation is realized w.r.t. the transform both the procedure as well as its respective results F[f ] =
standard context w̄ = h0, 0.5, 1i (F0 [f ], . . . , Fn [f ]) and fˆ.
Remark 1 (important)
(b) The F-transform is linear, i.e., if f = αu + βv then
It should be noted that only the nodes c1 , . . . , cn−1
fˆ = αû + βv̂.
should be considered when dealing with the F-transform
All the details and full proofs can be found in [17], and the edge nodes c0 , cn should be omitted. The reason
[18]. is that the areas [c0 , c1 ] and [cn−1 , cn ] are covered by
halves of the basic functions A0 , An , respectively and
3.3. Higher degree F-transform so, the approximation of f in these areas is subject to
too large error. Hence, we should consider the function
The F-transform introduced above is F0 -transform fˆ on the interval [c1 , cn−1 ] only.
(i.e., zero-degree F-transform). Its components are real
numbers. If we replace them by polynomials of arbi- 4. Time series and F-transform
trary degree m ≥ 0, we arrive at the higher degree
Fm transform. This generalization has been in detail A time series is a stochastic process (see [21], [22])
described in [18]. Let us remark that the F1 transform X : Q × Ω −→ R where Ω is a set of elementary
enables to estimate also derivatives of the given func- random events and Q = {0, . . . , p} ⊂ N is a finite
tion f as weighted average values over a specified area. set whose elements are interpreted as time moments.
The direct F 1 -transform of f with respect Our basic assumption is that the time series can be
to A1 , . . . , An−1 is a vector F 1 [f ] = decomposed as follows:
(F11 [f ], . . . , Fn−1
1
[f ]) where the components Fk1 [f ], r
X
k = 1, . . . , n − 1 are linear functions X(t, ω) = TC (t) + Pj ei(λj t+ϕj ) + R(t, ω),
j=1
Fk1 [f ](x) = βk0 + βk1 (x − ck ) (14)
t ∈ Q, ω ∈ Ω, (20)
with the coefficients βk0 , βk1 given by
R ck+1 where Q is a time domain (usually a finite set), Ω is a
c
f (x)Ak (x)dx set of elementary random events, TC (t) is a trend-cycle
0
βk = k−1 R ck+1 , (15) and R(t, ω) is a random noise†) . The middle term is a
ck−1
Ak (x)dx
R xk+1 mixture of complex periodic functions for some finite
x
f (x)(x − ck )Ak (x)dx r where λj are frequencies, ϕj phase shifts and Pj are
1
βk = Rk−1 ck+1 . (16) amplitudes.
ck−1
(x − ck )2 Ak (x)dx
Let us now assume (without loss of generality) that
Note that βk0 = Fk [f ], i.e. the coefficients βk0 are just the frequencies in (20) are ordered as λ1 ≤ · · · ≤ λr .
the components of the F0 transform given in (13). These frequencies correspond to periodicities
The F1 transform has also the properties stated in
T1 ≥ · · · ≥ Tr (21)
Theorem 1 (see [18]).
We will also use the F2 transform. Its components (via the equality T = 2π/λ). Now choose some q < r
are the functions and define a bounded time series
h2
  q−1
2 0 1 2 2
Fk [f ](x) = βk + βk (x − ck ) + βk (x − ck ) −
X
6 X̄(t) = TC (t) + Pj ei(λj t+ϕj ) . (22)
j=1
(provided that the basic functions are triangles).
The following theorem demonstrates the power of
Theorem 2 the F-transform for time series analysis.
If f is four-times continuously differentiable on [a, b]
Theorem 3
then for each k = 1, . . . , n − 1,
Let X(t) be realization of the stochastic process in (20)
βk0 = f (ck ) + O(h2 ), (17) considered over the interval [a, b]. If we construct a
βk1 0
= f (ck ) + O(h ).2
(18) fuzzy partition over the set of equidistant nodes (11)
00 with the distance h = d Tq where d ∈ N and Tq is a
f (ck )
βk2 = + O(h2 ). (19) periodicity corresponding to λq then the corresponding
2 inverse F-transform X̂ of X(t) gives the following
Thus, the F-transform components provide a weighted estimation of the bounded time series X̄(t):
average of values of the function f in the area around
the node ck (17), and also a weighted average of slopes |X̂(t) − X̄(t)| ≤ 2ω(h, X̄) + D (23)
(18) of f and that of its second derivatives (19) in the †). Each R(t) for t ∈ Q is a random variable with the zero mean
same area. value finite variance.
Pn−1
for t ∈ [c1 , cn−1 ], where D for d ≥ 2 is a certain small k=1 βk1 is minimal. This assures that we cover well
number and ω(h, X̄) is a modulus of continuity of X̄ areas where the slope of time series is (close to) zero.
w.r.t. h. Finally, we replace the original time series X by the
reduced one
The precise form of D in (23) and detailed proof of
this theorem can be found in [19]. The theorem holds X [h] = {β10 , . . . , βn−1
0
}. (25)
both for F0 as well as for F1 -transform.
This new time series has the following properties:
The periodicity Tq can be found using the well
known periodogram — see [22]. Then, by setting a 1) Its dimension n − 1 is significantly lower than
proper fuzzy partition, we first compute the F-transform the original dimension p. Of course, the reduction
of X(t) (either zero or first degree) depends on the choice of the distance h between
the nodes determining the fuzzy partition.
F[X] = (F1 [X], . . . , Fn [X]). 2) By Theorem 1, it fits well the shape of the original
Then the estimation of the bounded time series X̄ is time series X and keeps all its salient points.
obtained using the inverse F-transform: X̄(t) ≈ X̂(t). 3) Due to Theorem 3, it is free of all frequencies
Thus, the F-transform enables to filter out frequencies higher than λq and variance of its noise is lower
higher than a given threshold (including reduction of than variance of the noise R(t, ω) of the original
the noise). time series X.
Theorem 3 can be applied in several ways. First of Demonstration of dimensionality reduction of a time
all, by setting j = 1, we obtain X̂ ≈ TC which means series is in Fig. 3. The original time series†) has length
that we can estimate the trend cycle TC with high p = 188. The reduced time series has length n − 1 =
precision. Further applications are described below. 62. The distance between nodes is set to h = 3 on
the basis of the lowest periodicity T = 2.4 found by
5. Mining information periodogram.

This section contains the main contribution of this 5.2. Perceptionally important points
paper. We will show that both mentioned techniques
can be effectively used for mining information from Interesting problem also discussed in [1] is recog-
time series. nition of perceptionally important (salient) points. The
paper mentions several techniques based on special cri-
5.1. Reduction of dimensionality teria that can be summarized in the following definition:
perceptionally important point is a point where the time
One of the first problems mentioned in the paper [1] series essentially changes its course. Because of the
is reduction of dimensionality of long time series. The complicated character following from the presence of
paper cites several methods, mostly based on segmen- various frequencies and noise, we cannot expect that
tation of the time series into intervals and taking some this is just one isolated time point but better a certain
representative values (usually averages). In our opinion, area.
a more effective tool can be based on Theorem 3 for Since the F-transform provides estimation of the first
the following reasons: (a) reduction of the dimension and second derivative in a specified area we propose
is accompanied by reduction of the noise and filtering to use it also for finding the above points. A possible
out some of the high frequencies contained in it; (b) the algorithm is the following:
reduced time series can be anytime recovered to a time (i) Determine a fuzzy partition with the distance
series of the original dimension but still with reduced between nodes set to h = dTk for some d ∈ N
noise and free of some high frequencies. and k < q. Find position P of the nodes (11) in
n−1 1
The reduction procedure is as follows. First we such a way that the sum k=1 βk is minimal.
choose some boundary periodicity Tq from (21). We 2
Compute the F -transform of X.
will usually set either q = r or take some q close to r. (ii) Find all nodes ck with the highest |βk2 |. Mark such
Then we choose the distance between nodes to h = dTq ks as potentially important points.
(h ∈ N) and form a fuzzy partition (10). Finally, we
compute the F1 -transform of X Note however that not all potentially important
points may be indeed important. Their importance
F1 [X] = (F11 [X], . . . , Fn−1
1
[X]). (24) also depends on the width of the corresponding
To find the optimal fuzzy partition we will choose a †). The data are monthly numbers of slaughtered pigs in Victoria
position of the nodes (11) in such a way that the sum in the years 1980-1995.
(a)

(b)

Fig. 3. Demonstration of dimensionality reduction of a time series using F-transform with h = 3. Graph (a)
is the original time series (consisting of 186 points) and its inverse F0 -transform. Graph (b) is the reduced
time series consisting of only the components β10 , . . . , β62
0
.

basic function Ak . But this is relative to the length (ii) We can apply the learned linguistic description
0 0
of h. to forecast future components (βn+1 , . . . , βn+k ).
(iii) Check areas around the potentially important This is especially interesting when we character-
points with several (at least two) components ize behavior of the trend-cycle TC and try to
βk1 , βk+1
1
,. . . having the same sign followed by one forecast its future (unknown) development. This
or more components βj1 that have opposite sign can be done using the perception-based logical
and/or are sufficiently large. deduction.
An application of this algorithm to the Pigs time Recall that the trend-cycle TC can be obtained
series is in Fig. 4. from the time series on the basis of Theorem 3 when
choosing h = dT1 that is, the longest periodicity
5.3. Linguistic characterization of subsequent forming the seasonal component S in (??). Example
behavior of separation of the trend is in Fig. 5. The figure also
contains forecast of TC on the basis of the following
Let us consider a reduced time series X [h] given in learned linguistic description:
(25). Using the learning method developed in FNL,
we can learn a linguistic description characterizing Rule βk0 ∆βk0 ⇒ 0
βk+1
behavior of X [h] in various time moments on the basis 1 ex bi ra me qr bi
of its values in previous time moments. In other words, 2 ro bi -ml me qr bi
we can learn how to forecast future components βk0 on 3 ro bi -ex sm vr sm
the basis of their development in the past. As antecedent 4 ze -ex bi vr sm
variables, we consider the F0 -transform components 5 si sm si sm ra me
(17) as well as their first- and second-order differences: 6 ty me ra me ra me
7 qr sm -ml me ra me
∆βi0 = βi0 − βi−1
0
, i = 1, . . . , n − 1 8 ra me qr sm ml me
∆2 βi0 = ∆βi0 − 0
∆βi−1 , i = 2, . . . , n − 1 (the used shorts: sm-small, me-medium, bi-big,
respectively. To obtain the best fitting linguistic descrip- ex-extremely, ro-roughly, qr-quite roughly, vr-very
tion, we may define a validation set of time points and roughly, ra-rather, ty-typically, si-significantly, ml-more
test all possible combinations of linguistic description or less). Note that the last 12 real values depicted in
w.r.t. validation set so that the best combination can be the figure were not used in the computation but only
chosen. for comparison of the quality of prediction. The lower
The outcome of this procedure is twofold: part of Fig. 5 contains detail of the last 12 time points
(i) The automatically learned linguistic description that contains the following: (a) the real trend-cycle
gives information about behavior of the time TC computed using F-transform from the real data
series in a given time period. is depicted together with its forecast obtained from
Fig. 4. Demonstration of perceptionally important points in the Pigs time series. The points with highest
values of |βk2 | are marked by thick vertical lines. They evidently correspond to areas of more radical change.

Fig. 5. Demonstration of estimation of the trend-cycle TC of a given time series using F-transform. The
figure contains the used fuzzy partition and in the red (last) area both computed as well as forecasted TC ,
and also forecast of the whole time series for the next 12 time moments. Below is detail of the forecast.

the above linguistic description using the perception- task even when watching the graph (cf., e.g., the course
based logical deduction, (b) the real time series and its of the time series from Fig. 5.). Since F1 transform
forecasted obtained by combination of the forecast of provides estimation of the average slope (tangent), it is
TC and forecast of the remaining seasonal component a convenient tool for estimation of the trend. To make
S ‡) (cf. (??)). it more informative for managers, it seems better to
characterize it using natural language. For example, we
5.4. Linguistic evaluation of local trend can say “fairly large decrease (huge increase) of trend”,
“the trend is stagnating (negligibly increasing)”, etc.
Recall that the trend-cycle TC of a time series These expressions characterize trend (tendency) of the
X is the component that represents variations of low time series in an area specified by the user. In [23],
frequency in a time series, the high frequency fluc- a method for automatic generation of such linguistic
tuations having been filtered out. Thus, TC has two evaluation of trend was described.
subcomponents, namely trend (tendency) and cycle. Algorithm:
Trend is a general direction of the time series, namely
whether it is increasing, stagnating, or decreasing. The (i) Specify, what does it mean “extreme increase (de-
trend is, however, usually bound to a certain time crease)”. In practice, it can be determined as the
interval and then changes. This means that besides largest acceptable difference of time series values
trend there is also a certain cyclic characteristic that with respect to a given (basic) time interval (for
together with local trends forms the trend-cycle TC . example 12 months, 31 days) that is, a minimal
and maximal tangent. In practice, we set only
If a certain time interval is given, it may be interest-
the largest tangent vR while the smallest one is
ing to learn what kind of trend can be recognized in it.
usually vL = 0. The typical medium value vS is
Surprisingly, recognition of trend may not be a trivial
determined analogously as vR . The result is the
‡). Its forecast was obtained using other means not discussed in context wtg = hvL , vS , vR i.
this paper. and a substantiated formalization was developed
in a series of papers [8], [9], [10], [15]. (ii) Define basic functions Ai over each t̄i and com-
(ii) Specify an interesting time interval I ⊂ Q. pute the coefficients βi1 (16), i = 1, . . . , m.
Furthermore, compute a basic function A whose (iii) Using the function (26), generate linguistic ex-
base is I (cf. Subsection 3.1) and compute the pressions Ai evaluating the slope βi1 in the inter-
coefficient β 1 using formula (16). val t̄i .
(iii) Generate a linguistic evaluation of the trend of the (iv) Compute the measure (9) by counting the number
time series X in the area characterized by A with s of intervals t̄i , in which the slope of time series
respect to the context wtg . The required evaluative was evaluated in the sense of (iii) and evaluate
expression A will be obtained using the function the number s/m.
of local perception (21):
5.5.2. A set of time series. Let a set {Xi | i =
A = LPerc(β 1 , wtg ). (26) 1, . . . , s} of time series (20) be given. Using the theory
Example of linguistic evaluation of local trend in of intermediate quantifiers, we can model the meaning
the context wtg = h0, 1200/12, 3000/12i is in Fig. 6: of sentences, such as
negligible decrease in the time interval [17 − 44], • Most (many, few) analyzed time series stagnated
stagnating in [50 − 97], clear decrease in [104 − 114], recently but their future trend is slightly increas-
somewhat decrease in [116 − 126]. Trend of the whole ing.
time series is stagnating†) . • There is an evidence of huge (slight, clear) de-
crease of trend of almost all time series in the
5.5. Summarization recent quarter of the year.
Another possibility is to mine interesting informa-
Summarization of knowledge about time series be- tion from the given set of time series, summarize
longs among interesting tasks that was solved by sev- their properties and summarize also their possible fu-
eral authors (see, e.g., [24], [25]). In this paper, we ture development. Namely, we start with analysis and
suggest to apply the sophisticated formal theory of in- forecasting of all the time series. Then we generate
termediate quantifiers developed in fuzzy natural logic comments to interesting time slots, or we can also
(see Subsection 2.5). We can summarize information determine time slots in which behavior of the time
within one time series or over a set of time series. series is interesting for us, for example, “in which
period was the time series sharply increasing”, “how
5.5.1. One time series. Example of possible informa- long was the time series stagnating or decreasing before
tion mined from one time series: sharp increase”, etc. Finally, we can summarize the
• In most (many, few) cases, the time series was
results using intermediate quantifiers and derive further
stagnating (slightly increasing, decreasing). properties on the basis of valid syllogisms.
Moreover, we can also apply syllogistic reasoning
This can be formalized as (Q∀Bi Ve t̄i )(Stagn(t̄i ))‡) with such expressions, for example
where t̄i is a certain time interval.
• In almost all (most, many, few) cases, if the time
In few cases the increase of time series is not small
series is increasing (decreasing) then the increase
(decrease) is very slight (sharp, clear). In many cases the increase of time series is clear
This can be formalized as In few cases the clear increase of time series
is not small
(Q∀Bi Ve t̄i )(Increase(t̄i ), Sm Ve(Increase(t̄i )))
It is important to note that the latter is example of the
(cf. (7)). valid generalized Aristotle’s syllogism. Such syllogism
is true in all situations (models). Recall that in [9],
Algorithm: [10], validity of over 120 generalized syllogism with
(i) Divide a time series into intervals t̄1 , . . . , t̄m ⊂ Q. intermediate quantifiers was proven.
Their length may be naturally set, for example to
30 (month), 7 (week), etc. These intervals may 6. Conclusion
(but need not) form a partition of Q.
In this paper, we focused on the problem of mining
†). The results were obtained using experimental software LFL information from time series. There are many methods
Forecaster (see http://irafm.osu.cz/en/c110 lfl-forecaster/ ) which im-
plements the described method. providing such information with various success and
‡). This formula is a simplification of (7). reliability (see [1] and the citations therein). Our main
Fig. 6. Local linguistic evaluation of the trend of time series in areas [17 − 44] (negligible decrease), [50 − 97]
(stagnating), [104 − 114] (clear decrease), [116 − 126] (somewhat decrease — forecast of the time series).

contribution consists in utilization of coherent tech- [7] A. Dvořák and M. Holčapek, “L-fuzzy quantifiers of the type
niques of F-transform and fuzzy natural logic. With h1i determined by measures,” Fuzzy Sets and Systems, vol. 160,
pp. 3425–3452, 2009.
the help of them we are able to disclose various kinds [8] V. Novák, “A formal theory of intermediate quantifiers,” Fuzzy
of information and put them together using natural Sets and Systems, vol. 159, no. 10, pp. 1229–1246, 2008.
language. Strength and reliability of these techniques [9] P. Murinová and V. Novák, “A formal theory of generalized
intermediate syllogisms,” Fuzzy Sets and Systems, vol. 186, pp.
are based on their mathematically proved properties. 47–80, 2012.
Our future work will be focused on refinement and [10] ——, “Structure of generalized intermediate syllogisms,” Fuzzy
extension of the above presented techniques. Unfortu- Sets and Systems, vol. 247, pp. 18–37, 2014.
[11] G. Klir and Y. Bo, Fuzzy Set Theory: Foundations and Appli-
nately, due to space limit, we could not make com- cations. Upper Saddle River, NJ: Prentice Hall, 1995.
parison with the other published methods overviewed [12] V. Novák, M. Štěpnička, A. Dvořák, I. Perfilieva, V. Pavliska,
in [1]. Of course, this is also one of future tasks. We and L. Vavřı́čková, “Analysis of seasonal time series using
fuzzy approach,” Int. Journal of General Systems, vol. 39, pp.
will also focus on comparison of time series using the 305–328, 2010.
dynamic time warping method [26] and searching char- [13] V. Novák, “On modelling with words,” Int. J. of General
acteristic sequences. Our goal is to develop algorithms Systems, vol. 42, pp. 21–40, 2013.
[14] P. Peterson, Intermediate Quantifiers. Logic, linguistics, and
for finding a complete information about behavior of Aristotelian semantics. Aldershot: Ashgate, 2000.
time series and providing it in a concise and well [15] P. Murinová and V. Novák, “Analysis of generalized square
understandable form, namely in natural language. of opposition with intermediate quantifiers,” Fuzzy Sets and
Systems, vol. 242, pp. 89–113, 2014.
[16] M. Delgado, M. Ruiza, D. Sanchez, and M. Vila, “Fuzzy
quantification: a state of the art,” Fuzzy Sets and Systems, vol.
Acknowledgment 242, pp. 1–30, 2014.
[17] I. Perfilieva, “Fuzzy transforms: theory and applications,” Fuzzy
This paper was supported by the program MŠMT- Sets and Systems, vol. 157, pp. 993–1023, 2006.
[18] I. Perfilieva, M. Daňková, and B. Bede, “Towards a higher
KONTAKT II, project LH 12229. Additional support degree F-transform,” Fuzzy Sets and Systems, vol. 180, pp. 3–
was given also by the European Regional Develop- 19, 2011.
ment Fund in the IT4Innovations Centre of Excellence [19] V. Novák, I. Perfilieva, M. Holčapek, and V. Kreinovich, “Fil-
tering out high frequencies in time series using F-transform,”
project (CZ.1.05/1.1.00/02.0070). Information Sciences, vol. 274, pp. 192–209, 2014.
[20] V. Kreinovich and I. Perfilieva, “Fuzzy transforms of higher
order approximate derivatives: A theorem,” Fuzzy Sets and
References Systems, vol. 180, pp. 55–68, 2011.
[21] J. Anděl, Statistical Analysis of Time Series. Praha: SNTL,
1976 (in Czech).
[1] T.-C. Fu, “A review on time series data mining,” Engineering [22] J. Hamilton, Time Series Analysis. Princeton University Press:
Applications of Artificial Intelligence, vol. 24, pp. 164–181, Princeton, 1994.
2011. [23] V. Novák, V. Pavliska, I. Perfilieva, and M. Štěpnička, “F-
[2] V. Novák, “A comprehensive theory of trichotomous evalua- transform and fuzzy natural logic in time series analysis,” in
tive linguistic expressions,” Fuzzy Sets and Systems, vol. 159, Proc. Int. Conference EUSFLAT-LFA’2013, Milano, Italy, 2013.
no. 22, pp. 2939–2969, 2008. [24] R. Castillo-Ortega, N. Marı́n, and D. Sánchez, “A fuzzy ap-
[3] ——, “Mathematical fuzzy logic in modeling of natural lan- proach to the linguistic summarization of time series,” Multiple-
guage semantics,” in Fuzzy Logic – A Spectrum of Theoretical Valued Logic and Soft Computing, vol. 17, no. 2-3, pp. 157–
& Practical Issues, P. Wang, D. Ruan, and E. Kerre, Eds. 182, 2011.
Berlin: Elsevier, 2007, pp. 145–182. [25] J. Kacprzyk, A. Wilbik, and Zadrożny, “Linguistic summariza-
[4] ——, “Perception-based logical deduction,” in Computational tion of time series using a fuzzy quantifier driven aggregation,”
Intelligence, Theory and Applications, B. Reusch, Ed. Berlin: Fuzzy Sets and Systems, vol. 159, pp. 1485–1499, 2008.
Springer, 2005, pp. 237–250. [26] D. Berndt and J. Clifford, “Finding patterns in time series:
[5] V. Novák and S. Lehmke, “Logical structure of fuzzy IF-THEN a dynamic programming approach,” Advances in Knowledge
rules,” Fuzzy Sets and Systems, vol. 157, pp. 2003–2029, 2006. Discovery and Data Mining, pp. 229–248, 1996.
[6] V. Novák and I. Perfilieva, “On the semantics of perception-
based fuzzy logic deduction,” International Journal of Intelli-
gent Systems, vol. 19, pp. 1007–1031, 2004.

View publication stats

You might also like