Lai Sysid Time Series Hankel

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Mathl. Comput. Modelling Vol. 24, No. 3, pp.

l-10, 1996
Copyright@1996 Eisevier Science Ltd
Printed in Great Britain. All rights reserved
08957177/96 i15.00 + 0.00
PII: SO895-7177(96)00095-7

Dynamical Systems Identification


from Time-Series Data:
A Hankel Matrix Approach
D. LA1
Program in Biometry, School of Public Health
University of Texas at Houston, Houston, TX 77030, U.S.A.

G. CHEN
Department of Electrical and Computer Engineering
University of Houston, Houston, TX 77204, U.S.A.

(Received November 1995; revised and accepted March 1996)

Abstract-In this paper, we propose a simple but effective method for dynamical systems identi-
fication using time-series data. The method works perfectly well for deterministic dynamical systems
and works reasonably well for a general class of stochastic dynamical systems. Both computer simu-
lation studies and theoretical analysis are provided to validate the proposed methods.

Keywords-Conditional number, Dynamical system, Simulation study, Statistical test, Time


series.

1. INTRODUCTION

Time-series data obtained from observations of a dynamical system, if measurements were per-
formed appropriately, contain essential information about the system. To distinguish different
structures of the underlying dynamical systems using only short-period time-series data is of
great importance in many applications. In this paper, a simple yet effective method is proposed
for the purpose of distinguishing a linear structure from a nonlinear structure of a dynamical
system, using only short-period time-series data obtained from the system.
A natural way to examine time-series data is to plot the data in an appropriate fashion. The
most commonly used technique in plotting the time-series data is to use the “delay-coordinates”
plots, that is, to display {yt} versus {yt+e} for an integer fJ. For time-series data obtained by
sampling the input and output of a continuous finite-dimensional linear dynamical system, there
are many successful techniques for system identification [l], or even prediction, filtering, and
control [2]. A basic principle which supports this success is that the time-series data obtained
from a linear system can be characterized by a finite number of frequencies. However, these
methods are, in general, ineffective in solving similar problems for time-series data obtained from
nonlinear dynamical systems, particularly from chaotic systems, since the latter are characterized
by a continuous Fourier spectrum rather than a discrete set of frequencies.
Numerical methods for analyzing the time series produced by nonlinear dynamical systems
have been searched and developed in the last decade. The delay-coordinates method has a
solid underlying mathematical principle called the embedology [3,4]. According to [5], it was

We would like to thank J. Wiorkowski and X. Wang for their helpful comments and suggestions.

1
2 D. LAI AND G. CHEN

Packard et al. who first suggested the use of delay coordinates, and its mathematical result for
nonlinear dynamical systems was first published by Takens [6], based on a theory of Whitney (71.
In this paper, we study time series {yt} generated from a dynamical system

Yt = f(Yt--1,&t), t = 1,2,...,

where Yt-r = (yt_1,. . . , yt_d) and d is an integer. This system is stochastic if the sequence of
inputs {et} is random, or is deterministic if et s 0. For this time series model, there are many
successful system-identification methods available in the literature, including the correlation-test
method [S], the squares of time series approach 191, Tsay’s approach [9], Lagrange’s approach [9],
the BDS method [lo], Auestad and Tjotheim’s approach [ll], Petruccelli’s approach [12], Savit
and Green’s approach [13], Keenan’s approach [14], the neural-network techniques [15,16], the
bispectral method [17], and the R-S array method [18], etc. All these methods, called linearity
and/or nonlinearity testing, are based on an essential hypothesis: the time-series data were
generated from linear dynamic systems [19].
It was shown by Hunt et al. [20] that under some mild conditions, a nonlinear system can
always be represented as a nonlinear auto-regressive (AR) model of infinite order with a Volterra
series expansion and a finite-order approximation to such an infinite-order AR model is easily
implementable. Inspired by this connection of linearity and nonlinearity, our proposed method
is to utilize the Volterra series characteristics via a Hankel-like matrix consisting of the observed
time series data. This method can be extended to different orders of Hankel and Hankel-like
matrices by taking advantage of the Volterra series structure. Indeed, the R-S array method [18]
has already used the Hankel matrix structure. More precisely, the R-S array method is based
on the Hankel matrix of the autocorrelation coefficients of the observed time series, which was
proposed exclusively for stochastic systems. The method proposed in this paper, however, aims
at both deterministic and stochastic dynamical systems, initially for deterministic systems, but
eventually for both deterministic and stochastic systems (after a suitable modification).
This paper is organized as follows. The new method for deterministic dynamical systems is
first introduced in Section 2. Then, in Section 3, the behavior of the proposed method when
applied to systems with random shocks is analyzed. The modified method that can be used for
both deterministic and stochastic models is presented in Section 4. Finally, Section 5 concludes
the paper.

2. LINEAR AND NONLINEAR DETERMINISTIC SYSTEMS


Suppose that a scalar time series {yt} has been observed from a deterministic dynamical system

Yt = m-l), t = 1,2,..., (1)

where, again, Yt_i = (yt-1,. . . , yt_d). Using this set of time series data, we first form a Hankel
matrix Alk as follows:

7 (2)

Yk Yk+l *‘. Y2k-2 Y2k-1

wherek=1,2,..., K, with K < 00 being the last index that the Hankel matrix could be formed
from the data.
Let Dlb = I&] = determinant of Alk. If f in model (1) is linear, then there is a nonzero
constant vector C = (cl, cz, . . . , cd) such that

yt = c YtT1.
Hankel Matrix Approach 3

Consequently, we have Dlk = 0 for k 2 d. This result is both necessary and sufficient, and so
can be used to test whether or not the time series (yt} was generated from a linear dynamical
system.
It is clear, of course, that the determinant Dlk is very sensitive to small perturbations (noise)
of the matrix elements of Alk. This question will be further addressed in Section 4 below.
Model (1) is a very simple AR model. If f in model (1) is nonlinear, which will be indicated
by Dlk # 0, we may furthermore determine “how nonlinear the system would be.” To do so, we
recall a result from [20], which states that under some mild conditions, a nonlinear system can
be represented as a nonlinear AR model of infinite order with a Volterra series expansion

G(k + 1) = %+1(k), i=l,...,m-1,

zc,(k+ 1) = f: crps’l”(~)s~“_‘(Ic)...z~(k)uP-+‘(Ic), IPI = efljy


WI=1 j=l

Y(k) = n(k),

where both m and p are infinite if not truncated, {u(k)} is the system input, {y(k)} is the system
output, and {oa} is a sequence of constant coefficients determined by the given system.
Inspired by this result, we propose to modify the above Hankel matrix by considering its
extensions A2k, A&, . . . , so as to select a proper order of the Volterra series expansion of f.
More specifically, the Hankel-like matrix A2k that we will use is defined as

(4)

Y2k Y;k Y2k+l ?$k+l ‘*’ Y4k-2 Y;k-2

wherek=1,2 ,..., K.
Then, let D2k = (A2k] be the determinant of A2k as before. If D2k x 0 for some values of k,
then this indicates that the second-order Volterra expansion off is good enough to approximate f,
since there is a nearly linear relation among the columns of A2k.
As an example, consider the well-known logistic map

Yt = ayt-10 - Yt-11, t= 1,2,..., (5)

with a 2 3.56994. It is easy to verify, using the time series obtained from this model, that we
have Dlk # 0, but D2k = 0 for k 2 2. This indicates that the data are generated by a second-
order nonlinear dynamical system. Once the order is determined, statistical regression methods
can be used to reconstruct the dynamical system, by estimating the constant coefficients in the
system with that order. In this example, if the time series of length 100 with initial value 0.37
wss observed from
Yt = 3.99Yt-1(1 - Yt-I),

then we have Dlk # 0 and D2k = 0 for k 2 0. By using the ordinary least-square method with
the regression model
yt = CZYt”_l + QYt-1 -I- co,

the dynamical system is reconstructed from these time series data, as

yt = -3.99 y;_i + 3.99yt-1,

which is precisely the original system. For the Henon map, the result is similar. Indeed, many
examples can be easily given. This method works very well for different numbers of lengths and
initial values of the time series generated from the underlying deterministic dynamical systems.
4 D. LAI AND G. CHEN

3. LINEAR DETERMINISTIC SYSTEMS


WITH RANDOM SHOCKS
The method described in Section 2 works perfectly well for identifying linear deterministic
dynamical systems, and very well for nonlinear deterministic dynamical systems that can be
approximated by a lower-order Volterra series expansion. It is well known that using a long-
period trajectory of a deterministic linear dynamical system to do testing is by nature not a
difficult task. In this section, we turn to study a kind of stochastic dynamical systems, linear
deterministic dynamical systems with occasional random shocks.
If the time series is observed from a linear dynamical system that is subject to random shocks,
the value of Dlk may not be exactly zero for all k > d, where d is the order of the underlying
linear dynamical system. We studied the behavior of the statistics Dlk using the Monte Carlo
method on several dynamical systems with random shocks appearing at the rate of 5% or 10%
of the time throughout each process, namely, the random input st in the dynamic system

Yt = m-1) + Et

appears at the rate of 5% or 10% of the total time, respectively, during the entire process in
simulation.

9_ I

m_
0

x 0
0:

v!_
9

9_
7
I
I I I I I

0 20 40 60 a0 100

Index
Figure 1.

We have simulated time series observations from the dynamical system

j/t = 1.4 yt-1 - 0.6 yt-2 + Et, (6)


where Et is i.i.d. standard normal, and has length n = 100 for the appearance of random shocks
with either 5% or 10% intensity over the entire process. Then, we calculated the value of log I&(
for k = 10, 15, 20, 25, sequentially, along the dynamical time series. The trajectory of a typical
realization of the time series with 100 observations of the dynamical system, for the two different
intensities (5% and 10%) of random shocks, are shown in Figures 1 and 2, respectively. The
means of the log IDlkI for k = 10, k = 15, k = 20, and k = 25 for the time series generated by the
dynamical system (6) with random shocks of intensity 5% are -169.03, -179.69, -190.59, and
-198.35, respectively; and for the time series with random shocks of intensity 10% are -73.70,
Hankel Matrix Approach 5

I I I I I I

0 20 40 60 80 100

Index

Figure 2.

-74.14, -85.22, and -77.04, respectively. The means of the log (Dlk( for k = 10, k = 15, k = 20
and k = 25 for the time series generated by the logistic map (5) with a = 4 and initial value
y1 = 0.37 are -1.14, 1.15, 2.83, and 5.67, respectively.
These means indicate that the underlying randomly perturbed systems are not linear any more,
although the perturbation intensities are small. Therefore, the proposed method works reasonably
well in distinguishing the linear dynamical system with random shocks from a nonlinear dynamical
system.

4. STOCHASTIC DYNAMICAL SYSTEMS


In this section, we discuss a more general type of stochastic dynamical system.
For the stochastic dynamical system

yt = f(K-1) + Et,

where {Q} are i.i.d. normal random variables, we assume that the dynamical system is driven
constantly by the random shocks {.st} at all steps t = 1,2,. . . . In this case, the values of Dlk
appear to be unable to indicate the linearity or nonlinearity of the underlying dynamical system.
In order to overcome the difficulty, we propose below a modified version of the Hankel or Hankel-
like matrix method described above.
Recall that the determinant of a matrix equals the product of its eigenvalues {Xi}. The
proposed modified method uses the ratio

maxr<i<k PiI
rninililk
-- J&l ’

which is a condition number of the matrix. The closer the minimum of the absolute values of
eigenvalues is to zero, the larger the condition number will be. If we apply this condition number
to the Hankel matrix of the observed time series, which was defined in (2), then we have an
indicator of the linearity of the stochastic dynamical system that provides the time series data.
More precisely, suppose that from the observed time series of a stochastic dynamical system we
have formed the Hankel matrices Alk, as if this was in the deterministic case. We then compute
6 D. LAI AND G. CHEN

the condition numbers for all such Hankel matrices for j = 1,2,.. . . Denote the logarithm of
the condition number of the jth Hankel matrix as qj. If the matrices were formed from i.i.d.
elements, the distribution of {qj} is already known [21,22]. Here, however, the elements in a
Hankel matrix are not i.i.d., even when the time series consists of i.i.d. observations. If the
observed time series is stationary and satisfies the regularity conditions (or mixing conditions)
stated in 1231, which is too tedious to repeat here in this short paper, then the series consisting
of the condition numbers of the Hankel matrices generated from the time series is also stationary
and satisfies the regularity conditions. The reason is simply that the eigenvalues are continuous
functions of the matrix elements. Hence, the central limit theorem holds for this sequence of
condition numbers of Hankel matrices. Note, furthermore, that all stationary auto-regressive
moving average (ARMA) processes belong to the class of models considered in [23]. Thus, the
asymptotic result that we can conclude from this discussion is summarized as follows.

THEOREM 4.1. Let n be the length of the observed time series and k be the dimension of the
HanJce1 matrix generated from the time series. If the observed time series is stationary and
satisfies the mixing condition stated in [23], then we have

(n - 2k)li2(ij - p) -+ N (0, a2) , (7)

where ij = ~~~~” rlj/(n - 2k), and p and g2 are the mean and variance of ?j, respectively.
Analytic expressions for the mean ~1 and the variance o2 are complicated and very difficult to
obtain, even for i.i.d. standard normal series, as is well known. However, they can be estimated
by using the Monte Carlo method. For instance, if we want to estimate the p and c2 for fl from
the i.i.d. standard normal series, we can simulate many i.i.d. series and then calculate q for each
series. Based on that, the mean and the variance of q can be estimated by using the sample mean
and sample variance, respectively.

CU- 1

_ I

O-

‘;

Y-

c?-
I I 1

O 20 40 60

Index
Figure 3.

In the simulation study, we generated one hundred independent series of i.i.d. standard normal
observations, one hundred series of output data from the logistic map with a fixed a = 4, but
varying initial values chosen uniformly from (0, l), and the linear AR model

Yt = I.4 yt-1 - 0.6 yt-2 + &t,


Hankel Matrix Approach 7

I I I I I I

0 20 40 60 80 100

Index

Figure 4.

l-
I I I I I I

0 20 40 60 80 100

Index
Figure 5.

where Et is i.i.d. standard normal. The lengths for all series are the same: n = 100. The
typical trajectories of these three models (i.i.d. standard normal, logistic, and the linear AR) are
plotted in Figures 3-5, respectively. The estimates of the mean and variance of ?j for the Hankel
matrices of dimension 10, 15, 20, and 25, are shown in Table 1, and the histograms are plotted
in Figures 6-8, respectively, for these three models.
From these Monte Carlo simulations and many simulations that we have carried out, we are
convinced that the experimental results are consistent with the theoretical analysis that we pre-
sented above. Hence, we conclude that for the Hankel matrix of dimension k = 10, the mean
log (condition numbers) of the observed time series in linear stochastic dynamical systems are
D. LAI AND G. CHEN

2.0 2.5 3.0 3.5 4.0 2.0 2.5 3.0 3.5 4.0 4.5

d -,

2.0 2.5 3.0 3.5 4.0 4.5

Mean condition number Mean condition number


Figure 6.

d -

2 3 4 5 6 3.5 4.0 4.5 5.0

Mean condition number Mean condition number

3.0 3.5 4.0 4.5 5.0 5.5 6.0 4 5 6

Mean condition number Mean condition number


Figure 7.

greater than 4; the mean log (condition numbers) of the observed time series in nonlinear stochas-
tic dynamical systems are in between 3 and 4; the mean log (condition numbers) of the observed
time series in i.i.d. random processes are less than 3. Conclusions for other cases of different
dimensions can be drawn from the results summarized in Table 1.
. Hankel Matrix Approach

N
G -) N -

3 4 5 6 4 5 6

Mean condition number Mean condition number

4 5 6 7 4 5 6 7

Mean condition number Mean cow’


cition number

Figure 8.

Table 1. The simulation results for the means and the standard deviations for the
mean-log condition numbers of the Hankel matrices of different lags, based on the
100 independent realizations of the time series from the models with length 100.

10 15 20 25
P u P 0 /I (T P 0
i.i.d. N(O,l) 2.852 0.042 3.173 0.058 3.465 0.071 3.696 0.0767
Logistic(4) 3.892 0.078 4.270 0.077 4.690 0.072 4.885 0.069
_ AH(z) 4.469 0.045 4.955 0.101 5.283 0.088 5.433 0.154

5. CONCLUSIONS AND DISCUSSIONS

In this paper, we have proposed two methods based on the Hankel or Hankel-like matrices
generated from the observed time series for identification of linear or nonlinear (deterministic
or stochastic) dynamical systems. The first method using the determinants of Hankel matrices
works perfectly well in identifying linear deterministic dynamical systems. The time-series data
can be just a short period of observations. This deterministic method, if being combined with the
Volterra series representation of nonlinear systems, can be extended to nonlinearity identification,
for which the familiar logistic map has been used as a successful example. We have also applied
this method to the linear deterministic dynamical systems that contain external random shocks.
The proposed method works reasonably well for this case; however, not as good as the linear case
due to the complexity of nonlinearity in general.
We have also modified the method, using the condition numbers of the Hankel matrices gener-
ated from the observed time series of the underlying dynamical system, to identify some stochastic
dynamical systems. The central limit theorem on the average of the condition numbers of these
Hankel matrices has been applied to the time series that are stationary and satisfy the regularity
conditions imposed in [23]. The mean and the variance of the average of these condition numbers
have been estimated by the Monte Carlo method.
10 D. LAI AND G. CHEN

If the essential hypothesis of this study is that the observed series is obtained from an i.i.d.
standard normal random variable, then we may instead use the following matrix to repeat the
same test:
/ Yl Yk+l **. Y(k-2)k+l Y(k-l)k+l \
Y2 ?/k+2 *‘* Y(k-2)k+2 ?/(k-l)k+P

\Yk Y2k “’ Y(k-1)k !/ka /

Then the elements of this matrix are i.i.d. and so the limit distribution of {qj} can be derived
from the results of [21]. Note, however, that the limiting result in [21] requires that k -+ 00,
and hence, it would not be satisfied here since, we usually choose a small number of k (short-
period time-series data) in order to efficiently use the methods proposed in this paper for system
identification.

REFERENCES

1. P.J. Brockwell and R.A. Davis, Time Series: Theory and Methods, Springer-Verlag, New York, (1987).
2. G. Chen, G. Chen and S.H. Hsu, Linear Stochastic Control Systems, CRC Press, Boca Raton, FL, (1995).
3. T. Sauer, J.A. Yorke and M. Casdagi, Embedology, J. Stat. Phys. 65, 579-616 (1991).
4. E. Ott, T. Sauer and J.A. Yorke, Coping with chaos, In The Theory of Embedding, Chapter 5, Wiley, New
York, ( 1994).
5. T. Sauer, Time series prediction by using delay coordinate embedding, In Time Series Prediction: Forecast-
ing the Rhme and Understanding the Past (Edited by A.S. Weigend and N. Gershenfeld), Addison-Wesley,
(1993).
6. F. Takens, Detecting Strange Attractors in TuTbuZence, Springer-Verlag, New York, (1981).
7. H. Whitney, Differentiable manifolds, Annals of Math. 37, 645-680 (1936).
8. S.A. Billing and Q.M. Zhu, Nonlinear model validation using correlation tests, ht. J. Control 60, 1107-1120
(1994).
9. H. Tong, Non-linear Time Series: A Dynamical System Approach, Oxford University Press, New York,
(1990).
10. W.A. Brock, W.D. Dechert, J.A. Scheinkman and B. LeBaron, A test for independence bssed on the
correlation dimension, Department of Economics, University of Wisconsin at Madison, (1991).
11. B. Auested and D. Tjgstheim, Identification of nonlinear time series: First order characterization and order
determination, Biometrika 77 (4), 669-687 (1990).
12. J.D. Petruccelli and N. Davis, A portmanteau test for self-exciting threshold autoregressive type nonlinearity
in time series, Biometrika 73 (3), 687-694 (1986).
13. R. Savit and M. Green, Time series and dependent variables, Physica D 50, 95-116 (1991).
14. D.M. Keenan, A Tukey nonadditivity type test for time series nonlinearity, Biometrika 72 (l), 39-44 (1985).
15. T.H. Lee, H. White and W.J. Granger, Testing for neglected nonlinearity in time series, J. of Econometrics
56, 269-290 (1993).
16. T. Terasvirta, C.F. Lin and W.J. Granger, Power of the neural network linearity test, J. Time Ser. Anal.
14 (2), 209-220 (1993).
17. T.S. F&o and M.M. Gabr, A test for linearity of stationary time series, J. Time Ser. Anal. 1, 145-158
(1980).
18. H.L. Gray, G.D. Kelley and D.D. McIntire, A new approach to ARMA modeling, Commun. Statis.-Simul.
Comput. B7, l-77 (1978).
19. W.S. Chan and H. Tong, On tests for nonlinearity in time series, J. of Forecasting 5, 217-228 (1986).
20. L.R. Hunt, R.D. DeGroat and D.A. Linebarger, Nonlinear AR modeling, Circuits, Systems and Signal
Processing 14, 689-705 (1995).
21. A. Edelman, Eigenvalues and condition numbers of random matrices, SIAM J. Matrix Analy. Appl. 9,
543-560 (1988).
22. A. Edelman, On the distribution of a scaled condition number, Mathematics of Computation 58, 185-190
(1992).
23. I.A. Ibragimov, Some limit theorems for stationary processes, Theory of P&a. Appl. 7, 345-382 (1962).
24. D.R. Brillinger, An introduction to polyspectra, Ann. Math. Statist. 36, 1351-1374 (1965).
25. M. Hinich, Testing for Gaussianity and linearity of a stationary time series, J. Time Ser. Anal. 2, 169-176
(1982).
26. J.W. Van Ness, Asymptotic normality of bispectral estimates, Ann. Math. Statist. 37, 1257-1272 (1966).

You might also like