Professional Documents
Culture Documents
Forecasting The Behavior of Multivariate Time Series Using Neural Networks
Forecasting The Behavior of Multivariate Time Series Using Neural Networks
Forecasting The Behavior of Multivariate Time Series Using Neural Networks
00
Printed in the USA. All rights reserved. Copyright ~ 1992 Pergamon Press Ltd.
ORIGINAL C O N T R I B U T I O N
Abstract--This pal~er presents a neural network approach to multivariate time-series anal.lwis. Real world observations
~?/flottr /.'ices in three cities have been used as a benchmark in our ev~eriments, l"eed/orward connectionist networks
have bt,t,tl ~k,.signed to model flottr l,ices over the period/iom ..lltgttsl 1972 to Novt,mher 1980./or the cities ~?[Blt[]ith~.
:lliltneapoli.s, and Kansas Cit): Remarkable sttccess has heen achieved ill training the lletworks to leatvt the price
CllrlV [i," each t?/these cities and in m a k i l l g accltrale price predictions. Our r¢,~lt/l.s show that the it¢ltral network
al~proat'h is a leading contender with the statistical modeling apl,oaches.
Keywords--Neural networks, Back propagation, Multivariate time-series, Statistical models, Training, One-lag
prediction, Multi-lag prediction, Combined modeling, Forecasting.
tion for chaotic time series by using a local approxi- native to the existing procedures for learning as well as
mation approach involving use of nearest k neighbors forecasting of interdependent data.
with respect to the magnitudes of the values rather than
with respect to time points to which the values belong. 2. M E T H O D O L O G Y
Despite considerable progress over the last decade, for-
2.1. Neural Networks
mulation of reasonable nonlinear models is an ex-
tremely difficult task because of simplifications made Artificial neural networks are computing systems con-
in the modeling stage, e.g., omitting parameters which taining many simple nonlinear computing units or
are unknown or which do not seem to affect the ob- nodes interconnected by links. In a "feedforward" net-
served data directly. Also, the relationships between work, the units can be partitioned into layers, with links
known parameters and observed values can only be from each unit in the k 'h layer being directed (only) to
hypothesized with no simple laws governing their mu- each unit in the (k + 1 )st layer. Inputs from the envi-
tual behavior. See, for example, Saikkonen and Luuk- ronment enter the first layer, and outputs from the net-
konen ( 1991 ). Hence, we resort to a "neural network" work are manifested at the last layer. A d - n - i
approach for nonlinear modeling of multivariate time- network, shown in Figure 1, refers to a network with
series; in earlier work, we have successfully used this d inputs, n units in the intermediate "hidden" layer,
approach in analyzing univariate time-series (see Li, and one unit in the output layer (see Weigend et al.,
Mehrotra, Mohan, & Ranka, 1990). 1990). A weight or "connection strength" is associated
Neural networks belong to the class of data-driven with each link, and a network "learns" or is trained by
approaches, as opposed to model-driven approaches. modifying these weights, thereby modifying the network
The analysis depends on available data, with little ra- function which maps inputs to outputs.
tionalization about possible interactions. Relationships We use such d - n - 1 networks to learn and then
between variables, models, laws, and predictions are predict the behavior of multivariate time-series. The
constructed post-facto after building a machine whose hidden and output nodes realize nonlinear functions
behavior simulates the data being studied. The process of the form ( 1 + e x p ( - ~ ' L t wixi + 0)) -t , where wi's
of constructing such a machine based on available data denote real-valued weights of edges incident on a node,
is addressed by certain general-purpose algorithms such 0 denotes the adjustable "threshold" for that node, and
as "back propagation" (see Rumelhart, Hinton, & m denotes the number of inputs to the node from the
Williams, 1986). previous layer. The training algorithm is detailed in
In this paper, we use neural networks to predict fu- Section 2.2.
ture values of possibly noisy multivariate time-series
based on past histories. The particular data analyzed 2.2. Procedure for Training the Networks
are monthly flour prices for Buffalo, Minneapolis, and
Kansas City over a period of 100 months. For impartial We use the error back propagation algorithm of
evaluation of the prediction performance of the ap- Rumelhart et al. (1986) to train the networks, using
proach, data for different periods are used in the mean squared error (MSE) over the training samples
"training" (modeling) and "testing" (prediction) as the objective function. From the given p-variate time-
phases. The performance exceeded expectations and series S = { (vl (t) . 9 p ( l ) ) : 1 <_ t <_ N } , we obtain
. . . .
the root mean squared errors (in long-term, or multi- two sets: a training set Steal, = { ( v t ( t ) . . . . . v p ( t ) ) : 1
lag prediction) obtained using this approach are better -< t-< T } , and a test set St¢st = { ( v l ( t ) . . . . . vp(t)) :
than those obtained from the statistical model of Tiao T < t < N }. A set Ptrai, o f ( d + 1 )-tuples (the first d
and Tsay (1989) by at least one order of magnitude.
We expect such results to be obtained in other appli-
cations of interdependent time-series as well.
Section 2 presents the architecture of the neural net-
works used for our analysis, the experiments performed ( Output Layer
i i i i , i i ,
5.4 5.4
5.2
LFPI 5 LFPI
4.8 4.8
~g Tiao/Tsay-
4.6 4.6
I I I I I I / I
10 0 0 40 50 70 0 90 10 20 30 40 50 60 70 80 90
Time(months) Time(months)
FIGURE 4. 6-6-1 network modeling: Buffalo (training). FIGURE 6. Tiao & Tsay's model: Buffalo (90 months).
to judge the performance of the network in terms of the above criterion. For instance, in the d - ,1 - 1
MSE(test). Experiments were performed with 2-2-1, feedforward network used to predict y,. if d = 8. the
4-4-1, 6-6-1, and 8-8-1 networks. These experiments chosen input values would be -~t. z , _ , , y , , . x , _ ~,
were done in order to compare with the combined z, 2, x, 2, Y, - 2. :, - .~. In both cases, the training set
modeling approach, described below. consisted of the first 90 items oftrivariate data and the
2. Combined Modeling: We obtained a considera- test set consisted of the remaining 10. The results shown
bly improved performance using(for each series) in- in Figures 4 through 30 compare performances of both
formation from all three series, instead of treating each a 6-6-1 network (the first approach) and an 8-8-I net-
series individually. Two kinds of experiments were per- work ( the second approach ) with that of Tiao and Tsay's
formed using this approach. The first kind is illustrated model. As before, we have explained our choice of an
in Figure 3 in which .x-, ~ . is shown as being learned/ 8-h- I network in Section 3. The best performances were
predicted using six preceding values (recorded at time obtained with h = 8 for such networks that were ex-
points t and t - 1 only) from all the three series, i.e., perimented with. In all the graphs shown, the y-axis is
(x,, y,, z , , x , .,)', , , z, ,). Similar experiments were labeled "LFPI,'" an abbreviation for "Logarithms of
performed to predict )'t ~, and z,, , also. This method monthly _Flour _Price Indices.'"
determines the value of a variable at any time t using 3. Statistical Model: Tiao and Tsay (1989) devel-
strictly past data for all the variables in each training oped a statistical model of prediction which involves
input: it does not utilize contemporaneous data for the computation of the overall order of the temporal process
other variables. We experimented with 6-6-1 networks with the help of normalized criterion and root tables,
of this kind and the results are shown in Table 2. We fbllowed by estimation of unknown parameters. For
experimented with a variety of6-h-1 networks and ob- this example, it was found that a trivariate ARMA( 1,1 )
tained the best results with 6-6-1 networks. The reason model or A R ( 2 ) model would be appropriate for the
for choosing 6 input nodes is mentioned in Section 3. data. Subsequently, Tiao and Tsay (1989) modeled the
However, for the data studied, there was an implicit data as described below.
ordering among the three series: x, values were available First, each trivariate input vector z. has to be trans-
before y, values, and )'~ values were available before z, formed by premultiplication with a 3 × 3 matrix T,
values, and (naturally) all these were available before called the transformation matrix. The transformed data
.xt +, values. This observation led to our second kind conforms to a trivariate equation of the form
of combined modeling approach. In this approach, each
training input pattern consists o f " p a s t " data items by (I - cI,tB)y, = c + (I - ¢ , B ) a ,
, ~ i , , i i i
5.4
5.2
LFPI LFPI 5
4.8 4.8
" ~ts network --
4.6 •1.6
I I i I I | I I
I , 310 t ' i6 , h
I0 20 40 50 0 70 80 90 I0 20 30 40 50 60 70 SO 90
Time(months) Time(months)
FIGURE 5.8-8-1 network modeling: Buffalo (training). FIGURE 7. 6-6-1 network modeling: Minneapolis (training).
Forecasting H.ith Neural Networks 965
i i i i i i i i 1 i i i i i i
5.4
5.2
LFPI
+2j A
5
4.8
4.6
- ~
If
network - -
LFPI 5
4,8
4.6
V
l+'
network - -
10 20 40 ,50 60 70 80 90 '
10 '
20 '
30 '
40 '+'
50 0 7' o + ' 0 90
Time(months) Time(months)
FIGURE 8. 8-8-1 network modeling: Minneapolis (training). FIGURE 11.8-8-1 network modeling: Kansas City (training).
i i i i i g J i i i i +
5.4 5.4
5.2
LFPI 5 LFPI 5
10 20 30 ,10 50 60 70 80 90 10 20 30 40 50 60 70 80 90
Time(months) Time(months)
FIGURE 9. Tiao & Tsay's model: Minneapolis (90 months). FIGURE 12. Tiao & Tsay's model: Kansas City (90 months).
where the transformed series Yt = Tzt is a 3 X 1 (tri- actual values ofyt, for t = 2, 3 . . . 90. One-lag prediction
variate) c o l u m n vector, B represents the backshift op- is merely a continuation of the above process for t =
erator ( i.e., By, = Yt - t ), and the 3 × 1 c o l u m n vectors 91, 92 . . . . and multi-lag prediction is performed in a
at comprise the error c o m p o n e n t s of the model. The similar fashion but using estimated values of Yt for t =
matrix coefficients (I - ~ B) and (I - O~ B) represent 91, 92 . . . . . 99 instead of actual values Yg~. . . . . Y99.
the autoregressive and moving average components, re-
spectively. The estimated values of the 3 × 3 matrices
3. COMPARATIVE ANALYSIS OF
• ~, O~, and the 3 X 1 vector c are given in Tiao and
EXPERIMENTAL RESULTS
Tsay (1989).
In the above trivariate model, the mean squared er- The mean squared errors obtained for three different
rors are obtained from at's, after premultiplying each models of separate modeling networks are presented
such vector by the inverse of the transformation matrix in Table 1. The values correspond to the mean squared
T. The manner of computing the at vectors is as follows. errors observed for (a) the first 90 univariate data items,
We initialize at to zero; and for t = 1 , 2 . . . 89, c o m p u t e which correspond to the training data, (b) one-lag, and
Yt + i by the recipe of the model by using known values (c) multi-lag predictions over the remaining 10 uni-
ofyt, at. The error vector at is obtained at each step by variate data items for each of the three time-series. It
taking the difference between the c o m p u t e d and the is interesting to observe that the training performance
i i i i i i i
5.4
5.2 5.2
LFPI 5 LFPI 5
10 20 30 40 50 60 70 80 90 91 92 93 94 95 96 97 98 99 100
Time(months) Time(months)
FIGURE 10. 6-6-1 network modeling: Kansas City (training). FIGURE 13. 6-6-1 network prediction, one-lag (Buffalo).
966 K. Chakraborty et al.
5.4 5.4
5.2 f 5.2
LFPI 5 LFPI 5
4.6 4.6
i i i i i i i i i i i i i i i
91 ;2 93 94 95 96 97 98 99 100 91 92 93 94 95 96 97 98 99 100
Time(months) Time(months)
FIGURE 14. 8-8-1 network prediction, one-lag ( B u f f a l o ) . FIGURE 17. 8-8-1 network prediction, multi-lag ( B u f f a l o ) .
5.4 5.4
5.2 f 5.2
LFPI ,5 LFPI 5
4.6 4.6
i
91 92 ,
93 I
94 I
95 916 ,
97 98 919 100 92 3 94 95 6 97 8 9 100
Time(months) Time(months)
FIGURE 1 5 . T i a o & Tsay's model, one-lag ( B u f f a l o ) . FIGURE 18. Tiao & Tsay's model, multi-lag ( B u f f a l o ) .
improves whereas one-lag and multi-lag performances ensures that there are two data items belonging to
deteriorate (in terms of MSE (training) and MSE (test), strictly past time points for each city.
respectively), as the size of the networks is increased. Considerably improved performance, as measured
This suggests that the 4-4-1 and 6-6-1 networks are in terms of mean squared errors in training and testing,
oversized for the given univariate data and a 2-2-1 net- resulted by following the combined modeling approach
work is more suitable for prediction. Thus, every data for the given data. We believe that separate modeling
item in each time-series considered individually is gives poorer results than combined modeling for two
found to be strongly dependent on the past two values reasons. First, since the three series have a high pairwise
only, a fact which agrees with Tiao and Tsay's AR(2) positive correlation, each series carries information
model. This observation also led us to experiment with valuable not only for prediction of its own future values
6-h-1 and 8-h-1 networks for the two combined mod- but also for those of the other two series. Second, the
eling approaches, respectively. Since combined mod- combined modeling training set contains three times
eling uses trivariate data in our example, the choice of as many observations as are available for each single
6 for the number of inputs in the first case is obvious. modeling training set.
For the second combined modeling approach, we uni- The mean squared errors and coefficients of varia-
formly chose an 8-h-1 network for modeling each city tion for three different sets of experiments are listed in
because 8 is the least number of network inputs which Table 2. The values correspond, respectively, to the
5.4 5.4
5.2 5.2
LFPI 5 LFPI 5
4.6 4.6
i i i i i i i i
91 92 93 94 95 96 97 98 99 I00 91 ,
92 ,
93 l ,
94 95 96' 9 ' 7 9 89 9 100
Time(months) Time(months)
FIGURE 1 6 . 6 - 6 - 1 network prediction, multi-lag ( B u f f a l o ) . FIGURE 19. 6-6-1 network prediction, one-lag ( M i n n e a p o l i s ) .
Forecasting With Neural Net works 96 7
5.4 5,,]
5.2 5.2
LFPI 5 LFPI 5
4.6 4.6
I I I I I I I I I I I I
91 '
92 9'3 '
94 95 96 97 98 99 100 91 2 93 94 95 96 97 98 99 lO0
Time(months) Time(months)
FIGURE 20. 8-8-1 network prediction, one-lag (Minneapolis). FIGURE 22. 6-6-1 network prediction, multi-lag (Minneapolis).
mean squared errors and coefficients of variation ob- and prediction phases of our experiments and the re-
served for (a) the first 90 trivariate data items, which sults shown in Table 2 are fairly representative. Since
correspond to the training data for the combined mod- back propagation is a gradient-descent optimization
eling networks, (b) one-lag, and (c) multi-lag predic- technique, training a network may result in its sinking
tions of the combined modeling network and statistical into a local minimum of the MSE that may be far re-
models. Tiao and Tsay's model is seen to produce moved from the global one (see Tesauro & Janssens,
greater mean squared errors in training and prediction 1988). Hence, to ensure that the MSE(training) was
than the combined modeling networks in all but one not trapped in such a local minimum, several training
case (one-lag prediction for Buffalo, for which Tiao and runs were conducted for each network. The MSE values
Tsay's model outperforms the 6-6-1 model.) A word of achieved in these runs were found to be extremely small
caution is in order at this point. The model of Tiao and quantities that differed only slightly from one another.
Tsay was derived using all 100 trivariate observations. A small value of MSE(training) does not necessarily
For comparison with our network errors, we obtained imply a small value of MSE(test) as well. However, our
two sets of MSE values from their model. In this sense, experimental results generally indicate that if the neural
the neural network is in a slightly disadvantageous po- network performs well on the training samples, it also
sition because the network modeling is based on the has good generalization capabilities, i.e., MSE(test) is
first 90 trivariate observations only. Yet this method is also quite small. This is to be expected if the training
a simple and desirable way to perform comparisons and test samples are drawn from the same distribution
between the two approaches. and the network is not overtrained. In most of our ex-
We also notice that the 8-8-1 combined modeling periments, the MSE(training) and MSE(test) values
network, which assumes ordering among the three se- for the network were highly correlated.
ries, performs considerably better than the 6-6-1 net-
work, which utilizes strictly past data in both modeling
4. DISCUSSION
and prediction. This proves that for the example stud-
ied, contemporaneous values of the two other variables Most statistical models for learning and predicting time-
have a great role to play in determining the behavior series are based only on linear recurrences. Though
of a variable at any time instant. This fact indicates computationally inexpensive, such functions do not of-
that there is a strong pairwise correlation between the ten accurately represent temporal variations. Nonlinear
data for the three cities. functions, on the other hand, are more useful for tracing
The performance of the neural networks did not vary temporal sequences. This is probably the main reason
much for different choices of input sizes in the training for the significantly better performance of the neural
5.4 5.4
5.2 j f 5.2 f
LFPI 5 LFPI 5
4.6 4.6
i I I I I I
I I I l I I l I
91 92 93 94 95 96 97 98 99 100 95 96 97 98 99 100
Time(months) Time(months)
FIGURE 21. Tiao & Tsay's model, one-lag (Minneapolis). FIGURE 23. 8-8-1 network prediction, multi-lag (Minneapolis).
968 K. Chakraborty et al.
i i i ~ J J i i
5.4 5.4
5.2 5.2
J
LFPI 5 LFPI 5
4.6 4.6
I i I I | I I I
L 913 i I | | 9| i
91 92 94 95 96 97 8 99 100 91 92 93 94 95 96 97 98 99 100
Time(months) Time(months)
FIGURE 24. Tiao & Tsay's model, multi-lag (Minneapolis). FIGURE 26. 8-8-1 network prediction, one-lag (Kansas City).
network approach (with nonlinearities at each node) ally expensive task. The main reason for this is that the
as compared to statistical modeling. units in hidden layers must be made to iterate among
In any nontrivial time-series, new values depend not themselves until their outputs converge, and there is
only on the immediately prior value but also on many no way of knowing a priori how many iterations it
preceding values. Using too few inputs can result in would take before all the hidden units have stable out-
inadequate modeling, whereas too many inputs can ex- puts. Simple feedforward nets are less computationally
cessively complicate the model. In the context of neural intensive and in many applications give good perfor-
networks, too many inputs would imply slower training mance in less time.
and slower convergence, and may in fact worsen the A potential objection to the claim of improved per-
generalization capabilities (applicability to test cases) formance using the neural network approach, in com-
of the network. Weigend et al. (1990) have a rule of parison to the statistical approach, is that neural net-
thumb for determining the number of weights in the works are more complex and have many more param-
network as a function of the number of training sam- eters (weights and thresholds): Would a more complex
ples. But this rule was found to be too restrictive for statistical model perform equally well? The answer is
the data set of 100 patterns we worked with, and hence, essentially methodological. Often, the real-world phe-
had to be disregarded. nomena being modeled are so complex that it is im-
Different types of connectionist models have been possible to theorize and generate statistical models. A
proposed for learning temporal variations of data. It large investment of experts' domain-specific research
has generally been held in the past that recurrent net- studies must precede the formulation of an adequate
works are more suitable for learning temporal data. model for each separate phenomenon. When a large
There were two reasons why recurrent networks were number of parameters are involved, it is difficult to
not used for modeling the trivariate data on flour- predict data even when the laws which govern their
prices--we observed experimentally that unfolding behavior are known, e.g., in the gravitational interac-
them into simple feedforward networks would cause tions between a large number of bodies. With neural
worse training and output predictions than single hid- networks, on the contrary, an essentially similar archi-
den layer feedforward nets; the network would become tecture can be quickly modified and trained for a variety
inherently slower because of a much greater amount of of different phenomena. The procedure is data-driven
computation involved. An unfolded version of a re- rather than model-driven and gives good results in many
current network is an approximation of it and imple- cases despite the unavailability of a good theory / model
menting an exact recurrent network is a computation- underlying observed phenomena.
5.4 5.4
5.2 5.2
LFPI 5 LFPI 5
4.6 4.6
L I I I I l I
91 92 3 94 95 96
Time(months)
7 98 99 100 91 I2 93 94 95 96
Time(months)
97 98 99 100
FIGURE 25. 6-6-1 network prediction, one-lag (Kansas City). FIGURE 27. Tiao & Tsay's model, one-lag (Kansas City).
Forecasting HTth Neural Networks 969
i i i i i D ~ i
5.4
5.2 5.2
LFPI 5 [ LFPI 5
We now evaluate the neural network approach with random initial weights for the same network archi-
respect to the following criteria for a good model sug- tecture produced remarkably consistent MSE values
gested in the literature, see Harvey ( 1989): after training. In all our experiments, the MSE val-
1. Parsimony: The neural network does contain a large ues after training were very close to zero.
number of parameters and is, hence, not parsimo- . Encompassing: The results obtained using the
nious. However, the method of training does not neural networks are better than those obtained using
impose any external biases and networks started an alternative statistical model, indicating that the
with different random weights successfully converged network exhibits the potential of a competitive al-
to approximate the time-series very well. ternative tool for analysis and especially, prediction,
2. Data coherence: The neural network model provides of multivariate time-series. However, no theory is
a very good fit with the data, as shown by the low directly suggested by the neural networks developed
MSE values for the training samples. so far. The design of forecasting models utilizing the
3. Consisteno' with prior knowledge: No explicit theory parameters of the trained neural networks is cur-
was constructed using the neural networks, hence, rently under way.
this criterion is largely irrelevant. In the best neural
network model, the assumption that flour prices be-
come known in a fixed order (.vt, yt, zt, .v2. . . . ) is 4.1. Conclusions
consistent with the information that the data are
We have presented a neural network approach to mul-
available slightly earlier for some cities than for oth-
tivariate time-series analysis. In our experiments, real
ers. So the network modeling can easily accom-
world observations of flour prices in three cities have
modate contemporaneous data which become
been used to train and test the predictive power of feed-
known in a fixed predetermined order.
forward neural networks. Remarkable success has been
4. Data admissibilit)': The values in a time-series pre-
achieved in training the networks to learn the price
dicted by the neural networks are always close to
curve for each of these cities, and thereby to make ac-
the immediately preceding values and do not violate
curate price predictions. Our results show that the
any obvious definitional or reasonable constraints.
neural network approach leads to better predictions
5. Structural stahility. The neural networks seem to
satisfy this criterion because they give a good fit for
test data, which are outside the set of training sam- TABLE 1
ples. Multiple training runs started with changed Mean-Squared Errors for Separate Modeling
& Prediction x 10 3
91 92 ;3 ,, Time(months)
,5 ;6 ,, ,8 ,, 100 6-6-1 Training 2.774 2.835 1.633
One-lag 12.102 9.655 8.872
Multi-lag 1 7 . 6 4 6 14.909 13.776
FIGURE 29. 8-8-1 network prediction, multi-lag (Kansas C i t y ) .
970 K. Chakraborty et al.
TABLE 2
Mean-Squared Errors and Coeffs. of Variation for Combined vs. Tiao/Tsay's Modeling/Prediction × 103
than a well-known autoregressive moving average Harvey, A. C. (1989). Forecasting, time series models and the Kahnan
model given by T i a o and Tsay ( 1989 ). Filter. U.K.: Cambridge University Press.
Li, M., Mehrotra, K., Mohan, C. K., & Ranka, S. (1990). Sunspot
We obtained a very close fit during the training phase
numbers forecasting using neural networks. Proceedings of the
and the networks we developed consistently outper- IEEE Symposium on Intelligent Control, I, 524-529.
formed statistical m o d e l s d u r i n g the p r e d i c t i o n phase. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning
We are currently exploring the c o m b i n a t i o n o f sta- internal representations by error propagation. In D. E. Rumelhart,
tistical and neural a p p r o a c h e s for time-series analyses. & J. L. McClelland (Eds.), Parallel distributed processing (pp.
We expect that m o d e l - b a s e d statistical preprocessing 318-362). Cambridge, MA: MIT Press.
Saikkonen, E, & Luukkonen, R. ( 1991 ). Power properties of a time
can further i m p r o v e the p e r f o r m a n c e or help in ob- series linearity test against some simple bilinear alternatives. Sta-
taining faster convergence o f neural networks in the tistica Sinica, !, 453-464.
task o f time-series predictions. Tesauro, G.. & Janssens, B. (1988). Scaling relationships in back-
propagation learning. Complex Systems. 2, 39-44.
Tiao, G. C., & Tsay, R. S. (1989). Model specification in multivariate
time series. Journal ~f the R~o'al Statistical Societ); B 51, 157-
REFERENCES 213.
Tong, H. ( 1983). Threshold models in non-linear time series analysis.
Box, (3. E. P., & Jenkins, (3. M. (1970). Time series analysis: Fore- Lecture Notes in Statistics, 21, New York: Springer-Verlag.
casting and control. San Francisco, CA: Holden-Day. Tong, H. ( 1990 ). Non-linear time series: A dynamical system ap-
Farmer, J. D., & Sidorowich, J. J. (1987). Predicting chaotic time proach. Oxford: Oxford University Press.
series. Physical Rtn,iew Letters, 59, 845-848. Weigend, A. S., Huberman, B. A., & Rumelhart, D. E. (1990). Pre-
Granger, C. W. J., & Newbold, P. ( 1986). Forecasting economic time dicting the future: A connectionist approach. International Journal
series (2nd ed.). Orlando, FL: Academic Press. ~f Neural Systems, 1, 193-209.