Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Regional Management of Water Resources (Proceedings of a s y m p o s i u m held during die Sixth I A H S

Scientific Assembly at Maastricht, The Netherlands, July 2001). I A H S Publ. no. 2 6 8 , 2 0 0 1 . 239

River flow prediction: an artificial neural network


approach

A. W . J A Y A W A R D E N A & T. M . K. G. F E R N A N D O
Department of Civil Engineering, The University of Hong Kong, Pokfulam Road, Hong Kong
e-mail: h r e c j a w @ h k u c c , h k u , h k

Abstract A multi-layer perception neural network trained with the back-


propagation algorithm is adopted to make daily river flow forecasts at the
Nakhon Sawan gauging station on the Chao Phraya River in Thailand. Lead
times from 1 day to 28 days have been used. The predicted flow rates when
compared with observed ones agree within acceptable limits for both training
and testing data for most of the considered lead times. The study also
demonstrates that different network structures for the longer lead times can
enhance the prediction.
K e y w o r d s artificial n e u r a l n e t w o r k s ; river flow p r e d i c t i o n ; c r o s s - v a l i d a t i o n ; C h a o P h r a y a R i v e r

INTRODUCTION

River flow prediction is an important topic of research for hydrologists for a number of
reasons. For example, it is useful and necessary for flood disaster mitigation. It helps
in the planning, management and operation of water resources development projects.
The topic is however a difficult one because of the complexity of the physical
processes involved in the generation of river flows. An approach based on physics is
still far from being realized and researchers have therefore focused attention on the use
of "data driven" techniques in the recent years.
Data driven approaches do not lead to a better understanding of the process of the
river flow generation, but they have their advantages. They are very practical for "data
rich" but "theory weak" systems. Model formulations are often quite simple and
nonlinearities in the system pose no problems. One disadvantage is that extrapolation
of a model beyond the data range that has been used for calibration is not reliable if the
system is nonlinear.
In this study, one such data driven technique, the artificial neural networks (ANN)
approach is pursued with the objective of predicting daily river flows using lagged
variables. Artificial neural networks are capable of performing nonlinear modelling
without a priori knowledge about the relationship between input and output variables.
They learn from given data and capture the functional relationships among the data
even if the underlying relationships are unknown. After learning from sample data,
they can be used to approximate a continuous function to any desired accuracy (Hornik
et al, 1989; Hornik, 1991) using unseen input data. These distinguishing features
make ANNs a general and flexible alternative modelling tool for hydrological time
series prediction in recent years (Dawson & Wilby, 1998; Coulibaly et al, 2000;
Jayawardena & Fernando, 2000; Jayawardena et al, 2000).
240 A. W. Jayawardena & T. M. K. G. Fernando

D A T A SET

The data set used in this study consists of daily discharges at Nakhon Sawan gauging
station on the Chao Phraya River in Thailand for April 1978-March 1994 (5844
samples). The Chao Phraya River basin is one of the intensively monitored basins in
Asia under the auspices of the Global Energy and Water Cycle Experiment (GEWEX)
and the G E W E X Asian Monsoon Experiment (GAME). It has a basin area of
2
110 569 k m at Nakhon Sawan (latitude 15.67°N; longitude 100.12°E) and can be
considered as a macroscale basin. Several studies on the hydrology of this basin have
been carried out in recent years (Oki et al, 1995; Raghunath et al, 2000). The data set
was obtained from the Global Runoff Data Centre (GRDC # 2964100) in Germany.

METHODOLOGY

Artificial neural networks

Artificial neural networks are capable of learning the underlying behaviour of a system
from available data sets. The learning process, also known as training, is carried out
according to a learning rule. The synaptic weights are adjusted during the training so
that the error surface converges to a minimum. Neural network architectures can
basically be categorized into single-layer feedforward networks, multilayer
feedforward networks and recurrent networks (Haykin, 1999). The five basic learning
rales available in training neural networks include error correction learning, memory
based learning, Hebbian learning, competitive learning and Boltzmann learning
(Haykin, 1999).
A multilayer perception (MLP) type neural network trained with the back-
propagation algorithm (Rumelhart et al., 1986), which belongs to the category of error
correction learning rules, is the most widely used network paradigm in prediction
(Zhang et al, 1998). This is particularly so in water resource modelling (Maier &
Dandy, 2000). The M L P networks have gained popularity over other types of network
paradigms mainly due to their inherent capability of arbitrary input-output mapping. In
the present study too, a fully connected feedforward M L P network trained using the
back-propagation algorithm with a momentum term has been adopted. Despite its
popularity, the back-propagation algorithm suffers from limitations associated with its
derivation. Many such limitations such as local minima, paralysis and over-fitting can
be overcome by adopting necessary precautions during model formulation.

Data partition

The general approach of selecting a good training set from an available data series is to
include all extreme events in the training set. The inclusion of biased samples in the
training set is not recommended, as this will need a longer training time but not better
testing results. A preliminary analysis of the data indicated that most significant events
fall within the first quarter of the series. The latter part of the data series seems to
contain a lot of biased samples. Therefore the first 1750 samples of the data series
Riverflow prediction: an artificial neural network approach 241

were selected as the training set and the next 400 samples were taken as the validation
set, despite the suggestion to use equal-sized training and validation sets (Reed &
Marks, 1999). The choice of the length of the validation set is based on the recom­
mendation to use 1 0 - 2 0 % of the training samples as the validation set (Kasabov,
1996). The remaining samples were considered for testing the trained network.

Model formulation

The use of lagged input variables leads to better predictions in a dynamic system. One
way of finding out how many lagged variables to include is by way of the autocorrelo-
gram, which in this study revealed a high correlation even up to 50 days. However, the
inclusion of such a large number of inputs from the same variable is not reliable and,
furthermore, would give rise to a very large network. Therefore the simulations were
first carried out by assuming that the predicted discharge depended only on the discharge
values in the previous 7 days. Thus the input layer consists of seven nodes representing
the daily discharge values at time levels t, r - 1, - 6 . I n a sensitivity analysis, it was
found that these seven variables carried equal importance for 1-day lead-time
predictions. The same input variables were therefore used for other lead-time predictions
as well in the first instance as it has been pointed out (Coulibaly et al., 2000) that short-
term predictions are not significantly affected by the selection of a particular network
structure. However, it was seen that the results deteriorate rapidly with increasing lead
time. Therefore different lagged input variables were used for longer lead times.

Number of nodes in the hidden layer

The number of nodes in the hidden layer affects the performance of the trained
network and therefore should be optimally determined. In general, networks with
fewer hidden nodes are preferable as they usually have better generalization
capabilities and less over-fitting problems. The computational time required is also less
with a smaller number of nodes. However, if the number of nodes is not large enough
to capture the underlying behaviour of the data, then the performance of the network is
impaired. In most cases, the selection of the optimal number of nodes in the hidden
layer is a trial-and-error procedure with the help of some guidelines. A rule of thumb is
that the number of samples in the training set should at least be greater than the
number of synaptic weights (Rogers & Dowla, 1994; Tarassenko, 1998). This gives
the upper limit of the number of nodes for the network. In this study, a trial-and-error
procedure for hidden node selection was carried out by starting with two nodes and
then gradually increasing the number until there is no further reduction in error, or up
to the maximum number of nodes dictated by the above rule of thumb. This procedure
led to four hidden nodes for 1-day lead time with seven inputs, which was adopted
initially for all other lead times as well when the input layer had seven inputs.
Subsequently, the effects of the numbers of nodes for networks with longer lead times
but with the same inputs were examined. The numbers of hidden nodes for networks
with other inputs and lead times longer than 14 days were selected on a case by case
basis.
242 A . W. Jayawardena & T. M. K. G. Fernando

Learning rate and momentum term

The choice of the learning rate and the momentum term can have a significant effect
on the training process. Too small learning rates will result in longer training times and
too high values will lead to rapid learning but cause oscillations in the error surface.
Addition of the momentum term to the back-propagation algorithm allows synaptic
weights to be updated in the general direction of error reduction without getting
trapped in a local minimum. The momentum term can take values between 0 and 1.
Higher values would produce fluctuations in the error surface while smaller values
would not have much effect on convergence speed. In this study, the optimal values for
the learning rate and the momentum term, determined by trial and error, were 0.1 and
0.3, respectively.

Training and testing

The logistic function was selected as the activation function for both hidden and output
layers. Hence the input variables as well as the target values were normalized linearly
in the range of 0.1 to 0.9. The synaptic weights of the networks were initialized with
normally distributed random numbers in the range of - 1 to 1. The order of presenting
the training samples to the network was also randomized from iteration to iteration.
Such measures were taken to improve the performance of the back-propagation
algorithm.
Training was carried out in a pattern mode; the weights were updated after
presenting each sample in the training set to the network. After each iteration, the
mean square errors (MSE) corresponding to the training set as well as the validation
set were computed. The commonly used stopping criteria include a fixed number of
iterations, the rate of change of M S E of the training set reaching a pre-determined
value and cross-validation. When the ratio of the number of unbiased samples in the
training set to the number of synaptic weights in the network is less than 30, the cross-
validation criterion enhances the generalization of the trained network (Amari et al,
1997). Two stopping criteria, the cross-validation and the fixed number of iterations
were adopted in this study. The maximum number of iterations was set at 1000.
When cross-validation is used as the stopping criterion, the common practice is to
stop training when the M S E of the validation set just begins to increase. However, in
real situations, the error reduction curve of the validation data is not a smooth one, and
termination of training at the first local minimum may lead to incomplete training. In
the present study, the program was developed in such a way that training can further be
carried out for a pre-determined number of iterations after reaching a local minimum.

Model performance criteria

The performances of the models were evaluated using the root mean square error
2
(RMSE), the coefficient of determination (it ) and the coefficient of efficiency (E)
(Nash & Sutcliffe, 1970) as error indicators. The coefficient of efficiency varies
between -oc and + 1 , with higher positive values indicating better agreement. However,
River flow prediction: an artificial neural network approach 243

E cannot be interpreted easily as it has no lower bound (Legates & McCabe, 1999).
Scatter plots, double mass curves and time series plots were also used for visual
comparison of observed and predicted values.

RESULTS AND DISCUSSION

The initial results with the same network structure for all lead-time forecasts are
promising for lead times up to 7 days. The error indicators computed for training and
testing sets are summarized in Table 1. They are computed for the whole training and
testing sets. The illustrations (Figs 1-3) are for the period from day 2601 to day 2800,
which represents one of several peaks in the testing set. The scatter plots for the 1-day
and 7-day lead times illustrate the higher accuracy with shorter lead time (Fig. 1).
Although the mass curves for the same data show a marginal variation with increasing
lead time (Fig. 2), the time series plots (Fig. 3) clearly demonstrate the increase in
error with increasing lead time.

Table 1 Error indicators for training and testing data: network structure 7-4-1.
1 3 - 1 l
Lead time RMSE (m s" ') RMSE (m s ') R R E E
(days) (Training) (Testing) (Training) (Testing) (Training) (Testing)
1 108 113 0.990 0.972 0.972 0.875
2 126 127 0.981 0.945 0.962 0.842
7 213 196 0.901 0.727 0.891 0.623
14 346 252 0.723 0.541 0.711 0.375
21 425 290 0.575 0.405 0.564 0.174
28 469 322 0.477 0.313 0.468 -0.023

*= ~
2
E(o,-o)
where N is the length of the data set and O and P are the observed and predicted values respectively.
Overbar denotes the mean for the considered data set.
Network structure 7-4-1 refers to seven nodes in the input layer, four nodes in the hidden layer and one
node in the output layer.

The ratio between the number of samples in the training set to the number of
synaptic weights in the network was about 55 for these simulations. As a result, the
stopping criterion should be based on the maximum number of iterations (Amari et al.,
1997). However, training was terminated using the cross-validation criterion, except
for forecasts up to 7-day lead times. In an attempt to obtain the optimal networks for
higher lead times, it was assumed that the predicted discharge at lead times greater
than or equal to 7 days depended on the discharge values at times t , t - \ , . . r — 14 and
t, t - 1, r - 2 1 . The accuracy of the results for the 7-day lead time does not vary
significantly by varying the inputs and the hidden nodes. However the computed
values seem to be sensitive for the network structure for the remaining lead times
considered. The optimal network structures obtained for different lead times and the
corresponding error indicators are given in Table 2. The results have improved from
244 A. W. Jayawardena & T. M. K. G. Fernando

2500 2500

2000 2000 -I

TÈ1500 ^1500
T> "D
B "1000
§1000 13
0 ~ 'Il* » ,
et £
500
500 -
0 -I
0 0
- 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500
3
Observed (m /s) Observed (m /s) 3

(a) L e a d - t i m e : 1 d a y (b) L e a d - t i m e : 7 d a y s

Note: Period covered - day 2601 to 2800

Fig. 1 Scatter plot for part of the testing data.

250000
- Observed
% 200000 - Predicted
CD

I 150000

I 100000
I 50000 -I
o
0 50 100 150 200 50 100 150 200
Time (Days) Time (Days)
(a) L e a d - t i m e : 1 d a y (b) L e a d - t i m e : 7 d a y s

Note: Period covered - day 2601 to 2800

Fig. 2 Double mass curve for part of the testing data.

- Observed - Predicted Observed Predicted


3000 3000

"E. 2000 \ 2000


Qi
of

1
cn

b
1000 I
CO
1000

50 100 150 200I 50 100 150 200


Time (Days) Time (Days)
(a) L e a d - t i m e : 1 d a y (b) L e a d - t i m e : 7 d a y s

Note: Period covered - day 2601 to 2800

Fig. 3 Observed and predicted discharge values for part of the testing data.
Riverflow prediction: an artificial neural network approach 245

Table 2 Error indicators for higher lead times with optimal network structure.

Lead time Network R l


R l
E E
(days) structure (Training) (Testing) (Training) (Testing)
14 14-30-1 0.763 0.534 0.759 0.438
21 21-15-1 0.619 0.403 0.613 0.246
28 21-30-1 0.523 0.315 0.517 0.043

the initial study. However, it is difficult to obtain a good model for lead times greater
than two weeks. This study clearly demonstrates the potential of A N N in making
reasonably reliable daily discharge predictions for short lead times using lagged
discharges as inputs.

REFERENCES

A m a r i , S., Murata, N., Millier, K., Finke, M . & Yang, H. H. (1997) Asymptotic statistical theory of overtraining and cross-
validation. IEEE Trans, on Neural Networks 8(5), 9 8 5 - 9 9 6 .
Coulibaly, P., Anctil, F. & B o b é e , B . (2000) Daily reservoir inflow forecasting using artificial neural n e t w o r k s with
stopped training approach. J. Hydrol. 230, 2 4 4 - 2 5 7 .
D a w s o n , C. W . & Wilby, R. (1998) Artificial neural network approach to rainfall-runoff modelling. Hydrol. Sci. J. 43(1),
47-66.
Haykin, S. (1999) Neural Networlcs, a Comprehensive Foundation, second edn. Prentice-Hall, N e w Jersey.
Hornik, K. (1991) A p p r o x i m a t i o n capabilities of multilayer feed forward n e t w o r k s . Neural Networks 4, 2 5 1 - 2 5 7 .
Hornik, K., S t i n c h c o m b e , M. & White, H. (1989) Multilayer feed forward n e t w o r k s are universal a p p r o x i m a t o r s . Neural
Networlcs 2, 3 5 9 - 3 6 6 .
J a y a w a r d e n a , A. W. & Fernando, T. M . K. G. (2000) Prediction of algal b l o o m s using artificial neural n e t w o r k s . In:
Engineering Problems—Neural Network Solutions (ed. b y D . Tsaptsinos) (Proc. Int. Conf. on Engineering
Applications of N e u r a l N e t w o r k s , July 2000), 1 1 5 - 1 2 2 .
J a y a w a r d e n a , A. W., F e r n a n d o , T. M . K. G., Chan, C. W . & Chan, W . C. (2000) C o m p a r i s o n o f A N N , d y n a m i c a l systems
and s u p p o r t vector a p p r o a c h e s for river discharge prediction. In: The 19th Chinese Control Conference (Proc. C C C
2 0 0 0 , D e c e m b e r 2000), vol. 2, 5 0 4 - 5 0 8 . Hong Kong.
Kasabov, N . K. (1996) Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering. M I T Press,
C a m b r i d g e , Massachusetts, U S A .
Legates, D . R. & M c C a b e , G. J. (1999) Evaluating the use of "goodness-of-fit" measures in hydrologie and hydroclimatic
model validation. Wat. Resour. Res. 35(1), 2 3 3 - 2 4 1 .
Maier, H. R. & Dandy, G. C. (2000) Neural networks for the prediction and forecasting o f water resources variables: a
r e v i e w of m o d e l l i n g issues and applications. Environ. Modelling & Software 15,101-124.
N a s h , J. E. & Sutcliffe, J. V. (1970) River flow forecasting through conceptual m o d e l s . Part I-A: Discussion of principles.
J. Hydrol. 10, 2 8 2 - 2 9 0 .
Oki, T., M u s i a k e , K., M a t s u y a m a , H. & M a t s u d a , K. (1995) Global atmospheric water b a l a n c e and runoff from large river
basins. Hydrol. Processes 9(5/6), 6 5 5 - 6 7 8 .
R a g h u n a t h , J., Herath, S. & M u s i a k e , K. (2000) River n e t w o r k solution for a distributed hydrological model and
applications. Hydrol. Processes 14(3), 5 7 5 - 5 9 2 .
Reed, R. D. & M a r k s , R. J. II (1999) Neural Smithing. M I T Press, C a m b r i d g e , M a s s a c h u s e t t s , U S A .
Rogers, L. L. & D o w l a , F. U. (1994) Optimization of ground w a t e r remediation using artificial neural n e t w o r k s with
parallel solute transport modelling. Wat. Resour. Res. 30(2), 4 5 7 - 4 8 1 .
Rumelhart, D . E., Hinton, G. E. & Williams, R. J. (1986) L e a r n i n g representations b y b a c k p r o p a g a t i n g errors. Nature 3 2 3 ,
533-536.
T a r a s s e n k o , L. (1998) A Guide to Neural Computing Applications. Wiley, N e w York.
Z h a n g , G. B . , P a t u w o , B . E. & Hu, M . Y. (1998) Forecasting with artificial neural n e t w o r k s : the state o f the art. Int. J.
Forecasting 14, 3 5 - 6 2 .

You might also like