Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

ARTICLE IN PRESS

Neurocomputing 63 (2005) 381–396


www.elsevier.com/locate/neucom

Neural networks and M5 model trees in


modelling water level–discharge relationship
B. Bhattacharya, D.P. Solomatine
Department of Hydroinformatics and Knowledge Management, UNESCO-IHE Institute for Water
Education, P.O. Box 3015, 2601 DA Delft, The Netherlands

Received 22 October 2003; received in revised form 23 March 2004; accepted 23 April 2004
Available online 6 August 2004

Abstract

Reliable estimation of discharge in a river is the crucial component of efficient flood


management and surface water planning. Hydrologists use historical data to establish a
relationship between water level and discharge, which is known as a rating curve. Once a
relationship is established it can be used for predicting discharge from future measurements of
water level only. Successful applications of machine learning in water management inspired
the exploration of applicability of these approaches in modelling this complex relationship. In
the present paper, models of the water level–discharge relationship are built with an artificial
neural network (ANN) and an M5 model tree. The relevant inputs are selected by computing
average mutual information. The predictive accuracy of this model is compared with a
traditional rating curve built with the same data. It is concluded that the ANN- and M5 model
tree-based models are superior in accuracy than the traditional model.
r 2004 Elsevier B.V. All rights reserved.

Keywords: Artificial neural networks; M5 model tree; Water level; Discharge; Rating curve

Abbreviations: RMSE, Root mean squared error; NRMSE, Normalized root mean squared error.
Corresponding author. Tel: +31-15-215-1815; fax: +31-15-212-2921.
E-mail addresses: bha@ihe.nl (B. Bhattacharya), sol@ihe.nl (D.P. Solomatine).

0925-2312/$ - see front matter r 2004 Elsevier B.V. All rights reserved.
doi:10.1016/j.neucom.2004.04.016
ARTICLE IN PRESS
382 B. Bhattacharya, D.P. Solomatine / Neurocomputing 63 (2005) 381–396

1. Introduction

In flood management it is important to reliably estimate the discharge in a river.


Discharge measurement is time consuming, hazardous and costly. A cheaper
alternative is the so-called rating curve that embodies a functional relationship
between the water level (called stage when measured from a datum) and discharge
with the help of field measurements. Once a reliable rating curve is available
discharge can be estimated from the rating curve using the observed water level
(stage).
Normally a rating curve can be constructed with the help of polynomial regression
or auto-correlation-based statistical method such as ARIMA. The functional
relationship between stage and discharge is complex and cannot always be captured
by these traditional modelling techniques. Often this limits the practical usefulness of
rating curves.
In a situation when a considerable amount of data about the studied process is
available, as it is in the case with stage-discharge measurements, the use of simplified
traditional techniques based on conceptual models, such as a rating curve, may
hardly be justified. In order to achieve the maximum benefit it is important to exploit
the available data and an obvious choice here are the machine learning methods such
as artificial neural networks (ANN).
Among machine learning techniques, ANN is the one that is widely used in
various areas of water-related research, particularly in hydrology [13]. Our
experience has also shown that the use of committee of models, specialized for a
particular range of input–output space is often beneficial. One of such models,
belonging to a class of hierarchical modular models (dynamic committee machine) is
M5 model tree (MT) [24]. It is not as popular as ANN but has been proved to be
very efficient and robust.
In the present paper ANN and MT models of the stage–discharge relationship at
one discharge measuring station have been compared with a conventional rating
curve.

2. Rating curve characteristics

Ideally, with a view of establishing a relationship between stage and discharge, a


definitive flow condition at the measuring site should be established. This is achieved
through proper control of flow that restricts the transmission of the effect of changes
in flow condition either in an upstream or in a downstream direction. In ideal
conditions, a stage–discharge relationship theoretically becomes independent of
channel roughness and other uncontrolled circumstances. However, discharge
measuring sites in natural rivers bear considerable deviations from the desired
conditions due to practical reasons such as hydrological requirements, presence of
approach roads, availability of land, etc.
The parameters, stage and discharge, describe processes that develop in time and
are characterized by random fluctuations. The plot of discrete or continuous
ARTICLE IN PRESS
B. Bhattacharya, D.P. Solomatine / Neurocomputing 63 (2005) 381–396 383

recording of stage against time is called a stage hydrograph. With the help of a rating
curve a discharge hydrograph (variation of discharge with time) can be developed
using a stage hydrograph.
Rating curve is a useful tool for a hydrologist to predict discharges from gauge
observations [18,35]. It alleviates the need for costly and time consuming discharge
measurements. The quality in discharge prediction is crucial in flood management,
water yield computation and hydrologic design [7,12,26].
The most commonly used form of stage–discharge relationship is expressed as
follows:
Q ¼ aðh  h0 Þb ; ð1Þ

where h0 stands for the minimum stage below which a discharge is not feasible, h is
stage and Q is discharge. A first estimate of h0 is usually chosen by a hydrologist after
examining the characteristics of the historical stage data and then by trial and error
the final value of h0 is chosen which gives the best fit. Values for the regression
coefficients a and b are also chosen to maximize the fit to the training data.
During unsteady flows ðqu=qta0; where u is flow velocity and t is time) the
relationship between stage and discharge is not unique. During a rising flood, the
flood wave receives less hindrance in propagation than in a falling flood. For the
same stage this causes higher discharge during a rising flood than during a falling
flood. This effect is known as hysteresis; it results in a loop-rating curve and justifies
the premise that the relationship between stage and discharge is not a one-to-one
mapping, but has the dependency of discharge on past stage and discharge values.
To take into account the hysterisis effect sometimes the historical data is divided
into two sets: one set of stage and discharge data with rising stages and another set of
stage and discharge data with falling stages. Then separate regression models of the
form (1) are developed for each set. This approach is not without limitations as data
separation is often subjective and the subsequent use of rating curves needs expertise
and is prone to errors. Jones’ formula provides an alternative where a correction
procedure is adopted to take into account the hysteresis effect due to the dynamic
nature of flood waves [16,31].
When available stage–discharge data at a location is so limited that a rating curve
cannot be developed then a mathematical model of the flow in that region of the
river can be built. Flow of water in natural rivers is described by Saint Venant
equations (continuity and momentum) [2]. With assumptions such as acceleration is
nearly negligible, surface water slope is nearly parallel to the bed slope, the flow of
water can be approximated by the following kinematic wave equation [2]:
 
qh 1 dQ qh
þ ¼ 0; ð2Þ
qt bs dh qx

where h is the water depth, t the time, Q the discharge, x the coordinate axis in the
horizontal plane along the river, bs the storage width defined by: qA=qt ¼ bs ðqh=qtÞ
and A the cross-sectional area of the river. The kinematic wave equation is another
form of the advection equation describing the variations of a flood wave h(x, t) that
ARTICLE IN PRESS
384 B. Bhattacharya, D.P. Solomatine / Neurocomputing 63 (2005) 381–396

is advected with velocity


 
1 dQ
u¼ : ð3Þ
bs dh
Adopting a finite difference numerical scheme, Eq. (2) can be solved but it requires
initial and two-point (upstream and downstream) boundary conditions (either stage
or discharge). If the stage or discharge data is available at some upstream as well as
downstream locations of the point of interest then a mathematical model can be built
and can be calibrated with the sporadic stage–discharge data available at the
location of interest. However, numerical modelling also brings in errors and this
approach is not preferred by hydrologists unless data at the point of interest is
unavailable.

3. Machine learning approaches

In machine learning approach we use a non-linear parametric function


approximator. In function approximator the coefficients of the function decom-
position are obtained from the input–output data pairs, some chosen model
structure and systematic learning rules. Once trained, the machine learning model
becomes a parametric description of the function. Learning a general principle from
a set of specific training examples is achieved by trying out different model structures
and the related parameters. Out of several possible methods we considered ANN,
which is the most widely used method in the water sector, and M5 model trees, which
is almost unknown to the water sector.

3.1. Artificial neural network

An ANN is the most widely used method in ML and is, in fact, a broad term
covering a large variety of network architectures, the most common of which is a
multi-layer perceptron (MLP). Such a network is trained by the so-called error-
backpropagation method, which is a specialized version of the gradient-based
optimization algorithm.
Each target vector z is an unknown function f of the input vector x
z ¼ f ðxÞ: ð4Þ
The task of the network is to learn the function f. The network includes a set of
parameters (weights vector), the values of which are varied to modify the generated
function f0 , which is computed by the network to be as close as possible to f. The
weight parameters are determined by training (calibrating) the ANN based on the
training data set. More details about ANNs can be found in [15].
In solving water-related problems, ANNs have been successfully used in
rainfall–runoff modelling [8,21], prediction of discharge [22], in modelling the stage
behaviour (without considering discharge) [32]. An MLP ANN was found to be very
efficient in modelling stage–discharge relationships [3–5,18,31]. The effectiveness of
ARTICLE IN PRESS
B. Bhattacharya, D.P. Solomatine / Neurocomputing 63 (2005) 381–396 385

an ANN with a radial basis function was explored in [30] while a fuzzy-neural
network was used in [9].

3.2. Modular approach and the M5 model trees

A complex modelling problem can be solved by dividing it into a number of simple


tasks and combining the solutions of these tasks. The input space can be divided into
a number of subspaces or regions for each of which a separate specialized model
built. In machine learning such models are often called experts, or modules, and a
combination of experts –– a committee machine [15] classifies such machines into two
major categories: (1) static (represented by ensemble averaging and boosting), where
response of experts is combined by a mechanism that does not involve the input
signal, e.g., using fixed weights; and (2) dynamic, where experts are combined using
weighting schemes depending on the input vector.
The category of dynamic committee machines can be split further into two groups:
(2a) statistically-driven approaches with ‘‘soft’’ splits of input space represented by
mixtures of experts [17,19], and (2b) methods which do not combine the outputs of
different experts or modules but explicitly use only one of them, the most
appropriate one (a particular case when the weights of other experts are zero).
Contrary to the mixture models, methods of this group use the ‘‘hard’’ (i.e.
yes–no) splits of input space into regions progressively narrowing the regions of the
input space. Each individual expert is trained individually on subsets of instances
contained in these regions, and finally the output of only one specialized expert is
taken into consideration. The result is a hierarchy, a tree (often a binary one) with
splitting rules in non-terminal nodes and the expert models in leaves (Fig. 1). Such
models can be called hierarchical (or tree-like) modular models (HiMM). Their
optimisation is considered in [11].
Models in HiMMs could be of any type, for example linear regression or ANNs.
For solving numerical prediction (regression) problem, there is a number of splitting
methods that are based on the idea of a decision tree:

 If a leave is associated with an average output value of the instances sorted down
to it (zero-order model), then the overall approach is called a regression tree
introduced by Breiman et al. [6] and resulting in the numerical constants (zero-
order models) in leaves.
 If it is desirable to have in leaves the regression functions of the input variables,
then the two approaches are typically used: approach by Friedman [11] in his
MARS (multiple adaptive regression splines) algorithm, and M5 model tree
algorithm by Quinlan [24].

M5 algorithm uses the following idea: split the parameter space into areas
(subspaces) and build in each of them a local specialized linear regression model.
The splitting in MT follows the idea used in building a decision tree, but instead of
the class labels it has linear regression functions at the leaves, which can predict
continuous numeric attributes. Model trees generalize the concepts of regression
ARTICLE IN PRESS
386 B. Bhattacharya, D.P. Solomatine / Neurocomputing 63 (2005) 381–396

Training Data
Set

a1
New instance

a2 a3

a4 M3 M4 M5

M1 M2 Output

Fig. 1. Hierarchical mixture of experts (models). ai are the split nodes; Mj are the models.

X2
4 Model 3 Model 2

3
Model 1
2
Model 4 Model 6
1
Model 5
1 2 3 4 5 6
Y (output)
Fig. 2. Splitting the input space X1  X2 by M5 model tree algorithm; each model is a linear regression
model y ¼ a0 þ a1 x1 þ a2 x2 .

trees [6], which have constant values at their leaves. So, they are analogous to piece-
wise linear functions (and hence non-linear). Model trees learn efficiently and can
tackle tasks with very high dimensionality –– up to hundreds of attributes. The
major advantage of model trees over regression trees is that model trees are much
smaller than regression trees, the decision strength is clear, and the regression
functions do not normally involve many variables.
The M5 algorithm is used for inducing a model tree [24], which works as follows
(Fig. 2). Suppose that a collection T of training examples is available. Each example
is characterized by the values of a fixed set of (input) attributes and has an associated
ARTICLE IN PRESS
B. Bhattacharya, D.P. Solomatine / Neurocomputing 63 (2005) 381–396 387

target (output) value. The aim is to construct a model that relates a target value of
the training cases to the values of their input attributes. The quality of the model will
generally be measured by the accuracy with which it predicts the target values of the
unseen cases.
Tree-based models are constructed by a divide-and-conquer method. The set T is
either associated with a leaf, or some test is chosen that splits T into subsets
corresponding to the test outcomes and the same process is applied recursively to the
subsets. The splitting criterion for the M5 model tree algorithm is based on treating
the standard deviation of the class values that reach a node as a measure of the error
at that node, and calculating the expected reduction in this error as a result of testing
each attribute at that node. The formula to compute the standard deviation
reduction (SDR) is:
X jT i j
SDR ¼ sdðTÞ  sdðT i Þ; ð5Þ
i
jTj

where T represents a set of examples that reaches the node; Ti represents the subset
of examples that have the ith outcome of the potential set; and sd represents the
standard deviation.
After examining all possible splits (that is, the attributes and the possible split
values), M5 chooses the one that maximizes the expected error reduction. Splitting in
M5 ceases when the class values of all the instances that reach a node vary just
slightly, or only a few instances remain. The relentless division often produces over-
elaborate structures that must be pruned back, for instance by replacing a subtree
with a leaf. In the final stage, a smoothing process is performed to compensate for
the sharp discontinuities that will inevitably occur between adjacent linear models at
the leaves of the pruned tree, particularly for some models constructed from a
smaller number of training examples. In smoothing, the adjacent linear equations are
updated in such a way that the predicted outputs for the neighbouring input vectors
corresponding to the different equations are becoming close in value. Details of this
process can be found in [24,34,36].
MT is not yet as popular as ANN, and, for example in the water sector its use
started only recently [20,27,29].

4. Experimental set up

4.1. Study area

Data from a discharge measuring station at Swarupgunj on the river Bhagirathi in


India has been considered. The river is unidirectional and near the measurement site
has a width of about 320 m and a maximum depth of about 8 m. Minor changes in
the bank-lines upstream of the measuring station during the data collection period
(1990–1998) were observed but in the proximity of the measuring station, the bank-
lines were more or less stable. River bed material consisted of silty fine sand.
ARTICLE IN PRESS
388 B. Bhattacharya, D.P. Solomatine / Neurocomputing 63 (2005) 381–396

4.2. Selection of input–output variables

The construction of adequate input space is often even more important than the
choice of a learning algorithm. For selecting the right input and output variables in
the model average mutual information (AMI) was used to investigate the
dependency between variables and the related lag effect. AMI is based on Shannon’s
entropy theory and is a measure of information available from one set of data having
the knowledge of another set of data [25,33]. The AMI between two measurements ai
and bj drawn from sets A and B is defined by
X  
PAB ðai ; bj Þ
I AB ¼ PAB ðai ; bj Þlog2 ; ð6Þ
ab
PA ðai ÞPB ðbj Þ
i j

where PAB(ai, bj) is the joint probability density for the measurements A and B
resulting in values a and b and PA(ai) and PB(bj) are the individual probability
density for the measurements of A and B. If the measurements of a value from A
resulting in ai completely independent of the measurement of a value from B
resulting in bj then the average mutual information IAB is zero.
As a measure of information, the advantage of the AMI measure compared to
other approaches such as cross correlation is that it is independent of any pre-defined
function. For discrete measurements the actual AMI-values depend on the number
of class intervals used to calculate the probability densities [1].
The AMI measures of discharge at the selected site with itself and with stage at the
same location were computed at varying lag times (Fig. 3). AMI helps to find out
how much information about the future discharge is available from the past
discharge and stage data. From Fig. 3 we see that stage with a zero lag corresponds
to the highest AMI (2.5). However, stage data with a lag of 1 and 2 h also contain
substantial information about the present discharge. Thereafter, the information
content remains almost the same for increasing lag time. The variation of AMI of

3
Stage & discharge
2.5 Discharge & discharge
2
AMI

1.5

0.5

0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Time lag (hours)

Fig. 3. Average mutual information (AMI) between stage and discharge.


ARTICLE IN PRESS
B. Bhattacharya, D.P. Solomatine / Neurocomputing 63 (2005) 381–396 389

discharge with itself shows that discharge with a lag of 1 h have a high information
content about the current discharge. Thereafter, the AMI values remain almost
unchanged. Based on these AMI values the following input parameters were chosen
for the model:
ht: stage at time step t,
ht1: stage at time step t1,
ht2: stage at time step t2,
Qt1: discharge at time step t1,
Output: Qt (discharge at time step t).

4.3. Training

Stage and discharge data for the period 1990–1998 was considered for this
application. First 23 of data was selected for training and the rest was used for
verification. Total number of training and verification data points were 1364 and
621, respectively.
For building MT the Weka software was used [36]. The ANN model was built
with NeuralMachine [23]. We used an MLP ANN with the backprop training with
the adaptive learning rate, one hidden layer, logistic transfer functions; the number
of hidden nodes was 4 (found by optimization). The momentum and learning rate
were set to 0.7 and 0.02, respectively. We used a PC with a Pentium III at 600 MHz.
Training of ANN took 10 min and of MT only 4 s. Execution time on verification
data set was negligible (less than 0.5 s for both models). It can be mentioned that the
use of more advanced training algorithms such as quickprop [10] may decrease the
training time. Exploring the possibility of rule extraction from the trained ANN
model (e.g. by using the GRLVQ algorithm [14]) and comparing these rules with the
linear models of the MT might be interesting and is planned to take up in the future.
For building the rating curve two approaches were followed:
(1) a rating curve was developed using (1). The coefficients a and b were found by
least squares as a=35.317, b=2.0378. The value of h0 was chosen as zero as that
gave the best results on the training data.
(2) the available data were divided into two groups: one with the rising flood and the
other one with the falling flood cases. Separate rating curves were built for each
of them using (1). However, the developed rating curves following the second
approach did not show considerable deviation from the one built following the
first approach and therefore, the rating curve built without the data separation
was chosen as the final one (Fig. 4).

5. Results and discussions

The first MT generated was very complex with 94 linear models at the leaf nodes.
It was very accurate in training but overfit and had to be pruned in order to ensure
good generalization capacity. Pruning was done until the predictive accuracy did not
ARTICLE IN PRESS
390 B. Bhattacharya, D.P. Solomatine / Neurocomputing 63 (2005) 381–396

9.0

8.0
Stage (metre)
7.0

6.0

5.0

4.0
10 20 30 40 50 60 70 80 90 100

Discharge (scaled)

Fig. 4. Rating curve representing the relationship between stage and discharge at Swarupgunj.

Table 1
Comparison of errors in model trees of different complexities

Pruning factor Number of linear models Training Verification

RMSE NRMSE RMSE NRMSE

0 94 79.3 0.132 76.0 0.111


1 4 89.8 0.150 69.1 0.101
2 2 92.0 0.153 69.7 0.101

drop substantially. Table 1 shows the performance of the three model versions. The
model with 4 leaves (linear models) is given below:
if Qt1 p37:5 then

if Qt1 p28:25 then Qt ¼ 243  187; ht1 þ 299ht þ 0:667Qt1

if Qt1 428:25 then Qt ¼ 214  387; ht1 þ 448ht þ 0:885Qt1

if Qt1 437:5 then

if ht p7:85 then Qt ¼ 455  491; ht1 þ 628ht þ 0:727Qt1

if ht 47:85 then Qt ¼ 1720  605; ht1 þ 924ht þ 0:66Qt1 ;


ARTICLE IN PRESS
B. Bhattacharya, D.P. Solomatine / Neurocomputing 63 (2005) 381–396 391

Table 2
Comparison of errors and training time of different models

Training Verification Training time (s)

RMSE NRMSE RMSE NRMSE

Model tree 92.0 0.153 69.7 0.101 4


ANN 90.5 0.151 70.5 0.103 600
Conventional rating curve 143.3 0.239 111.2 0.162 ––

Table 3
Comparison of prediction errors of the different models

Percentage of verification data with prediction error

45% 410% 415% 420%

Model tree 20.3 1.6 0.2 0.2


ANN 21.4 3.1 0.6 0.3
Conventional rating curve 42.4 11.8 5.3 1.9

From Table 1 it can be seen that without losing too much accuracy a model with
only 2 linear models can be adopted; its equations are as follows:
if Qt1 p37:5 then Qt ¼ 204  301; ht1 þ 383ht þ 0:788Qt1

if Qt1 437:5 then Qt ¼ 728  550ht1 þ 721ht þ 0:745Qt1 ;


It is interesting to note that from this pruned model the term ht2 has disappeared
though it was present in more complex models. Training and testing errors of MT
and ANN models are very close to each other (Tables 2 and 3) and therefore the
results of the ANN model is not presented. Both machine learning models have
outperformed the conventional rating curve. A substantial improvement in
predictive accuracy of the MT (Fig. 5) during low flow and high flood conditions
can be observed in comparison to the discharges estimated using the rating curve
(Fig. 6). The final discharge hydrograph predicted by the MT is shown in Fig. 7.
From the practical point of view of a hydrologist, the accurate predictions are
especially important for the low and high flow periods. In order to estimate bias of
the employed models for different output ranges the testing data set was divided into
two sets: one containing the data points for which models generally underestimated
discharge, and the other one –– with the points for which discharge was
overestimated. The average underestimation and overestimation errors were
computed and plotted for the following eight discharge ranges: 10–20, 21–30,
31–40, 41–50, 51–60, 61–70, 71–80, 81–90 (Fig. 8). It is seen that inside each range
the underestimation and overestimation errors of ANN and MT are more or less the
ARTICLE IN PRESS
392 B. Bhattacharya, D.P. Solomatine / Neurocomputing 63 (2005) 381–396

100

90
Computed discharge (scaled) 80
10% above measured discharge

70

60

50

40

30

20

10 10% below measured discharge


0
0 10 20 30 40 50 60 70 80 90 100

Measured discharge (scaled)


Fig. 5. A comparison of discharge predicted by the model tree with the measured discharge.

100

90 10% above measured discharge


Computed discharge (scaled)

80

70

60

50

40

30

20

10 10% below measured discharge


0
0 10 20 30 40 50 60 70 80 90 100

Measured discharge (scaled)


Fig. 6. A comparison of discharge predicted by the rating curve with the measured discharge.

same –– this indicates the absence of these model bias in all ranges of discharge. The
underestimation errors of all the models, including the rating curve, are close to each
other for all ranges of discharge. However, the overestimation errors of the rating
ARTICLE IN PRESS
B. Bhattacharya, D.P. Solomatine / Neurocomputing 63 (2005) 381–396 393

Known Discharge Computed Discharge (MT)


100
90
Discharge (scaled)

80
70
60
50
40
30
20
10
0
0 50 100 150 200 250 300 350 400 450 500 550 600 650
Validation Events

Fig. 7. Discharge predicted by the MT on the testing set along with the known discharge (the ANN-
generated plot is very similar).

20.00
% Error in prediction

10.00

0.00

-10.00
0 10 20 30 40 50 60 70 80 90 100
Discharge (scaled)
Rating curve (underestimation) Rating curve (overestimation)
ANN (underestimation) ANN (overestimation)
MT (underestimation) MT (overestimation)

Fig. 8. Variation of prediction errors with discharge for different models.

curve, particularly during low flow and high flow situations, are higher than the ones
of ANN and MT. It is also observed that MT gave very accurate prediction during
high flow situations.

6. Conclusions

Since rating curve development is associated with the collection of considerable


amount of data, the use of machine learning methods appeared to be justified. The
ARTICLE IN PRESS
394 B. Bhattacharya, D.P. Solomatine / Neurocomputing 63 (2005) 381–396

predictive accuracy of the simplest MT model was observed to be very high and at
par with that of an ANN model built with the same data. This is explained by the
fact that this method, being in fact a dynamic committee machine, indices models
that are specialized in particular areas of the input space. The optimized non-greedy
version of M5 algorithm proposed in [28] allows for even higher accuracy. The
domain experts especially praised MT for being transparent, simple, verifiable and
demonstrable model that, when induced, can be easily implemented and used even in
spreadsheet software. (The versions of M5 algorithm [29] make it possible to include
the domain expert directly in the process of tree generation.) Both ANN and MT
models were found to be considerably better than the conventional rating curve
method.

Acknowledgment

Part of this work was performed in the framework of the project ‘‘Data mining,
knowledge discovery and data-driven modelling’’ of the Delft Cluster research
programme supported by the Dutch government.

References

[1] H. Abarbanel, Analysis of Observed Chaotic Data, Springer, New York, 1996.
[2] M.B. Abbott, A.W. Minns, Computational Hydraulics, Ashgate, Brookfield, 1998.
[3] B. Bhattacharya, D.P. Solomatine, Application of artificial neural network in stage discharge
relationship, Proceedings of the fourth International Conference on Hydroinformatics, Iowa, USA (on
CD-ROM, 2000).
[4] B. Bhattacharya, D.P. Solomatine, Application of artificial neural networks and M5 model trees to
modelling stage-discharge relationship, in: B.S. Wu, Z.Y. Wang, G.Q. Wang, G.H. Huang, H.W.
Fang, J.C. Huang (Eds.), Proceedings of the Second International Symposium on Flood Defence,
Beijing, China, Science Press, New York Ltd., New York, 2002, pp. 1029–1036.
[5] B. Bhattacharya, D.P. Solomatine, Neural networks and M5 model trees in modelling water level-
discharge relationship for an Indian river, in: M. Verleysen (Ed.), Proceedings of the 11th European
Symposium on Artificial Neural Network, Bruges, Belgium, d-side, Evere Belgium, 2003,
pp. 407–412.
[6] L. Breiman, J.H. Friedman, R.A. Olshen, C.J. Stone, Classification and regression trees, Wadsworth,
Belmont, CA, 1984.
[7] S.E. Darby, C.R. Throne, Predicting stage-discharge curves in channels with bank vegetation, ASCE,
J. Hydraulic Eng. 122 (10) (1996) 583–586.
[8] C.W. Dawson, R. Wilby, An artificial neural network approach to rainfall-runoff modelling,
Hydrological Sci. J. 43 (1) (1998) 47–66.
[9] P. Deka, V. Chandramouli, A fuzzy neural network model for deriving the river stage-discharge
relationship, Hydrological Sci. J. 48 (2) (2003) 197–210.
[10] S.E. Fahlman, Faster-learning variations on back-propagation: an empirical study, in: D. Touretzky,
G. Hinton, T. Sejnowski (Eds.), Proceedings of the 1988 Connectionist Models Summer School,
Morgan Kaufmann, Los Altos, CA, 1989, pp. 38–51.
[11] J.H. Friedman, Multivariate adaptive regression splines, Ann. Statist. 19 (1991) 1–141.
[12] D. Gessler, J. Gessler, C.C. Watson, Prediction of discontinuity in stage-discharge rating curves,
ASCE, J. Hydraulic Eng. 124 (3) (1998) 243–252.
ARTICLE IN PRESS
B. Bhattacharya, D.P. Solomatine / Neurocomputing 63 (2005) 381–396 395

[13] R.S. Govindaraju, Chairman, ASCE Task Committee on Application of Artificial Neural
Networks in Hydrology, Artificial neural network in hydrology, ASCE J. Hydrologic Eng. 5 (2)
(2000) 115–137.
[14] B. Hammer, T. Villmann, Batch-RLVQ, in: M. Verleysen (Ed.), Proceedings of the 10th European
Symposium on Artificial Neural Network, Bruges, Belgium, d-side, Evere, Belgium, 2002,
pp. 295–300.
[15] S. Haykin, Neural Networks: A Comprehensive Foundation, Prentice-Hall, Engelwoods Cliffs, NJ,
1999.
[16] F.M. Henderson, Open Channel Flow, McMillan Inc, Greenwich, 1960.
[17] R.A. Jacobs, M.I. Jordan, S.J. Nowlan, G.E. Hinton, Adaptive mixtures of local experts, Neural
Comput. 3 (1991) 79–87.
[18] S.K. Jain, D. Chalisgaonkar, Setting up stage-discharge relations using ANN, ASCE, J. Hydrologic
Eng. 5 (4) (2000) 428–433.
[19] M.I. Jordan, R.A. Jacobs, Hierarchical mixtures of experts and the EM algorithm, Neural Comput 6
(1994) 181–214.
[20] B. Kompare, F. Steinman, U. Cerar, S. Dzeroski, Prediction of rainfall runoff from catchment by
intelligent data analysis with machine learning tools within the artificial intelligence tools, Acta
Hydrotech. (in Slovene language) 16 (16) (1997).
[21] A.W. Minns, M.J. Hall, Artificial neural networks as rainfall-runoff models, Hydrological Sci. J. 41
(3) (1996) 399–417.
[22] R.S. Muttiah, R. Srinivasan, P.M. Allen, Prediction of two-year peak stream discharges using neural
networks, J. Am Water Resources Assoc. 33 (3) (1997) 625–630.
[23] NeuralMachine, www.data-machine.com (15/3/2003).
[24] J.R. Quinlan, Learning with continuous classes, Proceedings of the Australian Joint Conference on
Artificial Intelligence, World Scientific, Singapore, 1992 pp. 343–348.
[25] C.L. Shannon, A mathematical theory of communication, Bell System Tech. J. 27 (1948) 379–423 and
623–656.
[26] K. Shiono, J.S. Al-Romaih, D.W. Knight, Stage-discharge assessment in compound meandering
channels, ASCE, J. Hydraulic Eng. 125 (1) (1999) 66–77.
[27] D.P. Solomatine, K.N. Dulal, Model tree as an alternative to neural network in rainfall-runoff
modelling, Hydrological Sci. J. 48 (3) (2003) 399–412.
[28] D.P. Solomatine, M.B. Siek, Optimization of Hierarchical Modular Models and M5 Trees,
Proceedings of International Joint Conference on Neural Networks, Budapest, Hungary, July 2004,
Omni Press, 2004.
[29] D.P. Solomatine, M.B. Siek, Flexible and optimal M5 model trees with applications to flow
predictions, Proceedings of the Sixth International Conference on Hydroinformatics, June 2004,
World Scientific, Singapore, 2004.
[30] K.P. Sudheer, S.K. Jain, Radial basis function neural network for modeling rating curves, ASCE,
J. Hydrologic Eng. 8 (3) (2003) 161–164.
[31] M. Tawfik, A. Ibrahim, H. Fahmy, Hysteresis sensitive neural network for modeling rating curves,
ASCE, J. Computing Civil Eng. 11 (3) (1997) 206–211.
[32] K. Thirumalaiah, M.C. Deo, River stage forecasting using artificial neural networks, ASCE,
J. Hydrologic Eng. 3 (1) (1998) 26–32.
[33] S. Verdú, Fifty years of Shannon theory, IEEE Trans. Inf. Theory 44 (6) (1998)
2057–2078.
[34] Y. Wang, I.H. Witten, Induction of model trees for predicting continuous classes, Proceedings of the
poster papers of the European Conference on Machine Learning, University of Economics, Faculty
of Informatics and Statistics, Prague, 1997.
[35] J.A. Westphal, D.B. Thompson, G.T. Stevens, C.N. Strauser, Stage-discharge relations on
the middle Mississippi river, ASCE, J. Water Resources Planning Manage. 125 (1) (1999)
48–53.
[36] I.H. Witten, E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java
Implementations, Morgan Kaufmann, Los Altos, CA, 2000.
ARTICLE IN PRESS
396 B. Bhattacharya, D.P. Solomatine / Neurocomputing 63 (2005) 381–396

Biswa Bhattacharya is a research fellow in the Hydroinformatics and Knowledge


Management Department of the UNESCO-IHE Institute for Water Education.
He received his Masters degree in hydroinformatics from the UNESCO-IHE
Institute for Water Education in 2000. His research interests include
modelling water and soil related problems using computational intelligence
techniques.

Dimitri P. Solomatine is an Associate Professor in the Hydroinformatics and


Knowledge Management Department of the UNESCO-IHE Institute for Water
Education. He received his Masters degree in Systems Engineering from the
Moscow Aviation Institute in 1979 and Ph.D. degree in Systems and Management
Sciences from the Russian Academy of Sciences in 1984. His research interests
include global and evolutionary optimization, machine learning, data-driven
modeling and their applications in engineering problems.

You might also like