Professional Documents
Culture Documents
Solomatine 2005
Solomatine 2005
Received 22 October 2003; received in revised form 23 March 2004; accepted 23 April 2004
Available online 6 August 2004
Abstract
Keywords: Artificial neural networks; M5 model tree; Water level; Discharge; Rating curve
Abbreviations: RMSE, Root mean squared error; NRMSE, Normalized root mean squared error.
Corresponding author. Tel: +31-15-215-1815; fax: +31-15-212-2921.
E-mail addresses: bha@ihe.nl (B. Bhattacharya), sol@ihe.nl (D.P. Solomatine).
0925-2312/$ - see front matter r 2004 Elsevier B.V. All rights reserved.
doi:10.1016/j.neucom.2004.04.016
ARTICLE IN PRESS
382 B. Bhattacharya, D.P. Solomatine / Neurocomputing 63 (2005) 381–396
1. Introduction
recording of stage against time is called a stage hydrograph. With the help of a rating
curve a discharge hydrograph (variation of discharge with time) can be developed
using a stage hydrograph.
Rating curve is a useful tool for a hydrologist to predict discharges from gauge
observations [18,35]. It alleviates the need for costly and time consuming discharge
measurements. The quality in discharge prediction is crucial in flood management,
water yield computation and hydrologic design [7,12,26].
The most commonly used form of stage–discharge relationship is expressed as
follows:
Q ¼ aðh h0 Þb ; ð1Þ
where h0 stands for the minimum stage below which a discharge is not feasible, h is
stage and Q is discharge. A first estimate of h0 is usually chosen by a hydrologist after
examining the characteristics of the historical stage data and then by trial and error
the final value of h0 is chosen which gives the best fit. Values for the regression
coefficients a and b are also chosen to maximize the fit to the training data.
During unsteady flows ðqu=qta0; where u is flow velocity and t is time) the
relationship between stage and discharge is not unique. During a rising flood, the
flood wave receives less hindrance in propagation than in a falling flood. For the
same stage this causes higher discharge during a rising flood than during a falling
flood. This effect is known as hysteresis; it results in a loop-rating curve and justifies
the premise that the relationship between stage and discharge is not a one-to-one
mapping, but has the dependency of discharge on past stage and discharge values.
To take into account the hysterisis effect sometimes the historical data is divided
into two sets: one set of stage and discharge data with rising stages and another set of
stage and discharge data with falling stages. Then separate regression models of the
form (1) are developed for each set. This approach is not without limitations as data
separation is often subjective and the subsequent use of rating curves needs expertise
and is prone to errors. Jones’ formula provides an alternative where a correction
procedure is adopted to take into account the hysteresis effect due to the dynamic
nature of flood waves [16,31].
When available stage–discharge data at a location is so limited that a rating curve
cannot be developed then a mathematical model of the flow in that region of the
river can be built. Flow of water in natural rivers is described by Saint Venant
equations (continuity and momentum) [2]. With assumptions such as acceleration is
nearly negligible, surface water slope is nearly parallel to the bed slope, the flow of
water can be approximated by the following kinematic wave equation [2]:
qh 1 dQ qh
þ ¼ 0; ð2Þ
qt bs dh qx
where h is the water depth, t the time, Q the discharge, x the coordinate axis in the
horizontal plane along the river, bs the storage width defined by: qA=qt ¼ bs ðqh=qtÞ
and A the cross-sectional area of the river. The kinematic wave equation is another
form of the advection equation describing the variations of a flood wave h(x, t) that
ARTICLE IN PRESS
384 B. Bhattacharya, D.P. Solomatine / Neurocomputing 63 (2005) 381–396
An ANN is the most widely used method in ML and is, in fact, a broad term
covering a large variety of network architectures, the most common of which is a
multi-layer perceptron (MLP). Such a network is trained by the so-called error-
backpropagation method, which is a specialized version of the gradient-based
optimization algorithm.
Each target vector z is an unknown function f of the input vector x
z ¼ f ðxÞ: ð4Þ
The task of the network is to learn the function f. The network includes a set of
parameters (weights vector), the values of which are varied to modify the generated
function f0 , which is computed by the network to be as close as possible to f. The
weight parameters are determined by training (calibrating) the ANN based on the
training data set. More details about ANNs can be found in [15].
In solving water-related problems, ANNs have been successfully used in
rainfall–runoff modelling [8,21], prediction of discharge [22], in modelling the stage
behaviour (without considering discharge) [32]. An MLP ANN was found to be very
efficient in modelling stage–discharge relationships [3–5,18,31]. The effectiveness of
ARTICLE IN PRESS
B. Bhattacharya, D.P. Solomatine / Neurocomputing 63 (2005) 381–396 385
an ANN with a radial basis function was explored in [30] while a fuzzy-neural
network was used in [9].
If a leave is associated with an average output value of the instances sorted down
to it (zero-order model), then the overall approach is called a regression tree
introduced by Breiman et al. [6] and resulting in the numerical constants (zero-
order models) in leaves.
If it is desirable to have in leaves the regression functions of the input variables,
then the two approaches are typically used: approach by Friedman [11] in his
MARS (multiple adaptive regression splines) algorithm, and M5 model tree
algorithm by Quinlan [24].
M5 algorithm uses the following idea: split the parameter space into areas
(subspaces) and build in each of them a local specialized linear regression model.
The splitting in MT follows the idea used in building a decision tree, but instead of
the class labels it has linear regression functions at the leaves, which can predict
continuous numeric attributes. Model trees generalize the concepts of regression
ARTICLE IN PRESS
386 B. Bhattacharya, D.P. Solomatine / Neurocomputing 63 (2005) 381–396
Training Data
Set
a1
New instance
a2 a3
a4 M3 M4 M5
M1 M2 Output
Fig. 1. Hierarchical mixture of experts (models). ai are the split nodes; Mj are the models.
X2
4 Model 3 Model 2
3
Model 1
2
Model 4 Model 6
1
Model 5
1 2 3 4 5 6
Y (output)
Fig. 2. Splitting the input space X1 X2 by M5 model tree algorithm; each model is a linear regression
model y ¼ a0 þ a1 x1 þ a2 x2 .
trees [6], which have constant values at their leaves. So, they are analogous to piece-
wise linear functions (and hence non-linear). Model trees learn efficiently and can
tackle tasks with very high dimensionality –– up to hundreds of attributes. The
major advantage of model trees over regression trees is that model trees are much
smaller than regression trees, the decision strength is clear, and the regression
functions do not normally involve many variables.
The M5 algorithm is used for inducing a model tree [24], which works as follows
(Fig. 2). Suppose that a collection T of training examples is available. Each example
is characterized by the values of a fixed set of (input) attributes and has an associated
ARTICLE IN PRESS
B. Bhattacharya, D.P. Solomatine / Neurocomputing 63 (2005) 381–396 387
target (output) value. The aim is to construct a model that relates a target value of
the training cases to the values of their input attributes. The quality of the model will
generally be measured by the accuracy with which it predicts the target values of the
unseen cases.
Tree-based models are constructed by a divide-and-conquer method. The set T is
either associated with a leaf, or some test is chosen that splits T into subsets
corresponding to the test outcomes and the same process is applied recursively to the
subsets. The splitting criterion for the M5 model tree algorithm is based on treating
the standard deviation of the class values that reach a node as a measure of the error
at that node, and calculating the expected reduction in this error as a result of testing
each attribute at that node. The formula to compute the standard deviation
reduction (SDR) is:
X jT i j
SDR ¼ sdðTÞ sdðT i Þ; ð5Þ
i
jTj
where T represents a set of examples that reaches the node; Ti represents the subset
of examples that have the ith outcome of the potential set; and sd represents the
standard deviation.
After examining all possible splits (that is, the attributes and the possible split
values), M5 chooses the one that maximizes the expected error reduction. Splitting in
M5 ceases when the class values of all the instances that reach a node vary just
slightly, or only a few instances remain. The relentless division often produces over-
elaborate structures that must be pruned back, for instance by replacing a subtree
with a leaf. In the final stage, a smoothing process is performed to compensate for
the sharp discontinuities that will inevitably occur between adjacent linear models at
the leaves of the pruned tree, particularly for some models constructed from a
smaller number of training examples. In smoothing, the adjacent linear equations are
updated in such a way that the predicted outputs for the neighbouring input vectors
corresponding to the different equations are becoming close in value. Details of this
process can be found in [24,34,36].
MT is not yet as popular as ANN, and, for example in the water sector its use
started only recently [20,27,29].
4. Experimental set up
The construction of adequate input space is often even more important than the
choice of a learning algorithm. For selecting the right input and output variables in
the model average mutual information (AMI) was used to investigate the
dependency between variables and the related lag effect. AMI is based on Shannon’s
entropy theory and is a measure of information available from one set of data having
the knowledge of another set of data [25,33]. The AMI between two measurements ai
and bj drawn from sets A and B is defined by
X
PAB ðai ; bj Þ
I AB ¼ PAB ðai ; bj Þlog2 ; ð6Þ
ab
PA ðai ÞPB ðbj Þ
i j
where PAB(ai, bj) is the joint probability density for the measurements A and B
resulting in values a and b and PA(ai) and PB(bj) are the individual probability
density for the measurements of A and B. If the measurements of a value from A
resulting in ai completely independent of the measurement of a value from B
resulting in bj then the average mutual information IAB is zero.
As a measure of information, the advantage of the AMI measure compared to
other approaches such as cross correlation is that it is independent of any pre-defined
function. For discrete measurements the actual AMI-values depend on the number
of class intervals used to calculate the probability densities [1].
The AMI measures of discharge at the selected site with itself and with stage at the
same location were computed at varying lag times (Fig. 3). AMI helps to find out
how much information about the future discharge is available from the past
discharge and stage data. From Fig. 3 we see that stage with a zero lag corresponds
to the highest AMI (2.5). However, stage data with a lag of 1 and 2 h also contain
substantial information about the present discharge. Thereafter, the information
content remains almost the same for increasing lag time. The variation of AMI of
3
Stage & discharge
2.5 Discharge & discharge
2
AMI
1.5
0.5
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Time lag (hours)
discharge with itself shows that discharge with a lag of 1 h have a high information
content about the current discharge. Thereafter, the AMI values remain almost
unchanged. Based on these AMI values the following input parameters were chosen
for the model:
ht: stage at time step t,
ht1: stage at time step t1,
ht2: stage at time step t2,
Qt1: discharge at time step t1,
Output: Qt (discharge at time step t).
4.3. Training
Stage and discharge data for the period 1990–1998 was considered for this
application. First 23 of data was selected for training and the rest was used for
verification. Total number of training and verification data points were 1364 and
621, respectively.
For building MT the Weka software was used [36]. The ANN model was built
with NeuralMachine [23]. We used an MLP ANN with the backprop training with
the adaptive learning rate, one hidden layer, logistic transfer functions; the number
of hidden nodes was 4 (found by optimization). The momentum and learning rate
were set to 0.7 and 0.02, respectively. We used a PC with a Pentium III at 600 MHz.
Training of ANN took 10 min and of MT only 4 s. Execution time on verification
data set was negligible (less than 0.5 s for both models). It can be mentioned that the
use of more advanced training algorithms such as quickprop [10] may decrease the
training time. Exploring the possibility of rule extraction from the trained ANN
model (e.g. by using the GRLVQ algorithm [14]) and comparing these rules with the
linear models of the MT might be interesting and is planned to take up in the future.
For building the rating curve two approaches were followed:
(1) a rating curve was developed using (1). The coefficients a and b were found by
least squares as a=35.317, b=2.0378. The value of h0 was chosen as zero as that
gave the best results on the training data.
(2) the available data were divided into two groups: one with the rising flood and the
other one with the falling flood cases. Separate rating curves were built for each
of them using (1). However, the developed rating curves following the second
approach did not show considerable deviation from the one built following the
first approach and therefore, the rating curve built without the data separation
was chosen as the final one (Fig. 4).
The first MT generated was very complex with 94 linear models at the leaf nodes.
It was very accurate in training but overfit and had to be pruned in order to ensure
good generalization capacity. Pruning was done until the predictive accuracy did not
ARTICLE IN PRESS
390 B. Bhattacharya, D.P. Solomatine / Neurocomputing 63 (2005) 381–396
9.0
8.0
Stage (metre)
7.0
6.0
5.0
4.0
10 20 30 40 50 60 70 80 90 100
Discharge (scaled)
Fig. 4. Rating curve representing the relationship between stage and discharge at Swarupgunj.
Table 1
Comparison of errors in model trees of different complexities
drop substantially. Table 1 shows the performance of the three model versions. The
model with 4 leaves (linear models) is given below:
if Qt1 p37:5 then
Table 2
Comparison of errors and training time of different models
Table 3
Comparison of prediction errors of the different models
From Table 1 it can be seen that without losing too much accuracy a model with
only 2 linear models can be adopted; its equations are as follows:
if Qt1 p37:5 then Qt ¼ 204 301; ht1 þ 383ht þ 0:788Qt1
100
90
Computed discharge (scaled) 80
10% above measured discharge
70
60
50
40
30
20
100
80
70
60
50
40
30
20
same –– this indicates the absence of these model bias in all ranges of discharge. The
underestimation errors of all the models, including the rating curve, are close to each
other for all ranges of discharge. However, the overestimation errors of the rating
ARTICLE IN PRESS
B. Bhattacharya, D.P. Solomatine / Neurocomputing 63 (2005) 381–396 393
80
70
60
50
40
30
20
10
0
0 50 100 150 200 250 300 350 400 450 500 550 600 650
Validation Events
Fig. 7. Discharge predicted by the MT on the testing set along with the known discharge (the ANN-
generated plot is very similar).
20.00
% Error in prediction
10.00
0.00
-10.00
0 10 20 30 40 50 60 70 80 90 100
Discharge (scaled)
Rating curve (underestimation) Rating curve (overestimation)
ANN (underestimation) ANN (overestimation)
MT (underestimation) MT (overestimation)
curve, particularly during low flow and high flow situations, are higher than the ones
of ANN and MT. It is also observed that MT gave very accurate prediction during
high flow situations.
6. Conclusions
predictive accuracy of the simplest MT model was observed to be very high and at
par with that of an ANN model built with the same data. This is explained by the
fact that this method, being in fact a dynamic committee machine, indices models
that are specialized in particular areas of the input space. The optimized non-greedy
version of M5 algorithm proposed in [28] allows for even higher accuracy. The
domain experts especially praised MT for being transparent, simple, verifiable and
demonstrable model that, when induced, can be easily implemented and used even in
spreadsheet software. (The versions of M5 algorithm [29] make it possible to include
the domain expert directly in the process of tree generation.) Both ANN and MT
models were found to be considerably better than the conventional rating curve
method.
Acknowledgment
Part of this work was performed in the framework of the project ‘‘Data mining,
knowledge discovery and data-driven modelling’’ of the Delft Cluster research
programme supported by the Dutch government.
References
[1] H. Abarbanel, Analysis of Observed Chaotic Data, Springer, New York, 1996.
[2] M.B. Abbott, A.W. Minns, Computational Hydraulics, Ashgate, Brookfield, 1998.
[3] B. Bhattacharya, D.P. Solomatine, Application of artificial neural network in stage discharge
relationship, Proceedings of the fourth International Conference on Hydroinformatics, Iowa, USA (on
CD-ROM, 2000).
[4] B. Bhattacharya, D.P. Solomatine, Application of artificial neural networks and M5 model trees to
modelling stage-discharge relationship, in: B.S. Wu, Z.Y. Wang, G.Q. Wang, G.H. Huang, H.W.
Fang, J.C. Huang (Eds.), Proceedings of the Second International Symposium on Flood Defence,
Beijing, China, Science Press, New York Ltd., New York, 2002, pp. 1029–1036.
[5] B. Bhattacharya, D.P. Solomatine, Neural networks and M5 model trees in modelling water level-
discharge relationship for an Indian river, in: M. Verleysen (Ed.), Proceedings of the 11th European
Symposium on Artificial Neural Network, Bruges, Belgium, d-side, Evere Belgium, 2003,
pp. 407–412.
[6] L. Breiman, J.H. Friedman, R.A. Olshen, C.J. Stone, Classification and regression trees, Wadsworth,
Belmont, CA, 1984.
[7] S.E. Darby, C.R. Throne, Predicting stage-discharge curves in channels with bank vegetation, ASCE,
J. Hydraulic Eng. 122 (10) (1996) 583–586.
[8] C.W. Dawson, R. Wilby, An artificial neural network approach to rainfall-runoff modelling,
Hydrological Sci. J. 43 (1) (1998) 47–66.
[9] P. Deka, V. Chandramouli, A fuzzy neural network model for deriving the river stage-discharge
relationship, Hydrological Sci. J. 48 (2) (2003) 197–210.
[10] S.E. Fahlman, Faster-learning variations on back-propagation: an empirical study, in: D. Touretzky,
G. Hinton, T. Sejnowski (Eds.), Proceedings of the 1988 Connectionist Models Summer School,
Morgan Kaufmann, Los Altos, CA, 1989, pp. 38–51.
[11] J.H. Friedman, Multivariate adaptive regression splines, Ann. Statist. 19 (1991) 1–141.
[12] D. Gessler, J. Gessler, C.C. Watson, Prediction of discontinuity in stage-discharge rating curves,
ASCE, J. Hydraulic Eng. 124 (3) (1998) 243–252.
ARTICLE IN PRESS
B. Bhattacharya, D.P. Solomatine / Neurocomputing 63 (2005) 381–396 395
[13] R.S. Govindaraju, Chairman, ASCE Task Committee on Application of Artificial Neural
Networks in Hydrology, Artificial neural network in hydrology, ASCE J. Hydrologic Eng. 5 (2)
(2000) 115–137.
[14] B. Hammer, T. Villmann, Batch-RLVQ, in: M. Verleysen (Ed.), Proceedings of the 10th European
Symposium on Artificial Neural Network, Bruges, Belgium, d-side, Evere, Belgium, 2002,
pp. 295–300.
[15] S. Haykin, Neural Networks: A Comprehensive Foundation, Prentice-Hall, Engelwoods Cliffs, NJ,
1999.
[16] F.M. Henderson, Open Channel Flow, McMillan Inc, Greenwich, 1960.
[17] R.A. Jacobs, M.I. Jordan, S.J. Nowlan, G.E. Hinton, Adaptive mixtures of local experts, Neural
Comput. 3 (1991) 79–87.
[18] S.K. Jain, D. Chalisgaonkar, Setting up stage-discharge relations using ANN, ASCE, J. Hydrologic
Eng. 5 (4) (2000) 428–433.
[19] M.I. Jordan, R.A. Jacobs, Hierarchical mixtures of experts and the EM algorithm, Neural Comput 6
(1994) 181–214.
[20] B. Kompare, F. Steinman, U. Cerar, S. Dzeroski, Prediction of rainfall runoff from catchment by
intelligent data analysis with machine learning tools within the artificial intelligence tools, Acta
Hydrotech. (in Slovene language) 16 (16) (1997).
[21] A.W. Minns, M.J. Hall, Artificial neural networks as rainfall-runoff models, Hydrological Sci. J. 41
(3) (1996) 399–417.
[22] R.S. Muttiah, R. Srinivasan, P.M. Allen, Prediction of two-year peak stream discharges using neural
networks, J. Am Water Resources Assoc. 33 (3) (1997) 625–630.
[23] NeuralMachine, www.data-machine.com (15/3/2003).
[24] J.R. Quinlan, Learning with continuous classes, Proceedings of the Australian Joint Conference on
Artificial Intelligence, World Scientific, Singapore, 1992 pp. 343–348.
[25] C.L. Shannon, A mathematical theory of communication, Bell System Tech. J. 27 (1948) 379–423 and
623–656.
[26] K. Shiono, J.S. Al-Romaih, D.W. Knight, Stage-discharge assessment in compound meandering
channels, ASCE, J. Hydraulic Eng. 125 (1) (1999) 66–77.
[27] D.P. Solomatine, K.N. Dulal, Model tree as an alternative to neural network in rainfall-runoff
modelling, Hydrological Sci. J. 48 (3) (2003) 399–412.
[28] D.P. Solomatine, M.B. Siek, Optimization of Hierarchical Modular Models and M5 Trees,
Proceedings of International Joint Conference on Neural Networks, Budapest, Hungary, July 2004,
Omni Press, 2004.
[29] D.P. Solomatine, M.B. Siek, Flexible and optimal M5 model trees with applications to flow
predictions, Proceedings of the Sixth International Conference on Hydroinformatics, June 2004,
World Scientific, Singapore, 2004.
[30] K.P. Sudheer, S.K. Jain, Radial basis function neural network for modeling rating curves, ASCE,
J. Hydrologic Eng. 8 (3) (2003) 161–164.
[31] M. Tawfik, A. Ibrahim, H. Fahmy, Hysteresis sensitive neural network for modeling rating curves,
ASCE, J. Computing Civil Eng. 11 (3) (1997) 206–211.
[32] K. Thirumalaiah, M.C. Deo, River stage forecasting using artificial neural networks, ASCE,
J. Hydrologic Eng. 3 (1) (1998) 26–32.
[33] S. Verdú, Fifty years of Shannon theory, IEEE Trans. Inf. Theory 44 (6) (1998)
2057–2078.
[34] Y. Wang, I.H. Witten, Induction of model trees for predicting continuous classes, Proceedings of the
poster papers of the European Conference on Machine Learning, University of Economics, Faculty
of Informatics and Statistics, Prague, 1997.
[35] J.A. Westphal, D.B. Thompson, G.T. Stevens, C.N. Strauser, Stage-discharge relations on
the middle Mississippi river, ASCE, J. Water Resources Planning Manage. 125 (1) (1999)
48–53.
[36] I.H. Witten, E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java
Implementations, Morgan Kaufmann, Los Altos, CA, 2000.
ARTICLE IN PRESS
396 B. Bhattacharya, D.P. Solomatine / Neurocomputing 63 (2005) 381–396