Multi-Step-Ahead Prediction of River Flow Using NARX Neural Networks and Deep Learning LSTM

© 2022 The Authors H2Open Journal Vol 5 No 1, 43 doi: 10.2166/h2oj.2022.
134
Multi-step-ahead prediction of river flow using NARX neural networks and deep
learning LSTM
Gasim Hayder a,*, Mahmud Iwan Solihin b and M. R. N. Najwa c

a
Department of Civil Engineering, College of Engineering, Universiti Tenaga Nasional (UNITEN), 43000 Kajang, Selangor, Malaysia
b
Faculty of Engineering, Technology and Built Environment, UCSI University, Jalan Puncak Menara Gading, Taman Connaught, Kuala Lumpur
56000, Malaysia
c
College of Graduate Studies, Universiti Tenaga Nasional (UNITEN), 43000 Kajang, Selangor, Malaysia
*Corresponding author. E-mail: gasim@uniten.edu.my
GH, 0000-0002-2677-0367; MIS, 0000-0002-5293-7466; MRNN, 0000-0002-1596-0616
ABSTRACT
Kelantan river (Sungai Kelantan in Malaysia) basin is one of the essential catchments as it has a history of flood events. Numer-
ous studies have been conducted in river basin modelling for the prediction of flow and mitigation of flooding events as well as
water resource management. Therefore, having multi-step-ahead forecasting for river flow (RF) is of important research interest
in this regard. This study presents four different approaches for multi-step-ahead forecasting for the Kelantan RF, using NARX
(nonlinear autoregressive with exogenous inputs) neural networks and deep learning recurrent neural networks called LSTM
(long short-term memory). The dataset used was obtained in monthly record for 29 years between January 1988 and December
2016. The results show that two recursive methods using NARX and LSTM are able to do multi-step-ahead forecasting on 52
series of test datasets with NSE (Nash–Sutcliffe efficiency coefficient) values of 0.44 and 0.59 for NARX and LSTM, respectively.
For few-step-ahead forecasting, LSTM with direct sequence-to-sequence produces promising results with a good NSE value of
0.75 (in case of two-step-ahead forecasting). However, it needs a larger data size to have better performance in longer-step-
ahead forecasting. Compared with other studies, the data used in this study is much smaller.
Key words: deep learning, LSTM model, multi-step-ahead prediction, NARX model, neural networks, river flow prediction
HIGHLIGHTS
• Four different approaches for multi-step-ahead forecasting for river flow.

• Two forecasting approaches are performed in multivariate approach using NARXNN (nonlinear autoregressive with exogen-
ous inputs - neural networks) with respectively recursive and direct auto-regressive mode.
• Two forecasting approaches are performed using LSTM (long short-term memory) with respectively recursive univariate and
sequence-to-sequence multivariate approach.
• LSTM with a direct sequence-to-sequence multivariate approach performs the best for few-step-ahead forecasting in the lim-
ited data size. It has promising application for longer-step-ahead forecasting provided that the data size is sufficiently large.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying,
adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).
Downloaded from http://iwaponline.com/h2open/article-pdf/5/1/43/1031807/h2oj0050043.pdf

by guest
H2Open Journal Vol 5 No 1, 44
GRAPHICAL ABSTRACT
1. INTRODUCTION
In a time-series sequence, prediction can be classified into two categories, known as single-step ahead and multi-
step ahead (Saroha & Aggarwal 2014). Predicting multiple time steps into the future is called multi-step time-
series forecasting. It also includes the forecasting of some variables for some future time steps, given over a sig-
nificant time span of data (Dabrowski et al. 2020). Moreover, the extent of future forecasting is known as the
forecasting horizon. Highly chaotic time series and those with missing data pose important issues in multi-
step-ahead prediction, which have been addressed using non-linear filters and neural networks (Chandra et al.
2021). According to Guo et al. (2020), the recursive strategy has been normally applied in several fields
among five strategies to address a multi-step-ahead prediction task that has been suggested in the literature.
Besides the recursive strategy, a sequence-to-value and sequence-to-sequence forecasting approach has also
been used in multi-step-ahead forecasting (Wunsch et al. 2021).
Numerous studies were conducted with regard to this multi-step-ahead predictive modelling. Hernandez-
Ambato et al. (2017) conducted research to predict the future reservoir level of a hydroelectric dam in a hydro-
electric power station located in Ecuador. Open-Loop Prediction (OLP) and Closed-Loop Prediction (CLP)
techniques were used in the study. The results showed that CLP models were better than OLP models due to
the results obtained from 22 days of testing in a real environment since the authors compared both the
models using the same horizon and season. Yu et al. (2011) applied eight different training algorithms to forecast
the water level for 1- to 5 days ahead in the Heshui catchment in China. It was found that all of the models give
satisfactory results in 1-day-ahead forecasting where R 2 values ranged from 0.85 to 0.932 and RMSE values
ranged from 0.094 to 0.115. However, the performances of all models dipped with an increase of the time
steps (from 1 to 5 days ahead). Kisi et al. (2015) applied multi-step-ahead modelling to forecast daily lake
water levels for three different horizons, which was 1–7 days ahead in Urmia Lake in Iran. The models were
established using support vector machine (SVM), together with a firefly algorithm (FA). The authors observed
that 1-day-ahead forecasting model was much better as they achieved a value of 0.9999 for R 2 and was known
to be more accurate than the 7-days-ahead prediction. Other previous studies conducted using the same approach
were Chang et al. (2014), who established a real-time multi-step-ahead water level prediction using RNN for
urban flood control in Yu-Cheng Pumping Station, Taiwan, as well as Saroha & Aggarwal (2014), who applied
multi-step-ahead prediction of wind power using genetic algorithm (GA)-based neural networks. RNN was also
used in the study by (Granata & Di Nunno 2021) where RNN-based models using LSTM were built for the pre-
diction of short-term-ahead actual evapotranspiration. With reference to the subtropical climatic conditions of
South Florida, LSTM models proved to be more accurate than NARX models, while some exogenous variables
such as sensible heat flux and relative humidity did not affect the results significantly.

by guest
A parameter that is significant in the hydrological cycle and has the easiest access to the local area is known as
river flow (RF). By setting up a portion of the vital relationships that happen between social, physical, environ-
mental, and financial processes, this water parameter takes an essential position in hydrology (Bhagwat &
Maity 2012). For over a half century, RF prediction has been attracting the interest of researchers. Nowadays,
there are two techniques for water flow prediction. The first is a physically based model that contains mathemat-
ical models that trigger the hydrodynamic system of the flow of water. The second comprises data-driven models
that are constructed on a statistical relationship between the input and the output variable (Le et al. 2019). This
water parameter prediction can prevent conceivable flood harm and water deficiencies as well as assist water
management in the agricultural field, thus providing huge financial benefits (Atiya et al. 1999). Furthermore, fore-
casting river flow has been found to be difficult to measure due to the nonlinear, time varying, and indeterminate
nature of river flow data. (Zaini et al. 2018).
Zaini et al. (2018) used the SVM-based model, together with particle swarm optimization (PSO), to predict
short-term daily RF at the Upper Bertam Catchment located in Cameron Highland, Malaysia. The authors
made four SVM models where SVM1 and SVM-PSO1 contained only historical data, while SVM2 and SVM-
PSO2 contained historical data and meteorological variables as input. The outcomes showed that SVM2 and
SVM-PSO2 outperformed SVM1 and SVM-PSO1. Moreover, the performance of the hybrid models was much
better than that of the basic SVM models. Sahoo et al. (2019) applied LSTM, together with RNN (LSTM-
RNN) and the AI method, to predict low-flow series using daily discharge data that were obtained from the Basan-
tapur gauging station, India. The results showed that the LSTM-RNN model outperformed the RNN model and
other naïve methods that the authors used for comparison. The value of R (correlation coefficient), Nash–Sutcliffe
efficiency (ENS ) and RMSE for LSTM-RNN obtained were 0.943, 0.878 and 0.487, respectively. Moreover, Noor
et al. (2017) applied the NARX model to predict rainfall-based RF in Pelarit and River Jarum, Perlis, Malaysia,
and successfully established a model flow of the river 1 day (24 h) in advance according to the current rainfall
rates. The authors achieved 0.045, 0.013 and 0.9985 values for RMSE, MAPE and R 2, respectively, in Jarum
River, while in Pelarit River, the values of RMSE, MAPE and R 2 were 0.0113, 0.0038 and 0.999, respectively.
Thus, it showed that research was successfully conducted. Furthermore, Zhang et al. (2019) applied a time-series
analysis model using autoregressive integrated moving average (ARIMA) as well as a multilayer perceptron
neural network (MLPNN) to forecast wastewater inflow in Barrie Wastewater Treatment Facility in Canada.
With regard to recurrent neural network (RNN) and its proven history of performances particularly for an intri-
cate nonlinear system, a dynamic neural network that was established on nonlinear auto-regressive models with
exogenous input (NARX) models was extensively chosen. The NARX model, which has a limited number of par-
ameters, is known to be as strong as a connected RNN and is computationally efficient (Tijani et al. 2014).
Furthermore, it is considered as a part of RNN, having feedback connections surrounding the hidden layers of
the network (Marcjasz et al. 2019). Furthermore, it can be utilized in multi-time-series input and output appli-
cation since it is a different category of artificial neural network (ANN) that is fit to model time-series and
nonlinear systems (Ardalani-Farsa & Zolfaghari 2010).
Di Piazza et al. (2016) established the NARX model to execute hourly speed and solar irradiation prediction
based on a multi-step-ahead technique. Temperature is the chosen exogenous variable. The NARX model has
been optimized by a GA along with an optimal brain surgeon’s (OBS) strategy. The time horizons used in the
research ranged from 8 to 24 h ahead. As a result, the greatest outcomes are 8 and 24 h for wind speed, while
they are 8 and 10 h for solar irradiation. Di Nunno & Granata (2020), on the other hand, established the
NARX model to forecast daily groundwater level fluctuations for 76 monitored springs located in Apulian terri-
tory, Italy. Daily groundwater level data and training algorithms such as LM, Bayesian Regularization (RB) and
Scaled Conjugate Gradient (SGC) were used in the research. The outcomes showed that the BR training algor-
ithm produced greater accuracy in groundwater level prediction. In another study (Di Nunno et al. 2021), an
unprecedented application of nonlinear AutoRegressive with eXogenous inputs (NARX) neural networks to
the prediction of spring flows was shown. Discharge prediction models were developed for nine monitored
springs located in the Umbria region, along the carbonate ridge of the Umbria-Marche Apennines. In the model-
ling, precipitation was also considered as an exogenous input parameter. Good performances were achieved for
all springs and for both short-term and long-term predictions, passing from a lag time equal to 1–12 months. Fur-
thermore, Wunsch et al. (2018) also applied NARX to forecast short- and mid-term groundwater levels for several
groups of aquifers in the states of Baden-Wuttemberg, Bavaria and Hesse, Germany. Precipitation and

by guest
temperature were chosen as the input of the model. As a result, the NARX model was successfully used to con-
duct groundwater predictions for uninfluenced observation wells in all aquifer types, but it gave contrasting
results for influenced observation wells. In addition, RNN was applied to simulate the operation of three multi-
purpose reservoirs located in the upper Chao Phraya River basin (Yang et al. 2019). Three RNNs, namely
nonlinear autoregressive models with exogenous input (NARX), LSTM and genetic algorithm–based NAXR
(GA-NAXR) for reservoir operation, were based on historical data. The results showed that GA-NARX had
the highest accuracy among three RNNs and was more stable than the original NARX by optimizing the initial
conditions, although it took a longer training time than NARX and LSTM.
Apart from NARX, an LSTM network deep learning as an expansion of RNN also has greater capacity in time-
series data prediction. The major contrast between RNN and LSTM is that it can suitably plot between input and
output data as well as keep long-range time reliance data (Abbasimehr et al. 2020). According to Zhou et al.
(2018), the LSTM network system is highly acceptable to deal with the continuation of data like water quality
data since LSTM has a particular memory function (Zhou et al. 2018). There are several gates in the LSTM net-
work, namely, forget gates, output gates and input gates as well as a memory cell in every neuron (Vu et al. 2020;
Jia & Zhou 2020).
A number of research studies conducted using the LSTM model, such as the one by Zhang & Jin (2020), have
used automatic encoder, together with LSTM, to forecast the concentration of total phosphorous (TP) and total
nitrogen (TN) using 13 different variables as inputs. The authors found that the TP model outperformed the TN
model by having R 2 and RMSE values of 0.924 and 0.0002, respectively, while the TN model achieved R 2 and
RMSE values of 0.909 and 0.024, respectively. Liu et al. (2019) also applied LSTM to forecast dissolved
oxygen (DO) in the next 10 days (m ¼ 10) and the next 6 months (m ¼ 181). The authors achieved MSE
values of 0.0020 and 0.0017 for m ¼ 181 and m ¼ 10, respectively. Furthermore, water quality forecasting
based on the LSTM learning network was used by Hu et al. (2019) to forecast water temperatures and pH
values. The outcomes showed that the accuracy levels from the forecasting were 98.97% and 98.56% for water
temperature and pH values, respectively. As for long-term forecasting, accuracy levels of up to 96.88% and
95.76% for water temperature and pH values, respectively, were achieved. Le et al. (2019) also established
some LSTM models to predict floods at the Hoa Binh Station on Da River, Vietnam.
Based on the literature review presented, it can be observed that LSTM and NARX forecasting models are com-
monly used in multi-step-ahead prediction involving nonlinear multivariate data. The challenge in multi-step-
ahead modelling is that normally the model achieves high accuracy for one or a few steps ahead. In other
words, most of the studies with high accuracy conducted using NARX and LSTM are applied for one or a few
steps ahead of forecasting. This study aims to build a model for multi-step-ahead forecasting of RF based on
NARX Neural Networks and LSTM in univariate and multivariate approaches. In NARX, we will use direct auto-
regression and a recursive model. Unlike the recursive NARX approach, the direct autoregression NARX
approach tries to train a multi-step-ahead prediction model directly without requiring future exogenous inputs
to make multi-step prediction. In LSTM, we will use a direct sequence-to-sequence multivariate model and
also a recursive univariate model, both of which will be discussed in Section 2.
2. METHODOLOGY
2.1. Study area
The Kelantan river basin is one of the important catchments as it has a history of flood events (Pradhan &
Youssef 2011; Nashwan et al. 2018). The catchment represents most of the land area of Kelantan State, as
shown in Figure 1. Several stations such as rainfall, water level, evaporation, water quality and meteorological
stations operate in the area.
2.2. RF data
Machine learning and deep learning algorithms, such as ANN and LSTM, have been successfully proven for data-
driven predictive modelling applications. Therefore, the main ingredient for the success of predictive modelling is
the data itself in addition to the training algorithm developed. In this study, the RF data were collected with some
meteorological parameters. The time-series RF data were collected from the north of Kuala Krai city downstream
(merger of two main tributaries before discharging into the sea). The original data were recorded and compiled,
which consisted of 348 monthly samples of the Sungai Kelantan RF (m3 =s) spanning from January 1988 to

by guest
Figure 1 | The study area (using QGIS©).
December 2016 (29 years). Rainfall and evaporation are usually measured in a determined station, and only the
computed area weighted rainfall (WR) is used to evaluate the whole area rainfall quantity (Faisal & Gaffar 2012).
The RF was mainly from one main station (Guillemard), while the WR and evaporation (secondary data) were
over the whole river catchment. Table 1 shows the attributes and its basic statistical properties of the data col-
lected and used in this study. The time-series graph is shown in Figure 2. As we can see, the RF pattern is
close to the time-series pattern of the WR. Meanwhile, the average evaporation (AE) pattern cannot be visibly
observed in relation to RF as the amplitude is relatively small.
2.3. Preliminary data analysis

The stage of preliminary data analysis and pre-processing is crucial in the initial stage of the machine learning
model building. This process can significantly affect the prediction accuracy in any type of data (Hayder et al.
2020). The purpose of this study is to build a multi-step-ahead predictive model for RF using a machine learn-
ing/deep learning-based approach.
There are basically two ways of performing multi-step-ahead forecasts. It can be produced recursively by iter-
ating a one-step-ahead model, or directly using a specific model for each span (Ben Taieb et al. 2012). In addition,
the recorded RF data, together with other climate variables, are basically time-series data. Thus, the multi-step-
ahead time-series prediction model can be developed based on two strategies, namely, univariate and multivariate
time-series prediction. In the univariate time-series forecasting, it is assumed that RF has its seasonality and it
depends on its past values only. On the other hand, in multivariate time-series forecasting, the RF will also
depend on the other variables and their past values.
Table 1 | Variables and their attributes
Name of variable River flow Weighted rainfall Averaged evaporation
Notation (unit) RF (m =s)

3
WR (mm) AE (mm)
Max value 2,853.2 725.8 173.8
Min value 70.5 3.0 57.6
Mean value 471.8 207.7 114.4

by guest
Figure 2 | The compiled monthly time-series data (January 1988–December 2016).
In this study, the multi-step-ahead prediction is performed using a nonlinear machine learning-based model
with multivariate and univariate approaches. The two algorithms used are NARX (nonlinear autoregressive
with exogenous inputs) Neural Networks and recurrent Neural Networks called LSTM, which fall under the
category of deep learning.
Some inherent preliminary data analyses in time-series forecasting are autocorrelation and cross-correlation
analysis. Autocorrelation is the correlation of a variable with itself at different time lags. It is used mainly in uni-
variate forecasting mode. For example, r1 represents correlation between RF at the present time (RFt ) and its lag-1
value (RFt1 ) and so on. The autocorrelation coefficients of time-series variable y can be expressed as
P
n
(yt y)(ytk y)
t¼kþ1
rk ¼ (1)
P
n
(yt y)2
t¼1
This coefficient is plotted against the lag value (k) resulting autocorrelation function plot called correlogram.
On the other hand, the cross-correlation function (CCF) seeks a relationship between two time series. For
example, it measures the correlation between the series of WR and shifted (lagged) series of RF as a function
of the lag. As another instance, the time-series RF may depend on the past lags of WR and AE. The sample
CCF will be helpful for identifying lags of the input variable that might be useful predictors of the target variable,
i.e. RF. The cross-correlation between xt and ytþk is called rk and is expressed as
P
nk
(xt x )(ytþk y)
t¼1
rk ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi (2)
P n P n
(xt x )2 (ytþk y)2
t¼1 t¼1
2.4. Data preparation

Prior to feeding the data into the machine learning/deep learning, some data preparation needs to be carried out.
First, the data is partitioned into train and test split. Train dataset is used to build the predictive model, which is
otherwise called model training. Test dataset is used to evaluate the performance of the model after the model train-
ing has been completed. The first 280 samples, i.e. ≈80% (sequence from Jan 1988 to April 2011), are used as train
dataset, while the test set is the sequence from April 2011 to December 2016 (81 samples). Notice that there is some
sequence overlap in the data partition and we do not have large samples to split. In the overlapped sequence, only
the target data is used during the training, which is predicted using feature/input variables in the previous
sequence. Thus, the input/feature variables in the overlapped sequence still can be used as test dataset. The illus-
tration of train-test data partition is shown in Figure 3. Furthermore, as the data amplitudes are in wide range,
scaling is also performed to normalize the amplitudes of the feature inputs at a range of 0 to 1.
2.5. NARX- and LSTM-based approaches for multi-step-ahead prediction

There are two machine learning models used in this study as mentioned, namely NARX and LSTM. The archi-
tecture of NARX can be seen in Figure 4, where two types of connection are common (Thapa et al. 2020). NARX

by guest
Figure 3 | Train and test data partition.
Figure 4 | The architecture of NARXNN: (a) parallel and (b) series parallel.
is used in multivariate mode that is applied to predict some step-ahead values of RF based on current and past
values (lags) of WR and AE.
The NARX model is basically a modified NAR (nonlinear autoregressive) model by including another relevant
time-series variable as extra input to the forecasting model, which can be expressed as the following two
equations for parallel (a.k.a closed loop) and series parallel (a.k.a open loop), respectively (Pena et al. 2020):
^y(t þ 1) ¼ f(x(t d), x(t d 1), . . . , x(t d q þ 1), ^y(t), ^y(t 1), . . . , ^y(t p þ 1)) (3)
^y(t þ 1) ¼ f(x(t d), x(t d 1), . . . , x(t d q þ 1), y(t ), y(t 1), . . . , y(t p þ 1)) (4)
where f( ) is the mapping function performed by neural networks, x(t) is the external input variable (s), y(t) is the
output variable, p is auto regression order, q is the exogenous input order and d is the exogenous delay number.
Here, we use two different approaches for NARX, namely the recursive (or parallel as in Equation (3))
approach and the direct autoregression approach, where the model is attempting to train a multi-step-head-pre-
diction model directly. We refer to this model as NARX1 (following Equation (3)) and NARX2 (following
Equation (5)) throughout the manuscript. Given the output time series to predict y(t) and exogenous inputs
x(t), the model will generate output/target as
^y(t þ k) ¼ f(x(t d), x(t d 1), . . . , x(t d q þ 1), y(t ), y(t 1), . . . , y(t p þ 1)) (5)
with k being the prediction step.

Furthermore, LSTM is a type of RNN that was proposed to solve this issue by explicitly introducing a memory
unit called the cell into the network. A single memory unit makes a decision by considering the current input,
previous output and previous memory and generates a new output and alters its memory. In addition to
NARX, LSTM is also used here for multi-step-head-prediction with two different approaches, namely,
sequence-to-sequence multivariate approach (we call LSTM2) and univariate approach (we call LSTM1). The
illustration of the sequence-to-sequence multivariate approach (LSTM2) is shown in Figure 5 to indicate how

by guest
Figure 5 | Illustration of sequence-to-sequence prediction model used in LSTM1.
many sequences of past values of exogenous inputs (q) will be used and how many step-ahead predictions will be
performed (k). The training data will be arranged as per Figure 5, by arranging q sequence of input variables with
the next k target variable (including the current step).
For LSTM1, univariate approach is implemented to predict some step-ahead values of RF based on its past
values. This univariate model is basically an NAR (nonlinear autoregressive) model, which can be expressed
as (Sarkar et al. 2019):
^y(t þ 1) ¼ f( y(t), y(t 1), . . . , y(t p þ 1)) (6)
2.6. Metric of model evaluation

Once the model training or learning process is done, i.e. when the model fits the training data, the predictive
model developed needs to be evaluated using both training dataset and test dataset. The common metric for
regression model accuracy can be found such as in (Pena et al. 2020). In this study, the obtained model accuracy
is evaluated by calculating RMSE (root mean squared error) value and Nash–Sutcliffe efficiency coefficient
(NSE), which are defined, respectively, as follows:
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u n
uX (^yi yi )2
RMSE ¼ t (7)
i¼1
n
P
n
(yi ^yi )2
i¼1
NSE ¼ 1 (8)
Pn
(yi yi )2
i¼1
NSE is widely used to evaluate the performance of hydrological models. NSE is even better than other metrics,
such as the coefficient of determination, a.k.a regression coefficient. An NSE value of above 0.75 is considerably a
good fit model, while less than 0.5 indicates unsatisfactory model performance (Pena et al. 2020).
3. RESULTS AND DISCUSSION

The experimentation is performed for the predictive model with four different approaches using NARX neural
networks and LSTM, namely NARX1, NARX2, LSTM1 and LSTM2. These model approaches are listed in
Table 2, together with their information. Despite the fact that experimentation in this study is performed for a
local study, the modelling approach presented in this paper can be extended and applied for other international
Table 2 | Summary of the four predictive model for RF
Multi-step-ahead predictive model Approach
NARX1 Recursive (closed loop) approach; refer to Equation (3)

NARX2 Direct auto-regressive approach; refer to Equation (5)
LSTM1 Univariate autoregressive approach; refer to Equation (6)
LSTM2 Sequence-to-sequence multivariate approach; refer to Figure 5

by guest
studies. Generally, the predictive modelling approach can also be used not only for RF but also for other hydrol-
ogy-related variables such as sediment load, water quality, ground water level, etc. This generic data-driven
approach can be summarized as a flowchart in Figure 6 that begins with data collection and ends with a decision
on possible model deployment upon satisfactory model evaluation.
3.1. Multi-step-ahead prediction using NARX neural networks

The first experimentation is to obtain an optimum NARX model involving lag/delay number (d) and order (q) for
the input and the auto regressive order (p). Here, autocorrelation and cross-correlation analysis are observed.
Figure 7 shows the sample autocorrrelation function for the RF data, which indicates strong correlation of the
present RF to its past value at a lag of 1 and around a lag of 12, i.e. for the past 1 month and for the past 12
months.
Furthermore, cross-correlation between RF and WR as well as between RF and AE are also observed. The
result is shown in Figure 8, which indicates the following situation. RF has a positive and negative strong corre-
lation to the current value (lag 0) of WR and AE, respectively. It also reveals a pretty similar pattern at lag 11 and
at lag 12 particularly for AE. With this observation, some experimention to obtain the optimum NARX model is
performed. The NARX model notation follows the values of p, q, d set for the model building; for example,
NARX_2-12-0 means the model is set with p ¼ 2, q ¼ 12 and d ¼ 0.
Table 3 shows the ANN setup used in the NARX model. Table 4 shows the experimental results to obtain the
optimum model for the order and lags involved. We start the experiment with a two-step-ahead prediction model
and the performance metrics are shown. As we can see, NARX_13-13-0 gives the optimum model and, therefore,
this model is used for the rest of the experimentations in different step ahead. The models are evaluated using the
test data as shown in Table 4, where NARX_13-13-0 produces R2 ¼ 0:41 and RMSE ¼ 215:37. Throughout the
rest of the discussion, this model is called NARX2, which is the direct autoregression approach. We perform
the evaluation up to k ¼ 10. Above this step, the model performs poorly and is, therefore, not shown. Other
poor performances with different delays and orders are also not shown. Later, we will compare the result with
NARX1, which is a recursive approach as well as LSTM1 and LSTM2.
The next result shows the experimentation on NARX1. With the same setup of delay and order as in NARX2,
Figures 9 and 10 show the forecasting result on training data. The result indicates the good fit of the model with
NSE ¼ 0.85 and RMSE ¼ 152.29. The higher errors are shown particularly for some extreme values for certain
months. In Figure 10, the blue diagonal line represents the regression line for the observed vs forcasted values.
Figure 6 | The generic predictive model development approach.

by guest
Figure 7 | Sample auto correlation function for the time-series RF.
Figure 8 | Sample cross-correlation function for RF-WR and RF-AE.
Furthermore, Figures 11 and 12 show the sample of graph of the forecasting result on test data where the model
is used to perform 52 step-ahead prediction recursively in closed-loop mode. The result indicates the moderate
capability of the model with NSE ¼ 0.44 and RMSE ¼ 208.85. The result of NARX1 on test data is slighly
better than that of NARX2. However, NARX1 has a disadvantage. It needs the information of immediate past
exogenous inputs (x(t)) to make prediction for y(t þ 1). This means this approach is not really multi-step-ahead
as compared with the direct approach (NARX2).
3.2. Multi-step-ahead prediction using univariate and multivariate LSTM

In this section, the result using LSTM is presented. LSTM is a specialized deep learning recurrrent neural net-
work able to learn and preserve long-term dependencies that were proposed in the late 90’s (Hochreiter &

by guest
Table 3 | ANN hyperparameter setup used in NARX
Items Value Remarks
Number of neurons [5, 5] Using two hidden layers

Learning rate 0.01 Using a constant learning rate
Max epoch 1,000 Interaction using early stopping
Solver ‘Adam’ Reference (Kingma & Ba 2014)
Activation function Relu Rectified linear unit
Table 4 | Results for the NARX model evaluated on test data
Step-ahead prediction NARX model NSE RMSE
k¼2 NARX_2-12-0 0.36 224.19

NARX_2-13-0 0.40 217.49
NARX_13-12-0 0.35 226.87
NARX_13-13-0 0.41 215.37
NARX_13-3-10 0.34 228.99
NARX_13-4-10 0.37 224.89
k¼3 NARX_13-13-0 0.31 235.46
k¼4 NARX_13-13-0 0.41 218.23
k¼5 NARX_13-13-0 0.32 233.98
k ¼ 10 NARX_13-13-0 0.35 223.38
Figure 9 | Forecasting result and error graph of NARX1 on training data set.

by guest
Figure 10 | Observed vs forecasted value of NARX1 on training data set.
Figure 11 | Forecasting result and error graph of NARX1 on test data set.

by guest
Figure 12 | Observed vs forecasted value of NARX1 on test data set.
Table 5 | LSTM hyperparameter setup during training
Items Value Remarks
Number of neurons 20 Using 1 LSTM layer

Learning rate 0.01 Using constant learning rate
Max epoch 100 Interaction using early stopping
Batch size 32 –
Solver ‘Adam’ Reference (Kingma & Ba 2014)
Activation functions Relu and linear Rectified linear unit (Relu) at LSTM and linear at output (dense) layer
Dropout 0.2 Applying LSTM and dense layers
Schmidhuber 1997). As mentioned earlier, the LSTM model used in this experiment is basically a univariate NAR
model referred to in Equation (5). Table 5 shows the LSTM setup used during the training. This setup was chosen
after a series of experimentations. There is a better way to choose the best hyperparameter setup such as using a
grid search. However, this is beyond the scope of our research at the moment.
As mentioned earlier, we use two LSTM approaches, namely, univariate recursive (LSTM1) and direct
sequence-to-sequence multivariate (LSTM2). While the recursive approach is able to make longer-term predic-
tion, the direct sequence-to-sequence approach may not be able to make long-term prediction as it also needs
a long sequence to train the model. This may not be practical when the data is not sufficiently large like in
our case. LSTM2 will suffer from a lack of data. We will see the results in the following discussion.
Figure 13 shows the recursive forecasting result of LSTM1 on the test data where RMSE ¼ 191.42 is achieved,
which is slightly lower than that achieved with NARX1. Figure 14 shows the observed vs predicted values, and
NSE ¼ 0.59 is achieved. This result indicates that LSTM capability for multi-step-ahead prediction in the univari-
ate (NAR) approach is applied in this case.
Next, we will show the experimentation result with LSTM2. This approach uses the multivariate direct
sequence-to-sequence approach so that we need to specify the step size of the input (q) and the step size of

by guest
Figure 13 | Forecasting result and error graph on test data set (LSTM1).
Figure 14 | Observed vs forecasted value on test data set (LSTM1).
the output (for k step-ahead prediction). Here, we set q ¼ 13 as to make the same model order with that of
NARX1 and NARX2. The experimentation result is shown in Table 6 for a different number of k. The result is
evaluated by rolling on k-step-ahead prediction using some portions of test data. We also give the results of

by guest
Table 6 | Experimentation results with different k-step ahead in LSTM2
LSTM2 NARX2
Step-ahead prediction NSE RMSE NSE RMSE
k¼2 0.78 119.44 0.35 226.87

k¼3 0.54 168.45 0.31 235.46
k¼4 0.42 196.49 0.41 218.23
k¼5 0.39 201.36 0.32 233.98
k¼6 0.42 188.86 0.33 231.97
k¼7 0.23 220.52 0.28 243.54
k¼8 0.13 239.36 0.31 230.07
k¼9 0.13 231.20 0.41 216.68
k ¼ 10 ≍0 259.75 0.35 223.38
NARX2 in the last two columns for comparison. It can be seen that LSTM2 performs better up to a seven-step-
ahead prediction. The model performance degrades as k increases above 7. This is due to the fact that the LSTM2
model is trained only with q ¼ 13. It is most likely that LSTM needs a longer input sequence to make a longer
step-ahead prediction. However, this will need more data to be used. The nature of deep learning is that it
needs large data to learn from more complex processes. How much of training data still remains is an open ques-
tion in research (Wunsch et al. 2021). In addition, the model may also need different number of neurons and
batch size. This should be an optimization work such as using the grid search method, which is beyond our
scope at the moment. Figure 15 shows a sample of time-series graph for two-step-ahead forecasting using
LSTM2 rolling on some portions of test data.
On the other hand, NARX2 is seen to have more stable performance, at least up to k ¼ 10, despite having
inferior performance for a shorter step ahead. This suggests that for shorter-step-ahead forecasting, a multivariate
LSTM approach is preferable to NARX. For longer terms of step-ahead forecasting, the NARX approach is pre-
ferable, especially under conditions of limited size of data like in our case. However, this suggestion is not final as
the LSTM model can be further optimized as the predictors’ sequence can be increased whenever larger data is
available. With the larger number of data and further fine-tuning of the hyperparameters in LSTM, different sug-
gestion may be proposed, i.e LSTM may outperform NARX for longer terms of step-ahead forecasting.
The last part of this section is to highlight the comparison between this paper and other cited studies for multi-
step-ahead prediction in hydrological systems. This comparison is viewed from some aspects of model algorithms,
data size and model performance. This comparative summary is given in Table 7. We can see that in terms of data
size, our study has a much lower number of samples as compared with other studies. This is certainly our major
concern for improvement in the future. In terms of algorithms, NARX Neural Networks are commonly used,
while LSTM is still rarely explored especially for multi-step-ahead forcasting, i.e. not considering studies in
single-step-ahead forecasting. There is one interesting recent study by Guo et al. (2021) pointing out that gradient
boosting machine, a new type of machine learning algorithm, performs favourably for some step-ahead
Figure 15 | Sample of two-step-ahead forecasting using LSTM2 rolling on some portions of test data.

by guest
Table 7 | Comparative analysis between this study and other studies in multi-step-ahead prediction
Research
study Output variable Methods Data size Highlights of results
Bhagwat & River flow; Narmada LS-SVR and ANN 2,556 samples for The best NSE ¼ 0.49 for two-step-
Maity river (India) training ahead prediction using LS-SVR;
(2012) Reasonably good up to 5-day-
ahead predictions (NSE ¼ 0.3).
Chang et al. Inundation level of ANN, Elman 1,985 samples for NARX Networks perform the best,
(2014) flood; Yu–Cheng Networks and training and producing coefficients of
Pumping Station NARX Neural testing efficiency within 0.9–0.7 (scenario
(Taipei City) networks I) and 0.7–0.5 (scenario II) in the
testing stages for 10–60-min-ahead
forecasts.
Guo et al. River stage; Lan-Yang Optimized four ML ≈7,500 samples for All models demonstrate favourable
(2021) river basin (Lanyan, techniques, namely, Simon and performance in terms of R2 of
Simon and Kavalan SVR, RFR, ANN Lanyan; ≈4,000 about 0.72 for 1–6 step-ahead
stations at Taiwan) and LBGM samples for forecasting at all stations. The
Kavalan LGBM model achieves more
favourable prediction than SVR,
RFR and ANN.
Yu et al. Water level; Heshui Using eight different 4,749 samples for BFGS- and LM-trained ANN models
(2011) catchment, China types of ANN training and gave the best performance among
training algorithms testing all of the prediction scenarios.
Obtained a coefficient of
determination of around R2 0:7
for two-step-ahead forecasting and
R2 0:3 for five-step-ahead
forecasting.
This paper River flow; Kelantan NARX Neural 348 samples for LSTM with a direct sequence-to-
river (Malaysia) networks and training and sequence produces NSE ¼ 0.75
LSTM testing and NSE ¼ 0.39 for two-step-
ahead and five-step-ahead
forecasting, respectively.
LS-SVR, least-square-support vector regression; BFGS, Broyden–Fletcher–Goldfarb–Shanno; LM, Levenberg–Marquardt; LGBM, light gradient boosting machine; RFR,
random forest regressor.
prediction. This application could be explored in the future. In terms of accuracy, our proposed method in few-
step-ahead forecasting using LSTM particularly is generally on a par with or slightly better than some results from
other studies.
4. CONCLUSIONS
Four approaches for multi-step-ahead forecasting for the Kelantan RF in Malaysia using NARX neural networks
and deep learning LSTM have been discussed. These approaches use different strategies to perform multi-step-
ahead forecasting involving univariate and multivariate methods. The first approach is NARX neural networks
using the recursive approach that gives acceptable performance with RMSE ¼ 208.85 and NSE ¼ 0.44 on test
dataset. The second approach is NARX neural networks with direct multi-step-ahead prediction, which produces
the highest NSE ¼ 0.41 in four-step-ahead and nine-step-ahead forecasting in our experiment. The third approach
is LSTM with a direct sequence-to-sequence prediction, which performs better for few-step-ahead forecasting, i.e.
NSE ¼ 0.78 for two-step-ahead forecasting. The performance, however, degrades quite significantly when longer-
step-ahead is performed. The fourth approach is the LSTM univariate recursive method, which performs slightly
better than the first approach. The third approach is promising for few-step-ahead prediction, but it needs larger
data to build the model that is able to perform longer-step-ahead forecasting. Future work will involve collecting
more data and also investigating optimization of hyperparameters of the machine learning/deep learning-based
model such as using grid search or meta-heuristic optimization, which has been done earlier in another study
(Hayder et al. 2020). Applications of ensemble and boosting machine learning such as random forest and gradient
boosting algorithms can be explored as well. The application of this kind of study is important for water resource

by guest
management and flood mitigation planning as Kelantan River has recorded some flooding events in the recent
past.
ACKNOWLEDGEMENTS
This work is supported by the Ministry of Higher Education of Malaysia under Fundamental Research Grant
Scheme (FRGS) with reference number FRGS/1/2019/TK01/UNITEN/02/5 (20190105FRGS).
DATA AVAILABILITY STATEMENT

All relevant data are included in the paper or its Supplementary Information.
REFERENCES
Abbasimehr, H., Shabani, M. & Yousefi, M. 2020 An optimized model using LSTM network for demand forecasting. Computers
& Industrial Engineering 143, 106435.
Ardalani-Farsa, M. & Zolfaghari, S. 2010 Chaotic time series prediction with residual analysis method using hybrid Elman–
NARX neural networks. Neurocomputing 73 (13–15), 2540–2553.
Atiya, A. F., El-Shoura, S. M., Shaheen, S. I. & El-Sherif, M. S. 1999 A comparison between neural-network forecasting
techniques-case study: river flow forecasting. IEEE Transactions on Neural Networks 10 (2), 402–409.
Ben Taieb, S., Bontempi, G., Atiya, A. F. & Sorjamaa, A. 2012 A review and comparison of strategies for multi-step ahead time
series forecasting based on the NN5 forecasting competition. Expert Systems with Applications 39 (8), 7067–7083. https://
doi.org/10.1016/j.eswa.2012.01.039.
Bhagwat, P. P. & Maity, R. 2012 Multistep-ahead river flow prediction using LS-SVR at daily scale. Journal of Water Resource
and Protection 4 (07), 528.
Chandra, R., Goyal, S. & Gupta, R. 2021 Evaluation of deep learning models for multi-step ahead time series prediction. arXiv
preprint arXiv:2103.14250.
Chang, F. J., Chen, P. A., Lu, Y. R., Huang, E. & Chang, K. Y. 2014 Real-time multi-step-ahead water level forecasting by
recurrent neural networks for urban flood control. Journal of Hydrology 517, 836–846.
Dabrowski, J. J., Zhang, Y. & Rahman, A. 2020 ForecastNet: a time-variant deep feed-forward neural network architecture for
multi-step-ahead time-series forecasting. In: International Conference on Neural Information Processing. Springer, Cham,
pp. 579–591.
Di Nunno, F. & Granata, F. 2020 Groundwater level prediction in Apulia region (Southern Italy) using NARX neural network.
Environmental Research 190, 110062.
Di Nunno, F., Granata, F., Gargano, R. & de Marinis, G. 2021 Prediction of spring flows using nonlinear autoregressive
exogenous (NARX) neural network models. Environmental Monitoring and Assessment 193 (6), 1–17. https://doi.org/10.
1007/s10661-021-09135-6.
Di Piazza, A., Di Piazza, M. C. & Vitale, G. 2016 Solar and wind forecasting by NARX neural networks. Renewable Energy and
Environmental Sustainability 1, 39.
Faisal, N. & Gaffar, A. 2012 Development of Pakistan’s new area weighted rainfall using Thiessen polygon method. Pakistan
Journal of Meteorology 9 (17), 107–116.
Granata, F. & Di Nunno, F. 2021 Forecasting evapotranspiration in different climates using ensembles of recurrent neural
networks. Agricultural Water Management 255, 107040. https://doi.org/10.1016/J.AGWAT.2021.107040.
Guo, Y., Xu, Y. P., Sun, M. & Xie, J. 2021 Multi-step-ahead forecast of reservoir water availability with improved quantum-based
GWO coupled with the AI-based LSSVM model. Journal of Hydrology 597, 125769. https://doi.org/10.1016/j.jhydrol.
2020.125769
Guo, W. D., Chen, W. B., Yeh, S. H., Chang, C. H. & Chen, H. 2021 Prediction of river stage using multistep-ahead machine
learning techniques for a tidal river of Taiwan. Water (Switzerland) 13 (7). https://doi.org/10.3390/w13070920.
Hayder, G., Solihin, M. I. & Mustafa, H. M. 2020 Modelling of river flow using particle swarm optimized cascade-forward
neural networks: a case study of Kelantan River in Malaysia. Applied Sciences 10 (23), 8670. https://doi.org/10.3390/
app10238670.
Hernandez-Ambato, J., Asqui-Santillan, G., Arellano, A. & Cunalata, C. 2017 Multistep-ahead streamflow and reservoir level
prediction using ANNs for production planning in hydroelectric stations. In: 2017 16th IEEE International Conference on
Machine Learning and Applications (ICMLA). IEEE, pp. 479–484.
Hochreiter, S. & Schmidhuber, J. 1997 Long short-term memory. Neural Computation 9 (8), 1735–1780. https://doi.org/10.
1162/neco.1997.9.8.1735.
Hu, Z., Zhang, Y., Zhao, Y., Xie, M., Zhong, J., Tu, Z. & Liu, J. 2019 A water quality prediction method based on the deep LSTM
network considering correlation in smart mariculture. Sensors 19 (6), 1420.
Jia, H. & Zhou, X. 2020 Water quality prediction method based on LSTM-BP. In: 2020 12th International Conference on
Intelligent Human-Machine Systems and Cybernetics (IHMSC). IEEE, Vol. 1, pp. 27–30.
Kingma, D. P. & Ba, J. 2014 Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.

by guest
Kisi, O., Shiri, J., Karimi, S., Shamshirband, S., Motamedi, S., Petković, D. & Hashim, R. 2015 A survey of water level
fluctuation predicting in Urmia Lake using support vector machine with firefly algorithm. Applied Mathematics and
Computation 270, 731–743.
Le, X. H., Ho, H. V., Lee, G. & Jung, S. 2019 Application of long short-term memory (LSTM) neural network for flood
forecasting. Water 11 (7), 1387.
Liu, P., Wang, J., Sangaiah, A. K., Xie, Y. & Yin, X. 2019 Analysis and prediction of water quality using LSTM deep neural
networks in IoT environment. Sustainability 11 (7), 2058.
Marcjasz, G., Uniejewski, B. & Weron, R. 2019 On the importance of the long-term seasonal component in day-ahead
electricity price forecasting with NARX neural networks. International Journal of Forecasting 35 (4), 1520–1532.
Nashwan, M. S., Ismail, T. & Ahmed, K. 2018 Flood susceptibility assessment in Kelantan river basin using copula.
International Journal of Engineering and Technology (UAE) 7 (2), 584–590. https://doi.org/10.14419/ijet.v7i2.8876.
Noor, H. M., Ndzi, D., Yang, G. & Safar, N. Z. M. 2017 Rainfall-based river flow prediction using NARX in Malaysia. In: 2017
IEEE 13th International Colloquium on Signal Processing & its Applications (CSPA). IEEE, pp. 67–72.
Pena, M., Vazquez-Patino, A., Zhina, D., Montenegro, M. & Aviles, A. 2020 Improved rainfall prediction through nonlinear
autoregressive network with exogenous variables: a case study in Andes high mountain region. Advances in Meteorology
2020. https://doi.org/10.1155/2020/1828319.
Pradhan, B. & Youssef, A. M. 2011 A 100-year maximum flood susceptibility mapping using integrated hydrological and
hydrodynamic models: Kelantan River Corridor, Malaysia. Journal of Flood Risk Management 4 (3), 189–202. https://doi.
org/10.1111/j.1753-318X.2011.01103.x.
Sahoo, B. B., Jha, R., Singh, A. & Kumar, D. 2019 Long short-term memory (LSTM) recurrent neural network for low-flow
hydrological time series forecasting. Acta Geophysica 67 (5), 1471–1481.
Sarkar, R., Julai, S., Hossain, S., Chong, W. T. & Rahman, M. 2019 A comparative study of activation functions of NAR and
NARX neural network for long-term wind speed forecasting in Malaysia. Mathematical Problems in Engineering 2019.
https://doi.org/10.1155/2019/6403081.
Saroha, S. & Aggarwal, S. K. 2014 Multi step ahead forecasting of wind power by genetic algorithm based neural networks. In:
2014 6th IEEE Power India International Conference (PIICON). IEEE, pp. 1–6.
Thapa, S., Zhao, Z., Li, B., Lu, L., Fu, D., Shi, X., Tang, B. & Qi, H. 2020 Snowmelt-driven streamflow prediction using machine
learning techniques (LSTM, NARX, GPR, and SVR). Water (Switzerland) 12 (6). https://doi.org/10.3390/w12061734.
Tijani, I. B., Akmeliawati, R., Legowo, A. & Budiyono, A. 2014 Nonlinear identification of a small scale unmanned helicopter
using optimized NARX network with multiobjective differential evolution. Engineering Applications of Artificial
Intelligence 33, 99–115.
Vu, M. T., Jardani, A., Massei, N. & Fournier, M. 2021 Reconstruction of missing groundwater level data by using Long Short-
Term Memory (LSTM) deep neural network. Journal of Hydrology 597, 125776. https://doi.org/10.1016/j.jhydrol.2020.
125776.
Wunsch, A., Liesch, T. & Broda, S. 2018 Forecasting groundwater levels using nonlinear autoregressive networks with
exogenous input (NARX). Journal of Hydrology 567, 743–758.
Wunsch, A., Liesch, T. & Broda, S. 2021 Groundwater level forecasting with artificial neural networks: a comparison of long
short-term memory (LSTM), convolutional neural networks (CNNs), and non-linear autoregressive networks with
exogenous input (NARX). Hydrology and Earth System Sciences 25 (3), 1671–1687.
Yang, S., Yang, D., Chen, J. & Zhao, B. 2019 Real-time reservoir operation using recurrent neural networks and inflow forecast
from a distributed hydrological model. Journal of Hydrology 579, 124229. https://doi.org/10.1016/J.JHYDROL.2019.
124229.
Yu, J., Qin, X. & Larsen, O. 2011 Multistep ahead water level forecasting using different artificial neural network training
algorithms. In: Proceedings of the IASTED International Conference Environmental Management and Engineering (EME
2011), Calgary, AB, Canada. doi:10.2316/P.2011.736-016.
Zaini, N., Malek, M. A., Yusoff, M., Mardi, N. H. & Norhisham, S. 2018 Daily river flow forecasting with hybrid support vector
machine–particle swarm optimization. In: IOP Conference Series: Earth and Environmental Science. IOP Publishing, Vol.
140, No. 1, p. 012035.
Zhang, H. & Jin, K. 2020 Research on water quality prediction method based on AE-LSTM. In 2020 5th International
Conference on Automation, Control and Robotics Engineering (CACRE). IEEE, pp. 602–606.
Zhang, Q., Li, Z., Snowling, S., Siam, A. & El-Dakhakhni, W. 2019 Predictive models for wastewater flow forecasting based on
time series analysis and artificial neural network. Water Science and Technology 80 (2), 243–253.
Zhou, J., Wang, Y., Xiao, F., Wang, Y. & Sun, L. 2018 Water quality prediction method based on IGRA and LSTM. Water 10 (9),
1148.
First received 10 October 2021; accepted in revised form 8 January 2022. Available online 25 January 2022

by guest

Multi-Step-Ahead Prediction of River Flow Using NARX Neural Networks and Deep Learning LSTM

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multi-Step-Ahead Prediction of River Flow Using NARX Neural Networks and Deep Learning LSTM

Uploaded by

Copyright:

Available Formats

© 2022 The Authors H2Open Journal Vol 5 No 1, 43 doi: 10.2166/h2oj.2022.

Gasim Hayder a,*, Mahmud Iwan Solihin b and M. R. N. Najwa c

GH, 0000-0002-2677-0367; MIS, 0000-0002-5293-7466; MRNN, 0000-0002-1596-0616

• Four different approaches for multi-step-ahead forecasting for river ﬂow.

Downloaded from http://iwaponline.com/h2open/article-pdf/5/1/43/1031807/h2oj0050043.pdf

Downloaded from http://iwaponline.com/h2open/article-pdf/5/1/43/1031807/h2oj0050043.pdf

Downloaded from http://iwaponline.com/h2open/article-pdf/5/1/43/1031807/h2oj0050043.pdf

Downloaded from http://iwaponline.com/h2open/article-pdf/5/1/43/1031807/h2oj0050043.pdf

Figure 1 | The study area (using QGIS©).

2.3. Preliminary data analysis

Table 1 | Variables and their attributes

Name of variable River ﬂow Weighted rainfall Averaged evaporation

Notation (unit) RF (m =s)

Downloaded from http://iwaponline.com/h2open/article-pdf/5/1/43/1031807/h2oj0050043.pdf

Figure 2 | The compiled monthly time-series data (January 1988–December 2016).

2.4. Data preparation

2.5. NARX- and LSTM-based approaches for multi-step-ahead prediction

Downloaded from http://iwaponline.com/h2open/article-pdf/5/1/43/1031807/h2oj0050043.pdf

Figure 3 | Train and test data partition.

with k being the prediction step.

Downloaded from http://iwaponline.com/h2open/article-pdf/5/1/43/1031807/h2oj0050043.pdf

Figure 5 | Illustration of sequence-to-sequence prediction model used in LSTM1.

^y(t þ 1) ¼ f( y(t), y(t 1), . . . , y(t p þ 1)) (6)

2.6. Metric of model evaluation

3. RESULTS AND DISCUSSION

Table 2 | Summary of the four predictive model for RF

Multi-step-ahead predictive model Approach

NARX1 Recursive (closed loop) approach; refer to Equation (3)

Downloaded from http://iwaponline.com/h2open/article-pdf/5/1/43/1031807/h2oj0050043.pdf

3.1. Multi-step-ahead prediction using NARX neural networks

Figure 6 | The generic predictive model development approach.

Downloaded from http://iwaponline.com/h2open/article-pdf/5/1/43/1031807/h2oj0050043.pdf

Figure 7 | Sample auto correlation function for the time-series RF.

Figure 8 | Sample cross-correlation function for RF-WR and RF-AE.

3.2. Multi-step-ahead prediction using univariate and multivariate LSTM

Downloaded from http://iwaponline.com/h2open/article-pdf/5/1/43/1031807/h2oj0050043.pdf

Table 3 | ANN hyperparameter setup used in NARX

Items Value Remarks

Number of neurons [5, 5] Using two hidden layers

Table 4 | Results for the NARX model evaluated on test data

Step-ahead prediction NARX model NSE RMSE

k¼2 NARX_2-12-0 0.36 224.19

Downloaded from http://iwaponline.com/h2open/article-pdf/5/1/43/1031807/h2oj0050043.pdf

Figure 10 | Observed vs forecasted value of NARX1 on training data set.

Downloaded from http://iwaponline.com/h2open/article-pdf/5/1/43/1031807/h2oj0050043.pdf

Figure 12 | Observed vs forecasted value of NARX1 on test data set.

Table 5 | LSTM hyperparameter setup during training

Items Value Remarks

Number of neurons 20 Using 1 LSTM layer

Downloaded from http://iwaponline.com/h2open/article-pdf/5/1/43/1031807/h2oj0050043.pdf

Figure 14 | Observed vs forecasted value on test data set (LSTM1).

Downloaded from http://iwaponline.com/h2open/article-pdf/5/1/43/1031807/h2oj0050043.pdf

Table 6 | Experimentation results with different k-step ahead in LSTM2

Step-ahead prediction NSE RMSE NSE RMSE

k¼2 0.78 119.44 0.35 226.87

Downloaded from http://iwaponline.com/h2open/article-pdf/5/1/43/1031807/h2oj0050043.pdf

Downloaded from http://iwaponline.com/h2open/article-pdf/5/1/43/1031807/h2oj0050043.pdf

DATA AVAILABILITY STATEMENT

Downloaded from http://iwaponline.com/h2open/article-pdf/5/1/43/1031807/h2oj0050043.pdf

Downloaded from http://iwaponline.com/h2open/article-pdf/5/1/43/1031807/h2oj0050043.pdf

You might also like