Professional Documents
Culture Documents
Application of Linear Stochastic Models
Application of Linear Stochastic Models
Application of Linear Stochastic Models
2, 2017 197
Abstract: Time series analysis and forecasting has become a major tool in
different applications in hydrology and environmental management fields.
Linear stochastic models known as multiplicative seasonal autoregressive
integrated moving average (SARIMA) model were used to simulate and
forecast monthly streamflow of Rahad River, Sudan. For the analysis, monthly
streamflow data for the years 1972 to 2009 were used. A visual inspection of
the time plot gives the expected impression of a generally horizontal trend and
12-month seasonal periodicity. The seasonality observed in auto correlation
function (ACF) and partial auto correlation function (PACF) plots of monthly
streamflow data was removed using first order seasonal differencing prior to
the development of the SARIMA model. Interestingly, the SARIMA (2, 0, 0) ×
(0, 1, 1)12 model developed was found to be most suitable for simulating
monthly streamflow for Rahad River. The model was found appropriate to
forecast three years of monthly streamflow and assist decision makers to
establish priorities for water demand.
1 Introduction
The Rahad River, which catchment is in the Ethiopian uplands, is entirely seasonal. It
rises to the west of Lake Tana, Ethiopia, and flows westwards across the Sudanese border
joining the Blue Nile below Wad Madani, Sudan. The basin is characterised by highly
rugged topography and considerable variation of altitude ranging from about 410 metres
above sea level (masl) at Wad Madani to over 4,250 (masl) in the Ethiopian highlands
(Melesse, 2011). The flow in the river starts in July; the flood reaches its peak in the last
week of September and dries out by the end of November. Rahad River has been
measured at Abu Haraz, Sudan, near its mouth from 1908 to 1951, with a record at
El Hawata from 1972. The gap in the record between 1951 and 1972 was filled by means
of a statistical model. The average annual flow for the Rahad River is 1.076 km3 (1972 to
2009). The range of annual flows is great; the maximum recorded in the early years was
1.96 km3 in 1909 for the river, compared with low flows in 1941 of 0.53 km3. This low
flow has been cancelled in 1984 by flows of 0.29 km3 (Sutcliffe et al., 1999).
The Rahad agricultural project, which is semi-arid region, lies along the east bank of
the Rahad River about 160 km southeast of Khartoum in the central part of the Sudan.
ELFau town is the headquarters of the project which is about 280 km from Khartoum
along Khartoum – Port Sudan highway. The project area of the scheme is about 25 km
wide and 160 km long. It is situated in a vast clay plain at an elevation of
400 to 430 metres above sea level (Benedict et al., 1982). The annual rainfall ranges from
350 mm in the northern part of the project to about 600 mm in the south. The length of
rainy season fluctuates around five months, i.e., from June to October and the peak of
rainfall is in August. Temperatures are highest in April and May, and lowest in January.
The water supply resources for the Rahad project are the Blue Nile River and the Rahad
seasonal river. During a normal year, the Rahad could supply the full requirements of the
project during August and September, but not during the peak month of October
(Document of International Bank for Reconstruction and Development – International
Development Association, 1973). Therefore, the monthly flow forecasting for
Rahad River plays an important role in the planning and management of Rahad
agricultural scheme.
During the last decades, several studies have developed methods of analysing
stochastic characteristics of streamflow time series (Yurekli et al., 2005; Modarres, 2007;
Can and Selim, 2009). The most widely used model is the ARIMA model. For instance,
Can and Selim (2009) fitted an ARIMA (0, 1, 1) model to mean monthly streamflows at
Asagıkagdaric gauging station on Karasu River, Turkey. Yurekli et al. (2005) examined
monthly streamflow data in Cekerek stream watershed, Turkey, and fitted a SARIMA
Application of linear stochastic models to monthly streamflow data 199
(1, 0, 0) × (0, 1, 1)12 to it. Abudu et al. (2010) observed that ARIMA and SARIMA can
be used in the one-month-ahead streamflow forecasting of Kizil River, China. The work
of Papamichail and Georgiou (2001) was also a demonstration of the ability of SARIMA
models to forecast monthly inflows of the Almopeos River in Northern Greece with
19-year long monthly inflow series. Bazrafshan et al. (2015) found that the application of
SARIMA modelling was suitable for the forecasting of hydrological drought in the
Karkheh Basin. Another application of SARIMA modelling for short-term forecasting of
hydrological data is that of Karavitis et al. (2015) and Soltani et al. (2007) applied
multiplicative SARIMA modelling on Iranian rainfall data. These are just to mention a
few.
In this study, linear stochastic models known as multiplicative seasonal
autoregressive integrated moving average (SARIMA) models were used to model
monthly flow for Rahad River, Sudan. That is the objective of this work.
2.1 Data
In this study, streamflow data for the Rahad River at El Hawata gauging station were
obtained from the Ministry of Water Resources and Electricity, covering the period
1972–2009. It includes a length of 38-years. That is, 456 monthly observations.
where
φ(B) and θ(B) polynomials of order p and q, respectively.
φ ( B) = (1 − φ1 B − φ2 B 2 −"φp B p ) (2)
and
θ ( B ) = (1 − θ1 B − θ2 B 2 −" θq B q ) (3)
200 T.M. Mohamed and E.H. Etuk
Often time series possess a seasonal component that repeats every s observations. For
monthly observations s = 12 (12 in 1 year), for quarterly observations s = 4 (4 in 1 year).
Box et al. (1994) have generalised the ARIMA model to deal with seasonality, and define
a general multiplicative seasonal ARIMA model, which are commonly known as
SARIMA models. In short notation, the SARIMA model described as ARIMA (p, d, q) ×
(P, D, Q) s, which is mentioned below:
φp ( B )Φ p ( B s ) ∇ d ∇ sD ( X t ) = θq ( B )ΘQ ( B s ) εt (4)
1
∑
n
(Yi − Fi )
TIC = n i =1
(7)
1 1
∑ i =1 (Yi ) + ∑ i =1 ( Fi )
n 2 n 2
n n
4 Coefficient of determination:
2
⎡
∑ i =1 (Yi − Y )( Fi − F ) ⎤
n
R2 = ⎢ ⎥ (8)
⎢ 2 ⎥
∑ (Yi − Y ) ∑ ( Fi − F )
n n 2
⎢ ⎥
⎣ i =1 i =1 ⎦
Application of linear stochastic models to monthly streamflow data 201
5 Coefficient of efficiency:
∑
n
i =1
(Yi − Fi )2
E = 1− (9)
∑ (Yi − Y )2
n
i =1
where Yi are the n observed flows, Fi are the n modelled flows, Y is the mean of the
observed flows, F is the mean of the modelled flows.
Coefficient of efficiency, E, introduced by Nash and Sutcliffe (1970) is still one of the
most widely used criteria for the assessment of model performance. Model efficiency of
90% and above indicates very satisfactory performance. A value in the range of 80% to
90% indicates fairly good performance. A value below 80% is considered unsatisfactory
(Shamseldin et al., 1997).
The time series model development consists of three stages: identification, estimation and
diagnostic check (Box et al., 1994). In the identification stage, data transformation is
often needed to make the time series stationary. During the estimation stage, the model
parameters are calculated. Finally, diagnostic test of the model is performed to reveal
possible model inadequacies to assist in the best model selection.
500
400
300
200
100
0
1975 1980 1985 1990 1995 2000 2005
Level of Critical
Station Variable ADF test Probability Result
confidence value
El Hawata Monthly –1.04065 1% –2.57019 0.2687 Non-stationary
flow 5% –1.94154
10% –1.61621
From the plot of the auto correlation function (ACF) and partial auto correlation function
(PACF) of the monthly data, Figure 2, it has been found that the data is seasonal of
period 12 months and must therefore be differenced by one seasonal degree of
differencing to achieve stationary (D = 1, s = 12). Differencing for non-seasonal ARIMA
was not done due to absence of trends in the datasets. Figure 3 confirms that the ACF and
PACF plots for the differenced and de-seasonalised data were stable and the SARIMA
model (p, 0, q)(P, 1, Q)12 could be identified for further analysis.
Once the time series was adjusted for stationarity, the order of ARMA was estimated
using the autocorrelation and partial autocorrelation function plots, Figure 3. The
autocorrelation structure suggests many multiplicative SARIMA models.
The optional models, the Akaike information criterion (AIC) and the Schwarz
criterion (SC) values are shown in Table 2. The model that gives the minimum AIC and
SC is selected as best fit model. Obviously, model SARIMA (2, 0, 0) (0, 1, 1)12 has the
smallest values of AIC and SC, then one would temporarily have a model SARIMA
(2, 0, 0) × (0, 1, 1)12.
Application of linear stochastic models to monthly streamflow data 203
Figure 2 ACF and PACF plots for Rahad River monthly flow (see online version for colours)
Figure 3 ACF and PACF plots after one seasonal difference (see online version for colours)
requirements of a white noise process. Several tests were carried out on the residual
series. The tests are summarised briefly in the following paragraphs.
Table 3 Estimation of the SARIMA (2, 0, 0) × (0, 1, 1)12 model
The ACF and PACF of residuals of the model SARIMA (2, 0, 0) × (0, 1, 1)12 are shown
in Figure 4. Most of the values of the RACF and RPACF lies within confidence limits
except very few individual correlations appear large compared with the confidence limits.
The figure indicates no significant correlation between the residuals.
The goodness-of-fit of the selected model was tested using the Ljung-Box statistic test.
The test is employed for checking independence of residual. From Figure 4, the goodness
of fit values for the autocorrelations of residuals from the model up to lag 24 was ≥ 0.05.
The result proves the acceptance of the null hypothesis of model adequacy at the 5%
significance level and the set of autocorrelations of residuals was considered white noise
206 T.M. Mohamed and E.H. Etuk
Figure 4 ACF and PACF plots for SARIMA (2, 0, 0) × (0, 1, 1)12 residuals (see online version
for colours)
The graph showing the observed and fitted values is presented in Figure 5. The figure
shows a very close agreement between the fitted model and the actual data. Since the
model diagnostic tests show that all the parameter estimates are significant and the
residual series is white noise, the estimation and diagnostic checking stages of the
modelling process are complete.
Figure 5 Comparison of observed data and SARIMA model flow (1972–2006) (see online
version for colours)
SARIMA model can also be used for forecasting future values based on the historical
data. The SARIMA (2, 0, 0) × (0, 1, 1)12 model was tested for its validity to forecast
36 observations obtained for the years 2007 to 2009 for Rahad River. The observed
streamflow was found to be closely aligned to the forecasted values, Figure 6.
Year January February March April May June July August September October November December
1972 0 0 0 0 0 0 67.54746 263.0977 269.9125 76.388 0 0
1973 0 0 0 0 0 0 76.2796 232.2014 321.1126 149.6875 0 0
1974 0 0 0 0 0 0 286.5569 390.352 443.3076 289.7372 12.96796 0
The original data
Table 6
Year January February March April May June July August September October November December
1992 0 0 0 0 0 0 166.1332 408.5929 449.9915 283.7271 44.40112 2.344
1993 0 0 0 0 0 0 90.7908 361.5201 384.7963 221.7296 22.55335 0
1994 0 0 0 0 0 40.9293 262.935 431.7561 479.954 160.7236 43.39751 13.23
1995 0 0 0 0 0 0 162.8153 385.5757 383.6273 102.3799 2.277783 0
1996 0 0 0 0 0 38.7007 77.23345 296.566 331.6576 137.514 11.98973 1.229
1997 0 0 0 0 0 0 216.7059 352.115 330.0443 158.9165 76.26846 2.105
The original data (continued)
T.M. Mohamed and E.H. Etuk
5 Conclusions
In this paper, linear stochastic model known as multiplicative SARIMA model was used
to simulate and forecast monthly streamflow for Rahad River, Sudan. The tentative
model that best fits the criteria and meets the requirement is model SARIMA (2, 0, 0) ×
(0, 1, 1)12. By analysing the forecasted values, it was found that use of SARIMA model
for forecasting monthly streamflow is admirably good. The fitting of stochastic ARIMA
models to streamflow time series could result in a better tool which can be used for water
resource planning. SARIMA model has the ability to predict accurately the future
monthly streamflow for all streamflow gauge stations in Sudan. Further research is
however necessary to find out whether better models could be found. For instance,
Hadizadeh et al. (2013) have shown that for streamflow data of Polkohne and
Heydarabad hydrometric stations of Gamasiab River at Kermanshah daily series exhibit
long-memory tendencies calling for application of seasonal autoregressive fractionally
integrated moving average (SARFIMA) modelling. This tendency became less for
monthly series.
References
Abudu, S., Cui, C., King, J.P. and Abudukadeen, K. (2010) ‘Comparison of performance of
statistical models in forecasting monthly streamflow of Kizil River, China’, Water Science and
Engineering, Vol. 3, No. 3, pp.269–281.
Bazrafshan, O., Salajegheh, A., Bazrafshan, J., Mahdavi, M. and Marj, A.F. (2015) ‘Hydrological
drought forecasting using ARIMA models (a case study: Karkheh Basin)’, Ecopersia, Vol. 3,
No. 3, pp.1099–1117.
Benedict, P. et al. (1982) Sudan: The Rahad Irrigation Project, US Agency for International
Development (AID) [online] http://www.pdf.usaid.gov/pdf_docs/PNAAJ610.pdf.
Box, G.E.P., Jenkins, G.M. and Reinsel, G.C. (1994) Time Series Analysis Forecasting and
Control, 3rd ed., Prentice Hall, Englewood Cliffs, NJ.
Can, I. and Selim, S. (2009) ‘Stochastic modeling of mean monthly flows of Karasu River, in
Turkey’, Water and Environment Journal, Vol. 25, No. 1, pp.31–39, DOI: 10.1111/j.1747-
6593.2009.00186.x.
Chen, H. and Rao, A.R. (2002) ‘Testing hydrologic time series for stationarity’, Journal of
Hydrologic Engineering, Vol. 7, No. 2, pp.129–136.
Document of International Bank for Reconstruction and Development – International Development
Association (1973) Appraisal of the Rahad Irrigation Project, Agriculture Projects
Department Eastern Africa Regional Office, Sudan.
Hadizadeh, R., Eslamian, S. and Chinipardaz, R. (2013) ‘Investigation of long-memory properties
in streamflow time series in Gamasiab River, Iran’, International Journal of Hydrology
Science and Technology, Vol. 3, No. 4, pp.319–350.
Karavitis, C.A., Vasilakou, C.G., Tsesmelis, D.E., Oikonomou, P.D., Skondras, N.A.,
Stamatakos, D., Fassouli, V. and Alexandris, S. (2015) ‘Short-term drought forecasting
combining stochastic and geo-statistical approaches’, European Water, Vol. 49, pp.43–63.
Melesse, A.M. (2011) Nile River Basin: Hydrology, Climate and Water Use, Springer, Dordrecht,
Heidelberg, London.
Modarres, R. (2007) ‘Streamflow drought time series forecasting’, Stoch. Environ. Res. Risk
Assess., Vol. 21, No. 3, pp.223–233.
Nash, J.E. and Sutcliffe, J.V. (1970) ‘River flow forecasting through conceptual models:
1. A discussion of principles’, Journal of Hydrology, Vol. 10, No. 3, pp.282–290.
212 T.M. Mohamed and E.H. Etuk
Papamichail, D.M. and Georgiou, P.E. (2001) ‘Seasonal ARIMA inflow models for reservoir
sizing’, Journal of the American Water Resources Association, Vol. 37, No. 4, pp.877–885.
Shamseldin, A.Y., O’Connor, K.M. and Liang, G.C. (1997) ‘Methods for combining the output of
different rainfall-runoff models’, Journal of Hydrology, Vol. 197, Nos. 1–4, pp.203–229.
Soltani, S., Modaress, R. and Eslamian, S.S. (2007) ‘The use of time series modelling for the
determination of rainfall climates of Iran’, International Journal of Climatology, Vol. 27,
No. 6, pp.819–829.
Sutcliffe, J.V. et al. (1999) The Hydrology of the Nile, IAHS Special Publication No. 5,
IAHS Press, Institute of Hydrology, Wallingford, Oxfordshire OX10 8BB, UK.
Yurekli, K., Kurunc, K. and Ozturk, F. (2005) ‘Application of linear stochastic models to monthly
flow data of Kelkit stream’, Ecological Modeling, Vol. 183, No. 1, pp.67–75.