Professional Documents
Culture Documents
Mode Decomposition Based Large Margin Distribution Machines For Sediment Load Prediction
Mode Decomposition Based Large Margin Distribution Machines For Sediment Load Prediction
Mode Decomposition Based Large Margin Distribution Machines For Sediment Load Prediction
A R T I C L E I N F O A B S T R A C T
Keywords: Precise suspended sediment load (SSL) prediction is essential for irrigation, hydropower and river management
Sediment load prediction practices. But due to some external factors such as high altitude, heavy monsoon and tropical climate conditions,
Mode decomposition the data we collect may contain noisy samples. Hence, it becomes a challenging task to accurately predict the SSL
Large margin distribution machine
in rivers. Therefore, to diminish the influence of noise, the empirical mode decomposition (EMD) -based tech
Regression
Noise
niques can be adopted. Moreover, the large margin distribution machine-based regression (LDMR) can deal
efficiently with noisy datasets as it simultaneously minimizes the insensitive loss as well as quadratic loss. The
significance of this work lies in its contribution to improving the accuracy of suspended sediment load (SSL)
prediction in rivers, which has practical implications for various applications such as irrigation, hydropower, and
river management. We recognize the challenges posed by external factors, which introduce noise into the
collected data, making accurate prediction of SSL difficult. Overall, the significance of this work lies in its novel
integration of EMD-based techniques and LDMR-based models, which address the challenges posed by noisy and
non-stationary sediment load data. The findings contribute to the field of SSL prediction, offering practical so
lutions for managing and utilizing river resources more effectively in various domains. The main advantage of
the least squares LDMR (LS-LDMR) approach is that it solves a system of linear equations rather than solving a
large QPPs unlike SVR, TSVR, LDMR, which makes it computationally efficient. It is well known that the sedi
ment load data is complex and non-stationary in nature. Therefore, for daily SSL prediction, the LDMR and least
squares LDMR (LSLDMR) models are embedded with two different decomposition techniques, EMD and
ensemble EMD (EEMD), to handle the nonlinear and non-stationary characteristics of sediment load data on the
forecasted outputs and to increase prediction ability. The results of the proposed EMD-LDMR, EEMD-LDMR,
EMD-LSLDMR and EEMD-LSLDMR are compared with the conventional support vector regression (SVR), twin
SVR (TSVR), LDMR and LSLDMR. The performance of the evaluated using the best-fit model using the mean
absolute error (MAE), root mean square error (RMSE), symmetric mean absolute percentage error (SMAPE),
Willmott’s index (WI), correlation coefficient (CC) and R2. Better or comparable results specify the efficiency of
the suggested models. For graphical visualization, scatterplots, prediction error plots and violin plots are shown.
Numerical results show that these hybrid models show excellent prediction performance. The overall analysis of
the results suggests using EEMD-LSLDMR for SSL estimation.
1. Introduction eminent problem that has a negative effect on water quality, and causes
pollution in rivers. Usually, the SSL data forms a nonlinear and complex
Estimating the river suspended sediment load (SSL) is one of the structure due to different external factors such as rainfall, streamflow
major problems in river engineering practices as well as hydrology. and temperature (Moeeni & Bonakdari, 2018). The importance of ac
Suspended sediment in rivers can be defined as sediment that is trans curate estimation of SSL is increasing day by day, especially in flood-
ported by fluid and is fine enough that turbulent whirlpools can affected zones. Therefore, many researchers have been working on
outweigh the subsiding of the sediment particles within the river, trig predicting SSL accurately using different techniques. In the past de
gering them to become suspended. Sediment deposition in rivers is an cades, researchers used a few mathematical models for SSL estimation.
* Corresponding author.
E-mail address: deepakg@mnnit.ac.in (D. Gupta).
https://doi.org/10.1016/j.eswa.2023.120844
Received 18 May 2022; Received in revised form 9 June 2023; Accepted 12 June 2023
Available online 21 June 2023
0957-4174/© 2023 Elsevier Ltd. All rights reserved.
B.B. Hazarika and D. Gupta Expert Systems With Applications 232 (2023) 120844
However, these models had a long response time and are complex. prearranged basis functions. It converts the data into a series of intrinsic
Because of this, a few classic models like the multiple linear regression mode functions (IMFs). Hence, it can be considered an ideal method
(MLR) and sediment rating curve (SRC) were applied for SSL prediction compared to wavelet for analyzing non stationary as well as non-linear
(Kisi, 2005). An SRC is a mathematical formula that relates the water information. Very recent studies on EMD can be found in Yang and Chen
discharge of a river to the sediment load that is being transported. By (2019), Gupta and Singh (2023), Xie et al, (2023), Yang et al. (2023) and
measuring the water discharge at a particular point in the river and others. However, the conventional EMD has an issue of mode-mixing
using the rating curve, scientists can estimate the sediment load that is (MM). Hence to overcome this, an upgraded version of EMD called
being transported at that point. However, these models had the limited ensemble EMD (EEMD) was suggested by Wu and Huang (2009). EEMD
capability to deal with the non-linearity and non-stationarity in SSL highly reduces the mode mixing problem as it is a multiple trial process.
datasets. Therefore, researchers shifted their attention towards artificial EEMD has also been successfully applied with several ML-based models
intelligence (AI) /machine learning (ML) based models for SSL predic for T-S prediction (Yang & Yang 2020; Ali et al., 2020; Chen et al.,
tion. A detailed discussion can be found in Gupta et al. (2021) . Among 2021). Ren et al. (2014) hybridized the EMD and its variants with SVR
the various AI/ML-based models, the artificial neural network (ANN) for T-S forecasting. Yaslan and Bican (2017) hybridized the EMD with
(Hazarika et al., 2020a, Rezaei et al., 2021), support vector machine support vector regression (SVR) for electrical load forecasting. Fan et al.
(SVM) and its variants (Essam et al., 2022, Doroudi et al. 2021), wavelet (2013) hybridized EMD with SVR for electric load prediction. Khan et al.
analysis (Hazarika et al., 2020b, Shiri et al. 2022), neuro-fuzzy (2021) used the ELM embedded with EEMD for electricity price fore
(Mohanta et al., 2021, Hamaamin et al., 2019), adaptive neuro-fuzzy casting. Díez-García et al. (2022) explored the potential of EMD to
inference system (ANFIS) (Babanezhad et al., 2021, Darabi et al., mitigate radio frequency interface in microwave radiometry. Recently a
2021) and extreme learning machines (ELM) (Hazarika et al., 2020b, novel decomposition technique popularly known as feature mode
Gupta et al., 2020) based regressors have been widely explored for SSL decomposition (FMD) was suggested by Yonghao, Zhang, Li, Lin, &
prediction. Banadkooki et al. (2020) predicted the river SSL in Goor Zhang (2022) .
anrood Basin, Iran using ANN and ant lion optimization algorithm. Salih Recently, a novel regressor termed large margin distribution
et al. (2020) predicted the river SSL for the Delaware River, USA using a machine-based regression (LDMR) was proposed by Rastogi et al.
few novel machine learning models. Nhu et al. (2020) suggested a new (2020). It is well known that SVR minimizes the ε-insensitive loss and
model called random space for monthly SSL estimation. Hazarika et al. ignores the samples that fall inside the ε-insensitive tube. Moreover, the
(2020b) predicted the SSL for the Tawang Chu River, India using two least squares SVR (LSSVR) minimizes quadratic loss (QL), however, fails
wavelet-based hybrid models. Ehteram et al. (2021) suggested a hybrid to show good generalization performance on noisy datasets. However,
multi-objective whale algorithm-based ANN model for SSL prediction in the LDMR concurrently minimizes the ε-insensitive loss function and QL
Gooranrood Basin, Iran. Meshram et al. (2021) suggested novel iterative functions. It takes the advantage of SVR and LSSVR by taking full in
classifier optimizer-based random forest hybrid models for SSL estima formation of the training sample and avoiding overfitting at the same
tion in the Seonath river basin located in India. Panahi et al. (2021) time. However, it is computationally less efficient as it has to solve
hybridized the Black widow optimizer with soft computing models for quadratic programming problems (QPP) to find the optimum results. To
SSL forecast. The dataset was collected from the Telar river, Iran. Rezaei address this issue Gupta and Gupta (2021) suggested a computationally
et al. (2021) predicted the SSL using a few artificial intelligence-based efficient least squares LDMR (LSLDMR) model was suggested. LSLDMR
methods. Hazarika et al. (2021) suggested two coiflet wavelet-based solves linear programming problems rather than solving the QPP which
hybrid models for river SSL prediction in the Tawang Chu river, India. makes it more time efficient than LDMR. These LDMR based models
Very recently Esmaeili-Gisavandani et al. (2022) applied three different exploit both the margin mean and margin variance and therefore these
types of discrete wavelet transform for predicting the daily SSL in the models can efficiently handle the noisy datasets. It is evident that the SSL
Navrood watershed, Iran. Essam et al. (2022) estimated the SSL in datasets may have noise therefore, these LDMR based models can show
Peninsular, Malaysia using the powerful SVM model and deep learning excellent generalization performance for SSL datasets. Reisenbüchler
(DL) models. Latif et al. (2023) explored the ability of a few DL as well as et al. (2021) developed an ANN model for river sediment management.
ML models for sediment load estimation. Cheng et al. (2023) in their Sharafati et al. (2020) predicted the river SSL using a few ensemble-
work tested how vegetation and climate effects on monthly sediment based ML models. AlDahoul et al. (2022) explored a few ML models
load using a partial least squares-structural equation modelling. An for sediment load prediction. Karami et al. (2022) suggested a new
extensive survey of AI/ML models for river SSL prediction can be studied approach using ANFIS and ant colony optimization for predicting SSL in
in Gupta et al., (2021) and Tao et al., (2021). reservoir dams. Zhao et al. (2021) predicted the SSL using decomposi
There are numerous factors that may affect sediment load, including tion and multi-objective evolutionary optimization technique.
the size and shape of the river channel, the slope of the river bed, the The previous literature shows that the EMD and EEMD-based pre-
volume and velocity of water flow, and the type and amount of sediment processed models can be highly efficient for SSL estimation. Moreover,
present in the river. A monthly streamflow dataset is thus made up of the LDMR can efficiently handle the feature noise in the dataset. Addi
trend, episodic, and noise components (Zhang et al., 2015). Previously, tionally, the recently proposed LSLDMR can show high prediction per
streamflow series was predicted directly without data pre-processing in formance with low computational cost compared to LSLDMR. Moreover,
some previous studies (Bittelli et al., 2010), which may have resulted in to the best of our knowledge, LSLDMR has never been tested for time-
the loss of significant information that is present in the original time series analysis. The nature of the data on sediment load is complex
series (T-S). According to Chou and Wang (2004), it is hard to imitate and non-stationary. Therefore, to address the nonlinear and nonsta
the alteration mechanisms of streamflow data using a prediction model tionary characteristics of sediment load data on the prediction results
with only a single mixed-frequency component. To improve prediction and to get the maximum benefit from these decomposition techniques
accuracy, monthly streamflow time series must be preprocessed. For and LDMR-based models, the LDMR and least squares LDMR (LSLDMR)
that, Wiener (1949) created the Fourier analysis with the associated models are embedded with two alternative decomposition approaches,
spectrum analysis for decomposing stationary T-S. However, the major EMD and ensemble EMD (EEMD). We have proposed 4 different hybrid
flaw of this method is that it analyses the time-series data in the fre models for SSL prediction. They are- EMD-LDMR, EEMD-LDMR, EMD-
quency domain which may lead to loss of a few major pieces of infor LSLDMR and EEMD-LSLDMR.
mation from the original data. Therefore, the perception of wavelet was The originality and importance of this work lie in its novel integra
suggested by Morlet et al. (1982) which analyses both low frequency and tion of empirical mode decomposition (EMD)-based techniques and
noise. However, the empirical mode decomposition (EMD) which was LDMR models to address the challenges posed by noisy and non-
introduced by Huang et al. (1998) unlike wavelets does not need stationary suspended sediment load (SSL) data in river systems. The
2
B.B. Hazarika and D. Gupta Expert Systems With Applications 232 (2023) 120844
significance of accurate SSL prediction is emphasized as it is crucial for 2.1. The LDMR
various applications such as irrigation, hydropower, and river man
agement. Furthermore, we employ large margin distribution machine- The LDMR is a fresh approach suggested by Rastogi et al. (2020) for
based regression (LDMR) models to efficiently deal with noisy data regression problems. The key advantage of LDMR is that it shows high
sets. LDMR simultaneously minimizes the insensitive loss and quadratic efficiency while dealing with datasets that contain noise and outliers.
loss, making it robust to noise. Additionally, the least squares LDMR (LS- This is because it simultaneously minimizes the quadratic loss as well as
LDMR) approach is highlighted for its computational efficiency, as it ε − insensitive loss function. The primal statement of LDMR may be
solves a system of linear equations rather than solving a large quadratic expressed as:
programming problem (QPP) like support vector regression (SVR), twin c η 2
SVR (TSVR), LDMR, and LSLDMR. The prediction results of the proposed min ‖w‖2 − ‖y − (K(U, U t )w + eb)‖ + Cet (γ + δ),
2 2
EMD-LDMR, EEMD-LDMR, EMD-LSLDMR, and EEMD-LSLDMR models
are compared with conventional regression models, including SVR, s.t.,
TSVR, LDMR, and LSLDMR. Evaluation metrics such as mean absolute y − (K(U, U t )w + eb)⩾eε + γ
error (MAE), root mean square error (RMSE), symmetric mean absolute
percentage error (SMAPE), Willmott’s index (WI), correlation coeffi (K(U, U t )w + eb) − y⩾eε + δ,
cient (CC), and R2 are used to assess the prediction accuracy. The MAE is
And
important as it provides a simple and robust measure of prediction ac
curacy in regression models. RMSE measures the average magnitude of γ, δ⩾0. (1)
errors while considering the squared differences between predicted and
actual values, making it suitable for penalizing larger errors. SMAPE is where K(U, Ut ) of order, m is the kernel and the (i, j)th element may be
important as it provides a balanced and interpretable measure of pre denoted as K = K(U, Ut )ij = k(xi , xj ) ∈ R and K = K(x, Ut ) = (k(x, x1 ) , ..
diction accuracy, accounting for both the magnitude and percentage of ., k(x, xm )) ∈ Rm for a vector x ∈ Rn .ε and η are user-specified parame
errors in a symmetric manner. WI quantifies the agreement between ters.C is the model parameter,w indicates the unknown and b represents
observed and predicted values, providing a comprehensive assessment [ ]
w
of model performance that considers both bias and variability, making it the bias.γ and δ are slack variables. Assuming,z = ,‖w‖2 can be
b
suitable for evaluating the accuracy and reliability of predictions. The ⎡ ⎤
I 0
CC measures the strength and direction of the linear relationship be ⎢ . ⎥
tween two variables, allowing for the assessment of their association and rewritten as ‖w‖2 = zt I0 z,I0 = ⎢ ⎣
⎥.
. ⎦
providing insights into the dependency between them, which is crucial 0.....0
for understanding patterns, making predictions, and identifying poten [ ]
w
tial cause-and-effect relationships. R2 is significant as it quantifies the Finally, the unknowns, z = can be calculated as:
b
proportion of variance in the dependent variable that is explained by the [ ]
independent variables, indicating the goodness of fit of the regression z=
w − 1
= (cI0 + ηM t M) M t (ℓ1 − ℓ2 + y). (2)
model. Graphical visualization through scatterplots, prediction error b
plots, and violin plots are also provided. Overall, the originality of this
work lies in the integration of EMD-based techniques and LDMR models where ℓ1 and ℓ2 are Lagrangian multipliers. Also,M = [K(U, Ut ) e].
to address the challenges posed by noisy and non-stationary SSL data. Its For any new example x ∈ Rn , the regressor for LDMR can be
importance stems from its practical implications for various applica expressed as:
tions, including irrigation, hydropower, and river management, where f (x) = K(xt , U t )w + b. (3)
accurate SSL prediction is crucial for effective resource management.
The prime contributions of this work are: Despite showing high generalization ability, LDMR takes high
computation time to perform operations.
a) The EMD based LDMR (EMD-LDMR) and EMD based LSLDMR (EMD-
LSLDMR) are proposed
2.2. The LSLDMR
b) The EEMD based LDMR (EEMD-LDMR) and EEMD based LSLDMR
(EEMD-LSLDMR) are proposed
To reduce the computational efficiency of LDMR, very recently a
c) The LSLDMR model is freshly explored for SSL prediction.
novel LSLDMR was proposed by Gupta and Gupta (2021). LSLDMR deals
d) The SSL is predicted for the Tawang Chu river of Arunachal Pradesh,
with a system of linear equations rather than solving a QPP, which
India.
makes it time effectual compared to LDMR. The primal expression of
LDMR may be presented as:
The remainder of this paper is sorted out as follows: Section 2
demonstrates the related works. In section 3, the proposed models are c η 2
min ‖w‖2 − ‖y − (K(U, U t )w + eb)‖ + Cet (δ)2 ,
discussed. The experimental setup and dataset description are presented 2 2
in Section 3. Experimental setup numerical outcomes are discussed in s.t.,
Section 4. Lastly, we conclude the paper in Section 5.
(K(U, U t )w + eb) − y = eε + δ.
2. Related works
where C represents the model parameter,w is the unknown and b is the
bias.δ represents the slack variable. ε and η are user-defined parameters.
In this section, we have briefly shown the mathematical expressions [ ]
of the base models. They are LDMR, LSLDMR, EMD and EEMD. Let, the w
Finally, the unknowns, z = can be determined as:
b
training data points are {xi }m n
i=1 ∈ R ,m stands for the total number of data
[ ]
points.xi ∈ Rn and yi ∈ Rn indicates the input training datapoint and w
(4)
− 1
z= = (cI0 + (η + 2C)M t M) M t ((η + 2C)y + 2eε).
output respectively. Also, assume A ∈ Rm×n as the training data which ith b
row vector can be rewritten as xti and y = (y1,... ym ) indicates the original
values. Moreover, consider e is the vector of ones. where ℓ1 and ℓ2 are Lagrangian multipliers. Also,M = [K(U, Ut ) e].
For any unknown input x ∈ Rn , the regressor for LDMR may be ob
3
B.B. Hazarika and D. Gupta Expert Systems With Applications 232 (2023) 120844
tained as:
Equations (7) and (8) are reiterated and finally, the mean of the 4. Experimental setup and dataset description
corresponding IMFs is taken.
We have performed the simulations on a computer with 32 GB RAM,
3. Proposed models 3.20Ghz Intel i-7 processor on Windows 7 OS on the MATLAB 2019a
software. The solutions to the QPPs of SVR, TSVR, LDMR and the pro
A divide-and-conquer technique breaks down a problem a few sub- posed EMD-LDMR, as well as the EEMD-LDMR, are determined by the
problems of the similar or related type until they are simple enough to “quadprog” function in MATLAB.
tackle on their own. After that, the sub-problems’ solutions are merged
to solve the main problem. This technique is followed by the EMD and
EEMD techniques. The procedure of the proposed EMD based LDMR
models are as follows:
4
B.B. Hazarika and D. Gupta Expert Systems With Applications 232 (2023) 120844
4.1. Parameters and kernel selection where G(ki , lj ) indicates the Gaussian kernel and i,j = 1,2.,..,n, indicates
the maximum of i and j.
The optimum parameters for these models are chosen using the 10- For validating the performance of our models, we used 6 different
fold cross-validation. The C parameter of the SVR and TSVR models performance indicators. They are:
are selected from {10− 5 , 10− 3 , ..., 103 , 105 } However, in the case of
LDMR, LSLDMR and the proposed EMD-LDMR, EEMD-LDMR, EMD- √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
∑N 2
1
LSLDMR and EEMD-LSLDMR models the optimal parameters C, C1 = a) RMSE = N i=1 [(So )i − (Sp )i ]
C2 are opted from {10− 5 , 10− 3 , .., 103 , 105 } For the reported models, the ∑ ⃒ ⃒
b) MAE = N1 Ni=1 ⃒(So )i − (Sp )i ⃒
parameter ε is fixed to 0.01.
∑N |(So )i −
The k parameter is considered as 1 for the LDMR and LSLDMR based
(Sp )i |
c) SMAPE = N1 i=1 (So )i +(Sp )i
models. During the kernel selection, we have selected the Gaussian ∑N
((So )i − (Sp )i )2
kernel. The kernel parameter,μ of SVR, TSVR, LDMR, LSLDMR and the d) WI = 1 − ∑N i=1
2
proposed LDMR and LSLDMR based models is chosen from a range of i=1
( |(Sp )i − (S0 )|+|(So )i − (S0 )| )
∑N
{2− 5 , 2− 3 , .., 23 , 25 } The non-linear Gaussian kernel can be presented as:
e) CC = √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
i=1
((So )i − (So ))((Sp )i − (Sp ))
∑ √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
∑
( ) ( ⃦ ⃦2 ) N
((So )i − (So ))2
N
((Sp )i − (Sp ))2
G ki , lj = − exp − μ⃦ki − lj ⃦ i=1 i=1
Table 1
Prediction of the single models for SSL prediction.
Models Parameters Input Combinations Performance Indicators Time (in secs)
SVR (103,2-5) 5-days lag 0.0541 0.0235 0.2952 0.865 0.6902 0.8308 4.8822
(103,2-4) 4-days lag 0.0534 0.0032 0.0824 0.8992 0.6886 0.8298 4.7855
(103,2-5) 3-days lag 0.0543 0.0239 0.2985 0.863 0.6901 0.8307 5.1771
(102,2-1) 2-days lag 0.0533 0.0232 0.2826 0.8702 0.7019 0.8378 5.2858
(102,22) 1-day lag 0.0537 0.0234 0.2903 0.868 0.6958 0.8341 5.3062
Average 0.0537 0.0194 0.2498 0.8731 0.6933 0.8326 5.0874
TSVR (10-1,2-1) 5-days lag 0.0512 0.0245 0.2863 0.8964 0.7021 0.8379 0.3712
(10-2,2-1) 4-days lag 0.0512 0.0246 0.2875 0.8927 0.6933 0.8327 0.3357
(10-1,2-3) 3-days lag 0.0513 0.0247 0.288 0.892 0.6928 0.8324 0.3515
(10-3,2-3) 2-days lag 0.0507 0.0243 0.2875 0.8952 0.6998 0.8366 0.3507
(10-1,25) 1-day lag 0.0499 0.0252 0.3064 0.8921 0.6883 0.8297 0.4006
Average 0.0509 0.0247 0.2911 0.8937 0.6953 0.8339 0.3619
LDMR (10-3,100,20) 5-days lag 0.0523 0.0254 0.2858 0.8887 0.6788 0.8239 0.7093
(10-2,100,20) 4-days lag 0.0519 0.0251 0.2862 0.8899 0.6825 0.8261 0.6533
(10-2,10-5,2-2) 3-days lag 0.0505 0.0248 0.2909 0.8979 0.7001 0.8367 0.3031
(10-3,10-3,2-5) 2-days lag 0.0508 0.0247 0.2937 0.8956 0.6962 0.8344 0.3015
(10-2,10-5,24) 1-day lag 0.0498 0.0249 0.2929 0.9044 0.7063 0.8404 0.3046
Average 0.0511 0.0249 0.2899 0.8953 0.6928 0.8323 0.4544
LSLDMR (10-5,10-2,25) 5-days lag 0.0248 0.0177 0.2604 0.9658 0.8825 0.9394 0.1024
(10-4,10-2,25) 4-days lag 0.0276 0.0203 0.2759 0.9552 0.849 0.9214 0.1046
(10-4,10-2,25) 3-days lag 0.0333 0.0227 0.2775 0.9216 0.754 0.8683 0.1014
(10-5,10-2,24) 2-days lag 0.0305 0.0243 0.2973 0.9065 0.7148 0.8454 0.0989
(10-4,10-1,24) 1-day lag 0.0344 0.0252 0.3415 0.9015 0.7001 0.8367 0.1519
Average 0.0301 0.022 0.2905 0.9301 0.7801 0.8822 0.1118
5
B.B. Hazarika and D. Gupta Expert Systems With Applications 232 (2023) 120844
Table 2
Prediction performance of the decomposition-based hybrid models.
Models Parameters Input Combinations Performance Indicators Time (in secs)
EMD-LDMR (10-4,10-4,20) 5-days lag 0.0253 0.014 0.2182 0.9798 0.9239 0.9612 0.4931
(10-2,10-5,20) 4-days lag 0.0244 0.0133 0.2047 0.9813 0.9293 0.964 0.5221
(104,10-4,2-5) 3-days lag 0.0337 0.0166 0.2478 0.9607 0.868 0.9317 0.6956
(10-2,10-3,25) 2-days lag 0.0189 0.0099 0.158 0.9889 0.9573 0.9784 0.5177
(10-5,10-2,21) 1-day lag 0.046 0.0219 0.2587 0.924 0.7486 0.8652 0.6295
Average 0.0297 0.01514 0.2175 0.9669 0.8854 0.9401 0.5716
EEMD-LDMR (100,10-4,2-2) 5-days lag 0.0259 0.0125 0.1805 0.9781 0.9225 0.9605 0.6394
(100,10-5,2-3) 4-days lag 0.0289 0.0129 0.17 0.9724 0.9028 0.9502 0.6874
(102,10-5,2-4) 3-days lag 0.0291 0.0138 0.2044 0.9723 0.8999 0.9486 0.7371
(10-1,10-4,2-1) 2-days lag 0.0333 0.0147 0.1807 0.9635 0.8681 0.9317 0.7192
(10-2,10-3,20) 1-day lag 0.0451 0.0207 0.02105 0.927 0.7575 0.8703 0.6568
Average 0.0325 0.0149 0.1513 0.9627 0.8702 0.9323 0.6879
EMD-LSLDMR (10-3,10-4,25) 5-days lag 0.0082 0.0041 0.1169 0.9984 0.9939 0.9969 0.1116
(10-3,10-4,25) 4-days lag 0.0089 0.0058 0.13 0.9969 0.988 0.994 0.1213
(10-2,10-5,25) 3-days lag 0.0126 0.0096 0.1605 0.991 0.9653 0.9825 0.1094
(10-2,10-5,24) 2-days lag 0.014 0.0121 0.1831 0.9838 0.9393 0.9692 0.1068
(10-2,10-5,24) 1-day lag 0.0242 0.0184 0.2229 0.9515 0.8339 0.9132 0.1023
Average 0.0136 0.01 0.1627 0.9843 0.9441 0.971 0.1103
EEMD-LSLDMR (10-5,10-4,25) 5-days lag 0.0087 0.0032 0.0824 0.9992 0.9969 0.9984 0.1137
(10-5,10-4,25) 4-days lag 0.0096 0.0037 0.0929 0.9989 0.9955 0.9977 0.1077
(10-2,10-5,25) 3-days lag 0.0113 0.0083 0.1425 0.9934 0.9745 0.9872 0.1083
(10-2,10-5,24) 2-days lag 0.0122 0.01 0.1468 0.9879 0.9549 0.9772 0.1098
(10-2,10-5,24) 1-day lag 0.024 0.0183 0.2039 0.9522 0.8367 0.9147 0.1033
Average 0.0132 0.0087 0.1337 0.9863 0.9517 0.975 0.1086
∑N
it is noticeable that the LSLDMR shows the lowest average RMSE
f) R2 = ∑N i=1
[((So )i − (So ))((Sp )i − (Sp )]
∑N
i=1
[((Sp )i − (Sp ))]2 i=1
[((So )i − (So ))]2 (0.0301). Further, it can be noticed that among the single models, SVR
shows the lowest average MAE (0.0194) and SMAPE (0.2498) values
while the LSLDMR model has the highest WI (0.9301) and CC (0.8822)
where So is the original value and Sp indicate the predicted value.
values. The computational time of these models reveals that LSLDMR
model is more computationally efficient than SVR, TSVR and LDMR.
4.2. Dataset description
Table 2 shows the performance indicator values and computational
time (in seconds) for the EMD and EEMD based hybrid LDMR models.
To test the field applicability of EMD-LDMR, EEMD-LDMR, EMD-
One can notice from the table that the proposed EMD-LDMR, EEMD-
LSLDMR and EEMD-LSLDMR we have tested their prediction perfor
LDMR, EMD-LSLDMR and EEMD-LSLDMR show excellent prediction
mance on an SSL dataset. The dataset is accumulated from NHPC
performance with the best mean R2 values of 0.8854, 0.8702, 0.9441
Limited, Tawang Basin Project and the data collected from the Tawang
and 0.9517 respectively. The maximum R2 value 0.9969 for the EEMD-
Chu River, Arunachal Pradesh, India. The study area is shown in Fig. 2
LSLDMR model is for the full input data (5-days lag). The best average R2
(Panda et al., 2014). The gauge station was located at Jang, Arunachal
value for the EEMD-LSLDMR (0.9517), is followed by the EMD-LSLDMR
Pradesh, India. The dataset is a collection of daily SSL data for 3
model (0.9441). Overall based on mean R2 values, 36.11% and 35.906%
consecutive years from 2013 to 2015. The catchment area of the river is
increase can be observed for the EMD-LDMR and EEMD-LDMR respec
2737 sq. km. s and latitude 27 30′00″ to 28 24′00″ and longitude of
◦ ◦
tively compared to LDMR for Qt-5, Qt-4, Qt-3, Qt-2, Qt-1, Qt. Moreover,
91 47 00 to 92 28 00 . The dataset has a minimum value of 0.004 (gm/
′ ″ ′ ″
12.62% and 12.96% increase in R2 values can be noticed for the EMD-
◦ ◦
litre), maximum value of 0.0647 (gm/litre), variation of 0.0076 (gm/ LSLDMR and EEMD-LSLDMR respectively compared to LSLDMR for
litre), standard deviation of 0.0873 (gm/litre), skewness and kurtosis of Qt-5, Qt-4, Qt-3, Qt-2, Qt-1, Qt. Further, it is observed that the proposed
2.3284 (gm/litre) and 7.3785 (gm/litre) respectively. The mean, mode EEMD-LSLDMR shows the lowest average RMSE (0.0132) value which
and median values are 0.0649 (gm/litre), 0.0078 (gm/litre) and 0.0197 indicates the applicability of the proposed model. Among the
(gm/litre) respectively. decomposition-based hybrid models, EEMD-LSLDMR has the lowest
We have prepared 5 different variants of the datasets. They are mean MAE (0.0087) and SMAPE (0.1337) values as well as the highest
mean WI (0.9863) and CC (0.975) values.
a) Qt-5, Qt-4, Qt-3, Qt-2, Qt-1, Qt: 5 days lag Additionally, the following points can be observed:
b) Qt-4, Qt-3, Qt-2, Qt-1, Qt: 4 days lag
c) Qt-3, Qt-2, Qt-1, Qt: 3 days lag a) The lowest average RMSE of EEMD-LSLDMR is reduced to 75.419%,
d) Qt-2, Qt-1, Qt: 2 days lag 74.0668%, 74.168% and 41.096% compared to SVR, TSVR, LDMR
e) Qt-1, Qt: 1 day lag and LSLDMR respectively.
b) Moreover, the lowest average MAE of EEMD-LSLDMR is reduced to
5. Results and analysis 55.154%, 64.773%, 65.06% and 65.455% compared to SVR, TSVR,
LDMR and LSLDMR respectively.
The experimental outcomes for the single conventional models are c) The lowest average SMAPE of EEMD-LSLDMR is reduced to
shown in Table 1. Moreover, the optimum parameters for each model 46.477%, 54.071%, 53.881% and 53.976% compared to SVR, TSVR,
are also shown in Table 1. Five different input combinations are used in LDMR and LSLDMR respectively.
this work. It can be observed that the solitary machine learning algo d) The lowest average WI of EEMD-LSLDMR is increased by 12.96%,
rithms, i.e., SVR, TSVR, LDMR and LSLDMR show low prediction per 10.361%, 10.164% and 6.042% compared to SVR, TSVR, LDMR and
formance with the maximum R2 value of 0.7801 for LSLDMR involving LSLDMR respectively.
full input data (5-days lag). On further observation on the single models,
6
B.B. Hazarika and D. Gupta Expert Systems With Applications 232 (2023) 120844
Table 3
Average ranks based on different performance indicators for different input combinations (Best average rank is in boldface).
Input Combinations Indicators SVR TSVR LDMR LSLDMR EMD-LDMR EEMD-LDMR EMD-LSLDMR EEMD-LSLDMR
e) The lowest average CC of EEMD-LSLDMR is increased by 17.103%, indicators for each input combination. It can be noted that the proposed
16.921%, 17.145% and 10.519% compared to SVR, TSVR, LDMR EEMD-LSLDMR model shows the best results in 21 out of 30 cases. EMD-
and LSLDMR respectively. LSLDMR, EEMD-LDMR and EMD-LDMR reveal the best results in 2, 1
f) Additionally, the computational time of these models reveals that and 4 cases respectively while SVR show the best results in 2 cases. The
proposed EEMD-LSLDMR model takes overall lowest mean time for average rank is shown in the last row of Table 3. It can be noticed that
computation compared to SVR, TSVR, LDMR, EMD-LDMR, EEMD- the proposed EEMD-LSLDMR reveals the lowest average rank (1.3)
LDMR and EMD-LSLDMR. which clearly shows the dominance of the classifier. However,
measuring the performance of the models based on their mean rank is
Therefore, it is evident from the outcomes that the proposed EEMD- not adequate. Therefore, we perform the Friedman test with Nemenyi
LSLDMR is the best proposed model based on prediction as well as statistics (Demšar, 2006).
computational cost. Based on the average rank portrayed in Table 3 we further performed
Table 3 illustrates the ranks based on different performance non-parametric Friedman test (Demšar, 2006). Under the null
7
B.B. Hazarika and D. Gupta Expert Systems With Applications 232 (2023) 120844
8
B.B. Hazarika and D. Gupta Expert Systems With Applications 232 (2023) 120844
Fig. 5. Violin plot representation of the reported models on the SSL datasets.
9
B.B. Hazarika and D. Gupta Expert Systems With Applications 232 (2023) 120844
Fig. 5. (continued).
10
B.B. Hazarika and D. Gupta Expert Systems With Applications 232 (2023) 120844
Fig. 6. Prediction over the original values for the best single models.
11
B.B. Hazarika and D. Gupta Expert Systems With Applications 232 (2023) 120844
Fig. 7. Parameter insensitivity of proposed (a) EMD-LDMR, (b) EEMD-LDMR, (c) EMD-LSLDMR and (d) EEMD-LSLDMR on 5 days lag dataset.
12
B.B. Hazarika and D. Gupta Expert Systems With Applications 232 (2023) 120844
hypothesis that all models are significantly different, we formulate the and μ are shown respectively. The colour bar in the right side indicates
Friedman statistics as: the intensity of RMSE values. For example, the yellow colour indicates
⎡( )⎤
6.62 + 6.7662 + 6.8662 + 5.2662 + 3.42 + 3.5662 + 2.2332 + 1.332
12 × 30 ⎢ ⎥ (30 − 1) × 165.378
χ 2F = ⎣ 8 × 92 ⎦ = 165.978, FF = = 109.339
8(8 + 1) − 30(8 − 1) − 165.378
4
the highest RMSE values and deep blue colur indicates the lowest RMSE
values. It can be observed from the figure that the suggested regressors
FF is distributed with ((8 − 1), (8 − 1) × (30 − 1)) = (7, 203) degrees are not highly sensitive to user-defined parameters.
of freedom. The critical value for FF (7, 203) is 2.055 for α = 0.05 ,
which is lower compared to FF . Therefore, the null hypothesis is rejec 6. Conclusion
ted. Further, the critical distance (CD) is calculated for qα = 0.05 as:
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ In this paper, we have combined the EMD and EEMD based tech
8 × (8 + 1) niques with LDMR models for suspended sediment load prediction. The
CD = 2.8055 = 1.917.
6 × 30 proposed models, viz., EMD-LDMR, EEMD-LDMR, EMD-LSLDMR and
These are the conclusions which are established from the Nemenyi EEMD-LSLDMR models have been tested on SSL datasets that are gath
test for qα = 0.05: ered from the Tawang Chu River, India. The performance of these sug
gested models is compared with the traditional SVR, TSVR, LDMR and
a) The difference of SVR and EEMD-LSLDMR is more than the CD. LSLDMR. The performance of the regressors is assessed using 6 different
Therefore, EEMD-LSLDMR performs better than the SVR model. performance indicators. Furthermore, statistical analysis has been un
b) The difference between the average rank of TSVR and EEMD- dertaken to show the efficacy of our best-proposed technique. The
LSLDMR is higher than the CD. Therefore, EEMD-LSLDMR per following implications can be derived-
forms significantly better than the TSVR model.
c) Similarly, the difference between the average rank of LDMR and 1. EMD based hybrid models outperform the single models for SSL
EEMD-LSLDMR is more than the CD. Therefore, EEMD-LSLDMR prediction.
achieves significantly better generalization performance compared 2. EEMD based hybrid models outperform the EMD based hybrid
to the LDMR model. models and single models for SSL prediction.
d) Further, the difference between the LSDMR and EEMD-LSLDMR is 3. The proposed EEMD-LSDMR shows the best prediction performance
more than the CD. Therefore, EEMD-LSLDMR shows better general compared to other models.
ization performance compared to the LSLDMR model.
In future, it would be fascinating to combine the LDMR and LSLDMR
Fig. 3 shows the relative results of the Nemenyi test among all with empirical wavelet transform as well as with the recently proposed
learning methods based on average ranks. The methods with higher feature mode decomposition. The major limitation of this study is that
rankings are on the right, while those with lower rankings are on the left. we have not explored the proposed algorithms on different SSL datasets
The approaches within a horizontal line with a length less than or equal collected from other rivers. Therefore, we can explore the applicability
to a critical distance perform statistically identically. It is noticeable that of the models on the SSL datasets collected from other rivers.
EEMD-LSLDMR, EMD-LSLDMR, EEMD-LDMR and EMD-LDMR are at the
right side of the graph which shows the efficiency of the model. It can be CRediT authorship contribution statement
further noted that EEMD-LSLDMR is significantly different from SVR,
TSVR, LDMR, LSLDMR, EMD-LDMR and EEMD-LDMR models. But no Barenya Bikash Hazarika: Formal analysis, Validation, Visualiza
significant difference can be observed between EEMD-LSLDMR and tion, Conceptualization , Methodology. Deepak Gupta: Conceptualiza
EMD-LSLDMR models. Fig. 4 indicates the prediction performance of the tion, Investigation, Writing – original draft.
regressors for the various input combinations. It is evident that the
decomposition-based hybrid LDMR models are in close relationship with
Declaration of Competing Interest
the original data compared to the single models.
Fig. 5 depicts the best decomposition-based hybrid model on a violin
The authors declare that they have no known competing financial
diagram (Hoffmann, 2022). The figure shows that the EEMD-LSLDMR
interests or personal relationships that could have appeared to influence
has closer distribution to the original value when compared to other
the work reported in this paper.
models for all the input combinations.
In addition, the scatter plots are shown for each model based on the
Data availability
best input combinations in Fig. 6. It is evident that the proposed EMD-
LDMR, EEMD-LDMR, EEMD-LSLDMR and EMD-LSLDMR models are
The dataset is accumulated from NHPC Limited, Tawang Basin
highly correlated to the original values.
Project and the data is collected from the Tawang Chu River, Arunachal
Pradesh, India.
5.1. Parameter insensitivity
Acknowledgements
Fig. 7 show the parameter insensitivity plots for (a) EMD-LDMR (b)
EEMD-LDMR, (c) EMD-LSLDMR and (d) EEMD-LSLDMR on 5 days lag We would like to thank the National Hydroelectric Power Corpora
dataset for each C1 and μ combination based on RMSE. Here,C2 is fixed tion (NHPC) Limited, Tawang basin project, India for providing us with
to optimum. In the x-axis and y-axis, the different parameter values of C1 the dataset.
13
B.B. Hazarika and D. Gupta Expert Systems With Applications 232 (2023) 120844
14
B.B. Hazarika and D. Gupta Expert Systems With Applications 232 (2023) 120844
Reisenbüchler, M., Bui, M. D., & Rutschmann, P. (2021). Reservoir sediment Wang, W. C., Xu, D. M., Chau, K. W., & Chen, S. (2013). Improved annual rainfall-runoff
management using artificial neural networks: A case study of the lower section of the forecasting using PSO–SVM model based on EEMD. Journal of Hydroinformatics, 15
Alpine Saalach River. Water, 13(6), 818. https://doi.org/10.3390/w13060818 (4), 1377–1390. https://doi.org/10.2166/hydro.2013.134
Ren, Y., Suganthan, P. N., & Srikanth, N. (2014). A comparative study of empirical mode Wu, Z., & Huang, N. E. (2009). Ensemble empirical mode decomposition: A noise-assisted
decomposition-based short-term wind speed forecasting methods. IEEE Transactions data analysis method. Advances in Adaptive Data Analysis, 1(01), 1–41. https://doi.
on Sustainable Energy, 6(1), 236–244. https://doi.org/10.1109/tste.2014.2365580 org/10.1142/s1793536909000047
Rezaei, K., Pradhan, B., Vadiati, M., & Nadiri, A. A. (2021). Suspended sediment load Xie, Q., Hu, J., Wang, X., Du, Y., & Qin, H. (2023). Novel optimization-based
prediction using artificial intelligence techniques: Comparison between four state-of- bidimensional empirical mode decomposition. Digital Signal Processing, 133, Article
the-art artificial neural network techniques. Arabian Journal of Geosciences, 14(3), 103891. https://doi.org/10.1016/j.dsp.2022.103891
1–13. https://doi.org/10.1007/s12517-020-06408-1 Yang, H. F., & Chen, Y. P. P. (2019). Hybrid deep learning and empirical mode
Salih, S. Q., Sharafati, A., Khosravi, K., Faris, H., Kisi, O., Tao, H., … Yaseen, Z. M. decomposition model for time series applications. Expert Systems with Applications,
(2020). River suspended sediment load prediction based on river discharge 120, 128–138. https://doi.org/10.1016/j.eswa.2018.11.019
information: Application of newly developed data mining models. Hydrological Yang, J., Fu, Z., Zou, Y., He, X., Wei, X., & Wang, T. (2023). A response reconstruction
Sciences Journal, 65(4), 624–637. https://doi.org/10.1080/ method based on empirical mode decomposition and modal synthesis method.
02626667.2019.1703186 Mechanical Systems and Signal Processing, 184, Article 109716. https://doi.org/
Santhosh, M., Venkaiah, C., & Kumar, D. V. (2018). Ensemble empirical mode 10.1016/j.ymssp.2022.109716
decomposition based adaptive wavelet neural network method for wind speed Yang, Y., & Yang, Y. (2020). Hybrid prediction method for wind speed combining
prediction. Energy Conversion and Management, 168, 482–493. https://doi.org/ ensemble empirical mode decomposition and bayesian ridge regression. IEEE Access,
10.1016/j.enconman.2018.04.099 8, 71206–71218. https://doi.org/10.1109/access.2020.2984020
Shao, X., Sun, S., Li, J., Kong, W., Zhu, J., Li, X., & Hu, B. (2021). Analysis of functional Yaslan, Y., & Bican, B. (2017). Empirical mode decomposition based denoising method
brain network in MDD based on improved empirical mode decomposition with with support vector regression for time series prediction: A case study for electricity
resting state EEG data. IEEE Transactions on Neural Systems and Rehabilitation load forecasting. Measurement, 103, 52–61. https://doi.org/10.1016/j.
Engineering, 29, 1546–1556. https://doi.org/10.1109/tnsre.2021.3092140 measurement.2017.02.007
Sharafati, A., Haji Seyed Asadollah, S. B., Motta, D., & Yaseen, Z. M. (2020). Application Yonghao, M., Zhang, B., Li, C., Lin, J., & Zhang, D. (2022). Feature mode decomposition:
of newly developed ensemble machine learning models for daily suspended sediment new decomposition theory for rotating machinery fault diagnosis. IEEE Transactions
load prediction and related uncertainty analysis. Hydrological Sciences Journal, 65 on Industrial Electronics. https://doi.org/10.1109/tie.2022.3156156
(12), 2022-2042. https://doi.org/10.1080/02626667.2020.1786571. Zhang, H., Wang, B., Lan, T., & Chen, K. (2015). A modified method for non-stationary
Shiri, N., Shiri, J., Nourani, V., & Karimi, S. (2022). Coupling wavelet transform with hydrological time series forecasting based on empirical mode decomposition. Shuili
multivariate adaptive regression spline for simulating suspended sediment load: Fadian Xuebao, 34(12), 42–53. https://doi.org/10.11660/slfdxb.20151205
Independent testing approach. ISH Journal of Hydraulic Engineering, 28(sup1), Zhao, N., Ghaemi, A., Wu, C., Band, S. S., Chau, K. W., Zaguia, A., … Mosavi, A. H.
356–365. https://doi.org/10.1080/09715010.2020.1801528 (2021). A decomposition and multi-objective evolutionary optimization model for
Tao, H., Al-Khafaji, Z. S., Qi, C., Zounemat-Kermani, M., Kisi, O., Tiyasha, T., … suspended sediment load prediction in rivers. Engineering Applications of
Yaseen, Z. M. (2021). Artificial intelligence models for suspended river sediment Computational Fluid Mechanics, 15(1), 1811–1829. https://doi.org/10.1080/
prediction: State-of-the art, modeling framework appraisal, and proposed future 19942060.2021.1990133
research directions. Engineering Applications of Computational Fluid Mechanics, 15(1),
1585–1612. https://doi.org/10.1080/19942060.2021.1984992
15