Professional Documents
Culture Documents
Crude Oil Price Prediction Mode
Crude Oil Price Prediction Mode
Energy
journal homepage: www.elsevier.com/locate/energy
Crude oil price prediction model with long short term memory deep
learning based on prior knowledge data transfer
Zhongpei Cen*, Jun Wang
Institute of Financial Mathematics and Financial Engineering, School of Science, Beijing Jiaotong University, Beijing 100044, PR China
a r t i c l e i n f o a b s t r a c t
Article history: Energy resources have acquired a strategic significance for economic growth and social welfare of any
Received 8 March 2018 country throughout the history. Therefore, the prediction of crude oil price fluctuation is a significant
Received in revised form issue. In recent years, with the development of artificial intelligence, deep learning has attracted wide
25 October 2018
attention in various industrial fields. Some scientific research about using the deep learning model to fit
Accepted 3 December 2018
and predict time series has been developed. In an attempt to increase the accuracy of oil market price
Available online 7 December 2018
prediction, Long Short Term Memory, a representative model of deep learning, is applied to fit crude oil
prices in this paper. In the traditional application field of long short term memory, such as natural
Keywords:
Deep learning
language processing, large amount of data is a consensus to improve training accuracy of long short term
Crude oil energy market memory. In order to improve the prediction accuracy by extending the size of training set, transfer
Long short term memory predicting model learning provides a heuristic data extension approach. Moreover, considering the equivalent of each
Data transfer historical data to train the long short term memory is difficult to reflect the changeable behaviors of
Empirical predictive effect analysis crude oil markets, a very creative algorithm named data transfer with prior knowledge which provides a
Ensemble empirical mode decomposition more availability data extension approach (three data types) is proposed. For comparing the predicting
performance of initial data and data transfer deeply, the ensemble empirical mode decomposition is
applied to decompose time series into several intrinsic mode functions, and these intrinsic mode
functions are utilized to train the models. Further, the empirical research is performed in testing the
prediction effect of West Texas Intermediate and Brent crude oil by evaluating the predicting ability of
the proposed model, and the corresponding superiority is also demonstrated.
© 2018 Elsevier Ltd. All rights reserved.
1. Introduction global economic growth, while others think that high oil prices are
caused by economic growth, generally speaking, the relationship
As the most important non-renewable energy in the world, between oil prices and global economic is very unstable.
crude oil plays a significant and irreplaceable role in economic There has been a long history focused on crude oil markets and
society. Crude oil is an important material for many chemical in- financial markets prediction research. Alvarez-Ramirez et al. [1]
dustrial products including fertilizers, solvents, plastics and pesti- analyzed the auto-correlations of international crude oil. Chiroma
cides. The prices of international crude oil rest with the world's et al. [2] proposed an alternative approach for the prediction of
major oil-producing areas. For example, in New York Stock Ex- West Texas Intermediate (WTI) crude oil price. Niu and Wang [3]
change, the crude oil future in the United States West Texas is the investigated the statistical behaviors of long-range dependence
production of “intermediate base oil (WTI)” as the base oil. The phenomena and volatility clustering of crude oil price. Niu and
strength of USA super-crude oil buyers, coupled with New York Wang [4] constructed a simulative data from the financial model
Exchange itself influence, makes WTI benchmark oil futures trading that can accord with the real markets to a certain extent. Yu and
become as the global commodity futures varieties leader. Oil prices Wang [5] modeled a discrete time series of stock price process to
are closely related to global macroeconomic situation. Many compare the real financial markets. Mordjaoui et al. [6] used a
economists propose that high oil prices have a negative impact on dynamic neural network for the prediction of daily power con-
sumption. Wang and Wang [7] introduced the artificial neural
network to train and forecast the fluctuations of return intervals for
* Corresponding author.
the real data and the simulative data. Liao and Wang [8] introduced
E-mail address: zhongpeicen@bjtu.edu.cn (Z. Cen). an improved neural network with a stochastic time effective
https://doi.org/10.1016/j.energy.2018.12.016
0360-5442/© 2018 Elsevier Ltd. All rights reserved.
Z. Cen, J. Wang / Energy 169 (2019) 160e171 161
activation various gates which control whether the data is method is show on the forward
remembered by the current state in the cell. The LSTM uses these
gates to store intermediate state, in other words, the feature of f ðtÞ ¼ s W fx xðtÞ þ W fh hðt1Þ þ bf : (3)
LSTM is that nodes with the valve of each layer are added outside
the RNN structure. All elements of one LSTM cell are enumerated Output gate o: One memory cell can produce the ultimately
and represented in the diagram of Fig. 1, and there are three types of output value by multiplying the internal state h and the value of the
valves: forget gate, input gate and output gate. These three gates output gate o. The signal of internal state customary first runs
can be turned off or on, and determine whether the output of the through an activation function tanh, this compresses the output of
memory state of the model network (the state of the previous each memory cell a same dynamic range. In this part, we give the
network) reaches the threshold in the layer, thereby adding it to the calculation method
calculation of the current layer. The LSTM also has this chain like
structure, but the repeating module has a different structure. oðtÞ ¼ s W ox xðtÞ þ W oh hðt1Þ þ bo : (4)
The following equations give the complete algorithm for a LSTM
model, which are performed at each time step as follows: Internal state s: The most import part of one memory cell is the
Input node g: This unit is a node that takes activation signal from node with nonlinearity activation, which is called “internal state” s.
the input layer xðtÞ at the current time t to the hidden layer hðt 1Þ Every internal state s has a self recurrent structure. The update for
at previous time step t 1. The input signal aggregates weights by the internal state in the vector notation is
running through an activation function tanh, which is a typically
nonlinear function. And W gx express the weights between input sðtÞ ¼ gðtÞ 5iðtÞ þ sðt1Þ 5f ðtÞ (5)
node g and input layer x, W gh
express the weights between input
where 5 is point wise multiplication.
node and hidden layer h, and bg is the bias of input node. The
Hidden layer h: Hidden layer is defined by internal state and
following weights and bias are the same as this formula
output gate at current time step t
g ðtÞ ¼ f W gx xðtÞ þ W gh hðt1Þ þ bg : (1) hðtÞ ¼ f sðtÞ 5oðtÞ : (6)
Input gate i: Gate is a distinctive feature of the LSTM compared to There are two kinds of activation functions in the above for-
other RNNs, which is a node that contains an activation function s.
mulas. f is the tanh function
The activation function is a nonlinear mapping of input layer xðtÞ at
current time t and the value of hidden layer hðt 1Þ at the previous ex ex
time point into ½0; 1. The corresponding weights are denoted by fðxÞ ¼ tanhðxÞ ¼ (7)
ex þ ex
W ix and W ih ,
as well as bi is the bias of input gate. The value of gate
and another node are used to multiply, and if the input gate is sense and s is the sigmoid function
to be zero, then flow from the other node is cut off, in other words,
if the value of input gate is one, all flow from other nodes will be 1
sðxÞ ¼ sigmoidðxÞ ¼ : (8)
passed through. Figuratively speaking, the input gate is a switch of 1 þ ex
input node. Thus, we let For understanding the LSTM model, it is necessary to explain the
model construction of RNN firstly, since the LSTM is a deformation
iðtÞ ¼ s W ix xðtÞ þ W ih hðt1Þ þ bi : (2) of the RNN. Fig. 2(a) describes the structure of the RNN model, the
left side of the diagram is a basic model of the RNN, and the right is
Forget gate f: Forget gate provides a possibility for the LSTM the appearance after the model is expanded. The expansion is to
learning to forget the connotations of the internal state i, and it match the input sample. If a batch of time series is input, the
plays a especially useful role in continuously running neural net- maximum length of each bath time series is less than ten, then the
works. With this kind of design, the internal state calculation model is expanded ten times. The LSTM is same as the RNN, the
data will be passed to the next layer and processed to the next node
at the same level and at the same time. The difference between
them is that the LSTM has more hidden layer which is called deeper,
and the data processing nodes of the LSTM are cells, but the RNN
are neurons. Fig. 2(b) describes the model structure between the
LSTM layers. Here are two hidden layers, one batch data input from
the bottom of the LSTM model to the top, and one batch data are
input into each cell one by one, where every data will be input into
a LSTM cell, and input gate will accept input data and process with
the LSTM cell as mentioned above. A LSTM cell returns the output
data as the input data of the next layer, and passes the output data
to the next cell as short-term memory in the same layer. The final
model returns a string of data which is the output of the LSTM. In
this paper the output is the last one of these string data. A more
detailed description of the LSTM model can be referred to Ref. [25]
(see Fig. 3).
3. Data transfer
In this part, crude oil time series will be transferred into three
Fig. 1. Diagram of one cell of LSTM. longer and more complex time series by using three kinds of data
Z. Cen, J. Wang / Energy 169 (2019) 160e171 163
Table 1
Data selection and data transfer of daily crude oil prices for West Texas Intermediate and Brent.
Mode data sets total number training set testing set size
nonlinear mapping, the over selection of the hyper parameter will From the curves of actual closing prices and predictive data
lead to good performance of data fitting in training set, but poor which are exhibited by Fig. 4(a)(c)(e) and Fig. 5(a)(c)(e), they show
performance in test set, and this phenomenon is called over fitting. that the distinctions between the predicted value and the actual
To avoid drawing into over fitting, cross validation is used to data are almost nothingness in the training set, so as the distances
determine other hyper parameters, that means the data will be between the predictive data string and the actual data in the test
divided into three sets for appropriate parameters. It can find a set set. This can draw a conclusion that the LSTM model has a powerful
of most representative hyper parameters, that have the best generalization ability to train the crude oil time series well.
optimal generalization ability in validation set. The number of Furthermore, facing a sharp fall in the test set, Type III has a
batch size n is 10, data transfer is only made in the training set part, stronger adaptability than Type I and Type II. Fig. 4(b)(d)(f) are the
so the numbers of Type II and Type III are both n times of Type I linear regressions of WTI crude oil prices and predicted closing
training set size, and these three data processing have the same test prices of Type I, Type II and Type III data by the LSTM model
set. Table 2 lists four kinds of information for crude oil prices that respectively, Fig. 5(b)(d)(f) are the linear regressions of Brent crude
include opening prices, closing prices, highest prices and lowest oil prices and predicted closing prices of Type I, Type II and Type III
prices, and Table 2 also displays the corresponding three kinds of data by the LSTM model respectively, from these plots, the slope
data processing methods. For the data processing methods, the can explain that the predicted data and the real closing prices are
highest prices, the lowest prices and the opening prices are the close.
input features and the closing prices are as labels.
4. Evaluation by multiple statistical measures
Fig. 4. (a)(c)(e) Training and testing results of WTI daily closing prices of LSTM model for Type I, Type II and Type III, respectively. (b)(d)(f) Linear regressions of WTI predicted
closing prices of Type I, Type II and Type III by LSTM, respectively.
166 Z. Cen, J. Wang / Energy 169 (2019) 160e171
Fig. 5. (a)(c)(e) Training and testing results of Brent daily closing prices of LSTM model for Type I, Type II and Type III, respectively. (b)(d)(f) Linear regressions of Brent predicted
closing prices of Type I, Type II and Type III by LSTM model, respectively.
PN
t¼1 ðyt yÞ dt d
CC ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2ffi: (14)
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi PN 2 PN
1
PN 2 t¼1 ðy t yÞ t¼1 d t d
N t¼1 ðdt yt Þ
TIC ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
PN 2 ffi q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
PN 2 : (13)
1 1 In the above notations, dt is the real value and yt is the predict
N t¼1 dt þ N t¼1 yt
value at time t of WTI. N denotes the number of the evaluated data,
Correlation coefficient (CC). d is the average real value and y is the average predict value. Here
Z. Cen, J. Wang / Energy 169 (2019) 160e171 167
the test set is set as the same size for comparing the error of test set Table 4
more reasonably. The smaller values of MAE, RMSE, MAPE and Error evaluations for Brent data Type I, Type II and Type III of long short term
memory model.
SMAPE show the less deviation of the predicting results from the
actual values and the closer distance between time series. The value Training Times Mode MAE RMSE MAPE SMAPE TIC CC
of TIC is closer to 0, it means that the accuracy is higher, on the 1000 I 0.1819 0.4427 3.4635 3.8618 0.0137 0.987
contrary, the value of TIC and CC is closer to 1, this means the more II 0.1787 0.2567 2.6389 2.0833 0.0112 0.992
errors and the lower accuracy. III 0.1122 0.1223 1.0072 1.0651 0.0089 0.995
Through the cross-validation [36], it can find a set of most Training Times Mode MAE RMSE MAPE SMAPE TIC CC
representative hyper parameters, that have the best optimal 2000 I 0.1361 0.4853 2.1224 5.1398 0.0393 0.981
generalization ability in validation set. The learning rate of the II 0.1401 0.3228 2.4574 3.0056 0.0255 0.923
LSTM is chosen as 0.01 which is small enough to fit unceasingly III 0.1147 0.1148 1.3279 1.1212 0.0124 0.997
changing data, the number of neurons for each layer (we have Training Times Mode MAE RMSE MAPE SMAPE TIC CC
chosen here) is 150, and the number of hidden layers is 2. In
5000 I 0.1712 0.6234 4.6821 13.3412 0.0645 0.952
Tables 3 and 4, the empirical research has been made for the error II 0.1031 0.3565 1.4532 8.6756 0.0606 0.920
evaluation at different levels of training times K. By comparing the III 0.0718 0.2876 0.7824 0.8457 0.0156 0.994
statistical indicators between different data preprocessing methods Training Times Mode MAE RMSE MAPE SMAPE TIC CC
and different parameters, it is obvious that Type III has the smaller
10000 I 0.3996 0.5318 6.3018 559.85 0.0723 0.926
values in MAE, RMSE, SMAPE, TIC and CC than those of Type I and
II 0.1086 0.4230 5.2342 594.58 0.0772 0.981
Type II. It is clear that Type III has a better performance in pre- III 0.0521 0.0539 1.2108 0.8254 0.0411 0.997
dicting crude oil markets. In Fig. 6, we comparatively study the
Training Times Mode MAE RMSE MAPE SMAPE TIC CC
relative errors of the predicted data, where the relative errors are
given by ~ e ¼ ðyðtÞ dðtÞÞ=dðtÞ, we can compare the errors at each 20000 I 0.1774 0.7185 9.6424 7.6572 0.2984 0.917
II 0.1499 0.6512 5.1840 6.2693 0.1483 0.978
time point in test set. Fig. 6(a)(b)(c) are the relative errors of crude III 0.1113 0.2023 1.7721 3.5887 0.1161 0.989
oil prices for Type I, Type II and Type III predicted by the LSTM
model individually, it is clear that the relative errors of Type III
mode are lower than those of Type I and Type II predicted by the
LSTM model individually, this phenomenon demonstrates that adaptive physical basis of the original time series. In the high fre-
Type III mode has a better performance in predicting accuracy. quency part, each IMF has two requirements by following: (i) Over
all the time series, the number of local extremum and zero-
5. Forecasting model based on ensemble empirical mode crossings must either differ at most by one or be equal, that is,
decomposition between each zero-crossing there is only one extremum; (ii) At any
time point, the mean value of the envelope defined by the local
Empirical mode decomposition (EMD) is an analyzing nonlinear maxima and the local minima separately both are zero. But aimed
and non-stationary data method. Under no requirement of statio- at the insufficient of this method the ensemble empirical mode
narity or nonlinearity data, the EMD can extract all the oscillatory decomposition (EEMD), a noise-assisted data analysis method, is
modes [37e39]. The EMD locally decomposes any time series into introduced in Refs. [40,41], which is defined to solve the EMD
the high frequency part, which are composed with a series intrinsic mixing appears frequently. The biggest differences for intrinsic
mode functions (IMFs), and correspondingly a low frequency part mode function between EEMD and EMD are components as the
that names residual. Different IMFs represent different scales and mean of an ensemble of trials, each IMF consists of the signal and a
Table 3
Error evaluations for West Texas Intermediate data Type I, Type II and Type III of long short term memory model.
Fig. 6. (a) Relative error of crude oil prices for Type I. (b) Relative error of crude oil prices for Type II. (c) Relative error of crude oil prices for Type III.
finite amplitude white noise. The EEMD dispels the negative in- EEMD. It can be found that the fluctuation frequency gradually
fluence of mode mixing and meanwhile preserves physical reduces from IMF1 to IMF6. That means that the IMF1 is the first
uniqueness of decomposition by adding the white noise. The spe- series decomposed from the WTI and remains the main structures
cific algorithm of EEMD is as follows. and characteristics, and apparently it has a higher frequency than
Step one: For obtaining the noise added signal xm , we sum the other IMFs. Every time calculating the data with sifting, the data's
time signal x and a white noise time series nm in the mth trial frequency will be lower than the former's. It is known in the above
that, different data processing modes have different training and
xm ðtÞ ¼ xðtÞ þ nm ðtÞ: (15) generalization ability for the LSTM, the introduction of EEMD in this
Step two: Like traditional EMD decomposing the signal with paper is for the purpose of analyzing the predicting performance of
noise xm into IMFs cj;m , the EEMD uses the same algorithm three kinds data processing modes deeply. We choose the best
performance parameters of the LSTM to predict the IMFs, the IMFs
X
n as input data will be processed by Type I, Type II and Type II modes
xm ðtÞ/ cj;m ðtÞ þ rðtÞ (16) like the real data of WTI. The prediction performance of different
j¼1 IMFs is useful to understand the predictive ability of the proposed
model. Fig. 8(a)(d)(g)(j)(m)(p) are the IMFs of Type I,
and the name of r is residue, which is the data xm extracted after p Fig. 8(b)(e)(h)(k)(n)(q) are the IMFs of Type II and
times of cj;m . Fig. 8(c)(f)(i)(l)(o)(r) are the IMFs of Type III. To demonstrate the
Step three: Return to Step one, and repeat Steps one and two for performance of IMFs in different modes more clearly, we apply the
a predefined number M of trials, and in this process we use a methods MAE, RMSE, MAPE, SMAPE, TIC, CC to measure the specific
different white noise series with the same amplitude for every errors in Table 5. It can attribute to that Type III has a better fitting
time. ability than Type I and Type II for every IMF. It is obvious that Type
Step four: Calculate and obtain the ensemble mean of the cor- III has a stronger generalization ability on IMF1 and IMF2, but there
responding cj;m as the final IMFs, that is given by is no great advantage on IMF3, IMF4, IMF5 and IMF6, because white
noises with equal amplitude influences are greater on IMF3 to IMF6
1 XM compared with IMF1 and IMF2.
IMFj ðtÞ ¼ c ; j ¼ 1; 2; /; p; m ¼ 1; 2; /; M: (17)
M m¼1 j;m
6. Conclusion
So it is obvious that each IMF is peeled from previous residual,
and the front IMF fluctuates more acutely, and the back IMF tends The crude oil plays a significant and irreplaceable role in eco-
to be more slower fluctuation. nomic society, West Texas Intermediate crude oil and Brent crude
Fig. 7 shows the IMFs and the residual of WTI index by the oil are the most important crude oil market indices in the world. In
Fig. 7. Plots of IMFs and the residual of WTI index by EEMD model.
Z. Cen, J. Wang / Energy 169 (2019) 160e171 169
Fig. 8. Real IMFs of WTI and IMFs of long short term memory model by Type I, Type II and Type III.
the present paper, Long Short Term Memory of deep learning al- compare the predictive ability of Type I, Type II and Type III data.
gorithm is applied to predict the volatility behaviors of crude oil The comparison results of MAE, RMSE, MAPE, SMAPE, TIC and R
prices, and this is a new approach for predicting energy markets by show that Type III predicting performance is better than Type I and
using deep learning networks. Considering on the characteristics of Type II. Furthermore, the nonlinear analysis method EEMD is
energy price time series, new data transfer modes are established applied to decompose the crude oil price series into different
aimed at improving predicting ability of the fluctuations for global fluctuation frequency levels, that are the intrinsic mode functions
crude oil prices, which are Type I, Type II and Type III data pre- (IMFs). The corresponding predicting performance on the IMFs
processing methods. Through comparing linear regression of the shows that the proposed LSTM model can catch the main fluctua-
predicted values and the real data, the experiment results display tion characteristics of crude oil prices for different fluctuation fre-
that the predictive values can approach the real data well by the quency levels.
LSTM model. Then we adopt several evaluation methods to
170 Z. Cen, J. Wang / Energy 169 (2019) 160e171
Table 5
Error evaluations of IMFs for Type I, Type II and Type III of long short term memory model model.
Acknowledgment [16] Kusumo F, Silitonga AS, Masjuki HH, Ong HC, Siswantoro J, Mahlia TMI.
Optimization of transesterification process for Ceiba pentandra oil: a
comparative study between kernel-based extreme learning machine and
The authors were supported by National Natural Science artificial neural networks. Energy 2017;134:24e34.
Foundation of China Grant No. 71271026. [17] Schmidhuber J. Deep learning in neural networks: an overview. Neural
Network: Off J Int Neural Netw Soc 2015;61:85e117.
[18] Dahl G E, Yu D, Deng L, Acero A. Context-dependent pre-trained deep neural
References networks for large-vocabulary speech recognition. IEEE Trans Audio Speech
Lang Process 2012;20:33e42.
[1] Alvarez-Ramirez J, Alvarez J, Rodriguez E. Short-term predictability of crude [19] LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient based learning applied to
oil markets: a detrended fluctuation analysis approach. Energy Econ 2008;30: document recognition. Proc IEEE 1998;86:2278e324.
2645e56. [20] Safari A, Davallou M. Oil price forecasting using a hybrid model. Energy
[2] Chiroma H, Abdulkareem S, Herawan T. Evolutionary Neural Network model 2018;148:49e58.
for West Texas Intermediate crude oil price prediction. Appl Energy 2015;142: [21] Movagharnejad K, Mehdizadeh B, Banihashemi M, Kordkheili MS. Forecasting
266e73. the differences between various commercial oil prices in the Persian Gulf
[3] Niu H, Wang J. Volatility clustering and long memory of financial time series region by neural network. Energy 2011;36:3979e84.
and financial price model. Digit Signal Process 2013;23:489e98. [22] Wang J, Wang J. Forecasting energy market indices with recurrent neural
[4] Niu H, Wang J. Return volatility duration analysis of NYMEX energy futures networks: case study of crude oil price fluctuations. Energy 2016;102:
and spot. Energy 2017;140:837e49. 365e74.
[5] Yu Y, Wang J. Lattice-oriented percolation system applied to volatility [23] Li J, Wang R, Wang J, Li Y. Analysis and forecasting of the oil consumption in
behavior of stock market. J Appl Stat 2012;39:785e97. China based on combination models optimized by artificial intelligence al-
[6] Mordjaoui M, Haddad S, Medoued A, Laouafi A. Electric load forecasting by gorithms. Energy 2018;144:243e64.
using dynamic neural network. Energy 2017;42:17655e63. [24] Rahman A, Srikumar V, Smith AD. Predicting electricity consumption for
[7] Wang F, Wang J. Statistical analysis and forecasting of return interval for SSE commercial and residential buildings using deep recurrent neural networks.
and model by lattice percolation system and neural network. Comput Ind Eng Energy 2018;212:372e85.
2012;62:198e205. [25] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput
[8] Liao Z, Wang J. Forecasting model of global stock index by stochastic time 1997;9:1735e80.
effective neural network. Expert Syst Appl 2010;37:834e41. [26] Sutskever I, Martens J, Hinton GE. Generating text with recurrent neural
[9] Taghavifar H, Mardani A. Applying a supervised ANN (artificial neural networks. Int Conf Mach Learn 2011;336:1310e8.
network) approach to the prognostication of driven wheel energy efficiency [27] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning
indices. Energy 2014;68:651e7. to align and translate. In: International Conference on learning Representa-
[10] Rumelhart DE, Hinton GE, Williams RJ, Haffner P. Learning representations by tions; 2015. p. 3111e229.
back propagating errors. Nature 1986;323:533e6. [28] Graves A, Mohamed A, Hinton GE. Speech recognition with deep recurrent
[11] Katijani Y, Hipel KW, Mcleod AI. Forecasting nonlinear time series with neural networks. Acoust Speech Signal Process (ICASSP) 2013;38:6645e9.
feedforward neural networks: a case study of Canadian lynx data. J Forecast [29] Sak H, Senior A, Beaufays F. Long short-term memory recurrent neural
2005;24:105e17. network architectures for large scale acoustic modeling. In: Proceedings of the
[12] Takahama T, Sakai S, Hara A, Iwane N. Predicting stock price using neural 15th annual conference of the international speech communication associa-
networks optimized by differential evolution with degeneration. Int J Innov tion: celebrating the diversity of spoken languages, INTERSPEECH 2014;
Comput Inf Contr Ijicic 2009;5:5021e32. September 2014. p. 338e42.
[13] Tripathyb M. Power transformer differential protection using neural network [30] Sutskever I, Vinyals O, Le QV. Sequence to sequence learning with neural
principal component analysis and radial basis function neural network. Sim- networks. Adv Neural Inf Process Syst 2014;27:3104e22.
ulat Model Pract Theor 2010;18:600e11. [31] Liu H, Mi XW, Li YF. Wind speed forecasting method based on deep learning
[14] Devaraj D, Yegnanarayana B, Ramar K. Radial basis function networks for fast strategy using empirical wavelet transform, long short term memory neural
contingency ranking. Electr Power Energy Syst 2002;24:387e95. network and Elman neural network. Energy 2018;156:498e514.
[15] Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep [32] Doucoure B, Agbossou K, Cardenas A. Time series prediction using artificial
convolutional neural networks. Adv Neural Inf Process Syst 2012;60: wavelet neural network and multi-resolution analysis: application to wind
1097e105. speed data. Renew Energy 2016;92:202e11.
Z. Cen, J. Wang / Energy 169 (2019) 160e171 171
€ A neural network and ARIMA model for water quality time series
[33] Faruk OF. [38] Huang NE, Shen Z, Long SR, Wu MC, Zheng Q, Yen NC, Tung CC, Liu HH. The
prediction, vol. 23. Pergamon Press; 2010. p. 586e94. empirical mode decomposition and the Hilbert spectrum for nonlinear and
[34] Olson D, Mossman C. Neural network forecasts of Canadian stock returns non-stationary time series analysis. Proceed: Math Phys Eng Sci 1998;454:
using accounting ratios. Int J Forecast 2003;19:453e65. 903e95.
[35] Plumb AP, Rowe RC, York P, Brown M. Optimisation of the predictive ability of [39] Huang YX, Schmitt FG, Hermand JP, Gagne Y, Lu ZM, Liu YL. Arbitraryorder
artificial neural network (ANN) models: a comparison of three ANN programs Hilbert spectral analysis for time series possessing scaling statistics: com-
and four classes of training algorithm. Eur J Pharm Sci 2005;25:395e405. parison study with detrended fluctuation analysis and wavelet leaders. Phys
[36] Golub GH, Heath M, Wahba G, Brown M. Generalized cross-validation as a Rev. E, Statist, Nonlin Soft Matter Phys 2011;84, 016208.
method for choosing a good ridge parameter. Technometrics 1979;21: [40] Wu ZH, Huang NE. Ensemble empirical mode decomposition: a noise-assisted
215e23. data analysis method. Adv Adapt Data Anal 2009;1:1e41.
[37] Flandrin P, Goncalves P. Empirical mode decompositions as data-driven [41] Yu L, Wang ZS, Tang L, Gagne Y, Lu ZM, Liu YL. A decomposition-ensemble
wavelet-like expansions. Int J Wavelets, Multiresol Inf Process 2004;2: model with data-characteristic-driven reconstruction for crude oil price fore
477e96. casting. Appl Energy 2015;156:251e67.