Expert Systems With Applications: Lin Wang, Yi Zeng, Tao Chen

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

ESWA 9499 No.

of Pages 9, Model 5G
3 September 2014

Expert Systems with Applications xxx (2014) xxx–xxx


1

Contents lists available at ScienceDirect

Expert Systems with Applications


journal homepage: www.elsevier.com/locate/eswa

5
6

3 Back propagation neural network with adaptive differential evolution


4 algorithm for time series forecasting
7 Q1 Lin Wang a, Yi Zeng a, Tao Chen b,⇑
8 a
School of Management, Huazhong University of Science and Technology, Wuhan 430074, China
9 b
College of Public Administration, Huazhong University of Science and Technology, Wuhan 430074, China

10
11
a r t i c l e i n f o a b s t r a c t
1
2 3
1
14 Article history: The back propagation neural network (BPNN) can easily fall into the local minimum point in time series 22
15 Available online xxxx forecasting. A hybrid approach that combines the adaptive differential evolution (ADE) algorithm with 23
BPNN, called ADE–BPNN, is designed to improve the forecasting accuracy of BPNN. ADE is first applied 24
16 Keywords: to search for the global initial connection weights and thresholds of BPNN. Then, BPNN is employed to 25
17 Time series forecasting thoroughly search for the optimal weights and thresholds. Two comparative real-life series data sets 26
18 Back propagation neural network are used to verify the feasibility and effectiveness of the hybrid method. The proposed ADE–BPNN can 27
19 Differential evolution algorithm
20 effectively improve forecasting accuracy relative to basic BPNN, autoregressive integrated moving aver- 28
age model (ARIMA), and other hybrid models. 29
Ó 2014 Elsevier Ltd. All rights reserved. 30
31

32
33
34 1. Introduction presented by the data. This data-driven approach is suitable for 57
many empirical data sets, wherein theoretical guidance is unavail- 58
35 Time series forecasting is an important area in forecasting. One able to suggest an appropriate data generation process. The for- 59
36 of the most widely employed time series analysis models is the ward neural network is the most widely used ANNs. Meanwhile, 60
37 autoregressive integrated moving average (ARIMA), which has the back propagation neural network (BPNN) is one of the most 61
38 been used as a forecasting technique in several fields, including utilized forward neural networks (Wang, Zeng, Zhang, Huang, & 62
39 traffic (Kumar & Jain, 1999), energy (Ediger & Akar, 2007), economy Bao, 2006). BPNN, also known as error back propagation network, 63
40 (Khashei, Rafiei, & Bijari, 2013), tourism (Chu, 2008), and health is a multilayer mapping network that minimizes an error backward 64
41 (Yu, Kim, & Kim, 2013). ARIMA has to assume that a given time ser- while information is transmitted forward. A single hidden BPNN 65
42 ies is linear (Box & Jenkins, 1976). However, time series data in layer can generally approximate any nonlinear function with arbi- 66
43 real-world settings commonly have nonlinear features under a trary precision (Aslanargun, Mammadov, Yazici, & Yolacan, 2007). 67
44 new economic era (Lee & Tong, 2012; Liu & Wang, 2014a, 2014b; This feature makes BPNN popular for predicting complex nonlinear 68
45 Matias & Reboredo, 2012). Consequently, ARIMA may be unsuit- systems. 69
46 able for most nonlinear real-world problems (Khashei, Bijari, & BPNN is well known for its back propagation-learning algo- 70
47 Ardali, 2009; Zhang, Patuwo, & Hu, 1998). Artificial neural rithm, which is a mentor-learning algorithm of gradient descent, 71
48 networks (ANNs) have been extensively studied and used in time or its alteration (Zhang et al., 1998). According to the theory, the 72
49 series forecasting (Adebiyi, Adewumi, & Ayo, 2014; Bennett, connection weights and thresholds of a network are randomly ini- 73
50 Stewart, & Beal, 2013; Geem & Roper, 2009; Zhang, Patuwo, & tialized first. Then, by using the training sample, the connection 74
51 Hu, 2001; Zhang & Qi, 2005). Zhang et al. (1998) presented a weights and thresholds of the network are adjusted to minimize 75
52 review of ANNs. The advantages of ANNs are their flexible the mean square error (MSE) of the network output value and 76
53 nonlinear modeling capability, strong adaptability, as well as their actual value through gradient descent. When the MSE achieves 77
54 learning and massive parallel computing abilities (Ticknor, 2013). the goal setting, the connection weights and thresholds are deter- 78
55 Specifying a particular model form is unnecessary for ANNs; the mined, and the training process of the network is finished. How- 79
56 model is instead adaptively formed based on the features ever, one flaw of this learning algorithm is that the final training 80
result depends on the initial connection weights and thresholds 81
to a large extent. Hence, the training result easily falls into the local 82
⇑ Corresponding author. minimum point rather than into the global optimum; thus, the net- 83
E-mail addresses: wanglin982@gmail.com (L. Wang), zengy200810@126.com work cannot forecast precisely. To overcome this shortcoming, 84
(Y. Zeng), chentao15@163.com (T. Chen).

http://dx.doi.org/10.1016/j.eswa.2014.08.018
0957-4174/Ó 2014 Elsevier Ltd. All rights reserved.

Please cite this article in press as: Wang, L., et al. Back propagation neural network with adaptive differential evolution algorithm for time series
forecasting. Expert Systems with Applications (2014), http://dx.doi.org/10.1016/j.eswa.2014.08.018
ESWA 9499 No. of Pages 9, Model 5G
3 September 2014

2 L. Wang et al. / Expert Systems with Applications xxx (2014) xxx–xxx

85 many researchers have proposed different methods to optimize the approaches have been proposed by a number of researchers 128
86 initial connection weights and thresholds of traditional BPNN. Yam (Zhang & Subbarayan, 2002; Zhang et al., 1998). None of the 129
87 and Chow (2000) proposed a linear algebraic method to select the choices, however, works efficiently for all problems. The most com- 130
88 initial connection weights and thresholds of BPNN. Intelligent evo- mon means to determine the appropriate number of input and hid- 131
89 lution algorithms, such as the genetic algorithm (GA) (Irani & den nodes is via experiments or by trial and error based on the 132
90 Nasimi, 2011) and particle swarm optimization (PSO) (Zhang, minimum mean square error of the test data (Hosseini, Luo, & 133
91 Zhang, Lok, & Lyu, 2007), have also been used to select the initial Reynolds, 2006). 134
92 connection weights and thresholds of BPNN. The proposed models In the current study, a single BP network is used for 135
93 are superior to traditional BPNN models in terms of convergence one-step-ahead forecasting. Several past observations are used to 136
94 speed or prediction accuracy. forecast the present value. That is, the input is ytn, ytn+1, . . ., yt2, 137
95 As a novel evolutionary computational technique, the differen- yt1; and yt is the target output. The input and output values of 138
96 tial evolution algorithm (DE) performs better than other popular the hidden layer are represented as Eqs. (1) and (2), respectively. 139
97 intelligent algorithms, such as GA and PSO, based on 34 widely The input and output values of the output layer are represented 140
98 used benchmark functions (Vesterstrom & Thomsen, 2004). Com- as Eqs. (3) and (4), respectively. The equations are given as follows: 141
142
99 pared with popular intelligent algorithms, DE has less complex
X
n
100 genetic operations because of its simple mutation operation and Ij ¼ wji yti þ bj ð j ¼ 1; . . . ; hÞ; ð1Þ
101 one-on-one competition survival strategy. DE can also use individ- i¼1
102 ual local information and population global information to search yj ¼ f h ðIj Þ ð j ¼ 1; . . . ; hÞ; ð2Þ
103 for the optimal solution (Wang, Fu, & Zeng, 2012; Wang, Qu,
X
h
104 Chen, & Yan, 2013; Zeng, Wang, Xu, & Fu, 2014). DEs and improved Io ¼ woj yj þ ao ðo ¼ 1Þ; ð3Þ
105 DEs are among the best evolutionary algorithms in a variety of j¼1
106 fields because of their easy implementation, quick convergence, yt ¼ f o ðIo Þðo ¼ 1Þ; ð4Þ 144
107 and robustness (Onwubolu & Davendra, 2006; Qu, Wang, & Zeng,
108 2013; Wang, He, & Zeng, 2012). However, only a few researchers where I denotes the input; y denotes the output; yt is the forecasted 145
109 have used the DE to select suitable BPNN initial connection value of point t; n and h denote the number of input layer nodes and 146
110 weights and thresholds in time series forecasting. Therefore, this hidden layer nodes, respectively; wji denotes the connection 147
111 study uses adaptive DE (ADE) to select appropriate initial connec- weights of the input and hidden layers; and woj denotes the connec- 148
112 tion weights and thresholds for BPNN to improve its forecasting tion weights of the hidden and output layers. bj and ao are the 149
113 accuracy. Two real-life time series data sets with nonlinear and threshold values of the hidden and output layers, respectively, 150
114 seasonal changing tendency features are employed to compare which are always distributed between 1 and 1. fh and fo are the 151
115 the forecasting performance of the proposed model with those of activation functions of the hidden and output layers, respectively. 152
116 other forecasting models. Generally, the activation function of each node in the same layer 153
117 The remainder of this paper is organized as follows. Section 2 is the same. The most widely used activation function for the output 154
118 discusses the ADE–BPNN model, including theory of BPNN in time layer is the linear function because the nonlinear activation func- 155
119 series forecasting and the ADE process. Section 3 presents two tion may introduce distortion to the predicted output. The logistic 156
120 numerical examples. Section 4 concludes the study. and hyperbolic functions are frequently used as the hidden layer 157
transfer functions (Zhang et al., 1998). 158

121 2. BPNN with DE


2.2. DE and ADE 159

122 2.1. BPNN for time series forecasting


2.2.1. Standard DE 160
The standard DE consists of four main operations: initialization, 161
123 A single hidden BPNN layer consists of an input layer, a hidden
mutation, crossover, and selection. Details are discussed as 162
124 layer, and an output layer as shown in Fig. 1. Adjacent layers are
follows: 163
125 connected by weights, which are always distributed between 1
126 and 1. A systematic theory to determine the number of input nodes
(1) Initialization: Real number coding is used for the DE. In this 164
127 and hidden layer nodes is unavailable, although some heuristic
operation, several parameters, including population size N, 165
length of chromosome D, scaling factor F, crossover rate 166
CR, and the range of gene value [Umin, Umax], are initialized. 167
The population is randomly initialized with Eq. (5), as 168
follows: 169
170

xij ¼ U min þ rand ðU max  U min Þ; ð5Þ 172

where i = 1, 2, ..., N, j = 1, 2, ..., D, and rand is a random number 173


with a uniform probability distribution. 174
(2) Mutation: For each objective individual xGi , i = 1, 2, . . ., N, the 175
standard DE algorithm generates a corresponding mutated 176
individual, which is expressed by Eq. (6): 177
  178
v iGþ1 ¼ xGr1 þ F  xGr2  xGr3 ; ð6Þ 180

where the individual serial numbers r1, r2, and r3 are different 181
and randomly generated. None of the numbers is identical to 182
the objective individual serial number i. Therefore, the popu- 183
lation size N P 4. The scaling factor F, which controls the 184
Fig. 1. Single hidden layer BPNN structure. mutation degree, is within the range of [0, 2], as mentioned 185

Please cite this article in press as: Wang, L., et al. Back propagation neural network with adaptive differential evolution algorithm for time series
forecasting. Expert Systems with Applications (2014), http://dx.doi.org/10.1016/j.eswa.2014.08.018
ESWA 9499 No. of Pages 9, Model 5G
3 September 2014

L. Wang et al. / Expert Systems with Applications xxx (2014) xxx–xxx 3

186 in literature (Cui, Wang, & Deng, 2014; Storn & Price, 1997; There are more decision variables to be optimized for the 247
187 Wang, Fu, & Zeng, 2012). BPNN’s training, and it can easily fall into the local minimum point 248
188 (3) Crossover: The crossover operation method, which is shown for the time series forecasting problem. As what has been dis- 249
189 in Eq. (7), generates an experimental individual as follows: cussed above, the ADE with adaptive mutation factor can obtain 250
190 ( a good balance between global search and local search, and has 251
v Gþ1
ij ; if rðjÞ  CR or j ¼ rnðiÞ
the apparent merits of global convergence ability and ease of 252
uGþ1
ij ¼ ; ð7Þ
192 xGij ; otherwise implementation. So the BPNN supported by ADE can obtain satis- 253
factory performance by especially avoiding the local and low con- 254
193 where r(j) is a randomly generated number in the uniform vergence. So, the BPNN optimized by ADE may be a good choice for 255
194 distribution [0, 1], and j denotes the jth gene of an individual. time series forecasting. 256
195 The crossover rate CR is within the range of [0, 1], which has
196 to be determined by the user. The randomly generated num- 2.3.2. ADE–BPNN 257
197 ber rn(i) e [1, 2, . . ., D] is the gene index. This index is applied The initial connection weights and thresholds of BPNN are 258
198 to ensure that at least one dimension of the experimental selected by combining ADE with BPNN. The ADE is used to 259
199 individual is from the mutated individual. Eq. (7) shows that preliminarily search for the global optimal connection weights 260
200 the smaller the CR is, the better the global search effect is. and thresholds of BPNN. The optimal results of this step are then 261
201 (4) Selection: A greedy search strategy is adopted by the DE. assigned to the initial connection weights and thresholds of 262
202 Each objective individual xGi has to compete with its corre- BPNN. Therefore, each individual in the ADE corresponds to the 263
203 sponding experimental individual uGþ1 i , which is generated initial connection weights and thresholds of BPNN as shown in 264
204 after the mutation and crossover operations. When the fit- Fig. 2. 265
205 ness value of the experimental individual uGþ1 i is better than The dimension number D is identical to the sum of the numbers 266
206 that of the objective individual xGi ; uGþ1 i will be chosen as the of weights and thresholds. That is h * n + o * h + h + o, where n, h and 267
207 offspring; otherwise, xGi directly becomes the offspring. Set- o denote the number of input layer nodes, hidden layer nodes and 268
208 ting the minimum problem as an example, the selection output layer nodes, respectively. In the one-step-ahead forecasting 269
209 method is shown in Eq. (8), where f is the fitness function problem, o = 1. For the BPNN, the search space for connection 270
210 such as a cost or forecasting error function. The equation is weights and thresholds is within the range of [1, 1]. The BPNN 271
211 given as follows: uses the Levenberg–Marquardt (LM) method to search for the opti- 272
212 ( mal connection weights and thresholds locally. Therefore, the fore- 273
   
uGþ1 ; if f ui Gþ1
< f xGi casting model is determined. 274
xGþ1
i ¼ i
: ð8Þ
214 xGi ; otherwise A group of weights and thresholds is obtained from each ADE 275
iteration. An output value y^t ðt ¼ 1; 2; . . . k; k is the number of pre- 276
dictions) is generated based on the group of weights and thresh- 277
215 2.2.2. ADE olds. The difference between the output value y ^t and the actual 278
216 The mutation factor F determines the scaling ratio of the differ- value yt is used as the fitness function. In general, the mean square 279
217 ential vector. If F is too big, then the efficiency of the DE will be error (MSE) or the mean absolute percentage error (MAPE), which 280
218 low; that is, the global optimal solution acquired by the DE exhibits are given by Eqs. (10) and (11), respectively, is chosen as the fitness 281
219 low accuracy. By contrast, if F is too small, then the diversity of the function. 282
220 population will not be ensured as the algorithm will mature early. 283
Pk 2
221 Consequently, we propose the adaptive mutation factor shown in ^
t¼1 ðyt  yt Þ
MSE ¼ ; ð10Þ 285
222 Eq. (9). F changes as the algorithm iterates. It is large during the ini- k
223 tial stage, which can guarantee the diversity of the population. 286
Pk
224 During the later stage of the algorithm, the smaller mutation factor ^  yt j=yt
t¼1 jyt
MAPE ¼ ; ð11Þ 288
225 can retain the excellent individuals. k
226
GenM The flowchart of the proposed ADE–BPNN is shown in Fig. 3, and the 289
228 F ¼ F min þ ðF max  F min Þ e 1GenMGþ1
; ð9Þ
procedures are as follows. 290
229 where Fmin denotes the minimum value of the mutation factor, Fmax 291
230 denotes the maximum value, GenM is the maximum iteration num- Step 1: Initialization. The parameters, namely, population size, 292
231 ber, and G is the present iteration number. maximum iteration number, minimum and maximum 293
mutation factors, crossover factor, and gene range, are 294
232 2.3. ADE for selecting the initial weights and thresholds of BPNN set. The initial population is generated by using Eq. (5). 295
Step 2: The iteration is assessed to determine whether it is com- 296
233 2.3.1. Rationales of using ADE pleted. If the present smallest fitness value reaches the 297
234 Unlike other evolutionary algorithms such as GA and PSO, DE accuracy requirement l or G is identical with the maxi- 298
235 generates offspring by perturbing the solutions with a scaled dif- mum iteration number, then ADE iteration is stopped. 299
236 ference of two randomly selected population vectors, instead of The optimum individual is acquired; otherwise, the proce- 300
237 recombining the solutions under conditions imposed by a probabi- dure proceeds to the next step. 301
238 listic scheme (Neri & Tirronen, 2010). In addition, DE employs a Step 3: The offspring individual xGþ1
i is generated according to the 302
239 one-to-one spawning logic which allows the replacement of an adaptive mutation, crossover, and selection methods. 303
240 individual only if the offspring outperforms its corresponding par- Step 4: Step 3 is repeated and the offspring population is 304
241 ent. On one hand, DE became very popular almost immediately generated. 305
242 after its original definition because of its simplicity, reliability, Step 5: The fitness values of the offspring population are evalu- 306
243 and high performance. However, GA or PSO is only capable of ated. The smallest fitness value is the present optimal 307
244 identifying the high performance region at an affordable time value and the corresponding individual is the present glo- 308
245 and displays inherent difficulties in performing local search for bal best individual. 309
246 the complex optimization problems (Wang, He, & Zeng, 2012). Step 6: Set G = G + 1. Return to Step 2. 310

Please cite this article in press as: Wang, L., et al. Back propagation neural network with adaptive differential evolution algorithm for time series
forecasting. Expert Systems with Applications (2014), http://dx.doi.org/10.1016/j.eswa.2014.08.018
ESWA 9499 No. of Pages 9, Model 5G
3 September 2014

4 L. Wang et al. / Expert Systems with Applications xxx (2014) xxx–xxx

hidden-input output-hidden hidden layer output layer


layer weights layer weights threshold threshold

w11 w12 ··· whn w11 ··· w1h β1 ··· βh ∂1

D = h * n +1* h + h +1
Fig. 2. Structure of an individual.

Start

Randomly generate initial


population
G=0

The present smallest


fitness value µ,
NO or G GenM YES

Adaptive mutation Optimum


operation (ADE finished)

Assign the optimum


Crossover operation individual from ADE as
the initial weights and
thresholds of BPNN
(Calculate the fitness)
Selection operation Train the BPNN and get
the best network
Evaluate the fitness of
offspring population. Get
the smallest fitness value Forecast the test sample
and the best individual.

G=G+1 End

Fig. 3. The flowchart of ADE–BPNN algorithm.

311 Step 7: The optimum individual from ADE is assigned as the initial on nonlinear modeling (Ju & Hong, 2013; Khashei & Bijari, 2012; 327
312 connection weights and thresholds of BPNN. The network Zhang, 2003; Zhang et al., 2012). BPNN has the advantages of flex- 328
313 is trained with the training sample, and thus, the best-fit- ible nonlinear modeling capability, strong adaptability, as well as 329
314 ting network is created. their learning and massive parallel computing abilities. So the 330
315 Step 8: The network is applied to forecast the test sample. two cases are suitable for verifying the feasibility and effectiveness 331
316 of BPNN and ADE–BPNN. One-step-ahead forecasting is considered 332
317 3. Numerical examples and results analysis in both cases. 333

318 Two real-life cases are considered to verify the feasibility and 3.1. Comparative example 1: electric load data forecasts in Northeast 334
319 effectiveness of the proposed ADE–BPNN model. The first case is China 335
320 the historical monthly electric load data in Northeast China from
321 January 2004 to April 2009 (Wang, Zhu, Zhang, & Sun, 2009). The 3.1.1. Data preprocessing 336
322 second case is a well-known data set, namely, the Canadian lynx The electric load data consist of 64 monthly data. For a fair com- 337
323 data set, which consists of 114 annual observations from 1821 to parison with the study of Zhang et al. (2012), the current research 338
324 1934 (Campbell & Walker, 1977). The two data sets have nonlinear used only 53 load data, as shown in Fig. 4. 339
325 features such as fluctuation and cyclic tendency, and thus, have The data set is divided into two subsets, namely, the training set 340
326 been extensively analyzed in several time series studies that focus (the former 87%, from December 2004 to September 2008) and the 341

Please cite this article in press as: Wang, L., et al. Back propagation neural network with adaptive differential evolution algorithm for time series
forecasting. Expert Systems with Applications (2014), http://dx.doi.org/10.1016/j.eswa.2014.08.018
ESWA 9499 No. of Pages 9, Model 5G
3 September 2014

L. Wang et al. / Expert Systems with Applications xxx (2014) xxx–xxx 5

250

Monthly electricity demand


(hundred million kWh)
200

150

100

50

Jun.05

Jun.06

Jun.07

Jun.08
Aug.05
Oct.05

Aug.06
Oct.06

Aug.07
Oct.07

Aug.08
Oct.08
Dec.04
Feb.05
Apr.05

Dec.05
Feb.06
Apr.06

Dec.06
Feb.07
Apr.07

Dec.07
Feb.08
Apr.08

Dec.08
Feb.09
Apr.09
Month

Fig. 4. The monthly electricity demand in Northeastern China (from Dec. 2004 to Apr. 2009).

342 test set (the latter 13%, from October 2008 to April 2009). The data range of connection weights and thresholds is specified within 389
343 are preprocessed by logarithms (to the base 10) for a good fit. To be [1, 1]. 390
344 commensurate to the limits of the activation function logsig, the Based on literature (Onwubolu & Davendra, 2006; Qu et al., 391
345 data sets are linearly scaled to within the range [0, 1] by adopting 2013) and several trials, the ADE parameters are specified as fol- 392
346 linear transformation through Eq. (12): lows: the population size N is set to 50; the maximum iteration 393
347
x  xmin number GenM is set to 50; the accuracy requirement l is set to 394

349
x0 ¼ ; ð12Þ 0.19; and the crossover rate CR is 0.1; Fmin and Fmax of ADE are also 395
xmax  xmin
set according to the recommendation of (Neri & Tirronen, 2010; Qu 396
350 where xmin and xmax are the minimum and maximum values of the et al., 2013; Wang, Qu, Liu, & Chen, 2014), i.e., Fmin = 0.2, and 397
351 series, respectively. The main advantage of scaling is to avoid attri- Fmax = 0.9. MAPE is set as the fitness function based on the study 398
352 butes in wider numeric ranges that dominate those in smaller of Zhang et al. (2012). To demonstrate the effectiveness of ADE, 399
353 numeric ranges. the standard DE and GA are selected to optimize BPNN as well. 400
The parameters of standard DE are the same with ADE except that 401

354 3.1.2. Performance assessment the scaling factor is set to 0.2. For a fair comparison, the population 402
355 Several methods can be employed to measure the accuracy of a size, stopping criterion and fitness function of GA are set as the 403

356 time series forecasting model. For such prediction, the forecasting same with ADE. According to the experiences of (Chen et al., 404
357 accuracy is examined by calculating three frequently used evalua- 2014; Wang, 2013) and results of ten times’ tuning, we find that 405

358 tion metrics: the root mean square error (RMSE), the mean abso- the most satisfactory GA–BPNN forecasting model is obtained 406
359 lute percentage error (MAPE), and the mean absolute error when the crossover rate is equal to 0.8 and the mutation rate is 407

360 (MAE). The expression for MAPE is given in Eq. (11). The RMSE equal to 0.05. 408
361 and MAE are expressed as follows:
362
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 3.1.4. Results and discussion 409
Pk 2
^
t¼1 ðyt  yt Þ The iteration processes of ADE–BPNN, DE–BPNN and GA–BPNN, 410
RMSE ¼ ; ð13Þ
364 k all of which including two parts, are shown in Fig. 5. Fig. 5(a) shows 411

365 that the ADE iteration is stopped when the iteration number 412
Pk reaches GenM (50). Then the procedure goes to the BPNN part. 413
^  yt j
t¼1 jyt
367 MAE ¼ ; ð14Þ After 4 epochs, the training error (MSE) of training sample 414
k
decreases to 0.0037, which is smaller than the goal error of training 415
368 where y ^t and yt are the prediction value and actual value, respec- (0.005). As a result, the training process of BPNN ends and the best 416
369 tively; and k is the number of predictions. forecasting model is obtained. Similarly, the best forecasting mod- 417
els of DE–BPNN and GA–BPNN can be gained and iteration pro- 418
370 3.1.3. Parameter setting cesses are shown in Fig. 5(b) and (c), respectively. 419
371 The proposed ADE–BPNN is programmed by using the software The test sample is forecasted with the best forecasting mode. 420
372 MATLAB 2012. To demonstrate the function of ADE, basic BPNN is The actual values and the out-of-sample forecasting loads obtained 421
373 used to forecast the aforementioned load series. According to a ser- by different forecasting models, including the BPNN, DE–BPNN, 422
374 ies of experiment performed for the BPNN structure, satisfactory GA–BPNN, ADE–BPNN, ARIMA (1, 1, 1), TF-e-SVR-SA(with 423
375 BPNN performance is achieved when 12 load data C = 100; e = 0.2), and SSVRCGASA (with C = 5045.1; d = 51.208; 424
376 yt-12, yt-11, . . ., yt-2, yt-1 are entered into the BPNN model with the e = 21.623) models, are reported in Table 1. The forecasting values 425
377 default parameters to forecast the current load yt and the number of the ARIMA (1, 1, 1), TF-e-SVR-SA, and SSVRCGASA models were 426
378 of hidden layer nodes is 4. Therefore, the best-fitting network is obtained from previous studies (Wang et al., 2009; Zhang et al., 427
379 composed of 12 inputs as well as 4 hidden and 1 output neurons 2012). For the TF-e-SVR-SA model, the trend fixed and seasonal 428
380 (N12-4-1). adjustment mechanism were combined with support vector 429
381 According to Zhang et al. (1998), the most appropriate BPNN regression (SVR) to improve the forecasting accuracy. For the 430
382 parameters are determined after several trials are performed using SSVRCGASA model, the chaotic genetic algorithm-simulated 431
383 the same network structure. The maximum training number is annealing algorithm (CGASA) was firstly used to determine SVR 432
384 2000; the goal error of training, of which MSE is selected in the model’s three parameters (penalty parameter C, RBF kernel param- 433
385 aforementioned learning function LM, is set to 0.005; the learning eter d, and width of loss function e); then the seasonal mechanism 434
386 rate is 0.01; the activation function of the hidden and output layers was used to adjust the cyclic load tendency. More details on the 435
387 are logsig and purelin functions, respectively. The training function application of these methods can be found in the aforementioned 436
388 is trainlm, which is embedded in the learning function LM. The studies. 437

Please cite this article in press as: Wang, L., et al. Back propagation neural network with adaptive differential evolution algorithm for time series
forecasting. Expert Systems with Applications (2014), http://dx.doi.org/10.1016/j.eswa.2014.08.018
ESWA 9499 No. of Pages 9, Model 5G
3 September 2014

6 L. Wang et al. / Expert Systems with Applications xxx (2014) xxx–xxx

The training process of BPNN


The iteration process of ADE -1
0.32 10
Train
Best
Goal
0.3

Training error (MSE)


Fitness function (MAPE)

0.28

-2
10
0.26

0.24

0.22

-3
10
0.2
0 5 10 15 20 25 30 35 40 45 50 0 0.5 1 1.5 2 2.5 3 3.5 4

Generation Generation

(a) ADE-BPNN
The training process of DE
The iteration process of DE -1
0.32 10
Train
Best
Goal
0.3
Fitness function (MAPE)

Training error (MSE)

0.28

-2
10
0.26

0.24

0.22

-3
10
0.2
0 5 10 15 20 25 30 35 40 45 50 0 0.5 1 1.5 2 2.5 3

Generation Generation

(b) DE-BPNN

The iteration process of GA The training process of BPNN


-1
0.44 10
Train
0.42 Best
Goal

0.4
Fitness function (MAPE)

Training error(MSE)

0.38

0.36
-2
10
0.34

0.32

0.3

0.28
-3
10
0.26
0 5 10 15 20 25 30 35 40 45 50 0 1 2 3 4 5 6
Generation Generation

(c) GA-BPNN

Fig. 5. The iteration processes of ADE–BPNN, DE–BPNN, and GA–BPNN.

438 Table 1 shows that the proposed ADE–BPNN model provides The Wilcoxon signed-rank test (Diebold & Mariano, 2002) is 446
439 significantly better forecasts than its basis models and some other implemented at the 0.025 and 0.05 significance levels in one-tailed 447
440 forecasting models. For example, the ADE–BPNN model exhibits a tests to further validate the performance of the proposed ADE– 448
441 71.46% and 48.37% decrease in MAPE compared with the basis BPNN. The Wilcoxon signed-rank test is one of the most commonly 449
442 models ARIMA and BPNN, respectively. Obviously, ADE adopted tests for evaluating the predictive capabilities of two dif- 450
443 optimization contributes significantly to the result. Besides, the ferent models and for determining whether statistically significant 451
444 performance of ADE–BPNN is better than that of other hybrid mod- differences exist among the results (Ju & Hong, 2013; Lu, Lee, & 452
445 els including TF-e-SVR-SA, SSVRCGASA, GA–BPNN, and DE–BPNN. Chiu, 2009; Pai & Hong, 2005; Zhang et al., 2012). The performance 453

Please cite this article in press as: Wang, L., et al. Back propagation neural network with adaptive differential evolution algorithm for time series
forecasting. Expert Systems with Applications (2014), http://dx.doi.org/10.1016/j.eswa.2014.08.018
ESWA 9499 No. of Pages 9, Model 5G
3 September 2014

L. Wang et al. / Expert Systems with Applications xxx (2014) xxx–xxx 7

Table 1
Forecasting results of the ARIMA, TF-e-SVR-SA, SSVRCGASA, BPNN, and ADE–BPNN models (unit: hundred million kWh).

Time point (month) Actual ARIMA (1, 1, 1) TF-e-SVR-SA SSVRCGASA BPNN GA–BPNN DE–BPNN ADE–BPNN
Oct. 2008 181.07 192.9316 184.5035 175.6385 190.7868 184.1345 185.2647 182.0502
Nov. 2008 180.56 191.127 190.3608 185.2100 190.5290 185.7915 183.7597 185.9895
Dec. 2008 189.03 189.9155 202.9795 189.9070 190.9704 187.3362 187.4669 188.5578
Jan. 2009 182.07 191.9947 195.7532 181.9693 191.0253 189.1063 185.7001 188.6273
Feb. 2009 167.35 189.9398 167.5795 163.2805 174.6066 181.4011 182.4213 173.3375
Mar. 2009 189.30 183.9876 185.9358 182.1747 186.0562 188.4219 186.1745 188.1399
Apr. 2009 175.84 189.3480 180.1648 177.6289 176.7288 182.8174 182.8586 176.6891
RMSE 12.3787 8.6167 4.1822 6.9870 6.9285 6.8622 3.9925
MAPE (%) 6.044 3.799 1.901 3.341 3.168 3.080 1.725
MAE 10.6641 6.9694 3.4347 5.9958 5.5618 5.4004 3.0623

Table 2
Wilcoxon signed-rank test.

Compared models Wilcoxon signed-rank test


a = 0.025 W = 2 a = 0.05 W = 3
ADE–BPNN vs. ARIMA (1, 1, 1) 0a 0a
ADE–BPNN vs. TF-e-SVR-SA 1a 1a
ADE–BPNN vs. SSVRCGASA 3 3a
ADE–BPNN vs. BPNN 0a 0a
ADE–BPNN vs. GA–BPNN 2a 2a
ADE–BPNN vs. DE–BPNN 2a 2a
a
Denotes that the ADE–BPNN model significantly outperforms the other alter-
native models.

Fig. 7. Results of ADE–BPNN with DE–BPNN and GA–BPNN (logarithmic test


454 metric, MAPE, is selected to conduct the non-parametric test for sample data set).
455 comparing the forecasting performance of two models because
456 the population distribution of MAPE is unknown. The comparison
457 results among the models are shown in Table 2. The proposed Table 3
458 ADE–BPNN achieves statistical significance compared with other Performance comparison of the ADE–BPNN with other forecasting models.

459 alternative models, particularly ARIMA, TF-e-SVR-SA, BPNN, Model MSE MAE
460 GA–BPNN, and DE–BPNN at the 0.025 and 0.05 level. Moreover, ARIMA 0.020486 0.112255
461 its performance is comparable to that of the SSVRCGASA at the FFNN 0.020466 0.112109
462 0.05 level. Zhang’s hybrid ARIMA/ANNs model 0.017233 0.103972
Khashei’s ANN/PNN model 0.014872 0.079628
Khashei’s ARIMA/PNN model 0.011461 0.084381
463 3.2. Comparative example 2: Canadian lynx series forecasts GA–BPNN 0.013599 0.081477
DE–BPNN 0.012899 0.080542
464 The lynx series indicates the number of lynx trapped per year in ADE–BPNN 0.010392 0.070623
465 the Mackenzie River district in northern Canada. Lynx is a kind of
466 animal. The data sets are plotted in Fig. 6. Based on other studies
467 (Khashei & Bijari, 2012; Zhang, 2003), the logarithms (to the base Similar to the case in Section 3.1, the BPNN is also optimized by 476
468 10) of the data are used in the study. The former 100 data ADE, DE, and GA. The parameters for ADE are set as follows: N = 55, 477
469 (87.7%, 1821–1920) are designated as training data and are applied GenM = 50, l = 0.005, Fmin = 0.2, Fmax = 0.9, and CR = 0.1. The fitness 478
470 for ADE optimization and BPNN training. The latter 14 data (12.3%, function is MSE. The parameters of standard DE are the same with 479
471 1921–1934) are assigned as test data and are used to verify the ADE except that the scaling factor F is 0.2. Similarly, the parameters 480
472 effectiveness of the hybrid model. Similar to the studies of Zhang of GA are also the same with ADE except the crossover rate and 481
473 (2003) and Khashei and Bijari (2012), MSE and MAE are selected mutation rate. To obtain the most satisfactory forecasting perfor- 482
474 as accuracy metrics to assess the forecasting performance of the mance, the crossover rate and mutation rate of GA are also set to 483
475 model. 0.8 and 0.05 respectively through ten times experiments. 484

8000
Number of lynx

6000
4000
2000
0
1821
1825
1829
1833
1837
1841
1845
1849
1853
1857
1861
1865
1869
1873
1877
1881
1885
1889
1893
1897
1901
1905
1909
1913
1917
1921
1925
1929
1933

Year

Fig. 6. Canadian lynx series (1821–1934).

Please cite this article in press as: Wang, L., et al. Back propagation neural network with adaptive differential evolution algorithm for time series
forecasting. Expert Systems with Applications (2014), http://dx.doi.org/10.1016/j.eswa.2014.08.018
ESWA 9499 No. of Pages 9, Model 5G
3 September 2014

8 L. Wang et al. / Expert Systems with Applications xxx (2014) xxx–xxx


535
Table 4 Other intelligent algorithms, such as quantum evolution algo- 536
Improvement of the hybrid ADE–BPNN in comparison with other forecasting models.
rithm and genetic-simulated annealing algorithm (Li, Guo, Wang, 537
Model MSE (%) MAE (%) & Fu, 2013), also show good performances to solve complex opti- 538
ARIMA 49.27 37.09 mization problems. In the future, other advanced optimization 539
FFNNs 49.22 37.01 algorithms can be used to select the best appropriate structure 540
Zhang’s hybrid ARIMA/ANNs model 39.70 32.07 and parameters for the BPNN to handle complex forecasting prob- 541
Khashei’s ANN/PNN model 30.12 11.31
Khashei’s ARIMA/PNN model 9.33 16.30
lems for enterprises in the network economic era. 542

GA–BPNN 23.58 13.32


DE–BPNN 19.44 12.32
Acknowledgments 543

The authors are very grateful for the constructive comments of 544
485 According to previous studies (Khashei & Bijari, 2012; Zhang, Professor Binshan Liu and referees. This research is partially 545
486 2003), the best-fitting network for the BPNN structure consists of supported by National Natural Science Foundation of China Q4 Q3 546
487 seven inputs as well as five hidden and one output neurons (71371080; 71373093), Humanities and Social Sciences 547
488 (N7-5-1). The best appropriate parameters for BPNN are obtained \Foundation of Chinese Ministry of Education (No. 11YJC630275), 548
489 after several trials by using the same network structure. That is, and Fundamental Research Funds for the Central Universities 549
490 the maximum training number is 2000; the goal error of training (HUST: 2014QN201). 550
491 is 0.03; the learning rate is 0.001; and the activation functions of
492 the hidden and output layers are tansig and purelin functions,
493 respectively. The training function is trainlm. References 551
494 The forecasting results obtained from our proposed method and
Adebiyi, A. A., Adewumi, A. O., & Ayo, C. K. (2014). Comparison of ARIMA and 552
495 the actual values of the test logarithmic Canadian lynx data are artificial neural networks models for stock price prediction. Journal of Applied 553
496 plotted in Fig. 7. A comparison of the forecasting performance on Mathematics, 2(1), 1–7. 554
497 the test logarithmic Canadian lynx data of the proposed model Aslanargun, A., Mammadov, M., Yazici, B., & Yolacan, S. (2007). Comparison of 555
ARIMA, neural networks and hybrid models in time series: Tourist arrival 556
498 with other models, such as ARIMA, ANN, ARIMA–ANN (Khashei & forecasting. Journal of Statistical Computation and Simulation, 77(1), 29–53. 557
499 Bijari, 2012; Zhang, 2003), DE–BPNN, and GA–BPNN, is shown in Bennett, C., Stewart, R. A., & Beal, C. D. (2013). ANN-based residential water end-use 558
500 Table 3. The percentage of improvements of the hybrid ADE–BPNN demand forecasting model. Expert Systems with Applications, 40(4), 1014–1023. 559
Box, G. E. P., & Jenkins, G. M. (1976). Time series analysis: Forecasting and control. 560
501 compared with the other forecasting models are summarized in 561
Francisco Holden-Day.
502 Table 4. Campbell, M. J., & Walker, A. M. (1977). A survey of statistical work on the 562
503 The proposed ADE–BPNN for lynx data forecasting outperforms Mackenzie River series of annual Canadian lynx trappings for the years 1821– 563
1934 and a new analysis. Journal of the Royal Statistical Society (Series A), 140(4), 564
504 the basic methods (ANN and ARIMA) and its performance is com- 565
411–431.
505 parable with those of other hybrid models in terms of MAE and Chen, G. Y., Fu, K. Y., Liang, Z. W., Sema, T., Li, C., Tontiwachwuthikul, P., et al. (2014). 566
506 MSE, as shown in Tables 3 and 4. The genetic algorithm based back propagation neural network for MMP 567
prediction in CO2-EOR process. Fuel, 126, 202–212. 568
Chu, F. L. (2008). A fractionally integrated autoregressive moving average approach 569
to forecasting tourism demand. Tourism Management, 29(1), 79–88. 570
507 4. Conclusions and future research Cui, L. G., Wang, L., & Deng, J. (2014). RFID technology investment evaluation model 571
for the stochastic joint replenishment and delivery problem. Expert Systems with 572
Applications, 41(4), 1792–1805. 573
508 A hybrid forecasting model, called ADE–BPNN, which uses ADE Diebold, F. X., & Mariano, R. S. (2002). Comparing predictive accuracy. Journal of 574
509 to determine the initial weights and thresholds in the BPNN model, Business & Economic Statistics, 20(1), 134–144. 575
510 is proposed to improve the accuracy of BPNN in time series fore- Ediger, V. S., & Akar, S. (2007). ARIMA forecasting of primary energy demand by fuel 576
in Turkey. Energy Policy, 35(3), 1701–1708. 577
511 casting. ADE is adopted to explore the search space and detect Geem, Z. W., & Roper, W. E. (2009). Energy demand estimation of South Korea using 578
512 potential regions. Two real-life cases are used to compare the fore- artificial neural network. Energy Policy, 37(10), 4049–4054. 579
513 casting performance of ADE–BPNN with those of other popular Hosseini, H. G., Luo, D., & Reynolds, K. J. (2006). The comparison of different feed 580
forward neural network architectures for ECG signal diagnosis. Medical 581
514 models and to verify the feasibility and effectiveness of ADE opti- 582
Engineering and Physics, 28(4), 372–378.
515 mization. The following conclusions are drawn. Irani, R., & Nasimi, R. (2011). Evolving neural network using real coded genetic 583
algorithm for permeability estimation of the reservoir. Expert Systems with 584
Applications, 38(8), 9862–9866. 585
516 (1) In comparative example 1, the historical monthly electric 586
Ju, F. Y., & Hong, W. C. (2013). Application of seasonal SVR with chaotic gravitational
517 load data in Northeast China, which include 64 data, are search algorithm in electricity forecasting. Applied Mathematical Modeling, 587
518 employed. The data exhibit a strong growth trend and an 37(23), 9643–9651. 588
Khashei, M., & Bijari, M. (2012). A new class of hybrid models for time series 589
519 obvious monthly cyclic tendency. The computational results
forecasting. Expert Systems with Applications, 39(4), 4344–4357. 590
520 show that the proposed ADE can effectively improve the Khashei, M., Bijari, M., & Ardali, G. A. R. (2009). Improvement of auto-regressive 591
521 forecasting accuracy of BPNN compared with the basic BPNN integrated moving average models using fuzzy logic and artificial neural 592
522 model. Meanwhile, the proposed ADE–BPNN outperforms networks (ANNs). Neurocomputing, 72(4), 956–967. 593
Khashei, M., Rafiei, F. M., & Bijari, M. (2013). Hybrid fuzzy auto-regressive 594
523 three other models, namely, ARIMA (1, 1, 1), TF-e-SVR-SA, integrated moving average (FARIMAH) model for forecasting the foreign 595
524 and SSVRCGASA, in terms of MAPE and MAE. One reason exchange markets. International Journal of Computational Intelligence Systems, 596
525 for the superior performance of ADE–BPNN is that the intel- 6(5), 954–968. 597
Kumar, K., & Jain, V. K. (1999). Autoregressive integrated moving averages (ARIMA) 598
526 ligence forecasting models of BPNN exhibit good nonlinear modeling of a traffic noise time series. Applied Acoustics, 58(3), 283–294. 599
527 fitting capacity. Another reason is that the ADE can deter- Lee, Y. S., & Tong, L. I. (2012). Forecasting nonlinear time series of energy 600
528 mine the appropriate initial parameters of the BPNN model, consumption using a hybrid dynamic model. Applied Energy, 94, 251–256. 601
Li, Y. H., Guo, H., Wang, L., & Fu, J. (2013). A hybrid genetic-simulated annealing 602
529 which can effectively improve forecasting accuracy. 603
algorithm for the location-inventory-routing problem considering returns
530 (2) In comparative example 2, the Canadian lynx data, which under E-supply chain environment. The Scientific World Journal. http:// 604
531 consist of 114 annual observations with yearly cyclic ten- dx.doi.org/10.1155/2013/125893. Article ID 125893, 10 pages. 605
Liu, S., & Wang, L. (2014a). Understanding the impact of risks on performance in 606
532 dency, are used. The proposed ADE–BPNN model is superior
internal and outsourced information technology projects: The role of strategic 607
533 to existing basic models (ANN and ARIMA) and some hybrid importance. International Journal of Project Management. http://dx.doi.org/ 608
534 algorithms in literature in terms of MSE and MAE. 10.1016/j.ijproman.2014.01.012. 609

Please cite this article in press as: Wang, L., et al. Back propagation neural network with adaptive differential evolution algorithm for time series
forecasting. Expert Systems with Applications (2014), http://dx.doi.org/10.1016/j.eswa.2014.08.018
ESWA 9499 No. of Pages 9, Model 5G
3 September 2014

L. Wang et al. / Expert Systems with Applications xxx (2014) xxx–xxx 9

610 Liu, S., & Wang, L. (2014b). User liaisons’ perspective on behavior and outcome Wang, L., Qu, H., Liu, S., & Chen, C. (2014). Optimizing the joint replenishment and 650
611 control in IT projects: Role of IT experience, behavior observability, and channel coordination problem under supply chain environment using a simple 651
612 outcome measurability. Management Decision, 52(6), 1148–1173. and effective differential evolution algorithm. Discrete Dynamics in Nature and 652
613 Lu, C. J., Lee, T. S., & Chiu, C. C. (2009). Financial time series forecasting using Society, 2014. http://dx.doi.org/10.1155/2014/709856. Article ID 709856, 12 Q6 653
614 independent component analysis and support vector regression. Decision pages. 654
615 Support Systems, 47(2), 115–125. Wang, L., Zeng, Y. R., Zhang, J. L., Huang, W., & Bao, Y. K. (2006). The criticality of 655
616 Matias, J. M., & Reboredo, J. C. (2012). Forecasting performance of nonlinear models spare parts evaluating model using an artificial neural network approach. 656
617 for intraday stock returns. Journal of Forecasting, 31(2), 172–188. Lecture Notes in Computer Science, 3991, 728–735. 657
618 Neri, F., & Tirronen, V. (2010). Recent advances in differential evolution: A survey Wang, J., Zhu, W., Zhang, W., & Sun, D. H. (2009). A trend fixed on firstly and 658
619 and experimental analysis. Artificial Intelligence Review, 33(1–2), 61–106. seasonal adjustment model combined with the e-SVR for short-term forecasting 659
620 Onwubolu, G., & Davendra, D. (2006). Scheduling flow shops using differential of electricity demand. Energy Policy, 37(11), 4901–4909. 660
621 evolution algorithm. European Journal of Operational Research, 171(2), 674–692. Yam, J. Y. F., & Chow, T. W. S. (2000). A weight initialization method for improving 661
622 Pai, P. F., & Hong, W. C. (2005). Support vector machines with simulated annealing training speed in feedforward neural network. Neurocomputing, 30(1), 219–232. 662
623 algorithms in electricity load forecasting. Energy Conversion and Management, Yu, H. K., Kim, N. Y., & Kim, S. S. (2013). Forecasting the number of human 663
624 46(17), 2669–2688. immunodeficiency virus infections in the Korean population using the 664
625 Qu, H., Wang, L., & Zeng, Y. R. (2013). Modeling and optimization for the joint autoregressive integrated moving average model. Osong Public Health and 665
626 replenishment and delivery problem with heterogeneous items. Knowledge- Research Perspectives, 4(6), 358–362. 666
627 Based Systems, 54, 207–215. Zeng, Y.R., Wang, L., Xu, X.H., & Fu, Q.L. (2014). Optimizing the joint replenishment 667
628 Storn, R., & Price, K. (1997). Differential evolution – A simple and efficient heuristic and delivery scheduling problem under fuzzy environment using inverse 668
629 for global optimization over continuous spaces. Journal of Global Optimization, weight fuzzy nonlinear programming method. Abstract and Applied Analysis, 669
630 11(4), 341–359. 2014, Article ID 904240, 13 pages, http://dx.doi.org/10.1155/2014/904240. 670
631 Ticknor, J. L. (2013). A Bayesian regularized artificial neural network for stock Zhang, G. P. (2003). Time series forecasting using a hybrid ARIMA and neural 671
632 market forecasting. Expert Systems with Applications, 40(14), 5501–5506. network model. Neurocomputing, 50, 159–175. 672
633 Vesterstrom, J., & Thomsen, R. (2004). A comparative study of differential evolution, Zhang, W. Y., Hong, W. C., Dong, Y., Tsai, G., Sung, J. T., & Fan, G. F. (2012). 673
634 particle swarm optimization, and evolutionary algorithms on numerical Application of SVR with chaotic GASA algorithm in cyclic electric load 674
635 benchmark problems. Proceedings of IEEE Congress on Evolutionary forecasting. Energy, 45(1), 850–858. 675
636 Computation, 2, 1980–1987. Zhang, G. P., Patuwo, B. E., & Hu, M. Y. (1998). Forecasting with artificial neural 676
637 Wang, S. T. (2013). Optimized light guide plate optical brightness parameter: networks: The state of the art. International Journal of Forecasting, 14(1), 35–62. 677
638 integrating back-propagation neural network (BPN) and revised genetic Zhang, G. P., Patuwo, B. E., & Hu, M. Y. (2001). A simulation study of artificial neural 678
639 algorithm (GA). Materials and Manufacturing, 29(1), 1–8. networks for nonlinear time-series forecasting. Computers & Operations 679
640 Wang, L., Fu, Q. L., & Zeng, Y. R. (2012). Continuous review inventory models with a Research, 28(4), 381–396. 680
641 mixture of backorders and lost sales under fuzzy demand and different decision Zhang, G. P., & Qi, M. (2005). Neural network forecasting for seasonal and trend time 681
642 situations. Expert Systems with Applications, 39(4), 4181–4189. series. European Journal of Operational Research, 160(2), 501–514. 682
643 Wang, L., He, J., & Zeng, Y. R. (2012). A differential evolution algorithm for joint Zhang, L., & Subbarayan, G. (2002). An evaluation of back-propagation neural 683
644 replenishment problem using direct grouping and its application. Expert networks for the optimal design of structural systems: Part I. Training 684
645 Systems, 29(5), 429–441. procedures. Computer Methods in Applied Mechanics and Engineering, 191(25), 685
646 Wang, L., Qu, H., Chen, T. S., & Yan, F. P. (2013). An effective hybrid self-adapting 2873–2886. 686
647 differential evolution algorithm for the joint replenishment and location- Zhang, J. R., Zhang, J., Lok, T. M., & Lyu, M. R. (2007). A hybrid particle swarm 687
648 Q5 inventory problem in a three-level supply chain. The Scientific World Journal. optimization-back-propagation algorithm for feedforward neural network 688
649 http://dx.doi.org/10.1155/2013/270249. Article ID 270249, 11 pages. training. Applied Mathematics and Computation, 185(2), 1026–1037. 689
690

Please cite this article in press as: Wang, L., et al. Back propagation neural network with adaptive differential evolution algorithm for time series
forecasting. Expert Systems with Applications (2014), http://dx.doi.org/10.1016/j.eswa.2014.08.018

You might also like