Professional Documents
Culture Documents
A NN Model Based On The Multi Stage Optimization Approach
A NN Model Based On The Multi Stage Optimization Approach
with Applications
Expert Systems with Applications 33 (2007) 347356
www.elsevier.com/locate/eswa
a,*
School of Economics and Management, Beihang University, No. 37, Xue Yuan Road, HaiDian District, Beijing 100083, China
b
Beijing Simulation Center, Beijing 100854, China
c
Production Technical Institute of the General Logistics Department, Beijing 100010, China
Abstract
Many studies have demonstrated that back-propagation neural network can be eectively used to uncover the nonlinearity in the
nancial markets. Unfortunately, back-propagation algorithm suers the problems of slow convergence, ineciency, and lack of robustness. This paper introduces a multi-stage optimization approach (MSOA) used in back-propagation algorithm for training neural network to forecast the Chinese food grain price. We divide the training sample of neural network into two parts considering the truth that
the recent observations is more important than the older ones. Firstly, we use the rst training sample to train the neural network and
achieve the network structure. Secondly, we continue to use the second training sample to further optimize the structure of neural network based on the previous step. Empirical results show that MSOA overcomes the weakness of conventional BP algorithm to some
extend. Furthermore the neural network based on MSOA can improve the forecasting performance signicantly in terms of the error
and directional evaluation measurements. The paper also proves accurate price estimation may not be a good predictor of the direction
of change in the price levels in food market. The neural network based on MSOA can be used as an alternative forecasting method for
future Chinese food price forecasting.
2006 Elsevier Ltd. All rights reserved.
Keywords: Articial neural network; Back-propagation; Time series forecasting; Multi-stage optimization approach; Food price forecasting
1. Introduction
The autoregressive integrated moving average
(ARIMA) model has been highly popularized, widely used
and successfully applied not only in economic time series
forecasting, but also as a promising tool for modeling the
empirical dependencies between successive times and failures (Ho & Xie, 1998). It also results in satisfactory performance. However, a linear correlation structure is assumed
among the time series values therefore, the ARIMA model
cannot capture nonlinear patterns. The approximation of
linear models to complex real-world problem is not always
satisfactory.
*
348
2. Methodology
2.1. ARIMA time series model
Introduced by Box and Jenkins (1970), the ARIMA
model has been one of the most popular approaches to
forecasting. In an ARIMA model, the future value of a
variable is supposed to be a linear combination of past values and past errors. Generally, a nonseasonal time series
can be modeled as a combination of past values and errors,
which can be denoted as ARIMA (p, d, q) or expressed as
the following form:
X t h0 /1 X t1 /2 X t2 /p X tp
et h1 et1 h2 et2 hq etq ;
2:1
349
2:2
350
absolute percentage error (MAPE). While the above criterions are good measures of the deviations of the predicted
values from the actual values, they cannot reect a models
ability to predict turning points. Direction-of-change forecasts are often used in nancial and economic decisionmaking. How to evaluate the power of the models in predicting the turning points is another important task. For
traders and analysts market, direction and turning points
are as important as the value forecast itself.
The ability of a model to forecast turning points can be
measured by an evaluation method advanced by Cumby
and Modest (1987), which is a version of Mertons test
(Merton, 1981). Mertons test is as follows: dene a forecast variable Ft and an actual direction At such that:
At 1 if DAt > 0 and
At 0 if DAt 6 0
3:1
F t 1 if DP t > 0
F t 0 if DP t 6 0
3:2
and
where DAt is the amount of change in actual variable between time t 1 and t and DPt is the amount of change
in the forecasting variable for the same period.
The probability matrix for the forecasting direction
changes in the actual value conditional upon the direction
of changes in the forecasting variable Ft is
P 1 Prob F t 0jAt 0
1 P 1 Prob F t 1jAt 0
3:3
3:4
P 2 Prob F t 1jAt 1
3:5
3:6
3:7
H 1 : P 1 P 2 1 > 0:
3:8
3:9
where Ft is the predicted price direction binary variable dened in (3.2). At is the actual price direction binary variable
dened in (3.1), and a1 is the slope of this linear equation, et
is the error term,
a1 p1 p2 1:
3:10
N
1 X
at ;
N 1
3:11
351
Table 1
Comparison of ARIMA models statistical results
ARIMA model
AIC
BIC
SEE
(1, 1, 0)
(1, 1, 1)
(0, 1, 1)
(0, 1, 2)
(2, 1, 2)
(2, 1, 3)
10.26141
10.27664
10.27303
10.27586
10.2943
10.24858
10.28554
10.32492
10.29703
10.32386
10.39139
10.36994
40.74608
40.87885
40.98514
40.86597
40.88171
39.78719
The criterions to judge for the best model are: (1) relatively small of BIC;
(2) relatively small of SEE. Therefore, ARIMA(1, 1, 0) is a relatively best
model.
352
Fig. 3. The MSE in the training process (the rst step) using MSOA
approach (the neural network structure is 3 6 1, for example).
neural networks converge quickly (see Fig. 4). The excellent convergence characteristics of MSOA can be easily
observed. The training times for both algorithms are presented in Fig. 5 (The neural network structure is 3 6
1, and implementing most other neural networks we can
also get the same result.) and Table 2 describes more
details. Under the training parameters set mentioned in
previous section, the training time of every training run
using MSOA is greatly lower than that using conventional
BP algorithm. Using MSOA and conventional BP algorithm, the average training time is 4.25 s and 7.83 s, respectively. The results show the global convergence ability and
convergence speed of most networks are improved greatly
using MSOA.
Fig. 4. The MSE in the training process (the second step) using MSOA
approach (the neural network structure is 3 6 1, for example).
Fig. 5. Training time for every training run (the neural network structure
is 3 6 1, for example).
353
Table 2
The comparison of both algorithms after 10 training runs
Training run
10
MSOA
4.2
0.00056
Y
7.7
0.00211
N
3.2
0.00049
Y
8
0.00206
N
4.3
0.00098
Y
7.8
0.00175
N
4.8
0.001
Y
6.9
0.001
Y
4.3
0.001
Y
8
0.001762
N
4.2
0.00099
Y
7.9
0.00175
N
3.9
0.00042
Y
7.6
0.00170
N
5.4
0.00098
Y
8.4
0.00225
N
4.4
0.00097
Y
8.1
0.00206
N
3.8
0.001
Y
7.1
0.00213
N
BP
Time to train
Performance
Convergence
Time to train
Performance
Convergence
Y: The performance function drops below goal; N: The performance function drops above goal.
354
a0
a1
R2
Adj-R2
ARIMA
(t ratio)
BP
(t ratio)
MSOA
(t ratio)
0.333
(1.551)
0.143
(1.29)
0.143
(1.291)
0.238
(0.813)
0.857
(5.0)**
0.657
(2.76)*
0.0572
0.029
*
**
Table 3
Wheat price forecast results
Models
In-sample
MAE
ARIMA
BP
MSOA
ARIMA
BP
MSOA
MSE
26.721
1807.116
23.637
1234.830
17.475
918.564
Out-sample
14.896
285.709
8.423
167.583
5.362
143.045
MAPE (%)
Dstat (%)
1.427
1.838
1.138
66.000
80.000
87.000
0.955
0.612
0.345
58.3333
91.667
83.333
0.714
0.686
0.432
0.375
Dstat(%)
0.661
25**
7.6*
58.333
91.667
83.333
ARIMA
BP
MSOA
MMRR (%)
0.2549
0.9627
1.1502
355
356
Wang, S. Y., Yu, L., & LAI, K. K. (2005). Crude oil price forecasting with
TEI@I methodology. Journal of Systems Science and Complexity, 18,
145166.
Yao, J. T., & Tan, C. L. (2000). A case study on using neural networks
to perform technical forecasting of forex. Neurocomputing, 34, 79
98.
Yu, L., Wang, S. Y., & Laic, K. K. (2005). A novel nonlinear ensemble
forecasting model incorporating GLAR and ANN for foreign
exchange rates. Computers and Operations Research, 32, 25232541.
Zhang, G., Patuwo, E. P., & Hu, M. Y. (1998). Forecasting with articial
neural networks: The state of the art. International Journal of
Forecasting, 14, 3562.