A NN Model Based On The Multi Stage Optimization Approach

Expert Systems
with Applications
Expert Systems with Applications 33 (2007) 347356
www.elsevier.com/locate/eswa
A neural network model based on the multi-stage

optimization approach for short-term food price forecasting in China
Zou Haofei a, Xia Guoping
a
a,*
, Yang Fangting b, Yang Han
School of Economics and Management, Beihang University, No. 37, Xue Yuan Road, HaiDian District, Beijing 100083, China
b
Beijing Simulation Center, Beijing 100854, China
c
Production Technical Institute of the General Logistics Department, Beijing 100010, China
Abstract
Many studies have demonstrated that back-propagation neural network can be eectively used to uncover the nonlinearity in the
nancial markets. Unfortunately, back-propagation algorithm suers the problems of slow convergence, ineciency, and lack of robustness. This paper introduces a multi-stage optimization approach (MSOA) used in back-propagation algorithm for training neural network to forecast the Chinese food grain price. We divide the training sample of neural network into two parts considering the truth that
the recent observations is more important than the older ones. Firstly, we use the rst training sample to train the neural network and
achieve the network structure. Secondly, we continue to use the second training sample to further optimize the structure of neural network based on the previous step. Empirical results show that MSOA overcomes the weakness of conventional BP algorithm to some
extend. Furthermore the neural network based on MSOA can improve the forecasting performance signicantly in terms of the error
and directional evaluation measurements. The paper also proves accurate price estimation may not be a good predictor of the direction
of change in the price levels in food market. The neural network based on MSOA can be used as an alternative forecasting method for
future Chinese food price forecasting.
2006 Elsevier Ltd. All rights reserved.
Keywords: Articial neural network; Back-propagation; Time series forecasting; Multi-stage optimization approach; Food price forecasting
1. Introduction
The autoregressive integrated moving average
(ARIMA) model has been highly popularized, widely used
and successfully applied not only in economic time series
forecasting, but also as a promising tool for modeling the
empirical dependencies between successive times and failures (Ho & Xie, 1998). It also results in satisfactory performance. However, a linear correlation structure is assumed
among the time series values therefore, the ARIMA model
cannot capture nonlinear patterns. The approximation of
linear models to complex real-world problem is not always
satisfactory.
*
Corresponding author. Tel.: +86 10 82328477; fax: +86 10 82327808.

E-mail addresses: zouhaofei@sohu.com (Z. Haofei), gxia@buaa.
edu.cn (X. Guoping).
0957-4174/$ - see front matter 2006 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2006.05.021
Recently, articial neural networks (ANNs) are being

used more frequently in the analysis of nancial time series
as they move from simple pattern recognition to a diverse
range of application areas (Patterson, 1996). It is known
that ANNs mapping process can cover a greater range of
problem complexity and is superior in its generality and
practical ease in implementation owing to its powerful
and exible capability (Trippi & Turban, 1996). Many
empirical studies including several large-scale forecasting
competitions suggest that using ANNs in time series is an
eective method.
ANNs need to be trained or taught. Training involves
the determination of arc weights such that the output values are as close as possible to the desired values for a set of
input patterns. This is a nonlinear minimization problem.
Indeed, this is a nonlinear network ow problem. Currently the most popularly used training method is the
348
Z. Haofei et al. / Expert Systems with Applications 33 (2007) 347356
back-propagation (BP) algorithm, which is essentially a

gradient steepest descent method. It is well known that
the steepest descent suers the problems of slow convergence, ineciency, and lack of robustness. Furthermore it
can be very sensitive to the choice of the learning rate.
Smaller learning rates tend to slow down the learning process while larger learning rates may cause network oscillation in the weight space. In light of the weakness of the
conventional BP algorithm, a number of variations or
modications of BP have been proposed. Zhang, Patuwo,
and Hu (1998) presented a recent review in this area. In this
paper, a multi-stage optimization approach (MSOA) is
introduced in order to overcome the weakness of conventional BP algorithm. The empirical result shows that
MSOA is an eective way to make the neural network converge quickly, better eciency and robustness.
The primary interest of this paper focuses on out-sample forecasting for the wheat prices of Chinese food market. The forecasting accuracy achieved beyond the
training data is the ultimate and the most important measure of performance. The quantitative evaluation is frequently used as the accuracy measure in most of the
literatures, such as the mean squared error (MSE), the
mean absolute error (MAE), the mean absolute percentage error (MAPE), and the root mean squared error
(RMSE), etc. However, the measure of turning points is
always neglected. In fact, directional predictions play a
fundamental role in the identication and timing of buy
and sell actions in trading and investment decision support systems. Recent years have witnessed considerable
research into the directional accuracy of macroeconomic
forecasts. Examples include: Cumby and Modest (1987),
Mills and Perpper (1999), Oller and Bharat (2000), Joutz
and Stekler (2000), Pons (2000), Greer (2003) and Ash,
Smyth, and Heravi (1998). In order to measure the ability
to forecast movement direction or turning points, Merton
(1981) presented a method, Mertons test, to evaluate this
kind of ability. Cumby and Modest (1987) advanced an
evaluation method, which was a version of Mertons test,
to assess the competence. Yao and Tan (2000) introduced
a statistic method to appraise the power of forecasting
directional change. Kholodilin and Yao (2005) developed
a dynamic bi-factor model with Markov-switching to
measure and predict turning points. Andrew, Macaulay,
nkal (2005) proposed a procedure for
Thomson, and O
examining dierent aspects of performance for judgmental
directional probability predictions of exchange rate movements. Some previously published forecast studies compared the directional and points accuracy of dierent
models. Kohzadi, Boyad, Kermanshahi, and Kaastra
(1996) Tseng, Yub, and Tzeng (2002) proved the neural
network model could capture a signicant number of
turning points using Mertons test. Wang, Yu, and LAI
(2005) Yu, Wang, and Laic (2005) implemented the statistic method to prove the better ability of the neural network to forecast the turning points. However, Ho, Xie,
and Goh (2002) concluded from their simulation study
that the feed-forward neural network did not perform well

and was generally inferior to the ARIMA and the recurrent models. Its predictive performances were poor and
had lower turning points detection capability. Therefore,
there are not enough experimental or theoretical studies
to validate the eciency of nonlinear methods (for example, the ANN model) in improving the performance of
forecasting turning points. On the other hand, we wonder
whether the results of forecasting are the same when
implementing dierent forecasting accuracy measurements
(the quantitative evaluation and the turning points evaluation) to appraise the performance of the linear and nonlinear methods.
Our objective is twofold. In the rst place, we propose
the multi-stage optimization approach and assessing the
forecasting performance of three methods, the ARIMA
model, the conventional back-propagation neural network
(BP) model and MSOA model, for forecasting the food
prices in Chinese food market. Food grain is the most
important sector of food security in China. Food grain
marketing data, especially price data, are vital for any
future agriculture development project because they can
inuence potential supply and demand, as well as distribution channels of food grain and the economics of agriculture. So price forecasting is expected to reduce the
uncertainty and risk in the food grain market and can
be used to determine the quantity of food grain and food
products consumed, which can be used to identify and
make appropriate and sustainable food grain policy for
the government. This study adds more evidence as to
the usefulness of neural networks for price forecasting
by performing a rigorous out-sample forecasting experiment over numerous observations. Specically, we use
three models to forecast the wheat prices in China
Zhengzhou grain wholesale market, respectively, and
compare the results with the ARIMA model as a
benchmark.
Secondly, we aim at investigating the forecasting measurement of turning points of three forecasting methods
mentioned above. Price forecasting is an integral part
of commodity trading and price analysis. When evaluating forecasting models, turning point forecasting power
is as important as quantitative accuracy with small
errors.
The rest of the paper is organized as following. In Section 2, the ARIMA, ANN and multi-stage optimization
approaches are described. Data and the evaluation methods for comparing the three forecasting approaches are
introduced in Section 3. In Section 4, the forecast procedures are discussed, including the selection of ARIMA
model and the design of the neural network architecture.
Compare between conversional BP and MSOA algorithms
are also demonstrated in this section. Section 5 shows the
experimental results obtained from the ARIMA, BP and
MSOA models and overall evaluation and comparison of
the three techniques are discussed. Finally Section 6 provides the concluding remarks.
cult and there is no theory that can be used to guide the

selection. Hence, experiments are often conducted to select
an appropriate m as well as n.
2. Methodology
2.1. ARIMA time series model
Introduced by Box and Jenkins (1970), the ARIMA
model has been one of the most popular approaches to
forecasting. In an ARIMA model, the future value of a
variable is supposed to be a linear combination of past values and past errors. Generally, a nonseasonal time series
can be modeled as a combination of past values and errors,
which can be denoted as ARIMA (p, d, q) or expressed as
the following form:
X t h0 /1 X t1 /2 X t2 /p X tp
et h1 et1 h2 et2 hq etq ;
2:1
where Xt and et are the actual value and random error at

time t, respectively, /i(i = 1, 2, . . ., p) and hj(j = 1, 2, . . ., q)
are model parameters. p and q are integers and often referred to as orders of autoregressive and moving average
polynomials. Random errors, et, are assumed to be independently and identically distributed with a mean of zero
and a constant variance, r2. Similarly, a seasonal model
can be represented as ARIMA (p, d, q)(P, D, Q). Basically,
this method has three phases: model identication, parameters estimation and diagnostic checking.
The ARIMA model is basically a data-oriented
approach that is adapted from the structure of the data
themselves.
2.2. Articial neural network model
The major advantage of neural networks is their exible
nonlinear modeling capability. With ANNs, there is no
need to specify a particular model form. Rather, the model
is adaptively formed based on the features presented from
the data. This data-driven approach is suitable for many
empirical data sets where no theoretical guidance is available to suggest an appropriate data generating process.
The ANN model performs a nonlinear functional mapping
from the past observations (Xt1, Xt2, . . ., Xtm) to the
future value Xt, i.e.,
X t f X t1 ; X t2 ; . . . ; X tm ; w et ;
349
2:2
where w is a vector of all parameters and f is a function

determined by the network structure and connection
weights. Thus, the neural network is equivalent to a nonlinear autoregressive model.
Training a network is an essential factor for the success
of the neural networks. Among the several learning algorithms available, back-propagation has been the most popular and most widely implemented learning algorithm of all
neural networks paradigms. In this paper, the algorithm of
back-propagation is used in the following experiment.
The important task of ANN modeling for a time series is
to choose an appropriate number of hidden nodes, n, as
well as the dimension of input vector (the lagged observations), m. However, in practice, the choice of m and n is dif-
2.3. A multi-stage optimization approach

One of the major issues in neural network forecasting is
how much data are necessary for neural networks to capture the dynamic nature of the underlying process in a time
series. Although larger sample size, in the form of a longer
time series, is usually recommended in model development,
empirical results suggest that longer time series do not
always yield models that provide the best forecasting performance. For example, Walczak (2001) investigated the
issue of data requirements for neural networks in nancial
time series forecasting. He found that using a smaller sample of time series or data close in time to the out-sample can
produce more accurate neural networks.
Determining an appropriate sample size for model
building is not an easy task, especially in time series modeling where a larger sample inevitably means using older
data. Theoretically speaking, if the underlying data-generating process for a time series is stationary, more data
should be helpful to reduce the eect of noise inherent in
the data. However, if the process is not stationary or
changing in structure or parameters overtime, longer time
series do not help and in fact can hurt the models forecasting performance. In this situation, more recent observations should be more important in indicating the possible
structural parameter change in the data while older observations are not useful and even harmful to forecasting
model building (Morantz, Whalen, & Zhang, 2004). However, the older observations are thought to be useful to
build the neural network architecture and cannot be discarded easily. In fact, the older observations are helpful
to conrm the initial weights vectors of the interconnection
of nodes in layers, especially when weights are initialized
arbitrarily.
In conventional BP algorithm, we use one training sample to train the neural network and neglect the truth that
the recent observations should be more important than
the older ones. Therefore, we divide the training sample
into two parts, d1 and d2. Firstly, we use the rst training
sample, d1, to train the neural network and achieve the network structure. Secondly, we continue to use the second
training sample, d2, to further optimize the structure of
neural network based on the previous step. The process is
named multi-stage optimization approach (MSOA).
MSOA is based on the following reasons:
(1) Recent observations should be more important than
older observations in forecasting.
(2) We always arbitrarily choose the initial weights of
neural networks. The randomicity of selection inuences the generalization ability of neural networks
to great extend because BP algorithm easily suers
the problem of local optimum. Dierent initial
350
weights may lead to dierent results. How to initialize

the weights is one of critical factors to improve the
generalization ability of networks.
(3) Older observations are useful to build the architecture. We can use the older observations to train the
network structure and obtain the weight vectors
between the nodes. The rst step of MSOA can be
regarded as the process of weights initialization.
Therefore, the method avoids the blindness and randomicity of selection of the weights vectors of the network before training to some extend.
3. Data description and forecast evaluation criteria

The wheat price (the white wheat, third class) data used
in this study are monthly spot prices of China Zhengzhou
Grain Wholesale Market, covering the period from January 1996 to July 2005 with a total of n = 115 observations,
as illustrated in Fig. 1. Although there is no consensus on
how to split the data for neural network applications, the
general practice is to allocate more data for model building
and selection. Most studies in the literatures use convenient
ratio of splitting for in- and out-samples such as 70%:30%,
80%:20%, or 90%:10%. This paper selects the last one. We
take the monthly data from January 1996 to July 2004 as
in-sample data (including 20 validation data) set with 103
observations for training and validation purposes and the
remainder as out-sample data set with 12 observations
for testing purposes. For space reasons, the original data
are not listed here, and detailed data can be obtained from
the website www.cngrain.com.
In order to evaluate and compare the prediction performance, it is necessary to introduce a forecasting evaluation
criterion. In this study, two kinds of evaluation criteria, the
quantitative evaluation and turning point evaluation, are
introduced. The quantitative evaluation includes three
overall error measures. They are the mean squared error
(MSE), the mean absolute error (MAE), and the mean
absolute percentage error (MAPE). While the above criterions are good measures of the deviations of the predicted
values from the actual values, they cannot reect a models
ability to predict turning points. Direction-of-change forecasts are often used in nancial and economic decisionmaking. How to evaluate the power of the models in predicting the turning points is another important task. For
traders and analysts market, direction and turning points
are as important as the value forecast itself.
The ability of a model to forecast turning points can be
measured by an evaluation method advanced by Cumby
and Modest (1987), which is a version of Mertons test
(Merton, 1981). Mertons test is as follows: dene a forecast variable Ft and an actual direction At such that:
At 1 if DAt > 0 and
At 0 if DAt 6 0
3:1
F t 1 if DP t > 0
F t 0 if DP t 6 0
3:2
and
where DAt is the amount of change in actual variable between time t 1 and t and DPt is the amount of change
in the forecasting variable for the same period.
The probability matrix for the forecasting direction
changes in the actual value conditional upon the direction
of changes in the forecasting variable Ft is
P 1 Prob F t 0jAt 0
1 P 1 Prob F t 1jAt 0
3:3
3:4
P 2 Prob F t 1jAt 1
3:5
1 P 2 Prob F t 0jAt 1:
3:6
In other words, (3.3) and (3.5) is the probability that the

forecasted direction has actually occurred and (3.4) and
(3.6) is the probability of a wrong forecast.
By assuming that the magnitude of changes in Ft and At
are independent, Merton (1981) showed that a necessary
and sucient condition of market timing ability is that:
P 1 t P 2 t > 1
3:7
The null hypothesis to be tested is:

H 0 : P 1 P 2 1 6 0;
H 1 : P 1 P 2 1 > 0:
3:8
Cumby and Modest (1987) showed that the above

hypothesis could be tested through the regression equation:
F t a0 a 1 At e t ;
3:9
where Ft is the predicted price direction binary variable dened in (3.2). At is the actual price direction binary variable
dened in (3.1), and a1 is the slope of this linear equation, et
is the error term,
a1 p1 p2 1:
Fig. 1. Wheat price data; monthly price of China Zhengzhou grain

wholesale market from January 1996 to July 2005.
3:10
Here, a1 should be positive and signicantly dierent from

0 in order to demonstrate that Ft and At have a linear relationship. This reects the ability of a forecasting model to
capture the turning points of a time series.
In addition, in order to view the ability of forecasting
the change in trend of the model directly, especially when
we could not prove the a1 signicantly dierent from zero,
we introduce directional change statistics (Dstat) (Yao &

Tan, 2000). Its computational equation can be expressed
as:
Dstat
N
1 X
at ;
N 1
3:11
where at = 1 if (At+1 At)(Pt+1 Pt) P 0 and at = 0

otherwise.
However, one major strategy employed by many futures
traders is the use of the trend as an aid in making trading
decisions. This behavior is based on the assumption that,
once a trend starts, it will continue. Traders want to follow
trends so they can take positions early in the trend and
maintain that position as long as the trend continues. Traders may, however, change their position when they predict
a change in the trend or market direction. Therefore, the
real aim of forecasting is to obtain prots based on prediction results. Here the average monthly return rate is introduced as another evaluation criterion, the prot criterion
(PC). Without considering others costs, the annual return
rate is calculated according to the compound interest
principle.
T
1 X
V et V ft

MMRR
T t1 V ft
where MMRR is the mean monthly return rate,

y y
V et t1y t 100 is the expectation return rate on the t
t
month of testing, V ft ^y t1y y t 100 is the forecasting return
t
rate per unit wheat for trading on the t month of testing,
and T is the number of months in the testing periods.
The computation of MMRR is based on the trading strategy in the following:
If (^y t1 y t ) > 0, then buy, else sell, where yt is the
actual value at time t, ^y t1 is the prediction at time t + 1.
That is, we use the dierence between the predicted value
and the actual value to guide trading.
4. Forecast procedure
4.1. The selection of ARIMA model
Using Eviews 5.0, we can see that the time series is likely
to have random walk pattern, which random walk up and
down in the line graph (Fig. 1). Also, in correlogram, the
ACFs (autocorrelation function) are suered from linear
decline and there is only one signicant spike for PACFs
(partial correlation function). Therefore, the time series is
rst dierenced in order to remove the trend. We nd the
rst-dierence series becomes stationary and is white noise
as no signicant pattern in the graph of correlogram.
And the unit root test also conrms the rst-dierence
becomes stationary. The strong evidences support that
the ARIMA(0, 1, 0) is suitable for the time series. Then,
we can construct the ARIMA model. The ARIMA models
are identied and the statistical results are compared in the
following Table 1.
351
Table 1
Comparison of ARIMA models statistical results
ARIMA model
AIC
BIC
SEE
(1, 1, 0)
(1, 1, 1)
(0, 1, 1)
(0, 1, 2)
(2, 1, 2)
(2, 1, 3)
10.26141
10.27664
10.27303
10.27586
10.2943
10.24858
10.28554
10.32492
10.29703
10.32386
10.39139
10.36994
40.74608
40.87885
40.98514
40.86597
40.88171
39.78719
The criterions to judge for the best model are: (1) relatively small of BIC;
(2) relatively small of SEE. Therefore, ARIMA(1, 1, 0) is a relatively best
model.
4.2. Fitting neural network models to the data

In this investigation, we only consider the situation of
one-step-ahead forecasting and then one output node of
ANNs can be specied. In this study, we only handle
ANNs with a single hidden layer (the usual situation), so
the choice of architecture is primarily about choosing the
number of neurons in the hidden layers.
4.2.1. Criterions of selection of the models
Before the training process begins, data normalization is
often performed. In the paper, the following formula (linear transformation to [0, 1]) is used (Lapedes & Farber,
1998):
xn x0 xmin =xmax xmin :
In a time series forecasting problem, the number of
input nodes corresponds to the number of lagged observations used to discover the underlying pattern in a time series and to forecast for future values. The most common
way in determining the number of hidden nodes is via
experiments or by trial-and-error. Therefore, the experimental design determines the number of input and hidden
nodes implementing some criterions to select the best
models. The criterions include the root mean squared error
(RMSE), the mean absolute error (MAE), AIC and BIC.
The following general formulas for AIC and BIC are used
in our study:
2p
AIC logASSE
4:1
T
p logT
BIC logASSE
;
4:2
T
P
where ASSE Tt1 y t ^y t 2 =T p is the number of
parameters in the model. T is the number of observations.
The rst part in (4.1) and (4.2) measures the goodness-of-t
of the model to the data while the second part sets a penalty for model overtting. The total number of parameters
(p) in a neural network with m inputs and n hidden nodes is
p = m(n + 2) + 1. Too few or too many input nodes can affect either the learning or prediction capability of the network (Zhang et al., 1998). The same is to the selection of
number of hidden nodes. Therefore, we set upper limit of
lag period 6 and the number of hidden nodes will vary from
352
1 to 6. Then there are totally 36 dierent models in the

ANN model building process. The 36 kinds of neural networks are indicated simply by the number from 1 to 36 in
the following Section 4.2.3.
4.2.2. Experiment conditions
Using MSOA, the training dataset is randomly split into
the rst training set (d1) comprising 70% of examples, and
the second training set (d2) comprising 30% of examples.
The convergence criteria used for training are a mean
squared error less than or equal to 0.001 or a maximum
of 1000 and 500 iterations for BP and MSOA, respectively.
The transfer function of the hidden layer and the output
layer is tan-Sigmoid and linear function, respectively. Bias
terms are employed in both hidden hand output layers. The
performance function is mean squared error (MSE). The
ANN model is built with MatLab 6.5. We used a PC with
a Pentium IV at 1.5 GMHz. All neural network models are
trained with back-propagation algorithm over the training
set and then tested over the test set. The experiment is
repeated 10 times and afterwards the average of MAE,
MSE, MAPE, AIC and BIC are computed.
In term of the criterions for selecting the best model
mentioned above, the neural network structure of
2 4 1 is selected to model the wheat price series.
4.2.3. Comparison between MSOA and conventional BP
algorithm
4.2.3.1. Training time and convergence. Form experiment
results, using conventional BP algorithm we nd that most
of neural networks suer the problem of slow convergence
and almost all of them cannot reach the target of training
(0.001) after 1000 epochs, see Fig. 2. On the other hand,
using MSOA we nd that most of neural networks also suffer the problem of slow convergence in the rst training
step (see Fig. 3). However, in the second training step most
Fig. 2. The MSE in the training process using conventional BP algorithm

(the neural network structure is 3 6 1, for example).
Fig. 3. The MSE in the training process (the rst step) using MSOA
approach (the neural network structure is 3 6 1, for example).
neural networks converge quickly (see Fig. 4). The excellent convergence characteristics of MSOA can be easily
observed. The training times for both algorithms are presented in Fig. 5 (The neural network structure is 3 6
1, and implementing most other neural networks we can
also get the same result.) and Table 2 describes more
details. Under the training parameters set mentioned in
previous section, the training time of every training run
using MSOA is greatly lower than that using conventional
BP algorithm. Using MSOA and conventional BP algorithm, the average training time is 4.25 s and 7.83 s, respectively. The results show the global convergence ability and
convergence speed of most networks are improved greatly
using MSOA.
Fig. 4. The MSE in the training process (the second step) using MSOA
approach (the neural network structure is 3 6 1, for example).
Fig. 5. Training time for every training run (the neural network structure
is 3 6 1, for example).
4.2.3.2. Forecasting accuracy. The forecasting results of

every neural network (dierent kind of structures) using
both algorithms are showed in Fig. 6, in which MSE is
the average of out-sample forecasting (the price of 12
months). As we can see from Fig. 6, obviously, MSOA is
much better than BP algorithm on accuracy. The average
value of MSE of MSOA and BP algorithm is 2.11 104
and 6.22 104, respectively. The standard deviation of
MSE is much lower for MSOA as illustrated in Fig. 6.
And the max of MSE of MSOA is also much lower than
that of BP (1.24 103 and 2.71 103, respectively).
The results may imply the MOSA brings better generalization and robustness than conventional BP algorithm.
5. Results
The corresponding forecasting performance and comparative evaluation results are summarized in Fig. 7 and
Table 3.
As can be seen in Fig. 7, MSOA model performs very
well in wheat price forecasting. In contrast with the results
of ARIMA, the forecasting performance of BP is better. A
comparison of three models is performed and the results
are shown in Table 3.
353
Fig. 6. The forecasting accuracy comparison of 36 neural networks using

both algorithms. MSE values are plotted before transforming back to the
original domain.
5.1. Quantitative evaluation

Table 3 lists four overall summary statistics in in-sample
and out-sample forecasting for wheat price with three
methods. As to quantitative forecasts, Table 3 indicates
that MSOA model provides the best forecasting results
for forecasting judging by all four-performance measures.
The improvement of using MSOA over either ARIMA or
BP models can be considerable.
As to in-sample error comparison of the wheat price
time series, the MSE of the ARIMA model and the BP
model are 1807.116 and 1234.830, respectively. In contrary
to ARIMA and BP, the MSE of MSOA model is considerably reduced to 918.564. During the in-sample period, the
MAE for the ARIMA and BP model are 26.721 and
23.637, respectively, while this same error measure is significantly reduced to 17.475 for MSOA model. MSOA model
results in distinguished decrease in MSE comparing with
ARIMA and BP model (49.1% and 25.6%, respectively).
The value of MAPE of MSOA model is lowest in the three
models (only 1.138), which is a full percentage point better
than the other models.
During the out-sample period, applying MSOA method,
we nd 64.0% and 16.5% decreases in MSE than the
Table 2
The comparison of both algorithms after 10 training runs
Training run
10
MSOA
4.2
0.00056
Y
7.7
0.00211
N
3.2
0.00049
Y
8
0.00206
N
4.3
0.00098
Y
7.8
0.00175
N
4.8
0.001
Y
6.9
0.001
Y
4.3
0.001
Y
8
0.001762
N
4.2
0.00099
Y
7.9
0.00175
N
3.9
0.00042
Y
7.6
0.00170
N
5.4
0.00098
Y
8.4
0.00225
N
4.4
0.00097
Y
8.1
0.00206
N
3.8
0.001
Y
7.1
0.00213
N
BP
Time to train
Performance
Convergence
Time to train
Performance
Convergence
Y: The performance function drops below goal; N: The performance function drops above goal.
354

Table 4
Results of Mertons test of turning point forecasting power for ARIMA
and neural network models
Models
a0
a1
R2
Adj-R2
ARIMA
(t ratio)
BP
(t ratio)
MSOA
(t ratio)
0.333
(1.551)
0.143
(1.29)
0.143
(1.291)
0.238
(0.813)
0.857
(5.0)**
0.657
(2.76)*
0.0572
0.029
*
**
Fig. 7. The out-sample forecasting performance of wheat price.
Table 3
Wheat price forecast results
Models
In-sample
MAE
ARIMA
BP
MSOA
ARIMA
BP
MSOA
MSE
26.721
1807.116
23.637
1234.830
17.475
918.564
Out-sample
14.896
285.709
8.423
167.583
5.362
143.045
MAPE (%)
Dstat (%)
1.427
1.838
1.138
66.000
80.000
87.000
0.955
0.612
0.345
58.3333
91.667
83.333
In-sample: January 1996July 2004; Out-sample: August 2004July 2005;

MAE: mean absolute error; MSE: mean square error; MAPE: mean
absolute percent error; Dstat: direction statistics.
ARIMA model and BP model (285.709, 167.583 and

143.045, respectively). As to MAPE, there are considerable
reductions of 63.9% and 16.3% using MSOA model. The
MAE of MSOA model is also better than those of the other
models. In fact, during the out-sample period the BP model
provides a half percentage point MAPE error improvement
(0.955% versus 0.412%) than the ARIMA model. With
MAD and MSE, the great improvements of the BP model
over the ARIMA model are 56.9% and 76.3%, respectively.
As far as MAPE is concerned, the greater accuracy of BP
over ARIMA is also evident in wheat price forecasting as
indicated by 0.955% versus 0.612%.
5.2. Turning point evaluation
The formal statistical test of turning points for both
models is performed by estimating Eq. (3.9) above and
results, (after adjusting for autocorrelation), are shown in
Table 4. The t ratio of the slope coecient, a1, for the
ARIMA wheat model shows that it is not statistically different from zero. This implies that for the out-sample per-
0.714
0.686
0.432
0.375
Dstat(%)
0.661
25**
7.6*
58.333
91.667
83.333
Signicant at 5% (p < 0.05).

Signicant at 0.00% (p < 0.01).
iod the ARIMA wheat model had extremely limited

turning points forecasting power. MSOA model is able to
capture a statistically signicant number of turning points
as indicated by a ratio of 3.166. On the other hand, for the
neural network predictions, a1, is highly signicant and different from zero. The analysis supports the greater turning
point forecasting power of the neural network in addition
to its more accurate price level forecasts.
In term of Dstat, we also nd that MSOA model outperforms ARIMA model, but the BP model is better than
MSOA. Notably, the values of Dstat for MSOA and BP
model exceed 80%, indicating that the nonlinear models
are good predictors of the wheat price while the value of
Dstat for ARIMA model is 58.333%.
From the practical application point of view, indicators
Dstat and Mertons test are more important than indicators
MAE, MSE, MAPE. The reason is the former can reect
the movement trend of the wheat price and can help trader
to hedge their risk and to make good trading decisions in
advance. From the view of Dstat and Mertons test, MSOA
and BP models are much better than the ARIMA model,
indicating that the nonlinear forecasting approach has
especially strong prediction ability in the volatile wheat
market.
In addition, we observe in Tables 3 and 4 that smaller
MAE, MSE and MAPE do not necessarily mean a higher
Dstat and signicant test of Mertons test. This may imply
that the Dstat and signicant test of Mertons test are dierent from the MAE, MSE and MAPE in dierent time series
forecasting. When using dierent forecasting performance
criteria (point and directional accuracy measurements),
we may get some dierent results. This means that accurate
price estimation, as determined by its deviation from the
actual observation, may not be a good predictor of the
direction of change in the price levels of nancial
instruments.
As to the prot criterion (MMRR), the empirical results
show that MSOA model could be applied to future forecasting. Compared with other models presented in this
paper, MSOA model performs the best, which is similar
to the result of evaluation of point accuracy. As shown
in Table 5, the best mean monthly return rate for MSOA
model is 1.1502%, the rate for ARIMA is 0.2549%, and
the rate for the BP model is 0.9627%.

Table 5
A comparison of mean monthly return rate of dierent methods
Models
ARIMA
BP
MSOA
MMRR (%)
0.2549
0.9627
1.1502
355
can be used as an alternative method to forecast the wheat

price for nancial managers and business practitioners.
Acknowledgements
MMRR: mean monthly return rate.
We can, therefore, conclude from the results of Tables

35, and above analysis that MSOA model performs better
than the ARIMA and BP model in terms of MAE, MSE
and MAPE. On the other hand, we cannot deny that the
BP model is an eective way to improve the forecasting
accuracy. The nonlinear model is an eective method to
model the Chinese food price of wholesale market. Furthermore, the overall prediction performance of MSOA
model is satisfactory because (1) the MAE, MSE and
MAPE are smallest, (2) the Dstat exceeds 80%, (3) Mertons
test is signicant at 0.00% and (4) MMRR is the highest.
6. Conclusion
This study proposes a multi-stage optimization
approach (MSOA) in order to overcome the weakness of
conventional BP algorithm. From the experimental results
comparing the performance of three models, MSOA, BP
and AIRMA, we can conclude that MSOA is an eective
method to model Chinese food price forecasting. From
previous studies on time series forecasting with the ANN
model, it is shown that the nonlinear models work well
even when the signal to noise ratio is small.
This study compares MSOA, ARIMA and the BP models to forecast the wheat price of China Zhengzhou Grain
Wholesale Market. Results show MSOA model forecasts
are considerably more accurate than either the traditional
ARIMA model or conventional BP model which used as
benchmarks in terms of error measures, such as MAE,
MSE and MAPE. On the other hand, as far as turning
point evaluation (Mertons test and Dstat) and the prot
criterion (MMRR) are concerned, MSOA and BP model
is absolutely better than ARIMA. The results of turning
point and prot criterions prove the eciency of nonlinear
models prediction ability in the vertiginous wheat market.
Meanwhile, MSOA and BP models can capture a statistically signicant number of turning points for the wheat
price, while the ARIMA model can not. The paper also
proves accurate price estimation may not be a good predictor of the turning point in the price levels in nancial
market.
Moreover, we can conclude from experimental results
that the global convergence ability and convergence speed
of most networks are improved greatly using MSOA. The
forecasting results imply the MOSA brings better generalization and robustness than conventional BP algorithm.
As a summary, our research ndings demonstrate that
the articial neural network based on multi-stage optimization approach with only one hidden layer can precisely and
satisfactorily approximate any continuous function and
Funding for this research is supported by the National

Science Foundation of China under grants No.70371004
and Ph.D. program Foundation of Education Ministry of
China contract number 20040006023.
References
nkal, Dilek.
Andrew, C. P., Macaulay, Alex, Thomson, Mary E., & O
(2005). Performance evaluation of judgmental directional exchange
rate predictions. International Journal of Forecasting, 21, 473489.
Ash, J. C. K., Smyth, D. J., & Heravi, S. M. (1998). Are OCED forecast
rational and useful? A directional analysis. International Journal of
Forecasting, 14, 381391.
Box, G. E. P., & Jenkins, G. (1970). Time series analysis, forecasting and
control. San Francisco, CA: Holden-Day.
Cumby, R. E., & Modest, D. M. (1987). Testing for market timing ability:
A framework for forecast evaluation. Journal of Financial Economics,
19, 169189.
Greer, M. (2003). Directional accuracy tests of long-term interest rate
forecasts. International Journal of Forecasting, 19, 291298.
Ho, S. L., & Xie, M. (1998). The use of ARIMA models for reliability and
analysis. Computers and Industrial Engineering, 35, 213216.
Ho, S. L., Xie, M., & Goh, T. N. (2002). A comparative study of neural
network and Box-Jenkins ARIMA modeling in time series prediction.
Computers and Industrial Engineering, 42, 371375.
Joutz, F., & Stekler, H. O. (2000). An evaluation of the predictions of the
Federal Reserve. International Journal of Forecasting, 16, 1738.
Kholodilin, K. A., & Yao, V. W. (2005). Measuring and predicting turning
points using a dynamic bi-factor model. International Journal of
Kohzadi, N., Boyad, M. S., Kermanshahi, B., & Kaastra, I. (1996). A
comparison of articial neural network and time series models for
forecasting commodity price. Neurocomputing, 10, 169181.
Lapedes, A., & Farber, R. (1998). How neural nets work. In D. Z.
Anderson (Ed.), Neural Information Processing Systems (pp. 442456).
New York: American Institute of Physics.
Merton, R. C. (1981). On market timing and investment performance: An
equilibrium theory of value for market forecasts. Journal of Business,
54, 363406.
Mills, T. C., & Perpper, G. T. (1999). Assessing the forecasters: an analysis
of the forecasting records of the Treasury, the London Business School
and the National Institute. International Journal of Forecasting, 15,
247257.
Morantz, B. H., Whalen, T., & Zhang, G. P. (2004). A weighted window
approach to neural network time series forecasting. In G. P. Zhang
(Ed.), Neural networks in business forecasting (pp. 251265). Idea
Group, Inc.
Oller, L., & Bharat, B. (2000). The accuracy of European growth and
ination forecasts. International Journal of Forecasting, 16, 293315.
Patterson, D. W. (1996). Articial Neural Networks. Englewood Clis, NJ:
Prentice-Hall.
Pons, J. (2000). The accuracy of IMF and OECD forecasts for G7
countries. Journal of Forecasting, 19, 5363.
Trippi, R.R., & Turban, E., (1996). Neural Networks in Financial and
Investing. Irwin, Chicago.
Tseng, F. M., Yub, H. C., & Tzeng, G. H. (2002). Combining neural
network model with seasonal time series ARIMA model. Technological
Forecasting and Social Change, 69, 7187.
Walczak, S. (2001). An empirical analysis of data requirements for
nancial forecasting with neural networks. Journal of Management
Information Systems, 17, 203222.
356
Wang, S. Y., Yu, L., & LAI, K. K. (2005). Crude oil price forecasting with
TEI@I methodology. Journal of Systems Science and Complexity, 18,
145166.
Yao, J. T., & Tan, C. L. (2000). A case study on using neural networks
to perform technical forecasting of forex. Neurocomputing, 34, 79
98.
Yu, L., Wang, S. Y., & Laic, K. K. (2005). A novel nonlinear ensemble
forecasting model incorporating GLAR and ANN for foreign
exchange rates. Computers and Operations Research, 32, 25232541.
Zhang, G., Patuwo, E. P., & Hu, M. Y. (1998). Forecasting with articial
neural networks: The state of the art. International Journal of

A NN Model Based On The Multi Stage Optimization Approach

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A NN Model Based On The Multi Stage Optimization Approach

Uploaded by

Copyright:

Available Formats

Expert Systems

A neural network model based on the multi-stage

, Yang Fangting b, Yang Han

Corresponding author. Tel.: +86 10 82328477; fax: +86 10 82327808.

Recently, articial neural networks (ANNs) are being

Z. Haofei et al. / Expert Systems with Applications 33 (2007) 347356

back-propagation (BP) algorithm, which is essentially a

that the feed-forward neural network did not perform well

Z. Haofei et al. / Expert Systems with Applications 33 (2007) 347356

cult and there is no theory that can be used to guide the

where Xt and et are the actual value and random error at

where w is a vector of all parameters and f is a function

2.3. A multi-stage optimization approach

Z. Haofei et al. / Expert Systems with Applications 33 (2007) 347356

weights may lead to dierent results. How to initialize

3. Data description and forecast evaluation criteria

1 P 2 Prob F t 0jAt 1:

In other words, (3.3) and (3.5) is the probability that the

The null hypothesis to be tested is:

Cumby and Modest (1987) showed that the above

Fig. 1. Wheat price data; monthly price of China Zhengzhou grain

Here, a1 should be positive and signicantly dierent from

Z. Haofei et al. / Expert Systems with Applications 33 (2007) 347356

we introduce directional change statistics (Dstat) (Yao &

where at = 1 if (At+1 At)(Pt+1 Pt) P 0 and at = 0

where MMRR is the mean monthly return rate,

4.2. Fitting neural network models to the data

Z. Haofei et al. / Expert Systems with Applications 33 (2007) 347356

1 to 6. Then there are totally 36 dierent models in the

Fig. 2. The MSE in the training process using conventional BP algorithm

Z. Haofei et al. / Expert Systems with Applications 33 (2007) 347356

4.2.3.2. Forecasting accuracy. The forecasting results of

Fig. 6. The forecasting accuracy comparison of 36 neural networks using

5.1. Quantitative evaluation

Z. Haofei et al. / Expert Systems with Applications 33 (2007) 347356

Fig. 7. The out-sample forecasting performance of wheat price.

In-sample: January 1996July 2004; Out-sample: August 2004July 2005;

ARIMA model and BP model (285.709, 167.583 and

Signicant at 5% (p < 0.05).

iod the ARIMA wheat model had extremely limited

Z. Haofei et al. / Expert Systems with Applications 33 (2007) 347356

can be used as an alternative method to forecast the wheat

MMRR: mean monthly return rate.

We can, therefore, conclude from the results of Tables

Funding for this research is supported by the National

Z. Haofei et al. / Expert Systems with Applications 33 (2007) 347356

You might also like