Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Neural Comput & Applic (1993)1:59--66

(~) 1993 Springer-Verlag London Limited Neural


Computing
& Applications

The Principles and Practice of Time Series Forecasting and


Business Modelling Using Neural Nets
R.G. Hoptroff
Right Information Systems Limited, 14 St Christopher's Place, London WlM 5HB, UK

This paper is intended as a 'hands-on' practical common-sense performance and ease of use. The
discussion of how and why neural networks are used neural network approach will be examined here,
in forecasting and business modelling. The need for and the practical benefits assessed.
forecasting is briefly examined. The theory of the
multilayer perceptron neural network is then covered
both qualitatively and in mathematical detail, includ- 2. Motivations for Forecasting and
ing the methods of back-propagation of error and
Modelling in Business
independent validation. The advantages of the neural
net approach to forecasting, namely nonlinear model-
Forecasting is the rational prediction of future events
ling capability, plausible interpolations and extrapol-
on the basis of information about past and current
ations, robustness to noise, ill-conditioning and
events. The process is very similar to modelling,
insufficient data, and ease of use, are discussed.
where the outcome of an unknown variable is
Finally, some working notes are offered for the
predicted from known or controllable variables. The
practical implementation of neural nets in forecasting,
relationship between known and unknown variables,
and four real-life examples are given from the pursuits
or between past and future events, may be derived
of econometrics, sales forecasting, market modelling,
either through rational deduction or statistical analy-
and risk evaluation.
sis of historical relationships, or through a combi-
nation of the two. If relevant exemplary information
Keywords: Sales forecasting; Market modelling; about past relationships is available, the second
Risk evaluation
approach is invariably more reliable than the first,
because it simply reflects patterns in data in an
unbiased way. This second approach is termed
1. Introduction 'technical analysis', and is the approach that will be
examined here.
Forecasting is a difficult and relevant problem in There are several motivations for forecasting
business. The increasing availability of computers and modelling in business and economics. Short-,
at work can make the forecasting job easier and medium- and long-term forecasts serve a variety of
the results more accurate. With the new freedom purposes; modelling may be useful in a business as
of extensive computing facilities, new approaches a management tool for decision making, or it may
to forecasting and modelling can be considered. be a central function of the business itself:
One of the most attractive approaches is the
neural network, because it combines accuracy with 1. Short-term forecasts: typically, a short-term
forecast is for the week or the month ahead, and
is used for stock control, monitoring cash flow, etc.
Original manuscript received 19 February 1992
Short-term forecasts are usually based on models of
Correspondence and offprint requests to: R.G. Hoptroff, Right current trends and seasonalities in demand.
Information Systems Ltd., 14 St Christopher's Place, London
WlM 5HB, UK. 2. Medium-term forecasts: the medium term fore-
60 R. G. Hoptroff

casts typically look at the position over the next few biological neural network theory, the mathematical
months or years to help manage long-term cash modelling of how the human brain works [2]. Since
utilisation and budgeting. Medium-term forecasts then it has found a wide scope of applications
incorporate independent influential variables into beyond its original field. The MLP has three
the trend/season forecast, to take into account such properties relevant to forecasting and modelling:
factors as the cyclical nature of the economy, and
the effects of different marketing strategies. 1. M L P transfer function: the MLP is a complex
'mathematical function box' which translates (or
3. Long-term forecasts: forecasting many years
maps) an input vector into an output vector.
ahead aids long-term strategic decision making and
(A vector is an ordered list o f numbers, e.g.
capital investment programming, and is used both
coordinates). The mapping from input to output is
in business and in government. These forecasts are
smooth and nonlinear (i.e. a graph relating input
the most difficult of all because of the need to
to output is a continuous arbitrary curve). This
quantify the effects of changes in the fundamental
mapping is wholly dictated by a series of parameters
structure of the system. As there is rarely any
called weights.
relevant data upon which to construct econometric
analyses in these situations, it is common to use 2. Training algorithm: training algorithms exist -
traditional economic modelling for such forecasts. in particular, back propagation of error - for tuning
the weights so that the MLP mapping is the 'best
4. Modelling as a management tool: by far the
fit', according to some measure of error, to a set
greatest interest in modelling for business is market
of training data, i.e. a series of example input/output
modelling. For example, if the effects of price,
data pairs.
promotional activity and advertising spend on
demand can be modelled, then the cost effectiveness These properties are covered in detail in, for
of the different marketing strategies can be quant- example, Rumelhart et al [2] or Wasserman [3].
ified. This has an obvious advantage for both Observe that the MLP's function is in essence a
marketing companies/departments and their clients. nonlinear extension of the linear mapping function
There are a large number of other applications for of multiple regression. There is one further aspect
such quantifiable cause and effect analyses, for to the MLP machine whose relevance is a little
example in project costing, risk estimation and more subtle:
shortlisting (shortlisting anything from personnel to
oil drilling sites). 3. Independent validation: a method, which we
will term independent validation, exists by which
5. Modelling as a central function of a business:
an independent test set may be used to verify the
modelling is the central profit-generating activity in
quality of the training data. The method allows the
a number of businesses. Invariably, the model is
training algorithm to extract what information it can
some form of price model, whether the business be
from the data before identifying a point where, if
in insurance, bookmaking, valuation or speculative
it tried to extract further information, it would begin
trading in financial markets.
to be misled by noise, ill-conditioning or simply
Conventional forecasting methods are almost all insufficient data to draw further conclusion. Indeed,
based on linear or linearised models such as the method can even be used to markedly improve
the auto-regressive moving average method. The on the performance of traditional linear regressions
practical success of these approaches is limited by constructed with small quantities of noisy or ill-
their linearity, their ravenous data requirements, conditioned data. (Ill-conditioning usually arises
and because one needs to be reasonably skilled to when similar, or linearly dependent, input vectors
obtain a good forecast. The interested reader is are associated with very different output vectors.)
directed towards Makridakis et al. [1] for a detailed The detailed argument for independent validation
exposition of traditional forecasting and modelling is as follows. The back-propagation of the error
methods. training process may be initialised so that, before
training starts, the MLP mapping is completely
impartial (or, perhaps, it reflects a priori knowledge
3. Principles of the Multilayer derived from a source other than the training set).
Perceptron (MLP) Neural Network Usually, this state is such that the output is constantly
equal to the training set's mean. Back propagation
In this section the properties of the MLP neural training, which is an iterative process, then proceeds
network are presented, first qualitatively and then by making small changes in the weights so as to
in mathematical detail. The MLP originated in minimise the error as quickly as possible. In time,
Time Series Forecasting and Business Modelling Using Neural Nets 61

the MLP reaches a point which is the 'best fit' to


the training data. The training data, however, may
be noisy, ill-conditioned, or simply too small to sine wave data

convey the full story. If this is the case, the 'best


T
fit' is of little value; indeed, such a 'best fit' is usually
= Independently-validated /
wildly inaccurate when applied to independent test ~. neural reconstruction .jJ ~ ~,
data. This problem is frequently encountered with ~ .
traditional methods, particularly in business and
econometrics where the amount of available data is Conventiona[
reconstruction
strictly limited.
Consider a slightly different approach where some
Time
of the available data is partitioned off, and is only
used for independent validation - that is, it is
Fig. 1. Comparison of independent validation and standard
not used for training, but only for independently regression forecasts of the noisy sine wave.
assessing the quality of the mapping being obtained
from the remainder of the training data. If the
remaining data set is good, the error in predicting the data for being too noisy to draw further
the outputs of the independent validation set will conclusions.
continue to fall during training. If the MLP sees
patterns in the training set but these patterns are
misleading due to noise, etc., the fit to the validation 4. Principles and Motivations for Using
set will not improve during training (indeed, it often the MLP for Forecasting and Business
gets far worse). In practice, most data contains an
Modelling
element of useful information and an element of
noise. In this case, the error associated with the The MLP may be used in forecasting and in business
validation set will first fall, as the MLP fits to modelling by 'unplugging' a conventional method
structure in the data, and then rise as the MLP and 'plugging-in' an MLP. Several examples in the
starts fitting to noise, etc., in the data. The fall next section show how this may be done. In essence,
happens before the rise because there is a strong all independent variables are collected together as
tendency for the best fit to the underlying model to an input vector, and the required dependent vari-
be located far closer to the unbiased starting point ables are collected together as an output vector.
than any best-fit to a noisy or ill-conditioned or The MLP is then trained on historical examples of
under-determined data set. The back propagation these input/output pairs. The MLP can then be used
method tries to minimise fitting error in as short a for modelling or forecasting by presenting an input
distance from the starting point as possible, and vector to the MLP and interpreting its corresponding
hence approaches the best fit to underlying model output. In this way, the MLP approach is a natural
before passing on to find the best fit to the actual extension of traditional forecasting, and all the
data. The best fit to the underlying model is located established tricks of the trade, such as dummy
at the point where the validation set error set is variables to represent changes, seasonality and time
minimised. trends, can be readily incorporated.
The advantage of independent validation is dem- There are four core advantages with the MLP
onstrated clearly in Fig. 1. The original data - 50% approach to forecasting and modelling: nonlinearity,
sine-wave and 50% white noise - is used to train plausible generalisations, robustness to poor quality
an independently validated linear MLP and its data, and ease of use. There is one key limitation:
standard regression equivalant. Both the MLP and the MLP can do no better than the data it is trained
the regression use eight previous points to try to on. Specifically:
forecast the next point ahead. The MLP is linear
so the only difference between the two approaches 1. Nonlinearity: the MLP has the advantage over
is the use of independent validation. The regression other approaches of arbitrary nonlinear mapping
solution, identical to the MLP's 'best-fit' to the capability.
actual data, is somewhat misled because there is 2. Plausibility: the nonlinear map is a smooth
insufficient data to average out the noise. The function which gives plausible generalisations
independent validation solution performance is (interpolations and extrapolations) in contrast with,
much healthier, however, because after training for say, polynomial models which can rarely be
a while, the MLP reaches a point where it rejects described as realistic.
62 R. G. Hoptroff

3. Robustness: the independent validation method 2. Architecture: choose a network architecture


automatically gauges how much relevant information (i.e. I or 2 layers? How many processing elements in
exists in a given data set, and treats it accordingly. each layer?). As a guide, forecasting and modelling
This makes it robust to the problems of ill- problems in business usually require just one hidden
conditioning, noise and data shortages, which are layer of no more than 10 processing elements. Using
common in business and econometrics. the independent validation technique, no harm is
4. Ease of use: the whole process may be done by having an over-sized MLP; it will merely
automated in computer software so that people with take longer to converge. Having too small a net will
little knowledge of either forecasting or neural nets lead to an over-simplified model. Trial and error
can prepare reasonable forecasts in a short space of will reveal the optimum size if size is critical for
time. The MLP's robustness to poor data helps it any reason.
survive conditions of 'mis-use', where it is fed 3. Training: train the net using independent
garbage data, without earning bad reputation. For validated gradient descent. Partition off 10%-25%
example, the approach is unlikely to conclude that of the data (at least 10 points) to be used for
the UK birth rate is driven by the Swedish stork independent validation only. When the error on this
population level even if, by chance, there is some validation set is minimised, the network is in its
correlation between the two variables in the model- optimum configuration for modelling, and fore-
ling data. This is because, if the variables are casting and is ready for use.
unrelated, it is unlikely that the same degree of 4. Error estimation: how good are the resulting
correlation will arise in the independent validation forecasts? The forecasting error on the validation
data. Conventional software is liable to draw such set can be used as an estimate of the likely
a conclusion in non-expert hands. forecasting error. Better still, a second neural net
5. Limitations: if there is no information in the can be used to model the mean square error of the
data, the MLP will offer a non-commital response. first. This is particularly useful if the system being
As with all scientific forecasting methods, the MLP modelled is more predictable in some cases than in
can never predict the unpredictable. It can only others (see Example 4).
generate useful forecasts if it has had access to data 5. Key variables: which inputs are most relevant
from which it can construct a relevant model. to the model? This may be determined by sensitivity
analysis: perturb each input variable in turn, and
observe the magnitude of variation in the output
5. Practice of Using the MLP for variable (or determine the MLP input/output
Forecasting and Business Modelling derivatives). Irrelevant inputs can then be identified,
removed from the model and a more detailed model
Using an MLP for forecasting or modelling is a six- can then be built from the remaining 'key-factors'.
stage process: This process is similar to step-wise regression, but
1. Inputs: choose variables as inputs to the here less relevant variables are removed rather than
network which are thought to explain the variations added in one by one.
in the quantity being modelled: 6. Analysis: analysing the cross-sections of the
MLP transfer function allows strategic analysis. For
'Micro' factors specific to the case in question, example, a cross-section of the variation of sales
e.g. marketing spend, selling price, etc. against marketing spend gives hard evidence of the
'Macro' factors that reflect many aspects of effects of marketing. Quantifying such benefits is
business, e.g. GDP, CBI optimism index, etc. every marketing manager's dream (see Example 3).
'Maybe' factors which may or may not be
relevant, e.g. weather, day of week, etc.
6. Four Examples
Dummy variables may be used just like in any other
forecasting approach. If there are trends associated Four example applications are now given. Taken
with time or seasonality, these should be represented from a variety of different backgrounds, they
in the input vector. The training process is speeded demonstrate how neural nets have been applied in
up if all variables in the input are normalised to the a variety of real business situations by Right
same magnitude. The independently validated net Information Systems Ltd. In the second and third
is robust to insufficient quantities of data, so examples, the data have been disguised to protect
there is much less pressure than with traditional clients' interests. They do, however, genuinely
approaches to use as few variables as possible. reflect the quality of the actual results obtained.
Time Series Forecasting and Business Modelling Using Neural Nets 63

6.1. Forecasting Turning-Points in the UK that a growth period similar to that of 1982-1983 is
Economy forecast for 1992-1993.

The problem here is to forecast the cyclical nature


6.2. Sales Forecasting for Cash Flow and Stock
of the UK economy up to two years in advance
Control
and, in particular, turning points in the cycle. This
is a notoriously difficult problem.
A well-known free distribution magazine needs to
Eighty inputs to the one-layer, two-hidden unit
forecast, some months in advance, each issue's total
MLP were derived from 16 detrended indices
advertising space and revenue to regulate printing
available from the UK Central Statistical Office:
stock levels and predict the cash position in the
stock levels, factory spare capacity, business opti-
future. Only a fraction of advertisements have been
mism, company turnover, new orders, expected
booked when the forecasts are required.
stock levels, housing starts, retail sales, capital
This forecasting problem is unusual because
investment, new car registrations, interest rates,
discrete events (individual issues) are modelled
unemployment, production levels, consumer credit,
rather than the progression of a time series.
FT-A-500, and GDP(0). Five inputs were derived
The number of advertisements booked gradually
from each index: the value now, for the last three
increases as the publication date approaches. Conse-
quarters, and for two years ago. The MLP was
quently, the forecast is expected to improve as the
trained to predict the output of the CSO coincident
publication date nears. The inputs chosen were
indicator (essentially a smoothed GDP indicator)
number of days to publication, current number of
two years in advance, using monthly data from 1963
pages of advertisements booked, and the average
to the present; the period 1975-1980 was used for
for the previous 20 days (this gives the net a measure
independent validation.
of current sales and current growth in sales), one-
The resulting index is shown in Fig. 2, horizontally
year-ago total advertising revenue (to give the net
shifted so that it may be directly compared with the
an idea of seasonal factors), dummy variables
coincident indicator and the most advanced indicator
to identify when political activity might increase
available, the CSO longer leading indicator which
advertising, and the CBI change in business opti-
predicts the data one year in advance. Note that
mism index. The network was trained to forecast
the underlying cycle is clearly forecastable by the
how much more advertising space would be booked
MLP two years in advance with better than one
between now and publication, rather than the
quarter-year's accuracy; that performance is as good
total bookings. For every issue, the forecast then
as the leading indicator, yet smoother and available
approaches the same value (zero) as the publication
one year earlier; that the coal strike trough is not
date draws near, simplifying the forecasting process.
forecast by either index (it was an unforeseen
A two-layer net (six elements in the first layer,
event); that the MLP forecast sustained growth in
three in the second) was trained on a data set of
the 1987-1990 period while in reality deregulation
50 issues and tested on eight. This is a tough
and debt-financing led to a boom-bust effect, and
forecasting problem. The established method typi-
cally forecasts with an accuracy of around + / - 20%

~C~ Deregu~lon T7

~.,~inr
ln~tc~ar

T5
Longer Forecast R ~
Leading
Indicator

MLP
indicator

nue ! 2

, : ', ', I ~ : ', ] ', ', ', I i : ', ~ ', : J I I I [ I I I I I I i I I I

Yllllr : 1 i : I 0
100 90 80 70 60 50 40 30 20 10 0
Fig. 2. Comparison of neural net and longer leading indicator Days to Launch
forecasts of the coincident indicator. Note the neural net forecasts
two years ahead while the longer leading indicator forecasts one Fig. 3. Forecasts of total advertising revenue against the cumulat-
year ahead. ive total in the months leading up to a magazine's publication.
64 R.G. Hoptroff

two months ahead. The neural net achieves shows the corresponding product sales. The 4-
+ / - 10%, which is a significant improvement. Given element single layer MLP was trained to forecast
the small amount of training data (indeed, no sales using January 1984-June 1989 data (1983 being
sensible technical forecast is possible at all without used for independent validation), and was tested by
independent validation), a sensitivity analysis is forecasting July 1989-December 1990 sales. The
probably worthwhile for this problem to weed out test demonstrates excellent results (Fig. 5a). Most
some of the less relevant variables. Figure 3 shows interesting, however, are the cross-sections of the
how the forecast varies for one issue during the run- model which are obtained by varying each variable
up to publication. Ideally, the 'forecast revenue' in turn over its relevant range while keeping all
line is horizontal and intersects the cumulative other variables fixed. Figure 5b shows that the effect
revenue line at five days to publication (the last day of advertising on sales is roughly linear; the slope
on which adverts are accepted). of the line determines whether or not advertising is
worthwhile. Figure 5c shows that, up to the 300
level, promotions have a positive effect on sales.
6.3. Sales Forecasting for Demand and Market
Beyond 300, however, the benefit of promotions on
Modelling
sales begins to level off. Finally, Fig. 5d shows how
sales vary with product price. The product is a
A household product has seasonal sales for which
premium brand, and consequently higher priced
monthly data is available since 1983 (Fig. 4a).
than average. The cross-section reveals how, above
During that time, various marketing tools have been
a 20% premium, sales fall off significantly, while
mixed, including advertising, price positioning and
below 15% premium, sales do not increase drasti-
promotions. The production team needs to forecast
cally.
demand for the year ahead given the marketing
strategy planned for next year. The marketing
department needs to quantify how each marketing 6.4. Modelling Company Performance for Risk
ingredient influences sales so that the optimum and Return Evaluation
marketing mix can be prescribed in the future.
Figure 4 also shows the basic promotions (Fig. The performance of the top 100 UK construction
4b), pricing (Fig. 4c) and advertising (Fig. 4d) input companies were modelled for the industry's annual
data fed to the MLP. Inputs were also provided to review, UK Construction 1991. Looking at the
represent seasonality and time trend. Figure 4a scattergram in Fig. 6, it appears that the smaller
70 20
18
60
16
50

,a 4 0 12
r
~1o
g~ 3 o ~8
20 > 6

4
J,.
10 84

0 I i J i i C 0
1983 1984 1985 1986 1987 1988 1989 1990 1983 1984 1985 1986 1987 1988 1989 1990

700 1.3

6OO ~ 1.25
II
1,2
5OO <
1.15
400
--
C~ 1.1
E
~ 300
1.o5
=.o
2O0
1

~- 0.95

O 0.9 i t i i

1993 1984 1985 1986 1887 1988 1989 1990 1983 1984 1985 1988 1987 1988 1989 1990

Fig. 4. Product data 1983--1989 for market modelling, a Sales. b Advertising activity, c Promotional activity, d Price strategy.
Time Series Forecasting and Business Modelling Using Neural Nets 65

60
7O

50 -~ so
T=
40
~, so
~ gel, o

2O
20
,r
10
i lO
0 i J t i

1988 1989 1990 1991 5 10 15 20


Advertising Activity

55
g
65 r

=~ 50
-== 8o g

ss 45
_===
;:so ~_ 40
4s o

.35
40
#,
35
100 200 300 400 500 600 700 0.95 1 't.0S 1.1 1.15 !L2 I=25 1.3 1.35
Promotions Value Ratio of Brand Price to Average Price

Fig. 5. Modelling results from the data in Fig. 4. a Sales forecast (--: actuats; ---: forecast), b Effect of advertising, e Effect of
promotions, d Effect of price point.

companies operate at higher profit margins, and shows that there is sufficient information in the data
one might conclude that these companies are more to conclude that companies of below s million
efficient. However, it is not immediately clear annual turnover tend to be up to twice as profitable
whether the conclusion is statistically justifiable or than their larger competitors.
not, nor what the exact numerical relationship is. In a similar test, company size was used to model
Independent validation is used here as a test of recent (1990) company performance. Not only was
statistical significance. A 4-hidden unit MLP was a model of the relationship built, but a second
trained using independent validation to model model was generated to estimate the mean squared
company profitability on the basis of company size error of the first as this varied with company size.
alone. The MLP cross-section (the curve in Fig. 6) Figure 7 shows that in 1990, s million construction
companies were twice as volatile as their s billion
turnover competitors. This information can be used

E
4~I
35

30

25
o
to support investment decisions: a share in a
s million construction company offers twice the
potential return of a s billion company, but at
o
o D o
twice the total risk. However, Sharpe's work on
20
o o
o capital asset pricing [4] shows that a greater pro-
g 15 portion of risk can be diversified away for small
~oo ~Oo = o o
companies than large companies. Hence this work
m 10-
o Ooo\
o o []
~
o oo
o _n
~ [] o o
o
~
o
indicates that investing in a portfolio of small
5 D o~o o~ OJo o u o u o ~

ooos~ ~ g o 0 o construction companies may be better than investing


o o @ o oO ~
0 in a portfolio of large construction companies.
-5 I E E

100~.00 100000.00 1000090.00 10000000.00


1990 Turnover s 7. S u m m a r y
Fig. 6. UK construction company long-term performance by
turnover, both actual data (squares) and independently validated The potential for neural net approaches to fore-
model (curve). casting and modelling in practical situations has
66 R.G. Hoptroff

20 " econometrics, sales forecasting, market modelling,


and risk evaluation. The advantages of the neural
net approach are nonlinear modelling capability,
plausible interpolations and extrapolations, robust-
~176 2 : Z ; ~ : , ~
,~<~-10 ~,.i o~ ness to noise, ill-conditioning and insufficient data,
~- =~ -2o -
"g
.~
and ease of use.

g -30
r] -~

References
._~ ~

l o o 1. Makridakis S, Wheelwright SC, McGee VE. Fore-


casting: Methods and applications. New York: Wiley,
10000 100000 1000000 10000000
1990 Turnover s
1983
2. Rumelhart DE, Hinton GE, Williams RJ. Learning
internal representations by error propagation. In:
Fig. 7. UK construction company recent performance by turn- McCleiland, Rumelhart DE (eds). Parallel Distributed
over, showing actual data (squares), independently validated Processing. Cambridge (MA): MIT Press, Vol 1,
model (central curve), and variance model (exterior curves).
1986, 318-360
3. Wasserman PD. Neural computing - Theory and
practice. New York: Van Nostrand Reinhold, 1989
been discussed. The four examples demonstrate how 4. Sharpe WF. Portfolio theory and capital markets.
the MLP approach can be used successfully in New York: McGraw Hill, 1971

You might also like