Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Short-Term Load Forecasting Methods in Comparison:

Kohonen Learning, Backpropagation Learning, Multiple Regression Analysis


and Kalman Filters

T. Baumann, H. Strasser H. Landrichter


SIEMENS AG Austria Wiener Elektrizitatswerke
Vienna, Austria Vienna, Austria

Keywords: Short-term load forecasting, Artificial neural networks, Regression Techniques

1 ABSTRACT works operation providing a high degree of adaptability.


This adaptation process is the so-called learning.
In this paper we present a comparison between four dif- In contrast to classical software, which are programmed
ferent short-term load forecasting methods. Two of them by a succession of instructions, the ANNs work with a
are based on artificial neural networks and two on classi- training process. In other words, an algorithm is not pro-
cal statistical methods. The first method is based on the grammed anymore, but the ANN is trained by a succes-
well known Backpropagation learning method. The sec- sion of examples (Input I Output pairs). Therefore, this
ond method is based on a combination of the self-orga- approach is useful for tasks where the basic laws are not
nizing Kohonen algorithm and the supervised delta rule. known or only poorly known, which is e.g. the case for
The third method, the multiple regression analysis, uses load forecasting.
the technique of weighted least squares estimation,
whereas the last method is based on Kalman filter theory. In the last decade, interest in neural networks has revived
as advances in computer technology have made it easier
to simulate large networks in software. At present there
2 INTRODUCTION exist numerous of different architectures of ANN.
Among them, the best adapted one must be found for a
If in the last few years, the short time load forecasting has particular task.
become more and more important, new load forecasting
techniques have been created or adapted
3 FOUR FORECASTING METHODS
The artificial neural network (ANN) approach is only a
new step in this evolution, which began in the forties by
discovering the strong correlation between the load and 3.1 ANN wjth BacJwro:oagatjon Learning
meteorological data.
Nowadays, approximately 25 papers have been published The Multilayer Perceptron (MLP) with backpropagation
on the load forecasting using ANN. But only a few of learning is the most popular neural algorithm. The meth-
them did compare their method to classical forecasting od is based on a supervised learning algorithm. During
methods [1], [2], [3] and [4]. In this paper, the most pop- the learning phase, the neural network learns a nonlincar
ular ANN approach, the Multilayer Perceptron with mapping from a n-dimensional input vector space to
backpropagation learning, two statistical methods [5], the m-dimensional output vector space ~n-> ~m, by mini-
Multiple Regression Analysis and Kalman filter and a mizing an energy function, calculated from the mean
new ANN method are compared. square error between the actual ANN outputs and the de-
sired outputs. A general structure of an MLP is shown in
The basic idea of this new approach has been presented Fig. 1. For more detail see also [7] and [8].
in (6). It consists of a combination of a self-organizing
ANN, the Kohonen Feature Map, and a single layered lin- The short-term load forecasting method with MLP pre-
ear network with delta rule learning. This approach is ex- sented in this paper is similar to the methods explained in
plained in detail, while the other three methods will only [8] and (9]. Two main changes are proposed: introduction
briefly be discussed and references to existing work are of \Yeather data and the use of seven MLP's, one MLP for
provided. each day of the week. Therefore, the resulting seven neu-
ral networks need following inputs: the load and weather
A neural network is a collection of simple interconnected data of the present day and forecasted weather data of the
processing units or neurons. The processing units operate
independently from each other, thus, realizing an archi-
tecture that is parallel and distributed, resulting in a high
speed information processing and a potential fault toler-
ance. The connection weights that determine the way in
which neurons interact can be modified during the net-

445
day for which the load is to be forecasted. For the latter tional connections (weights or synaptic weights) to all
day, the network then outputs the loads .. PE's.
ASSOCIATION \ Xf Inputj
Wf Weightj
~ Y: Output
uCD W: Weight vector
>
y X: Input vector

Fig. 1 Structure of an MLP with one hidden layer. Fig. 2 The processing element or artificial neuron

Therefore, the input vector contains 144 components: Fig. 2 shows a schema of a PE where each in11ut X).~s con-
nected to the PE via a synaptic weight W;. X = lX 1, X_i,
48 half-hourly loads(day) .. Xn) is called input vector and on the otfier hand the W
24 temperature( day) =CW1. W2 •.. Wn) is called weight vector.
24 light(day) For most ANN the basic operation using Xand Wto cal-
24 temperature( day+ 1) culate the output Y is the scalar product For the Kohonen
network, the basic operation is the Euclidian distance:
24 light(day+l)
N
and the output vector contains 48 values:
= DEuct = L <Xr ~)
2
Y
48 half-hourly loads(day+ 1) j=1
All the inputs and outputs were normalized to [0.1, 0.9]. where N is the number of components of the input vector.
The seven MLPs were composed of an input layer with
144 neurons, two hidden layers with 60 and 48 neurons The basic architecture of the Kohonen network is com-
and an output layer with 48 neurons. The following pa- posed of an input layer with N inputs followed by a layer
rameters were used for learning: learning rate = 0.6, mo- with M * M neurons, arranged in a two-dimensional lat-
mentum term 0.3. The learning was performed until the tice, also called map(Fig. 3). Each input is connected
system error fell below 0.003. through a weight Wij to all neurons of the second layer.
There are many other possibilities to design an MLP ar-
chitecture. Alternatively, non fully connected MLP's, and
a different organization of inputs and outputs could be
tested with the same data set.

3.2 Kohonen/delta method

This method consists of two phases: the basic load fore-


casting and the weather dependent load correction (Fig.
6). In Phase 1, the load is calculated without any weather
data influence using the self-organized Kohonen network
(section 3.2.1). Fig. 3 Basic architecture of the Kohonen network.
(Connections are only shown partially)
In Phase 2, a one layered ANN is trained to produce a
weather dependent correction of the loads obtained in The training of an ANN is done by a continuous modifi-
Phase 1 (section 3.2.5). cation of the weights Wij• following an internal law which
differs from network to network. This needs a repeated
presentation of input vectors until the ANN is organized.
3.2.1 The Kohonen network
A single learning step can be summarized by the two fol-
In 1982, Teuvo Kohonen proposed a new neural network lowing phases:
architecture that implements a self organizing algorithm 1. Find the neuron c whose weight vector is nearest
(10], (11]. to the input vector. (i.e. Find the best-matching
As rioted above, an ANN can be characterized as a collec- neuron c)
tion of processing elements (PE), also called neurons, An input vector X is presented to the Kohonen network
which can all perform the same task in parallel. The in- and the Euclidian distan~ between this input vector and
formation to be processed is distributed via single-direc- all the weight vectors W; (i = 1 .. M2) is calculated.
Therefore an Euclidian distance is calculated for each

446
neuron. The neuron with the smallest distance is the win- the nearest neuron to the (incomplete) input vector is
ner (also called active ~uron). This neuron gets the index computed. The weights between this selected neuron and
c. The weight v~tor We is consequently the nearest to all the ignored input components are then the values of
the input vector X. This phase can be summarized by: • the missing components of the input vector. In Fig. 5, the
Kohonen network is trained with four consecutive load
DEuc/W0 X) = min(DEuctWi. X)) for i = 1... /.I values, L(t-2) until L(t+ 1), and the recall is done with
three load values and with the load L(t+l) is missing.

2. Update the unit c and its neighbors.


Training Auto-associative memory
Once the winner has been located, unsupervised learning L(t+l)
proceeds by updating the weights of neurons within a
neighborhood Ne defined around winner neuron c (Fig. L(t-2) L(t-2
4).
L(t-1) L(t-1

~ L(t) L(t)

~
Fig. 4 Neighbourhood around the wirmer neuron
Fig. 5 Principle of auto-associative memory
All other weight vectors are not modified. In the begin-
ning, the radius of the neighborhood Ne is big (approxi-
mately half of the network size) but then decreases with 3 2.3 Application of auto-associative memory to load
the learning step until it reaches zero. If the radius is zero, forecasting (phase })
only the neuron c is updated. Updating is done as follows:
An application of auto-associative memory with the Ko-
w,{t+ 1>= Wi<t) + a(t> x (X(O - w.<o> honen network to single day term load forecasting is de-
w.<t+ 1> = W.<t> scribed. This type of ANN is well suited for the task of
load forecasting due to its characteristic of feature extrac-
where cx(t) is the gain term (0 < cx(t) < 1) which decreases tion of input vectors.
as a function of the training iteration step, as well as the
neighbourhood. During this feature extraction, the Kohonen network cre-
ates (= self-organizes) a fixed number of reference vec-
The key point is that the algorithm performs a reduction tors corresponding each to a day of the year (reference
of dimensionality. In our case the dimension of the input day). Each neuron stores a reference day in its weights.
vector space is reduced to two, arranging vectors on a sel-
Therefore, the number of reference days is fixed by the
f-organizing feature map.
number of neurons in a Kohonen map. We used 100 neu-
Kohonen maps partition the input vector space into M 2 rons networks for all our simulations.
sub-spaces, also called "receptive fields", which are de-
Due to the different load shapes of the seven days in a
termined by the set of inputs selecting the same neuron.
week, we used seven Kohonen maps, as well. The type of
This mapping is done, conserving the topology of the in-
the day could also be introduced as input to the neural
put vectors. In other words, if two partitions of the input
network, but our experience gave better results by choos-
vector space are "neighbours", then the two correspond-
ing units of the Kohonen map are topological neighbours, ing one Kohonen map for each day type. The inputs, the
desired outputs after learning and the seven Kohonen
too.
maps are shown as Phase 1 in Fig. 6.

3.2.2 Application of the Kohonen network to


auto-associative memory

Once the training is done, it is possible to apply a vector


with missing components and the Kohonen network can
recover the complete vector. This is called auto-associa-
tive memory. For example: when the Kohonen network
has been trained with binary images and only a part of an
image is presented to the Kohonen network, it will recon-
struct the whole binary image.
This method consists of ignoring the missing components
of the input vector and calculating the Euclidian distance Fig. 6 The data flow of the forecasting method.
with the known components only. As in the normal case,

447
~ Phase 1

lt~=~l -
[~
ISa_Su l
:1~~~~~1

K1ij+1)]~:i.
2ij+1) ~!;'

r .
: 8;
-g~

~8ij+1) ~~
Phase 2

Mo_Tu= Monday to Tuesday, ect.

The seven ANN's are trained with one to three years of 3.2.5 Introduction of weather data (Phase 2)
load data, depending on the availability. During training
the input vector to the Kohonen maps is composed of 96 Simulations show that Phase I with trend correction per-
inputs (48 half-hourly loads of day j + 48 half-hourly forms already well. But the forecast of days with impor-
loads of day j+l). When forecasting, only the first 48 in- tant temperature changes between two days gives very
puts are presented and the Kohonen network finds the bad results. Now a second neural network is introduced
missing 48 loads of the next day by auto-associative re- to correct the forecasted load as a function of weather
call. data. The basic idea is to relate the errors produced with
Phase I to weather data. For this purpose an ANN is
3.2.4 Trend correction trained to predict the errors of Phase I as a function of
weather data and then add it to the output of Phase 1 (Fig.
If the seven Kohonen networks are trained with several 6, Phase 2). Therefore, the input to the ANN is the weath-
years, the produced reference days, which are stored in er data and the desired outputs are the load errors pro-
the neurons of the Kohonen maps will statistically con- duced by Phase 1. The weather data is composed of the
tain load values which are too small , due to the growing temperature of today and the predicted temperature of the
load over years. The larger the data period to train the day to be forecasted. Intuitively, the ANN should learn a
ANN's, the more important this statistical effect will be. zero correction, if the temperature of the following days
does not change in comparison with the present day.
We propose a simple trend correction method to over-
come this problem. For that purpose we used a one-layered linear ANN,
trained with the delta rule [7] :
The difference between the load input to the ANN and the
N
48 weights of the best-matching neuron c is added to the net ~
intermediate output Yi (calculated with auto-association) o; = L.., xj' wij recall
of the ANN. This can be summarized by: j= 1
desired net
~. hase 1 = Y1.+o · (X.J- 48 -Wc.j-48 ) = TJ · ( O; - o; ) · xj learning rule
1
!J,. W;j
forj' =49 ... 96

where o is a correction factor between 0 and 1. With o= where o?et is the network output of unit i, xj the j1h input,
0.75 we obtained best results. In Fig. 6 the calculation of tJ,. W;; the changes of weights between two learning steps,

trend corrected loads is included in Phase I. o;desrred th e deslf


. ed output an d rt t he 1earnmg
. rate. Note
that the weights wij are different from the weights in the
This method is also useful, if rapid load profile changes Kohonen maps.
occur (e.g. summer vacation). After one day, the method
adapts the output to the new circumstances. At present, only one ANN is used for the seven days. In
the future it will be tested, if using more ANN's will im-
prove the performance. Simulations showed that the

448
training data should be composed of the one or two last In comparison with the multiple regression analysis, the
months of the actual and last year. Kalman filter adapts itself more rapidly due to the usage
of the last predicted state vector, the last forecast error as
well as the current and the predicted weather data to cal-
3.3 Multiple regression analysis culate the next forecast.

By means of multiple regression analysis the statistical A critical point with Kalman filters is the optimal weight-
relations between total load and weather conditions as ing of the different influencing parameters. For example,
well as day-type influences are calculated. In other a too strong influence of the light intensity may increase
words, the method attempts to reconstruct analytically the forecasting errors. Furthermore, the optimal weight-
the causal functional relationship between the load and ing normally change over time, depending on the sea-
the influencing factors. sons.
In this paper, the regression coefficients are computed by The multiple regression analysis as well as the Kalman
an exponentially weighted-least-squares estimation filter approach are explained in detail in [5].
based on a linear regression model for a given point of
time (e.g. time interval 8.00-8.30 a.m.). The analysis pe- 4 RESULTS
riod was fixed to five weeks.
The result of this process is a set of polynomial-coeffi- The comparison of results are demonstrated using the
cients, so called regression coefficients lli· By means of half hourly loads and the corresponding weather data
dummy variables, day-type specific load influences are from the electric utility in Vienna (WEW) between the
taken into account Different day types are commonly 22. October 91 and 20. October 92. The maximal load is
treated in the data analysis in order to take advantage of approximately 1500 MW.
the increased amount of statistical data. The following error definitions were used:
The result of the analysis is a decomposition of the time
Joacf!esired - load.
dependent total load Y(t) into: e~el = / /
Relative error
* time-sensitive load component ' loa<f!esired
I

weather insensitive base load B(t) ref 1 """Ab ref


MAPE µAbs= "'j( £.,. s ( ei )
* weather-sensitive load components I

temperature dependent load aT * T(t)


where MAPE means Mean Absolute Percent Error.
light intensity dependent load aL * L(t)
We present results based on a forecasting period of two
* day type specific load component D(t) weeks, from Monday 21. September to Sunday 4. Octo-
Y(t) =B(t) + aT * T(t) + aL * L(t) + D(t) + model_error(t) ber 1992. For all methods, the forecasting is done for a
single day at midnight
In the model we used the derived indoor temperature in-
stead of the measured outdoor temperature. The indoor This test period has been chosen in order to evaluate the
temperature is calculated by solving the heat flow equa- methods in two different ways: first, the forecast perfor-
tion. The light intensity is transformed via Weber-Fech- mance after a long sunny period, and second, the adapting
ners-law [5]. ability after a change from summer-time (ST) to winter-
time (WT) on 26(27 September.
This change introduces a change in the load shape. There-
3.4 Kalman filter
fore, within this testing period, the first six days test the
forecasting ability and the last eight days the ability to
Multiple regression analysis and the Kalman filter are
adapt to a new situation. This is done for all methods.
parts of the linear filter theory: one considers a linear
combination of certain model functions with unknown For the two ANN methods, the training was performed
coefficients. In case of multiple regression analysis these with recorded data from the 22. October 91 until one day
coefficients are time independent, whereas the Kalman before the testing period. The forecasting of the two
filter uses time evaluated ones (dynamic state equation). weeks is then performed with the previously trained
Of course, both methods can be performed with arbitrary ANN's. For a longer testing period a retraining of the
model functions, but the simulations in this paper were neural networks would have been necessary.
based on linear functions.
For the two classical methods, an adaptation of the meth-
The decomposition of the time dependent load into com- od coefficients was performed with five weeks until one
ponents as well as the transformations on temperature day before the forecasting day. Therefore the two ANN
and light intensity are as described in the multiple regres- methods are slightly handicapped compared to the classi-
sion analysis section. cal methods, because due to the high calculation time,
only one learning is done for the whole test period.

449
The table below show the MAPE calculated over a single Due to the trend correction, the Kohonen/Delta rule
day for all methods over the whole testing period of two method performed by far best during the whole WT peri-
weeks. od. It would be interesting to include a trend correction
algorithm to the other methods, as well.
Date MLP Koh MR.egr Kalman ST/WT
Fig. 7 - Fig. 10 show the effective and forecasted half--
21.9.92 1.94 2.04 1.4 2.50 ~1
hourly loads in MW of the four methods on 30. Septem-
22.9.92 1.32 1.58 1.0 0.79 ST ber 92. They show how fast the four different methods
23.9.92 1.86 0.80 1.0 1.16 ST could adapt to the new situation for one example.
24.9.92 1.60 0.73 0.8 0.85 ST
25.9.92 1.58 0.77 1.4 1.11 ST
26.9.92 1.42 1.78 1.8 1.58 ST
27.9.92 4.94 3.98 4.9 3.96 wr
28.9.92 1.81 2.02 1.5 2.30 wr ~
29.9.92 2.62 0.85 2.0 2.04 wr ~
30.9.92 3.43 1.07 1.5 1.07 wr
1.10.92 2.71 0.96 2.2 1.49 wr
2.10.92 2.81 1.17 2.1 1.06 wr
3.10.92 3.47 1.78 3.9 2.72 wr
4.10.92 6.8 2.21 5.0 5.53 wr
MAPES'! 1.62 1.28 1.23 1.33 s·1 Fig. 7 Effective and forecasted loads of the MLP method.
MAPEWf 3.57 1.75 2.89 2.52 wr
MAPE total 2.74 1.55 2.18 2.01 all

Table 1: Daily MAPEs of the four methods

4.1 ST test period

This period is characterized by the fact that the load


shapes are similar to the preceding 5 weeks. In this peri-
od, the two statistical methods and the Kohonen/Delta
method performed well. The best performance was
achieved by the multiple regression which could take ad- Fig. 8 Effective and forecasted loads of the Kohonen/Del-
vantage of the non changing 5 weeks analysis period. The ta rule method.
MLP method performed the worst.

4.2 wr test period


~
From Saturday 26. Sept to Sunday 27. Sept. the time II ~
changes from ST to WT. Therefore the four methods have
to cope with a new situation. The sunday was forecasted
badly by all the methods. But then, the four methods re-
acted differently to ST/WT change. While the Kohonen/-
Delta method reacted fast, the multiple regression and es-
pecially the MLP method reacted slowly or not. The MLP Fig. 9 Effective and forecasted loads of the multiple re-
method should know this situation, because WT load gression method.
shapes of passed year are included in the data set.
The Kalman filter adapted rapidly to the new situation.
After 3 days it performed better than the multiple regres-
sion method.
The two classical methods mode led the last day very bad- II
ly. This day was the first cloudy day after a clear period.
Therefore, the methods could not correctly model the in-
fluence of the light intensities. The Kohonen/Delta rule
method predicted this day best due to a longer learning
period. The MLP method does not react to the light
change.
Fig. 10 Effective and forecasted loads of the Kalman filter
method.

450
The change from ST to WT induces a strong change in Nonfully Connected Artificial Neural Network",
the load shape around sunset For the 30. September the IEEE Transactions on Power Systems, Vol. 7, No. 3,
Kalman filter and the Kohonen/Delta method have the pp. 1098-1105, August 1992.
same MAPE. While the Kalman filter performs better af-
ter midnight when the loads are small, the Kohonen/Delta 4) Srinivansan, D., Liew, A.C., Chen John S.P., "Short
method provides a better forecast around sunset. The Term Forecasting Using Neural Network Ap-
maximum MAPE for this day is 4.34% for the Kalman proach", IEEE, Proc. First International Forum on
filter method and 3.17% for the Kohonen/Delta method. Applications of Neural Networks to Power Sy-
stems, Seatle, Washington, pp. 22-25, July 23-26,
1991.
4.3 Two weeks test oeriod
5) H. Strasser, N. Friemelt, G. Schellstede, "Short term
Over the whole test period, the Kohonen/Delta-rule load forecast using multiple regression analysis or
method was found out to be the most appropriate load adaptive regression analysis", Proceedings of
forecasting technique, especially when load shapes were PSCC, Graz, 1990.
changing. This method has also been tested with data of 6) Germond, A.J., Macabrey, N., Baumann, T., "Ap-
a different European electric utility and two testing peri- plication of Artificial Neural Networks to Load Fo-
ods of one month. One period in summer and one in win- recasting", INNS:Workshop on Neural Network
ter. The two-months MAPE was 1.6% [12]. Computing for the Electric Power Industry, Stan-
ford University, Stanford, California, August 17-19,
5 CONCLUSIONS 1992.
7) Rumelhart, D.E., Hinton, G.E., Williams, R.J., "Le-
In this paper four different load forecasting techniques aming Internal Representations by Error Propagati-
have been compared. Two ANN and two statistical ap-
on", In Parallel Distributed Processing, chap 8, pp.
proaches. The Kohonen/Delta rule approach gave the
best results during a two weeks testing period. This meth- 318-362, MIT Press, Cambridge, 1986.
od adapted very fast to a fairly new situation (changing 8) Lee, K.Y., Cha, Y.T.,Park, J.H., "Short-Term Load
from summer-time to winter-time). Another ANN meth- Forecasting using an Artificial Neural Network",
od, the Multilayer Perceptron with backpropagation IEEE Transactions on Power Systems, Vol 7, No. 1,
learning gave the worst results. The MLP method, as pro- pp. 124-130, February, 1992.
posed here, is probably not well adapted, because the data
set is very small (one year). It is known, that the MLP 9) Lee, K.Y., Cha, Y.T., Ku, C.C., "A Study on Neural
with backpropagation learning is very sensitive to learn- Networks for Short-Term Load Forecasting", IEEE,
ing and network parameters Oeaming rate, hidden layers, Proc. First International Forum on Applications of
...). Improvements should be possible by changing the pa- Neural Networks to Power Systems, Seatle, Was-
rameters and the structure of the ANN. Furthermore, it hington, pp. 26-30, July 23-26, 1991.
would be interesting to add a trend correction algorithm 10) Kohonen, T., "Self-Organization and Associative
to the MLP method. The two statistical methods per- Memory", 3rd edition, Springer Verlag, Berlin,
formed well during the first six days, but both reacted
1989.
slowly after the change from summer-time to winter-time
(especially the multiple regression method). 11) Kohonen, T., "Self-Organized Formation of Topo-
logical Correct Feature Map", Biological Cyberne-
tics, vol. 43, pp. 59-69, 1982.
12) Baumann, T., Germond, AJ., "Application of the
6 REFERENCES Kohonen Network to Short"Term Load Fore-
casting", submitted to the second International For-
l) Dillon, T.S., Morsztyn, K., Phua, K., "Short Term
um on Applications of Neural Networks to Power
Load Forecasting Using Adaptive Pattern Recogni-
Systems, April 19-22, Yokohama, Japan, 1993.
tion and Self-Organizing Techniques'', Proc. of
Fifth Power Systems Computation Conference,
Cambridge, Paper 2.4/3, pp. 1-16, 1975.
2) Brace, M.C., Schmidt, J., Haldin, M., "Comparison
of the Forecasting Accuracy of Neural Networks
with other Established Techniques", IEEE, Proc.
First International Forum on Applications of Neural
Networks to Power Systems, Seatle, Washington,
pp. 31-35, July 23-26, 1991.
3) Chen, S-T, Yu, D.C., Moghaddamjo, A.R., "Wea-
ther Sensitive Short-Term Load Forecasting Using

451

You might also like