Thesis Paper STELF Md. Sabuj Hossain 66B EEE

WUB
Short-Term Electrical Load Forecasting using

Constructive Feed-Forward Neural Network
A thesis report submitted to the Department of Electrical and Electronic Engineering,

World University of Bangladesh, in partial fulfillment of the requirements for award of
the degree of Bachelor of Science in Electrical and Electronic Engineering.
Submitted by
Md. Sabuj Hossain

Reg. No.: WUB/09/18/66/4196
Rounok Jahan Mitul

Reg. No.: WUB/09/18/66/4205
Supervised by
Dr. Md. Riyad Tanshen

Sr. Asst. Professor & Head
Department of Electrical & Electronics Engineering
World University of Bangladesh
October, 2021.
.
LETTER OF TRANSMITTAL
Dated: October, 2021.
Dr. Md. Riyad Tanshen

Department of Electrical & Electronic Engineering
Plot: 5-8, Avenue 6 & Lake Drive Road
Sector: 17/H, Uttara,
Dhaka-1230.
Subject : Submission of Thesis Report.
Dear Sir
We are pleased to submit here with the thesis report on “Short-Term Electrical Load Forecasting
using Constructive Feed-Forward Neural Network”. It was great pleasure for us to work on such
an important topic. This thesis report was done according to the requirement of the World University
of Bangladesh (WUB).
We believe that this report will certainly help you to evaluate our thesis work.
We would be very happy to provide any assistance in interpreting any part of the report whenever
necessary.
Thanking you,
Yours faithfully,
Md. Sabuj Hossain

Reg. No.: WUB/09/18/66/4196
Rounok Jahan Mitul

Reg. No.: WUB/09/18/66/4205
i
DECLARATION
We do hereby solemnly declare that, the work that has been presented in this project has been
carried out by us and has not been previously submitted to any University/Organization for
any academic degree. We hereby ensure that the work that has been presented does not
breach any existing copyright.
We further undertake to indemnify the University against any loss or damage arising from
breach of the forgoing obligation.
Md. Sabuj Hossain

Reg. No.: WUB/09/18/66/4196
Rounok Jahan Mitul

Reg. No.: WUB/09/18/66/4205
ii
CERTIFICATE
This is to certify that the thesis report on Short-Term Electrical Load Forecasting using
Constructive Feed-Forward Neural Network”, has been carried out by (Reg. No. WUB
09/18/66/4196 and WUB 09/18/66/4205) under my supervision in the Department of Electrical and
Electronic Engineering at World University of Bangladesh.
I wish his/her success in future.
Faculty Guide
Supervised by
(Dr. Md. Riyad Tanshen)

Department of Electrical & Electronic Engineering
Plot: 5-8, Avenue 6 & Lake Drive Road
Sector: 17/H, Uttara,
Dhaka-1230.
iii
ACKNOWLEDGEMENT
First of all, we are grateful to almighty Allah for giving me such an opportunity to bring this
report into light successfully.
This project is accomplished under the supervision of Dr. Md. Riyad Tanshen, Sr. Asst.
Professor & Head, Department of Electrical and Electronic Engineering, World University of
Bangladesh. It is a great pleasure to acknowledge my profound gratitude and respect to my
supervisor for this consistent guidance, encouragement, helpful suggestions, constructive
criticism and endless patience throughout the progress of this work. The successful
completion of this project would not have been possible without his persistent motivation and
continuous guidance.
We are also grateful to honorable vice chancellor Professor Dr. Abdul Mannan Choudhury,
Pro-vice chancellor Professor Dr. M. Nurul Islam, Pro VC and Sr. Asst. Prof. Dr. Md. Riyad
Tanshen, Head of the department of EEE and all respected teachers of the Electrical and
Electronic Engineering department for their co-operation and significant help for completing
the project work successfully. We would like to thank my family members and others for
their support and encouragement during the preparation of my report and throughout my life.
Thank you all
Authors
Md. Sabuj Hossain

Reg. No.: WUB/09/18/66/4196
Rounok Jahan Mitul

Reg. No.: WUB/09/18/66/4205
iv
EXECUTIVE SUMMARY
The economy of the operation and control of power systems is sensitive to system demand;
large savings can be obtained by increasing the accuracy of demand forecast. The effect of a
large forecast error is reflected in terms of over conservative or over risky operation. It
implies that, over estimation leads to the startup of too many units or excessive energy
purchase. Thereby, such over prediction supplies an unnecessary level of reserve that
resulting high operating cost. On the other hand, under prediction persuades insufficient
preparation of spinning reserve and causes the system to operate at a risk region to the
disturbance. In addition, under prediction of load forecast leads to insufficient reserve
capacity preparation, which ultimately increases the operating cost by using expensive
peaking units. Thus, improvement in load forecasting accuracy leads to the cost savings and
increases in the system security. Since, in power systems, the every next day‟s power
generation needs to be scheduled for the power dispatch. In this regard, the day-ahead
short-term load forecasting (STLF) is a necessary task.
A number of approaches exist in the literature, where they try to solve the short term
electrical load forecasting (ELF), i.e. STELF problem using neural networks (NNs). It has
been confirmed that, the usage of NN in STELF always outperforms any human-based
computational analysis in terms of accuracy, easy maintenance for users. Because, NN has a
good capability for mapping between input and output although load (i.e., output) is being
increased day by day. Feed forward NN (FFNN) has been used to solve the ELF problem for
different regions with a reasonable computational cost. It is noted that, FFNNs are much
suitable for mapping static relationships between inputs and outputs and ultimately providing
good results in ELF. However, FFNNs need large historical data and have a limited
capability to predict loads of holidays and fast load changes. To overcome the shortcomings
of FFNN, a number of efforts have been done recently, among which echo state NN, radial
basis function NN, recurrent NN, and nonlinear autoregressive NN are used, respectively. It
is noted here that, the performances of afore-mentioned NN models are satisfactory in
v
ABBRVIATIONS
AI Artificial Intelligence
ANN Artificial Neural Network
ARIMA Autoregressive Integrated Moving Average
ARMA Autoregressive–Moving-Average
BPN Back Propagation Network
CFFNN Constructive Feed Forward Neural Network
CAELF Constructive Approaches for Electrical Load Forecasting
CDD Cooling Degree Days
ELF Electrical Load Forecasting
DSM Demand Side Management
EPRI Electric Power Research Institute
FL Fuzzy Logic
HN Hidden Neuron
HDD Heating Degree Days
MAPE Mean Absolute Percentage Error
MSE Mean Squared Error
NARx Neural Network Auto Regressive
PG&E Pacific Gas And Electric Company
RBFN Radial Basis Function Neural Network
SELF Standard Electrical Load Forecasting
SVM Support Vector Machine
STELF Short Term Electrical Load Forecasting
SCADA Supervisory Control And Data Acquisition
THI Temperature-Humidity Index
WCI Wind Chill Index
vi
TABLE OF CONTENTS
LETTER OF TRANSMITTAL i
DECLARATION ii
CERTIFICATE iii
ACKNOWLEDGEMENTS iv
EXECUTIVE SUMMARY v
ABBREVIATIONS vi
1 Introduction 1
1.1 Business Needs of Load forecasts ....................................................................................... 2

1.2 Characteristics of the power system Load........................................................................... 3
1.2.1 Weather ..................................................................................................................... 4
1.2.2 Time .......................................................................................................................... 4
1.2.3 Economy ................................................................................................................... 5
1.2.4 Random Disturbance ................................................................................................. 5
1.3 Classification of Developed ELF methods ......................................................................... 6
1.4 Short Term Electrical Load Forecast .................................................................................. 6
1.5 Application of Short Term Electrical Load Forecast .......................................................... 6
1.6 Specific aims of this thesis .................................................................................................. 8
2 Methodology 9
2.1 Overview ............................................................................................................................. 9
2.2 Statistical Approaches ....................................................................................................... 11
2.2.1 Regression Analysis ................................................................................................. 11
2.2.2 Time Series Analysis ............................................................................................... 12
2.3 Neural Network based approaches .................................................................................... 13
3 Artificial Neural Network 15

3.1 Introduction ....................................................................................................................... 15
3.2 Fundamentals of Neural network ...................................................................................... 16
3.2.1 Processing Unit ........................................................................................................ 17
3.2.2 Activation Function ................................................................................................. 17
3.2.3 Network Topologies................................................................................................. 20
viii
3.2.4 Network Learning .................................................................................................... 21
3.2.5 Objective Function ................................................................................................... 22

3.3 Feed Forward Neural Network ......................................................................................... 22
3.3.1 Basic structure .......................................................................................................... 22
3.3.2 Representation Capability ........................................................................................ 24
3.3.3 Network structure design ......................................................................................... 25
3.3.3.1 Determination of Hidden Layers ................................................................. 25
3.3.3.2 Determination of Optimal Number of Hidden Units ................................... 26
3.3.3.3 Training Algorithm of Neural Network ...................................................... 27
3.3.3.4 Update of Output layer weights................................................................... 29
3.3.3.5 Update of Hidden layer weights .................................................................. 30
4 Proposed Model for Electrical Load Forecasting 33

4.1 Electrical Load forecasting ............................................................................................... 33
4.2 Constructive Approaches for Electrical Load Forecasting................................................ 33
4.2.1 Performance criterion of NN training ...................................................................... 36
4.2.2 Termination criterion of NN training ....................................................................... 36
4.2.3 Hidden Neuron Addition.......................................................................................... 37
4.3 Experimental Studies ........................................................................................................ 37
4.3.1 Description of data ................................................................................................... 38
4.3.2 Experimental setup................................................................................................... 40
4.3.3 Experimental results................................................................................................. 40
4.4 Results of CAELF for prototype data ............................................................................... 43
5 Analysis and Comparisons 45

5.1 Computational Complexity ............................................................................................... 45
5.2 T-Test ................................................................................................................................ 46
5.3 Mackey Glass Time Series ................................................................................................ 47
5.3.1 Forecasting Results .................................................................................................. 49
5.4 Lorenz Time Series ........................................................................................................... 49
5.5 Rossler Time Series .......................................................................................................... 51
5.6 Comparison with other works ........................................................................................... 53
6 Conclusion and Future works 55

6.1 Conclusion ........................................................................................................................ 55
6.2 Future works ..................................................................................................................... 56
References 57
ix
List of Figures
Figure 2.1: A typical STELF process .............................................................................................. 10
Figure 3.1: Processing unit .............................................................................................................. 17
Figure 3.2: Identity function ............................................................................................................ 18
Figure 3.3: Binary step function ...................................................................................................... 18
Figure3.4: Sigmoid function............................................................................................................ 19
Figure 3.5: Bipolar sigmoid function .............................................................................................. 19
Figure 3.6: Recurrent Neural Network ............................................................................................ 20
Figure 3.7: Supervised Learning model .......................................................................................... 21
Figure 3.8: Feed forward Neural network ....................................................................................... 27
Figure 4.1: Flowchart of CAELF .................................................................................................... 34
Figure 4.2: Model of feed forward NN for forecasting the electrical load ...................................... 35
Figure 4.3: Comparision between actual load and predicted load for 120 days obtained from
CAELF and corresponding their error in percentage ................................................... 41
Figure 4.4: Comparision between actual load and predicted load for 120 days obtained from SELF
and corresponding their error in percentage................................................................. 42
.CAELF........................................................................................................................ 42
.CAELF........................................................................................................................ 42
Figure 4.7: Comparision between actual load and predicted load for Holidays of January 99
obtained from CAELF ................................................................................................. 43
Figure 5.1: Sample data of the Mackey-glass time series................................................................ 48
Figure 5.2: Comparison between the actual data and the predicted data for Mackey glass data
x
Figure 5.3: Sample data of the Lorentz time series ........................................................................ 50
Figure 5.4: Comparison between the actual data and the predicted data for Lorentz data
Figure 5.5: Sample data of the Rossler time series........................................................................... 52
Figure 5.6: Comparison between the actual data and the predicted data for Rossler data
List of Tables
Table 1.1: Needs of forecasts in utilities ........................................................................................... 3
Table 4.1: A sample of data showing the log(load),HDD,CDD, and Dummy Variables................ 39
Table 4.2: Comparision between actual load and predicted load for Holidays of January 99......... 43
Table 4.3: Users design electrical load forecasting prototype data for the sample 1 to 5................ 44
Table 4.4: Result for Sample prototype data ................................................................................... 44
Table 5.1: The value of MAPE of CAELF on Mackey-glass data .................................................. 48
Table 5.2: The value of MAPE of CAELF on Lorentz data ............................................................ 49
Table 5.3: The value of MAPE of CAELF on Rossler data ............................................................ 51
Table 5.4: Comparisons with other models for the ELF problem in terms of the next 120
days .............................................................................................................................. 52
xi
Chapter 1
Introduction
With the growth of power system networks and the increase in their complexity, many
factors have become influential in electric power generation, demand or load management.
Load forecasting is one of the critical factors for economic operation of power systems.
Forecasting of future loads also plays a significant role in network planning, infrastructure
development and so on. However, power system load forecasting is a two dimensional
concept: consumer based forecasting and utility based forecasting and each forecast could be
handled coherently. Consumer based forecasts are used to provide some guidelines to
optimize network planning and investments, better manage risk and reduce operational
costs. Undoubtedly, both utility companies and consumers are challenged to accurately
predict their respective loads. This challenge has been in existence for decades, thus a variety
of load forecasting techniques ranging from classical to intelligent systems have been
developed to date, and highlighted in a number of studies. The ultimate distinction of these
methods can be drawn on the bases of forecast accuracy.
In the design stages, utilities need to plan ahead for anticipated future load growth under
different scenarios. Their decisions and designs can affect the gain or loss of huge revenues
for their companies/utilities as well as customer satisfaction and future economic growth in
their area .
The decisions on sale-purchase, banking of power and generating electric power, load
switching, and infrastructure development can be carried out with the help of load
forecasting. In the market environment, precise forecasting is the basis of electrical
energy trade and spot price establishment for the system to gain the minimum electricity
purchasing cost.
1
Chapter 1 Introduction
Because of high complexity, the research work in the area of load forecasting is still a
challenge to the electrical engineering scholars. Load forecasting with the historical data,
especially in case of holidays, days with extreme weather has remained a difficulty up to
now. To improve the forecasting results, the new mathematical, data mining and artificial
tools are implemented.
1.1 Business Needs of Load Forecasts
In today‟s world, load forecasting is an important process in most utilities with the
applications spread across several departments, such as planning department, operations
department, trading department, etc. The business needs of the utilities can be
summarized, but not limited to, the following:
1) Energy Purchasing: Whether a utility purchases its own energy supplies from the
market place, or outsources this function to other parties, load forecasts are essential
for purchasing energy. The utilities can perform bi-lateral purchases and asset
commitment in the long term, e.g., 10 years ahead. They can also do prevarication and
block purchases one month to 3 years ahead, and adjust (buy or sell) the energy
purchase in the day-ahead market.
2) Transmission and Distribution (T&D) Planning: The planning about transmission
and distribution are described. The utilities need to properly maintain and upgrade the
system to satisfy the growth of demand in the service territory and improve the
reliability. And sometimes the utilities need to hedge the real estate to place the
substations in the future. The planning decisions heavily rely on the forecasts, known
as spatial load forecasts, that contain when, where, and how much the load as well as
the number of customers will grow.
3) Operations and Maintenance: In daily operations, load patterns obtained during the
load forecasting process guide the system operators to make switching and loading
decisions, and schedule maintenance outages.
4) Demand Side Management (DSM): Although lots of DSM activities are belong to
daily operations; it is worthwhile to separate DSM from the operations category due to
its importance in this smart-grid world. A load forecast can support the decisions in
load
2
control and voltage reduction. On the other hand, through the studies performed during
load forecasting, utilities can perform long term planning according to the
characteristics of the end-use behavior of certain customers.
5) Financial Planning: The load forecasts can also help the executives of the utilities
project medium and long term revenues, make decisions during acquisitions, approve
or disapprove project budgets, plan human resources and technologies, etc.
According to the lead time range of each business need described above, the minimum
updating cycle and maximum horizon of the forecasts are summarized in Table 1.1.
TABLE 1.1. Needs of forecasts in utilities.
Minimum updating cycle Maximum horizon

Energy purchasing 1 hour 10 years and above
T&D planning 1 day 30 years
Operations 15 minutes 2 weeks
DSM 15 minutes 10 years and above
Financial planning 1 month 10 years and above
1.2 Characteristics of the Power System Load
The system load is the sum of all the consumers‟ load at a time. A good
understanding of the system characteristics helps to design reasonable forecasting
models and select appropriate models operating in different situations. Various
factors that influence the system load behaviour can be classified into the following
major categories:
● Weather
● Time
● Economy
● Random disturbance
The effects of all these factors are introduced in the remaining part of this section to provide a
basic understanding of the load characteristics.
3
1.2.1 Weather
Weather factors include temperature, humidity, rainfall, wind speed, cloud cover, light
intensity etc. The change of the weather causes the change of consumers‟ comfort feeling and
in turn the usage of some appliances such as space heater, water heater and air
conditioner. Weather-sensitive load also includes appliance of agricultural irrigation due to
the need of the irrigation for cultivated plants. In the areas where summer and winter have
great meteorological difference, the load patterns differ greatly.
Normally the intraday temperatures are the most important weather variables in terms of their
effects on the load; hence they are often selected as the independent variables in Electrical
Load Forecasting ( ELF). Temperatures of the previous days also affect the load profile. For
example, continuous high temperature days might lead to heat build up and in turn a new
demand peak. Humidity is also an important factor, because it affects the human being‟s
comfort feeling greatly. People feel hotter in the environment of 35 oC with 70% relative
humidity than in the environment of 37 oC with 50% relative humidity. That‟s why
temperature-humidity index (THI) is sometimes employed as an affecting factor of load
forecasting. Furthermore, wind chill index (WCI) is another factor that measures the cold
feeling. It is a meaningful topic to select the appropriate weather variables as the inputs of
ELF.
1.2.2 Time
Time factors influencing the load at time point of the day, holiday, weekday/weekend
property and season property. The weekend or holiday load curve is lower than the weekday
curve, due to the decrease of working load. Shifts to and from daylight savings time and start
of the school year also contribute to the significant change of the previous load profiles.
Periodicity is another property of the load curve. There is very strong daily, weekly, seasonal
and yearly periodicity in the load data. Taking good use of this property can benefit the load
forecasting result.
4
1.2.3 Economy
Electricity is a kind of commodity. The economic situation also influences the utilization of
this commodity. Economic factors, such as the degree of industrialization, price of electricity
and load management policy have significant impacts on the system load growth/decline
trend. With t h e development of modern electricity markets, the relationship between
electricity price and load profile is even stronger. Although time-of- use pricing and demand -
side management had arrived before deregulation, the volatility of spot markets and incentives
for consumers to adjust loads are potentially of a much greater magnitude. At low prices,
elasticity is still negligible, but at times of extreme conditions, price-induced rationing is a
much more likely scenario in a deregulated market compared to that under central planning.
1.2.4 Random Disturbance
The modern power system is composed of numerous electricity users. Although it is not
possible to predict how each individual user consumes the energy, the amount of the total
loads of all the small users shows good statistical rules and in turn, leads to smooth load
curves. This is the groundwork of the load forecasting work. But the startup and shutdown of
the large loads, such as steel mill, synchrotrons and wind tunnels, always lead to an obvious
impulse to the load curve. This is a random disturbance, since for the dispatchers, the startup
and shutdown time of these users is quite random, i.e. there is no obvious rule of when
and how they get power from the grid. When the data from such a load curve are used in load
forecasting training, the impulse component of the load adds to the difficulty of load
forecasting. Special events, which are known in advance but whose effect on load is not
quite certain, are another source of random disturbance. A typical special event is, for
example, a world cup football match, which the dispatchers know for sure will cause
increasing usage of television, but cannot best decide the amount of the usage. Other typical
events include strikes and the government‟s compulsory demand-side management due to
forecasted electricity shortage.
5
1.3 Classification of Developed ELF Methods
In terms of lead time, load forecasting is divided into four categories:

 Long-term forecasting with the lead time of more than one year
 Mid-term forecasting with the lead time of six month to one year
 Short-term load forecasting with the lead time of one week to six month
 Very short-term load forecasting with the lead time shorter than one day
Different categories of forecasting serve for different purposes. In this thesis short-term
load forecasting which serves the next day(s) unit commitment and reliability analysis is
focused on.
1.4 Short-Term Electrical Load Forecast
The term “short” to imply prediction times of the order of days. The basic quantity of
interest in short-term load forecasting is, typically, the daily integrated total system load. In
addition to the prediction of the hourly values of the system load, the short-term electrical
load forecasting (STELF) is also concerned with the forecasting of
 the daily peak system load
 the values of system load at certain times of the day
 the daily, weekly and monthly system energy
This dissertation develops a novel model of constructive feed-forward neural network for
short-term electrical load forecasting process. The term “short-term” here refers to initiating
all the forecasts from one week to six months period.
1.5 Application of Short-Term Electrical Load Forecast
STELF plays a key role in the formulation of economic, reliable, and secure operating
strategies for the power system 5 . The principal objective of the STLF function is to
provide the load predictions for
 the basic generation scheduling function
6
 assessing the security of the power system at any time

 timely dispatcher information
The primary application of the STELF function is to drive the scheduling functions that
determine the most economic commitment of generation sources consistent with
reliability requirements, operational constraints and policies, and physical, environmental,
and equipment limitations. For purely hydro systems, the load forecasts are required for the
hydro scheduling function to determine the optimal releases from the reservoirs and
generation levels in the power houses. For purely thermal systems, the load forecasts are
needed by the unit commitment function to determine the minimal cost hourly strategies for
the start-up and shutdown of units to supply the forecast load. For mixed hydro and thermal
systems, the load forecasts are required by the hydro-thermal coordination function to
schedule the hourly operation of the various resources so as to minimize production costs.
The hydro schedule unit commitment/ hydro-thermal coordination function requires system
load forecasts for the next day or the next week to determine the least cost operating plans
subject to the various constraints imposed on system operation.
A second application of STELF is for predictive assessment of the power system security.
The system load forecast is an essential data requirement of the off-line network analysis
function for the detection of future conditions under which the power system may be
vulnerable. This information permits the dispatchers to prepare the necessary corrective
actions (e.g. bringing peaking units on line, load shedding, power purchases, switching
operations) to operate the power systems securely.
The third application of STLF is to provide system dispatchers with timely information, i.e.
the most recent load forecast, with the latest weather prediction and random behaviour
taken into a c c o u n t . The dispatchers need this information to operate the system
economically and reliably.
7
1.6 Specific Aims of the Thesis
The thesis is highlighted on basis of the following objectives:

1) To develop a new straightforward model for electrical load forecasting (CAELF) using
constructive feed-forward NN that might reduce the current difficulties of NN for the
STELF problem.
2) To predict loads of holidays and fast load changes effectively using CAELF.
3) To compare the performance of CAELF with other conventional NN-based ELF models.
8
Chapter 2
Methodology
Thousands of papers and reports were published in the load forecasting field in the past
50 years. The literature review presented in this chapter concentrates on the Short-Term
Electrical Load Forecasting (STELF) literature published in the reputed journals. The papers
are reviewed from three aspects: the techniques developed or applied, the various
variables deployed, and the representative work done by several major research groups. The
reviews tend to be focused on the major development in the field rather than covering every
aspect of the matter. The comments to the papers in this review are addressed on the
conceptual level. This chapter is organized as following: Section 2.1 overview and Section
2.2 reviews the statistical techniques including regression analysis and time series analysis
applied to STELF. The neural network techniques are presented in Section 2.3.
2.1 Overview
Fig. 2.1 shows a typical STELF process conducted in the utilities that rely on the weather
information. Weather and load history are taken as the inputs to the modeling process. After
the parameters are estimated, the model and weather forecast are extrapolated to generate
the final forecast. A time series, including the load series, can be decomposed to systematic
variation and noise. The modeling process in Fig. 2.1 tends to capture the systematic
variation, which, as an input to the extrapolating process, is crucial to the forecast accuracy.
As a consequence, a large variety of pioneer research and practices in the field of STELF
has been devoted to the modeling process. Most of the model development work can be
summarized from two aspects: techniques and variables.
In addition, people have been adopting or developing various techniques for STELF as
tackling the time series forecasting problems. A lot of these techniques can be roughly
Chapter 2 Literature review
categorized into two groups: statistical approaches, such as regression analysis 6 and time
series analysis 7 , and Artificial Intelligent (AI) based approaches, such as ANN 8 .Various
combinations of these techniques have also been studied and applied to STELF problems.
Fig. 2.1 A typical STELF process that is adapted from 9
On the other hand, people have been seeking the most suitable variables for each particular
problem and trying to generalize the conclusions to interpret the causality of the electric
load consumption. Most of these efforts were embedded coherently into the
development of the techniques. For instance, temperature and relative humidity were
considered in 10 , while the effect of humidity and wind speed were considered through a
linear transformation of temperature in the improved version 11 . In general, the electric
load is mainly driven by nature and human activities. The effects of nature are normally
reflected by weather variables, e.g., temperature, while the effects of human activities are
normally reflected by the calendar variables, e.g., business hours.
The combined effects of both elements exist as well but are nontrivial. With the progress of
deregulation, more and more parties have joined the energy markets, where electricity is
traded as a commodity. In some situations, the consumers tend to shift electricity
consumption from the expensive hours to other times when possible. Price information
would affect the load profiles in such a price - sensitive environment. Following this
thought, a price-sensitive load forecaster was proposed, of which the results are reported to
be superior to an existing STELF program 12 . Since the price-sensitive environment is not
generic in the current Bangladesh utility industry, price information was not included in
Fig. 2.1 or the scope of this report.
10
Although the majority of the literature in STELF is on the modelling process, there is some
research concerning other aspects to improve the forecast. Weather forecast, as an input to
the extrapolating process, is also very important to the accuracy of STELF. Consequently,
another branch of research work is focusing on developing, improving or incorporating the
weather forecast 13 . A temperature forecaster is proposed for STELF 14 .
2.2 Statistical Approaches
2.2.1 Regression Analysis
A regression-based approach to STELF is proposed by Papalexopoulos and Hesterberg 6 .

The proposed approach was reported to be tested using Pacific Gas and Electric Company‟s
(PG&E) data for the peak and hourly load forecasts of the next 24 hours. This is one of the
few papers fully focused on regression analysis for STELF in the past 20 years 15 - 17 .
Some modelling concepts of using multiple linear regression for STELF were applied:
weighted least square technique, temperature modelling by using heating and cooling
degree functions, holiday modelling by using binary variables, and a robust parameter
estimation method and so on. Through a thorough test, the new model was concluded to
be superior to the existing one used in PG&E.
While the paper clearly introduced the proposed approach, two issues appeared in the
details. Firstly, the method of weighted least square was applied to “minimize the effect of
outliers”. However, there is neither analysis for the data to show the existence of outliers,
nor the supportive data to show the cause of outliers. Furthermore, the new model using the
approach with weighted least square was only compared to the existing model, but not the
model without using the weighted least square technique.
Therefore, the advantage of the proposed weighted least square method was not
convincing. Secondly, adding noise to the temperature history to obtain a more robust
forecast was not quite justifiable. This issue was also pointed out by Larson in the
11
appended discussion of this paper, and the authors‟ response did not show much strong
evidence to support their statement. Papalexopoulos and Hesterberg‟s paper offers a
comprehensive grounding work for applying regression analysis to STELF.
A nonparametric regression based approach was applied to STELF in 18 . The load model
was constructed to reflect the probability density function of load and the factors that
affected the load. The corresponding load forecast was the conditional expectation of the
load given the explanatory variables including time, weather conditions, etc. The proposed
method did not require weather forecast to produce the load forecast, which was different
from the regression based approaches discussed above. Three-week period load and weather
history were used to generate a one-week load forecast. The results were shown to be
competitive comparing with those of an ANN based approach. Since the proposed approach
was only tested using one set of data in the summer, it was not quite convincing as to
whether the method would work well throughout the year. Further thorough tests were
necessary to shown the credibility of this approach.
2.2.2 Autoregressive Integrated Moving Average
Regression techniques were combined with ARIMA models for STELF in 19 . Regression
techniques were used to model and forecast the peak and low load, as well as, weather
normalize the load history, or “remove the weather-sensitive trend” from the load series.
Then ARIMA was applied to a weather normalized load to produce the forecast. Finally
the forecasted normalized load was adjusted based on the forecasted peak and low load.
ARIMA models, together with other Box and Jenkins time series models were applied to
STELF shown to be “well suited to this application” in Hagan and Behr‟s paper 7 . A
nonlinear transformation, more precisely, a 3rd order polynomial of the temperature was
proposed to reflect the nonlinear relationship between the load and temperature. Three time
series methods, ARIMA models, standard transfer function models, and transfer function
models with nonlinear transformation, were compared with a conventional procedure
deployed in the utilities, which relied on the input from the dispatchers, for three 20-day
periods (winter, spring and summer) in 1984. The results showed that all the three types of
12
time series models performed better than the convention forecasting approach. Among the
time series models, the nonlinear extension of the transfer function model provided the best
results.
The modified ARIMA method produces the STELF in better accuracy than the other three
approaches. Some other time series modelling approaches were applied to STELF.
Threshold autoregressive models with the stratification rule were discussed in 20. A
modified ARMA approach was proposed to include the non-Gaussian process
considerations 21 . An adaptive ARMA approach was tested and compared with
conventional Box- Jenkins approach and showed better accuracy 22 . A method using
periodic autoregressive models was reported 23 . ARMAX model, with particle swarm
optimization as the technique to identify the parameters, was proposed for STELF 24 .
Other than Box and Jenkins models, a nonlinear system identification technique was
applied to STELF as well 25 . All these techniques and the associated engineering solutions
provided some good insights in certain aspects to the field of STELF.
2.3 Neural Network based Approaches
The history of applying NN to STELF can be traced back to the early 1990s 27 , when
ANN was proposed as an algorithm to combine both time series and regression approaches.
In addition, the NN was expected to perform nonlinear modeling for the relationship
between the load and weather variables and be adaptable to new data. The algorithm was
tested using Puget Sound Power and Light Company‟s data, which included hourly
temperature and load for Seattle/Tacoma area from Nov. 1, 1988 to Jan 30, 1989. Three test
cases were constructed for peak, total, and hourly load of the day respectively. Normal
weekdays were the focus of the test cases. The proposed algorithm was compared with an
existing algorithm deployed in the utility.
A number of approaches exist in the literature (e.g., 27 - 37 ), where they try to solve the
STELF problem using neural networks (NNs). It has been confirmed that, the usage of NN
in STELF always outperforms any human-based computational analysis in terms of
13
accuracy, easy maintenance for users. Because, NN has a good capability for mapping
between input and output although load (i.e., output) is being increased day by day 38 .
Feed-forward NN (FFNN) has been used in 33 - 37 to solve the ELF problem for different
regions with a reasonable computational cost. It is noted that, FFNNs are much suitable for
mapping static relationships between inputs and outputs and ultimately providing good
results in ELF. However, FFNNs need large historical data and have a limited capability to
predict loads of holidays and fast load changes 39 . To overcome the shortcomings of
FFNN, a number of efforts have been done in 28 , 30 - 32 recently, among which echo
state NN, radial basis function NN, recurrent NN, and nonlinear autoregressive NN are used,
respectively. It is noted here that, the performances of afore-mentioned NN models are
satisfactory in predicting the electrical load comparing to the FFNN, but computationally
expensive. Thereby, huge requirements are necessary for the hardware setups as well as
experts are needed for maintenances.
14
Chapter 3
Artificial Neural network
3.1 Introduction
Neural networks, more accurately called Artificial Neural Networks (ANN), are
computational models that consist of a number of simple processing units that communicate
by sending signals to each other over a large number of weighted connections. They were
originally developed from the inspiration of human brains. In human brains, a biological
neuron collects signals from other neurons through a host of fine structures called dendrites.
The neuron sends out spikes of electrical activity through a long, thin stand known as an
axon, which splits into thousands of branches. At the end of each branch, a structure called a
synapse converts the activity from the axon into electrical effects that inhibit or excite
activity in the connected neurons. When a neuron receives excitatory input that is
sufficiently large compared with its inhibitory input, it sends a spike of electrical activity
down its axon.
Learning occurs by changing the effectiveness of the synapses so that the influence of one
neuron on another changes. Like human brains, neural networks also consist of processing
units (artificial neurons) and connections (weights) between them. The processing units
transport incoming information on their outgoing connections to other units. The "electrical"
information is simulated with specific values stored in those weights that make these
networks have the capacity to learn, memorize, and create relationships amongst data. A
very important feature of these networks is their adaptive nature where "learning by
example" replaces "programming" in solving problems. This feature renders these
Chapter 3 Artificial Neural Network
computational models very appealing in application domains where one has little or
incomplete understanding of the problems to be solved, but where training data are available.
There are many different types of neural networks, and they are being used in many fields.
And new uses for neural networks are devised daily by researchers. Some of the most
traditional applications include 41 42 : Classification – To determine military operations
from satellite photographs; to distinguish among different types of radar returns (weather,
birds, or aircraft); to identify diseases of the heart from electrocardiograms. Noise reduction
– To recognize a number of patterns (voice, images, etc.) corrupted by noise. Prediction – To
predict the value of a variable given historic values. Examples include forecasting of various
types of loads, market and stock forecasting, and weather forecasting. The model built in this
thesis falls into this category.
3.2 Fundamentals of Neural Networks
Neural networks, sometimes referred to as connectionist models, are parallel-distributed

models that have several distinguishing features 43 a set of processing units;
1) An activation state for each unit, which is equivalent to the output of the unit;
2) Connections between the units. Generally each connection is defined by a weight wjk
that determines the effect that the signal of unit j has on unit k;
3) A propagation rule, which determines the effective input of the unit from its external
inputs;
4) An activation function, which determines the new level of activation based on the
effective input and the current activation;
5) An external input (bias, offset) for each unit;
6) A method for information gathering (learning rule);
7) An environment within which the system can operate, provide input signals and, if
necessary, error signals.
16
3.2.1 Processing Unit
A processing unit (Fig. 3.1), also called a neuron or node, performs a relatively simple job; it
receives inputs from neighbors or external sources and uses them to compute an output
signal that is propagated to other units.
x wj0 θj
j
wj1 aj yj
0 Σ f( aj)
w jn
...
x1
yj  f (aj )
wjixi j
xn n
aj  

i 1
Fig. 3.1 Processing unit
Within the neural systems there are three types of units:
1) Input units, which receive data from outside of the network;

2) Output units, which send data out of the network;
3) Hidden units, whose input and output signals remain within the network.
Each unit j can have one or more inputs x0, x1, x2, … xn, but only one output yj. An input to a
unit is either the data from outside of the network, or the output of another unit, or its own
output.
3.2.2 Activation Function
Most units in neural network transform their net inputs by using a scalar-to-scalar function
called an activation function, yielding a value called the unit's activation. Except possibly for
output units, the activation value is fed to one or more other units. Activation functions with
a bounded range are often called squashing functions. Some of the most commonly used
activation functions are 44 :
1) Identity function:
f (x)  x (3.1)
17
It is obvious that the input units use the identity function. Sometimes a constant is multiplied
by the net input to form a linear function.
f(x)
1
0
x
-1 0 1
-1
Fig. 3.2 Identity function
2) Binary step function:

Also known as threshold function or Heaviside function. The output of this function is
limited to one of the two values:
1 if(x   )
f (x)  (3.2)

0 if(x   )
This kind of function is often used in single layer networks.
Fig. 3.3 Binary step functions
3) Sigmoid function (Fig. 3.4) 

1
f (x)  (3.3)
1  ex
This function is especially advantageous for use in neural networks trained by back-
propagation; because it is easy to differentiate, and thus can dramatically reduce the
computation burden for training. It applies to applications whose desired output values are
between 0 and 1.
18
f(x)
1
0
-6 -4 -2 0 2 4 6
Fig. 3.4 Sigmoid function
4) Bipolar sigmoid function:
1 e x
f (x)  (3.4)
1 ex
This function has similar properties with the sigmoid function. It works well for applications
that yield output values in the range of -1, 1 .
f(x)
1
0
-6 -4 -2 0 2 4 6x
-1
Fig. 3.5 Bipolar sigmoid functions
Activation functions for the hidden units are needed to introduce non-linearity into the
networks. The reason is that a composition of linear functions is again a linear function.
However, it is the non-linearity (i.e., the capability to represent nonlinear functions) that
makes multi-layer networks so powerful. Almost any nonlinear function does the job,
although for back-propagation learning it must be differentiable and it helps if the function is
bounded. The sigmoid functions are the most common choices 45 .
For the output units, activation functions should be chosen to be suited to the distribution of
the target values. We have already seen that for binary 0, 1 outputs, the sigmoid function is
an excellent choice. For continuous-valued targets with a bounded range, the sigmoid
19
functions are again useful, provided that either the outputs or the targets to be scaled to the
range of the output activation function.
3.2.3 Network Topologies
The topology of a network is defined by the number of layers, the number of units per layer,
and the interconnection patterns between layers. They are generally divided into two
categories based on the pattern of connections:
1) Feed-forward Neural Networks: where the data flow from input units to output units is
strictly feed-forward. The data processing can extend over multiple layers of units, but no
feedback connections are present. That is, connections extending from outputs of units to
inputs of units in the same layer or previous layers are not permitted. Feed-forward
networks are the main focus of this thesis. Details have been described in Section 3.3.
2) Recurrent Neural Networks: It contains feedback connections. Contrary to feed-forward
networks, the dynamical properties of the network are important. In some cases, the
activation values of the units undergo a relaxation process such that the network will
evolve to a stable state in which activation does not change further. In other applications
in which the dynamical behaviour constitutes the output of the network, the changes of
the activation values of the output units are significant. A schematic model of RNN shows
in Fig. 3.6
Fig. 3.6 Recurrent neural networks
20
3.2.4 Network Learning
The functionality of a neural network is determined by the combination of the topology

(number of layers, number of units per layer, and the interconnection pattern between the
layers) and the weights of the connections within the network. The topology is usually held
fixed, and the weights are determined by a certain training algorithm. The process of
adjusting the weights to make the network learn the relationship between the inputs and
targets is called learning, or training. Many learning algorithms have been invented to help
find an optimum set of weights that result in the solution of the problems. They can roughly
be divided into two main groups: Supervised and unsupervised about these learning
methodologies are mentioned below:
1) Supervised Learning: The network is trained with mapping the inputs and desired
outputs (i.e., target values). These input-output pairs are provided by an external teacher,
or by the system containing the network. The difference between the real outputs and the
desired outputs is used by the algorithm to adapt the weights in the network (Fig. 3.7). It
is often referred as a function approximation problem - given training data consisting of
pairs of input patterns x, and corresponding target t, the goal is to find a function f(x) that
matches the desired response for each training input.
Training Data
Input Desired output
target
Network + error
in out -
Weight Objective
changes Function
Training Algorithm
(optimization method)
Fig. 3.7 Supervised learning model, adapted from 46
2) Unsupervised Learning: In unsupervised learning, there is no feedback from the

environment to indicate if the outputs of the network are correct. The network must
discover features, regulations, correlations, or categories in the input data automatically.
In fact, for most varieties of unsupervised learning, the targets are the same as inputs. In
21
other words, unsupervised learning usually performs the same task as an auto-associative
network, compressing the information from the inputs.
3.2.5 Objective Function
To train a network and measure how well it performs, an objective function (or cost function)
must be defined to provide an unambiguous numerical rating of system performance.
Selection of an objective function is very important because the function represents the
design goals and decides what training algorithm can be taken. To develop an objective
function that measures exactly what we want is not an easy task. A few basic functions are
very commonly used. One of them is the sum of squares error function,
1P N
E 
NP p 1  (t pi  ypi)2 (3.5)
i 1
Where N and P refer to the total number of output nodes and pattern in the training set
respectively on the other hand i indexes the output nodes, t and y are the target and actual
network output for the ith output unit.
3.3 Feed-Forward Neural Networks
3.3.1 Basic Architecture
A layered feed-forward network consists of a certain number of layers, and each layer
contains a certain number of units. There is an input layer, an output layer, and one or more
hidden layers between the input and the output layer. Each unit receives its inputs directly
from the previous layer (except for input units) and sends its output directly to units in the
next layer (except for output units). Unlike the Recurrent network, which contains feedback
information, there are no connections from any of the units to the inputs of the previous
layers nor to other units in the same layer, nor to units more than one layer ahead. Every unit
only acts as an input to the immediate next layer. Obviously, this class of networks is easier
to analyze theoretically than other general topologies because their outputs can be
represented with explicit functions of the inputs and the weights.
22
Fig. 3.8 Feed-forward neural network
An example of a layered network with one hidden layer is shown in Fig. 3.8. In this network
there are l inputs, m hidden units, and n output units. The output of the jth hidden unit is
obtained by first forming a weighted linear combination of the l input values, then adding a
bias,
l
aj   w ji xi  w
(1) (1)
j0 (3.6)
i1
Where w (1)
ji is the weight from input i to hidden unit j in the first layer and w (1)
j 0 is the bias
for hidden unit j. If we are considering the bias term as being weights from an extra input
x0  1 , Eq.3.6 can be rewritten to the form of,
l
aj   w jixi
(1)
(3.7)
i0
The activation of hidden unit j then can be obtained by transforming the linear sum using an
activation function f (x) :
hj  f (aj) (3.8)
23
The outputs of the network can be obtained by transforming the activation of the hidden
units using a second layer of processing units. For each output unit k, first we get the linear
combination of the output of the hidden units,
a   w(2)hj  w(2)
m
(3.9)
k kj k0
j 1
Again we can absorb the bias and rewrite the above equation to,
m
ak   wkj(2)hj (3.10)
j 0
Then applying the activation function f2 (x) to Eq. 3.10 we can get the kth output
y k  f 2 (a k ) (3.11)
Combining Eq. 3.7, Eq. 3.8, Eq. 3.9 and Eq. 3.10, we get the complete representation of the
network as
y  f ( w(2) f ( w(1) xi))

m l
(3.12)
k 2 kj ji
j0 i0
The network of Fig. 3.8 is a network with one hidden layer. We can extend it to have two or
more hidden layers easily as long as we make the above transformation further.
One thing we need to note is that the input units are very special units. They are hypothetical
units that produce outputs equal to their supposed inputs. No processing is done by these
input units.
3.3.2 Representation Capability
The feed-forward NN networks provide a general framework for representing non-linear

functional mapping between a set of input variables and a set of output variables. The
representation capability of a network can be defined as the range of mappings that can be
implemented when the weights are varied. Theories 45 , 47 - 48 show that:
24
1) Single-layer networks are capable of representing only linearly separable functions or

linearly separable decision domains.
2) Two hidden layered networks can represent an arbitrary decision boundary to arbitrary
accuracy with threshold activation functions and could approximate any smooth mapping
to any accuracy with sigmoid activation functions.
3.3.3 Network Structure Design
Determination of optimal architecture of NN for a certain problem is not an easy matter. For reduce
this complexity, various methodology have been already introduced. Among those methods, some
are discussed below.
3.3.3.1 Determination of Number of Hidden Layers
Because networks with two hidden layers can represent functions with any kind of shapes,
there is no theoretical reason to use networks with more than two hidden layers. It has also
been determined that for the vast majority of practical problems, there is no reason to use
more than one hidden layer. Problems that require two hidden layers are only rarely
encountered in practice. Even for problems requiring more than one hidden layer
theoretically, most of the time, using one hidden layer performs much better than using two
hidden layers in practice 41 . Training often slows dramatically when more hidden layers
are used. There are several reasons why we should use as few layers as possible in practice:
1) Most training algorithms for feed-forward network are gradient-based. The additional
layer through which errors must be back propagated makes the gradient very unstable.
The success of any gradient-directed optimization algorithm is dependent on the degree
to which the gradient remains unchanged as the parameters vary.
2) The number of local minima increases dramatically with more hidden layers. Most of the
gradient-based optimization algorithms can only find local minima, thus they miss the
global minima. Even though the training algorithm can find the global minima, there is a
higher probability that after much time-consuming iteration, we will find ourselves stuck
in a local minimum and have to escape or start over.
25
Of course, it is possible that for a certain problem, using more hidden layers of just a few
units is better than using fewer hidden layers requiring too many units, especially for
networks that need to learn a function with discontinuities. In general, it is strongly
recommended that one hidden layer be the first choice for any practical feed-forward
network design. If using a single hidden layer with a large number of hidden units does not
perform well, then it may be worth trying a second hidden layer with fewer processing units.
3.3.3.2 Determination of Optimal Number of Hidden Units
Another important issue in designing a network is how many units to place in each layer.
Using too few units can fail to detect the signals fully in a complicated data set, leading to
underfitting. Using too many units will increase the training time, perhaps so much that it
becomes impossible to train it adequately in a reasonable period of time. A large number of
hidden units might cause overfitting, in which case the network has so much information
processing capacity, that the limited amount of information contained in the training set is
not enough to train the network.
The best number of hidden units depends on many factors – the numbers of input and output
units, the number of training cases, the amount of noise in the targets, the complexity of the
error function, the network architecture, and the training algorithm 45 .
In most situations, there is no easy way to determine the optimal number of hidden units
without training using different numbers of hidden units and estimating the generalization
error of each. The best approach to find the optimal number of hidden units is trial and
error. In practice, we can use either the forward selection or backward selection to determine
the hidden layer size. Forward selection starts with choosing an appropriate criterion for
evaluating the performance of the network. Then we select a small number of hidden units,
like two if it is difficult to guess how small it is; train and test the network; record its
performance. Next we slightly increase the number of hidden units; train and test until the
error is acceptably small, or no significant improvement is noted, whichever comes first.
Backward selection, which is in contrast with forward selection, starts with a large number
26
of hidden units, and then decreases the number gradually 41 49 . This process is time-
consuming, but it works well.
3.3.3.3 Training Algorithm of Neural Network
Back-propagation is the most commonly used method for training multi-layer feed-forward
networks. It can be applied to any feed-forward network with differentiable activation
functions. This technique was popularized by Rumelhart, Hinton and Williams 50 .
For most networks, the learning process is based on a suitable error function, which is then
minimized with respect to the weights and bias. If a network has differential activation
functions, then the activations of the output units become differentiable functions of input
variables, the weights and bias. If we also define a differentiable error function of the
network outputs such as the sum-of-square error function, then the error function itself is a
differentiable function of the weights. Therefore, we can evaluate the derivative of the error
with respect to weights, and these derivatives can then be used to find the weights that
minimize the error function, by either using the popular gradient descent or other
optimization methods. The algorithm for evaluating the derivative of the error function is
known as back-propagation, because it propagates the errors backward through the network.
Back Propagation Learning algorithm is intuitive appealing because it is based on a relative

simple concept: if the network gives the wrong answer, then the weights are corrected so that
the error is lessened and as a result future result of the network are more likely to be correct.
The back-propagation learning algorithm involves two phases: During the first phase the
input is presented and propagated forward through the network to compute the output value
Opk for each unit. This output is then compared with the targets, resulting in an error signal
δpk for each output unit. The second phase involves a backward pass through the network
(analogous to the initial forward pass) during which the error signal is passed to each unit in
the network and the appropriate weight changes are made. This second backward pass
allows the recursive computation of δ as indicate above. The first step is to compute δ for
each of the output units. This is simply the difference between the actual and desired output
27
values times the derivative of the squashing function. Then the weight changes for all
connections that feed into the final layer can be computed. After this is done, then compute
δ‟s for all units in the penultimate layer. This propagates errors back one layer and the same
process can be repeated for every layer.
The significance of the process is that, as the network trains, the nodes in the intermediate
layers organize themselves such that different nodes learn to recognize different features of
the total input space. After training, when presented with an arbitrary input pattern that is
noisy or incomplete, the units in the hidden layers of the network will respond with an active
output if the new input contains a pattern that resembles the feature the individual unit learns
to recognize during training. Conversely hidden layer units have a tendency to inhabit their
outputs if the input pattern does not contain the feature that they were trained to recognize.
The Back propagation network shown is a layered feed-forward network that is fully
interconnected by layers. Thus there are no feedback connections and no connections that
bypass one layer to go directly to a later layer. Although only three layers are used in
discussion, more than one hidden layer is permissible.
Suppose a set of P vector –pairs,(x1,y1),(x2,y2),…… ,(xp,yp), which are examples of

functional mapping Y  ( x) : X R N ,YRM 35 . The network will learn an approximation
O  Y    ( X ) for its training. To derive a method of doing this training that usually works
provides the training vector pairs have been chosen properly and there is sufficient number
of them. Learning of a neural network means finding an appropriate set of weights. The
learning technique described here resembles the problem of finding the equation of a line
that best fits a number of known points.
Let us consider an input vector, X  (x , x ,...x )t , is applied to the input layer of the
1 2 PN
network. The “P” subscript refers to the p training vector. The input units distribute the
values to the hidden-layer units. The net input to the jth hidden unit is
N
nethpj   whji xpi  jh (3.13)
j1
28
Where wh is the weight of the connection from the ith input unit to jth hidden unit, and  h is
ji j
the bias term. The “h” superscript refers to quantities on the hidden layer. Assuming that the
activation of this node is equal to the net input; then, the output of this node is
i  f h (neth ) (3.14)
pj j pj
Where the function f h (neth ) is referred to as an activation function. Its domain is the set of
j pj
activation values, net, of the neuron model.
The equations for the output nodes are

L
netopk   wkjo ipj  ko (3.15)
j1
o  f o (neto ) (3.16)
pk k pk
Where the “o” superscript refers to quantities on the output layer.
3.3.3.4 Update of Output-layer Weights
The error at a single output unit is defined  pk  ( ypk  opk ), where the subscripts “p” refers
to the P training vector and “k” refers to the kth output units. In this case ypk is the the
desired output and opk is the actual output from the kth unit. The error to be minimized is the
sum of the squares of the errors for all output units:
M
1
E 
p
 2 (3.17)
pk
2 k 1
To determine the direction in which to change the weights, the negative of the gradient of Ep,
Ep , with respect to weights, wkj is calculated. The values of the weights can adjust such
that the total error is reduced. It is often usual to think of Ep as a surface in a weights space.
From Eq. (3.17) and the definition of  pk
1
E 
p
2k
( y pk  opk )2 (3.18)
29
Ep  ( y  o ) fko (netopk ) (3.19)

wkjo (netpko ) wkjo
pk pk
Where Eq. (3.18) is used for the output value, opk and the chain rule for partial derivatives.
The last factor of Eq. 3.19 is
(neto )  L
( wo i  ko )  ipj

pk
(3.20)
w w
o o kj pj
kj kj j 1
Combining Eq. 3.19 and Eq. 3.20, the negative gradient
Ep o 
  ( ypk  o pk) f (netopk )ipj (3.21)
wkjo k
As far as the magnitude of the weight change is connected. It has taken to proportional to the
negative gradient. Thus the weights on the output layer are stated according to
wo (t  1)  wo (t)   wo (t) (3.22)

kj kj p kj
Where  wo  ( y  o o (3.23)
p kj pk o pk ) fk (net pk )ipj
The factor η is called the learning rate parameter. If sigmoid function is used then weight
update equation for output unit is
wo (t  1)  wo (t)  ( y  o )o (1  o )i (3.24)

kj kj pk pk pk pk pj
By defining output layer error term
o 
pko  ( ypk o pk) f k (netopk )
 (3.25)
  f o (neto )
pk k pk
By combining the Eq. 3.24 and Eq. 3.25 the weight update equation becomes
wo (t  1)  wo (t)   o i (3.26)
kj kj pk pj
3.3.3.5 Update of Hidden Layer Weights
The error of the hidden layer is given by
30
1
E 
p
2
( y pk  opk )2
k
1
 ( y pk
 f o (neto ))2
k pk
(3.27)
2 k
1
 ( y pk
 f o (  wo i   o ))2
k kj pj k
2 k j
The gradient of Ep with respect to the hidden layer weights
Ep 1   o )2
  ( yo
w ji 2 k w ji
o pk pk
o (neto ) i (neth )
 ( y  o ) pk pk pj pj (3.28)
k
pk pk
(netpk
o
) i (netpjh ) whji
pj
Each of the factors in Eq. 3.28 can be calculated explicitly from previous equation. The
result is
Ep   ( y  o ) f o (net o )w o f h (net h )x (3.29)

woji pk pk k pk kj j pj pi
k
The hidden layer weights update in proportion to negative of the Eq. 3.29;
 
 wh
p ji 
f h (net h )x
j pj pi
( y pk
o )
pk
f o (net o )w
k pk
o
kj
k
By using Eq. 3.25;
 wh 
p ji 
f h (net h )x
j pj pi  o o
w
pk kj
(3.30)
k
Every weight update on the hidden layer depends on all the error terms, pk
o
, on the output
layer. The known errors on the output layer are propagated back to hidden layer to determine
the appropriate weight changes on that layer. By defining hidden layer error term

h  h h o o
pj f j (netpj ) pk wkj (3.31)

k
So, the weight update equation becomes analogous to those for the output layer:
31
wh (t 1)  wh (t)   h x (3.32)

ji ji pj pi
The amount of weight adjustment depends on three factors: δ, η, x. the size of the weight
adjustment is proportional to δ the error value of the unit. Thus a larger error value for that
unit results in the larger adjustment to its incoming weights.
The weight adjustment is also proportional to x, the output value for that originating unit. If
this output value is small, then the weight adjustments are small. If this output value is large,
then the weight adjustment is large. Thus a higher activation value for incoming unit results
in a larger adjustment to its outgoing weight.
The variable η in the weight adjustment equation is the learning rate. Its value commonly
between 0.25 and 0.75 is chosen by the neural network user and usually reflects the rate of
learning of the network 50 .
32
Chapter 4
Proposed Model for Electrical Load Forecasting
This chapter describes about the proposed model (i.e., CAELF) for solving the forecasting of
electrical load demand efficiently. In order to evaluate the CAELF, extensive experimental
results have been reported in this chapter.
4.1 Electrical Load
If an electric circuit has a well-defined output terminal, the circuit connected to this terminal
is the load. In other word, the term “load” may also refer to the power consumed by a circuit
or consumer. The estimation of electrical load demand in a locality or a large area is a very
difficult issue, because the consumers consumed the electrical load on basis of different
criteria, such as, weather conditions, nature of the days (i.e., working days, weekend days),
and so on.
Thus, estimating or forecasting the electrical load demand is vitally important for the electric
industry in the deregulated economy as well as essential to the operation and planning of a
utility company. Load forecasting helps an electric utility to make important decisions
including decisions on purchasing and generating electric power, load switching, and
infrastructure development. It is also important for energy suppliers, financial institutions,
and other participants in electric energy generation, transmission, distribution, and markets.
4.2 Constructive Approach for Electrical Load Forecasting (CAELF)
This paper describes a new single-stage electrical load forecasting model using constructive
approaches, called as CAELF. This model differs from the previous works in such a way
Chapter 4 Load Forecasting
that, CAELF determines the appropriate NN architecture in advance before the ELF starts
using the constructive NN training. In contrast to the previous approaches (e.g., 33 - 37 ),
they generally use a fixed NN architecture with randomly selecting the hidden neuron in the
hidden layer during training before the ELF starts. It is well known that, the random
selection of hidden neurons affects the generalization performance of NNs. The reason is
that, the performance of any NN is greatly dependent on its architecture 51 , 52 . Thus
determining the hidden neurons‟ number automatically provide a novel approach in building
learning models using NNs for the electrical load forecasting.
On the other hand, the proposed CAELF overcomes an existing problem of NN-based ELF
approaches efficiently, that is to say, the limited capability to predict loads of holidays and
fast load changes in respect of FFNN 39 . Although a number of recent efforts have been
done (e.g., 28 - 32 ) to overcome such shortcoming, they are being suffered by the huge
computation cost. In this regard, CAELF uses a very simple NN architecture involving a
constructive technique that does not require the expensive computational during training.
Furthermore, CAELF ultimately enhances the prediction performances of holidays and fast
changing loads. Load
Output layer
Hidden Layer Bias unit
Bias unit
Input Layer …....
HDD CDD Wd Mt
Fig. 4.1 Model of feed-forward NN for forecasting the electrical load. Here, HDD and CDD refer to
the exogenous variables of degree days that are calculated as heating degree days and
cooling degree days, respectively. On the other hand, Wd and Mt are the dummy variables
that represent all the weeks and monthly seasonalities, respectively. For more information
about these input variables can be found in 53
34
CAELF uses a training approach in association with incremental training to find a minimum
number of hidden neurons for NN models. Hidden neurons (HNs) are added simultaneously
one by one in constructive fashion during the training process of a NN. If the addition of HN
does not improve the NN‟s accuracy, it is then removed. The major steps of CAELF are
summarized in Fig. (4.1), which are explained further as follows:
Step 1) At first, choose a feed-forward NN with minimal size. Precisely, size of the input
layer and output layer are decided by the total number of input variables and the output load
of the given ELF dataset, respectively, whereas, size of the hidden layer is initialized using
one hidden neuron.
Step 2) Start the partial training of NN on the training data set up to τ epoch using the back
propagation (BP) algorithm 54 . The number of training epochs, τ, is speciﬁed by the user.
Partial training, which was first used in conjunction with an evolutionary algorithm 55 ,
means that the NN is trained for a fixed number of epochs regardless whether it has
converged or not.
Initialize NN
Partial training for τ

epoch
YES Final testing of

Training
stop? load forecast
NO
YES
Further
training?
NO
Add hidden neuron
Fig. 4.2 Flowchart of CAELF. Here, NN and HN refer to neural network and hidden neuron,
respectively.
Step 3) Check the termination criterion of NN training. If it is satisfied, the current NN

architecture is the outcome of CAELF for a given dataset. Otherwise, follow the next step. In
this work, calculate average training error 56 , Ea on the validation set. In other word, the
average training error is considered here as mean squared error (MSE). The average error is
35
the modified form of estimating error and this estimator procedure mentioned in Eq.
3.5.Thus, the error, Ea, is calculated as,

 (t ( p)  y ( p))
P C
1 2
(4.1)
Ea 
2 p
c c
P1 C1
Where, tc(p) and yc(p), respectively, are the actual and predicted responses of the c-th output
neuron for the validation pattern p. The symbols P and C represent the total number of
validation patterns and of output neurons, respectively.
Step 4) Check the performance criterion of the network training. If the criterion is satisfied
then the network is assigned to be trained further and go to Step 2. Otherwise, follow the
next step.
Step 5) Add a hidden neuron to the network and go to Step 2 for following the partial
training again.
Step 6) NN is then tested with the unseen testing pattern. Finally, get the electrical load
forecasting from the current NN.
CAELF uses only one cost function that is the training error on validation set. CAELF
finally tries to design a better load forecaster using NN. Details about some basic steps of
CAELF are further given in the following sections.
4.2.1 Performance Criterion of NN Training
If the average training error on validation set reduces by a predefined amount ε, after the
training epoch τ, it is assumed that the training process is progressing well, thus further
training is necessary and go to the Step 2. The reduction of training error can be described
as,
Ea (t   )  Ea (t)   , t   ,2 ,3 ,.... (4.2)
Where, τ and t are positive integer number specified by the user.
4.2.2 Termination Criterion of NN Training
Since CAELF adds hidden neurons one by one during the training process of a NN, the
training error would reduce as the training process progresses. However, the objective of
CAELF is to improve generalization ability of the NN. This means the training error may not
36
be a right choice to be used for terminating the training process of the NN. Generally, a
separate dataset, called the validation set, is widely used for termination. It is assumed that
the validation error gives an unbiased estimate because the validation data are not used for
modifying the weights of the NN.
In order to achieve good generalization ability, CAELF uses average training error on
validation set in its termination criterion. It measures validation error after every τ epochs of
training, called strips. It terminates training when the average training error increases by a
predefined amount (λ) for T successive times, which are measured at the end of each of T
successive strips 57 . Since the average training error on validation set increases not only
once but T successive times, it can be assumed that such increases indicate the beginning of
the final over fitting not just the intermittent. The termination criterion can be expressed as,
Ea (  i)  Ea ( )  , i  1, 2, 3,. ...,T (4.3)
Where τ and T are positive integer number specified by the user. Our model, CAELF, tests
the termination criterion after every τ epochs of training and stops training when the
condition described by Eq. 4.3 is satisfied. In this work, the value of T is chosen as 3.
4.2.3 Hidden Neuron Addition
CAELF adds a hidden neuron to the existing network architecture according to the Eq. 4.4.
The reason is that, the existing network architecture is not capable to acquire the all
information of the dataset; thereby increasing the size of the network is necessary. Then,
train the modified architecture for a certain number of τ epochs.
Ea (t   )  Ea (t)   , t   ,2 ,3 ,.... (4.4)
Where, ε is the predefined amount specified by user.
4.3 Experimental Studies
In this section, the performance of CAELF for predicting the electrical load at near future
was presented using the daily load data set. The data used in this study is the daily electrical
demand in megawatts/hour in Spain 32 , 53 . The CAELF‟s performance was evaluated in
terms of predicted error. Precisely, predicted error refers to the error of existing NN on
37
testing set. For more clarification about the performance evaluation of CAELF, this section
is organized by the following subsections.
4.3.1 Description of Data
The sample used in this study comprises of the daily electricity demand given in
megawatts_hour or MW/h in Spain from January 1, 1993 to June 1998 for a total 2007 days.
All sectors (industrial, commercial, and residential) are included, as sectorial disaggregated
data was not available for this time frequency. The sample has been transformed by taking
natural logarithms of the electricity demand data to reduce the impact of the
heteroskedasticity that could be present due to the large.
The exogenous variables of degree-days are calculated as heating degree days, HDD
= max(0, Tref – Tave ) and cooling degree days, CDD = max (0, Tave – Tref ), where
o
Tave is the population-weighted daily mean temperature and Tref = 18 C. The mean daily
temperature is collected over four different weather stations to represent different climatic
subregions of Spain and a population-weighted temperature is assessed in 53 .To clarify
about the data, Table 4.1 shows a partial sample of the data sheet. Using this data, the
architecture of feed-forward NNs have been trained that are presented in Fig. 4.2.
To capture the significant seasonal daily components in the electricity load series a
qualitative variable „day of the week‟ has been introduced into the model through the
specification of six dummy variables (Wit ) representing all days in the week except the base
day of Monday. The index i represents the day of the week (Tuesday, Wednesday, Thursday,
Friday, Saturday and Sunday), and Wit equals 1 if in the t observation day i is found, and 0
otherwise.
To improve the forecast of electricity consumption, anomalous events related to holidays, or
days near a holiday, have been considered. Electricity consumption, mainly in the industrial
sector, decreases appreciably during holidays. For the model to take this into account, three
additional dummy variables have been introduced. First, a variable Hi has been defined,
which equals 1 if t is a holiday and 0 otherwise. Secondly, another dummy variable Ht-1 has
been defined that equals 1 if the t observation corresponds to a day t-1 following a holiday,
and 0 otherwise. This variable is introduced to check the impact on electricity consumption
38
TABLE 4.1 A Sample of Data Showing the Log (Load), Actual Load (MW/hr), HDD, CDD,
and Dummy Variables
produced by the proximity of a holiday. Finally, a third variable has been included to analyze
the influence of Easter. This holiday has been treated separately because it has a variable
time location each year within the sample. Thus, a new variable Gt has been defined which
equals 1 if the t observation corresponds to Easter Thursday or Good Friday, and 0
otherwise. To account in the model for the monthly seasonality, eleven dummy variables
(Mjt) were introduced, each representing one of the months in a year and taking January as
the base month. Thus, j refers to February, March, April, May, June, July, August,
39
September, October, November and December, and Mij equals 1 if in the observation the
month j is found, and 0 otherwise.
4.3.2 Experimental Setup
The data used for training of the NN model in CAELF was from January 1, 1993 to
December 31, 1997 for a total 1826 days, whereas the NN was validated during training
using the data from July 1, 1998 to December 31, 1998 in total 184 samples. Precisely, these
samples were called “in-sample” data as they used in NN model for training. On the other
hand, the data sample period of 120 days from January 1, 1999 to April 30, 1999, that were
used to test the forecasting performance by comparing model output (i.e., predicted load)
with the actual load. These data samples are called as “out-of-sample” as they were not used
during the training of NN.
In all experiments, one bias unit with a fixed input +1 was connected to the hidden layer and
output layer. The learning rate and momentum term for training of NN were chosen as 0.05–
0.1 and 0.4–0.7, respectively. The initial connection weights for an NN were randomly
chosen in the range between -1.0 and 1.0. A sigmoid function was used as an activation
function.
4.3.3 Experimental Results
The performance of CAELF in terms of forecasting the out-of-samples was measured by

making a comparison between actual values and model outputs during the same period.
Furthermore, we measured the mean absolute percentage error (MAPE) for the best relative
accuracy measure among the various forecasting accuracy criteria 58 . However, MAPE
was calculated here as,
1 NP  P
MAPE   n n
100
N n1 pn
Where, the Pn and Pn represent the actual and predicted electrical load, respectively and N
is the total number of samples available.
40
In this context, Fig. 4.3 and Fig. 4.4 represent the forecasting analysis for the period of 120
days (including errors in percentage), respectively. Particularly, a comparison between actual
load and predicted load (i.e., forecasting load) was made in Fig. 4.3 using CAELF. We
calculated MAPE between these two loads and found that it was 0.2132. In addition, more
analytical forecasting results of CAELF can be found in Figs. 4.5 and 4.6, where the
forecasting of electrical load was done in between 30 days and 7 days, respectively.
In case of holydays load demand, predicting such demand perfectly is really a difficult task
for any kind of model. The reason is that, the consumers enjoy their time in different places
in various ways that why, the electrical load fluctuations nonlinearly. In order to observe
such issue, CAELF conducted one experiment for 30 days, where the forecasting results
especially for the holydays are pointed out between the actual load curve and predicted load
curve in Fig. 4.7. It has been seen that, the load variations are satisfactory except some cases.
Also Table 4.2 shows that the difference between predicted load and actual load is very low.
(a)
(b)
Fig. 4.3 (a) Comparison between the actual load and the predicted load for 120 days obtained
from CAELF used by constructive feed-forward neural network and (b)
corresponding their errors in percentage
41
(a)
Fig. 4.4 (a) Comparison between the actual load and the predicted load for 120 days obtained from
standard model (SELF) used by feed-forward neural network and
(b)
Fig. 4.4 (b) The percentage of error between the actual load and the predicted load for 120 days
obtained from standard model (SELF) used by feed-forward neural network
Fig. 4.5 Comparison between the actual load and the predicted load for 30 days obtained from
CAELF used by constructive feed-forward neural network
Fig. 4.6 Comparison between the actual load and the predicted load for 7 days obtained from
CAELF used by constructive feed-forward neural network
42
Fig. 4.7 Comparison between the actual load and the predicted load for Holidays of january 99
obtained from CAELF used by constructive feed-forward neural network
TABLE 4.2 Comparisons between the actual and predicted load for
Holidays of January 99
Holidays Predicted Actual Differences
Jan‟1st 13.1888 13.1482 0.0406
2nd 13.187 13.1809 0.0061
6th 13.2189 13.2833 -0.0644
9 th 13.1974 13.2569 -0.0595
15th 13.2077 13.2582 -0.0505
16th 13.1921 13.2542 -0.0621
23rd 13.1645 13.1842 -0.0197
24 th 13.1078 13.0741 0.0337
30 th 13.1744 13.2209 -0.0465
31st 13.0745 13.0928 -0.0183
4.4 Results of CAELF in Prototype Data

In order to justify our model CAELF, we applied to some prototype data whether it works
well or not. In this regard, we have designed some data presented on Tables 4.3 based on the
methodology of Spain electrical load data. In these prototype data, the actual load demand is
unknown. After that, we used these prototype data to the trained NN in CAELF, and finally
we got the corresponding forecasting results that have been presented in Table 4.4.
43
TABLE 4.3 Users design electrical load forecasting prototype data for the sample 1 to 5. Here, the
output load is unknown in each sample
Sample -1
Sample -2
Sample -3
Sample -4
Sample -5
TABLE 4.4 Results for Sample Prototype Data

Observation Sample Predicted Output (MW/h)
01 Sample -1 534346.62
02 Sample -2 492779.08
03 Sample -3 544541.81
04 Sample -4 476632.01
05 Sample -5 204771.00
It is observed for the Table 4.4 is that, the predicted output results are reasonable comparing
to the results mentioned in Table 4.1. It means that our model works well in the prototype
data. Hence, by using our proposed model we can forecast electrical load of near future.
44
Chapter 5
Analysis and Comparisons
In this chapter, a rigorous analysis about the experimental performance of CAELF is done in
different aspects in order to measure the complexity, significance, and generalization ability.
Particularly, for measuring the complexity of CAELF, a rigorous computational complexity
analysis is done. In order to justify whether CAELF is statistically significant or not, a t-test
analysis is made. In addition to the measurement of generalization ability, three synthetic
time series data samples are used. Finally, a comparison between the forecasting
performances of CAELF and other two related models has been presented in this chapter.
5.1 Computational Complexity
Computational complexity is a measure by which it can be understood about a model how

much complexity is available in the computational process. However, computational
complexity theory is a branch of the theory of computation in theoretical computer science
and mathematics that focuses on classifying computational problems according to their
inherent difficulty, and relating those classes to each other. A computational problem is
understood to be a task that is in principle amenable to being solved by a computer, which is
equivalent to stating that the problem may be solved by mechanical application of
mathematical steps.
The analysis of computational complexity helps to understand the actual computational cost
of an algorithm. As Kudo and Sklansky 59 showed such an analysis in the form of big-O
notation, we are inspired to compute the computational cost of our CAELF. The following
few paragraphs present the computational complexity of CAELF to show that the inclusion
of different techniques does not increase computational complexity of training NNs.
(i) Partial Training: In our thesis, we use standard back propagation (BP) algorithm 54 for
training. Each epochs of BP takes O(W) computations for training one example. Here, W
is the number of weights in the current NN. Thus training all examples in the training
Chapter 5 Analysis and Comparisons
set for  epochs needs O(  pt  w) computations, where Pt denotes the number of
examples in the training set
(ii) Termination Criterion: The termination criterion employed in CAELF for stopping
training of the NN that uses both training and validation errors. Since the training error is
computed as a part of the training process, the termination criterion takes O( pv  w)
computations, where pv denotes the number of examples in the validation set. Here pv < pt
, O( pv  w)< O(  pt  w)
(iii) Further Training: Our CAELF uses Eq. (4.4) to check whether  further training for
the added HN is necessary. The evaluation of Eq. (4.4) takes a constant computation O(1),
since the error values used in Eq. (4.4) have already evaluated during training.
(iv) Adding a Hidden Neuron: The computational cost for adding a hidden neuron is
O( N1  C ) for initializing its connection weights, where N1 is the number of added input
features and C is the number neurons in the output layer. It is also noted that O( N1  C ) <
O(  pt  w) .
All the above mentioned computation is done for a partial training consisting of t epochs. In
general, CAELF needs several, say M, such partial trainings. Thus the total computational
cost of CAELF for training a total of T epochs (T    M ) is
O(N 2  pt )  O(  M  p t w) . However, in practice, the first term, i.e., N 2  pt is much
less than the second one. Hence the total computational cost CAELF is O(  M  pt  w),
which is same for training a fixed network architecture using BP 54 . It is clear that the
incorporation of several techniques in CAELF does not increase its computational cost.
5.2 T-TEST
T-test is one kind of statistical significant test in the literature that is usually performed in
order to know whether a model is statistically significant or not to solve a particular task.
However, this test is used to compare responses from two groups of data, where two groups
46
can come from different experimental treatments. A t-test is any statistical hypothesis test in
which the test statistic follows a data‟s t distribution if the null hypothesis is supported. It
can be used to determine if two sets of data are significantly different from each other, and is
most commonly applied when the test statistic would follow a normal distribution if the
value of a scaling term in the test statistic were known. When the scaling term is unknown
and is replaced by an estimate based on the data, the test statistic (under certain conditions)
follows a Student's t distribution.
The null hypothesis is that the two data means are equal to each other. To test the null
hypothesis, we have to calculate the following values: x1 and x2 are the means of the two
samples; s 12 and s 22 refer to the variances of the two samples; n and
1 n are
2 the sample sizes
of the two samples; and k is the degrees of freedom.
x  1/ n (x1  x2  x3................  xn )  (1/ n) xi
s2  1/(n 1) (x 1  x)2  (x 2  x) 2  .........  (xn  x )2 ........
By shortening the above equation, we can get that,
s 2  1/(n 1) ( xi 2  (1/ n)( xi )2
We use the following equation to calculate the T-statistic:

x1  x2
t 
(s1 2 / n1  s2 2 / n2

Here, we compared the calculated t-value, with k degrees of freedom to the critical t value
from the t-distribution table at the chosen confidence level and it was then decided whether
to accept or reject the null hypothesis. After computation, it was found that t-value is 7.65
and from the t-table the critical t-value is 2.81. Since, the calculated t-value is greater than
the t-table value, we can say that the null hypothesis is rejected and the obtained predicted
load forecasting result of CAELF is statistically significant.
5.3 Mackey-Glass Time Series

The Mackey-Glass series, based on the Mackey-Glass differential equation is widely
regarded as a benchmark for comparing the generalization ability of different methods. This
47
series is a chaotic time series generated from the following time-delay ordinary differential
equation: Mackey-Glass time series refers to the following differential equation:
dx(t)  ax(t   )
  bx(t) (5.1)
dt 1 x(t   )10
It can be numerically solved using, for example, the 4th order Runge-Kutta method, at
discrete, equally spaced time steps:
x(t  t)  mackeyglass _ rk 4(x(t), x(t   ), t, a, b)
Where, the function mackeyglass_rk4 numerically solves the Mackey-Glass delayed
differential equation using the 4-th order Runge Kutta. This is the RK4 method:
k1  t.mackeyglass _ eq( x(t), x(t   ), a, b)
1
k  t.mackeyglass _ eq(x(t  k ), x(t  ), a, b)
2
21
1
k  t.mackeyglass _ eq(x(t  k ), x(t  ), a, b)
3
22
k4  t.mackeyglass _ eq(x(t  k3 ), x(t  ), a, b)
k1 k2 k3 k4
x(t  t)  x(t)    
6 3 6 6
Where, mackeyglass_eq is the function which return the value of the Mackey-Glass delayed
differential equation in (5.1) once its inputs and its parameters (a,b) are provided.
The generated data sample are presented in the following Fig.5.1, where the value of   17,
a  0.2, b  0.1.
1.4
1.2
1
0.8
Magnitude of
0.6
0.4 Sample Data
Samples
0.2
0
0 1000 2000 3000 4000 5000 6000
No. of Samples
Fig. 5.1 Sample data of the Mackey-glass time series
48
5.3.1 Forecasting Results
Using the Mackey-glass time series model, we generated some data samples that were used
as training and testing data samples. After training CAELF with the training data samples,
the testing samples were applied to the trained NN model in order to find the forecasting
result. After that, we found a promising result using Mackey-glass time series data samples,
which are exhibited in Fig. 5.2 and Table 5.1.
1.4
1.2
1
Magnitude of
0.8
0.6 Predicted Data
Samples
0.4 Actual Data

0.2
0
0 200 400 600
No. of Samples
Fig. 5.2 Comparison between the actual data and the predicted data for Mackey-glass data samples
obtained from CAELF.
TABLE 5.1 The value of MAPE of CAELF on Mackey-glass data.

Here SD refers to standard deviation.
Mean SD Maxm Minm
MAPE 0.108234 0.014332 0.127487 0.087631
In accordance with Fig. 5.2, it has been found that, the two curves, that is to say, predicted
and actual data curves are closely over lapped each other. The reason is that, the MAPE in
this case is very low, i.e., 0.10823 as well as the value of SD is quite low, which are
presented in Table 5.1. Thus, we can say that, our CAELF model is robust and well-
performed in predicting the value of Mackey-glass time series problem.
5.4 Lorenz Time Series
The Lorenz system is a system of ordinary differential equations (i.e., the Lorenz equations)
first studied by Edward Lorenz. It is notable for having chaotic solutions for certain
parameter values and initial conditions. In particular, the Lorenz attractor is a set of chaotic
solutions of the Lorenz system which, when plotted, resemble a butterfly or figure eight.
49
In 1963, Edward Lorenz developed a simplified mathematical model for atmospheric

convection. The model is a system of three ordinary differential equations now known as the
Lorenz equations:
dx
  ( y  x),
dt
dy
 x(  z)  y,
dt
dz
 xy  z.
dt
Here, x, y and z make up the system state, t is time, and  ,  ,  , are the system parameters.
From a technical point of view, the Lorenz system is nonlinear, three-dimensional and
deterministic.
1.2
1
0.8
Magnitude of
0.6
0.4
Samples
0.2
0
1 501 1001 1501
No. of
Samples
Fig. 5.3 Sample data of the Lorentz time series

Using the Lorenz time series model, we generated some data samples that were used as
training and testing data samples. After training CAELF with the training data samples, the
testing samples were applied to the trained NN model in order to find the forecasting result.
After that, we found a promising result using Lorenz time series data samples, which are
exhibited in Fig. 5.4 and Table 5.2.
50
20
15
Magnitude of
10 Actual data
Samples 5 Predicted data
0
1 51 101 151 201 251 301 351
No. of Samples
Fig. 5.4 Comparison between the actual data and the predicted data for Lorentz data obtained from
CAELF.
TABLE 5.2 The value of MAPE of CAELF on Lorentz data

Mean SD Maxm Minm
MAPE 0.916056 0.015022 0.93433 0.89491
performed in predicting the value of Lorenz time series problem.
5.5 Rossler Time series
The Rössler attractor is the attractor for the Rössler system, a system of three non-linear
ordinary differential equations originally studied by Otto Rössler.These differential
equations define a continuous-time dynamical system that exhibits chaotic dynamics
associated with the fractal properties of the attractor.
The defining equations of the Rössler system are:
dx
  y z
dt
dy
 x  ay,
dt
dz
 b  z(x  c).
dt
51
Otto E. Rössler studied the chaotic attractor with a=0.2, b=0.2, and c=5.7, though properties
of a=0.1, b=0.1, and c=14 have been more commonly used since. Another line of the
parameter space was investigated using the topological analysis. It corresponds to b=2, c=4,
and a was chosen as the bifurcation parameter.
1.2
1
0.8
Magnitude of
0.6
Samples
0.4
0.2
0
0 200 400 600 800 1000 1200
No. of
Samples
Fig. 5.5 Sample data of the Lorentz time series

Using the Rossler time series model, we generated some data samples that were used as
training and testing data samples. After training CAELF with the training data samples, the
testing samples were applied to the trained NN model in order to find the forecasting result.
After that, we found a promising result using Rossler time series data samples, which are
exhibited in Fig. 5.6 and Table 5.3.
14
12
10
Magnitude of
8
6
Samples
Actual data
4
2 Predicted data
0
23
45
67
89
111
133
155
177
199
221
243
265
287
309
331
353
375
397
1
No of
Samples
Fig. 5.6 Comparison between the actual data and the predicted data for Rossler data obtained from
CAELF
52
TABLE 5.3 The value of MAPE of CAELF on Rossler data

Mean SD Maxm Minm
MAPE 0.823876 0.033428 0.84915 0.78577
performed in predicting the value of Rossler time series problem.
5.6 Comparison with Other Works
The obtained forecasting result of CAELF on Spanish daily electrical load data has been
compared with the results of three electrical load forecasting models, such as, (i) standard
electrical load forecasting (SELF), (ii) NNELF-1 32 , and (iii) NARx-2 32 . The first two
models used standard feed-forward NN for electrical load forecasting, where a fixed number
of hidden neuron in the hidden layer of NN and a fixed number of iteration for the NN
training have been considered. The third one is the nonlinear autoregressive model that is
composed by two parts: (i) the true available output is fed as an input to train the NN; (ii) the
resulting network has purely feed-forward architecture and BP algorithm is used for training.
A detailed explanation of NNELF and NARx has been mentioned in Chapter 3. We used one
parameter for comparisons here, that is to say, MAPE.
TABLE 5.4 Comparisons with other models for the ELF problem in terms of the next 120
days. Here, the comparisons are made according to the MAPE
Models Mean SD Max Min

NNELF-1 32 0.6426 0.0551 0.7686 0.4946
NARx-2 32 0.3852 0.1132 0.6455 0.2367
SELF 0.3151 0.0009 0.3168 0.3142
CAELF 0.2132 0.0008 0.2136 0.2110
53
In SELF, the whole setup of CAELF was used, except the constructive approach and partial
training. In this case, 5 hidden neurons were considered in the hidden layer of NN and 200
iterations for the NN training. For comparisons, we run SELF from 10 times and averaged
the forecasting results. On the other hand, the model NNELF-1 32 and NARx-2 32 were
used 10 hidden neurons in the hidden layer and the forecasting results were averaged by 20
individual runs.
Table 5.4 shows the comparison results among these four model including CAELF. From the
table we see that, the mean value of CAELF is lower than NNELF-1, NARx-2 and SELF
model also the standard deviation is lowest. According to a close look, it is understood that,
the mean value of MAPE is the most reduced one for CAELF among the others. Not only
that, the other factor is lower than existing models. So, we can clear that our model is
superior to the existing models.
54
Chapter 6
Conclusion and Future Works
6.1 Conclusion
The thesis proposes a new short-term electrical load forecasting model, CAELF that is
formulated by feed-forward NN training scheme. The size of the hidden layer is determined
automatically by the constructive approach during the training processes. Thereby, the
strength of standard feed-forward NN has definitely enhanced in terms of ELF problem,
which has been exhibited in Fig. 4.3 and Fig. 4.4 for the case of 120 days prediction
analysis. In addition to be understood more clearly, we have also presented the forecasting
result of CAELF for 30 days and 7 days in Fig.4.5 and 4.6, respectively. We can conclude
our observation from these figures is that, the proposed CAELF has a remarkable capability
of forecasting the electrical load as per short-term basis. In case of holidays load demand,
forecasting of electrical load using CAELF is satisfactory as the predicted load curve is very
much closed to overlap the actual load curve except some points, which has been exhibited
in Fig.4.7.
Table 5.4 shows a comparison results in terms of MAPE for three different models, such as,
SELF, NNELF-1 32 , and NARx-2 32 for the Spanish daily load demand data. It is found
that, forecasting result of CAELF has minimum quantity of MAPE comparing to the other
models. In addition, the value of SD for MAPE in case of CAELF is lower than that of other
models.
In order to justify the logical significance of our proposed CAELF, we conducted one
different experiment, i.e., the Mackey Glass time series, Lorenz, Rossler analysis that is
mentioned in detail in Section 5.3, Section 5.4, and Section 5.5. It can be observed that the
Forecasting results in three prototype data samples are generated artificially. In addition to
the results of SD for all data samples including real Spain data and artificial time series data
exhibited in Table 5.1 to Table 5.3 , it has been found that the value of SD is very low.
Table 5.4 also proves that the value of SD of CAELF is lower than other existing modes.
Hence, we can claim that our CAELF model is robust and significantly better for electrical
load forecasting problem.
6.2 Future Works
Although the forecasting performance of CAELF is very much satisfactory, there are some
areas where it could not perform well. It can be found from Fig. 4.3, forecasting errors of the
initial part and the last part are little bit higher comparing to the middle part. The reason
behind such less performance is to be the nonlinearity of electrical load. To overcome such
difficulties, incorporating some heuristic techniques to the CAELF is recommended for the
future tasks.
56
References
[1] J.P. Rothe, A.K. Wadhwani and S. Wadhwani, “Hybrid and integrated approach to short
term load forecasting.” IEEE Transactions on Power Systems, vol. 2, issue 12, pp. 7127-
7132, 2010.
[2] ShahramJavadi, “Spatial Load Forecasting Using Fuzzy Logic.” 4 t h

WSEAS international
conference on power engineering systems–ICOPES, Brazil, pp. 51-56, 2005.
[3] M. Amina and V. S. Kodogiannis, “Load forecasting using fuzzy wavelet Neural networks.”
IEEE International Conference on Fuzzy Systems, pp. 1033 – 1040, 2011.
[4] H. L. Willis, “Power Distribution Planning Reference Book.” Second Edition, Revised and
Expanded. New York: Marcel Dekker, 2002.
[5] G. Gross and Francisco D Galiana, “Short-term load forecasting.” IEEE transaction on power
system, vol 75.no.12, pp.1558-1573, December 1987.
[6] A. D. Papalexopoulos and T. C. Hesterberg, “A regression-based approach to short-term

system load forecasting.” IEEE Transactions on Power Systems, vol. 5, pp. 1535-1547, 1990.
[7] M. T. Hagan and S. M. Behr, “The Time Series Approach to Short Term LoadForecasting,”
IEEE Transactions on Power Systems, vol. 2, pp. 785-791, 1987.
[8] H. S. Hippert, C. E. Pedreira, and R. C. Souza, “Neural networks for short- term load
forecasting: a review and evaluation,” IEEE Transactions on Power Systems, vol. 16, pp. 44-
55, 2001
[9] Zhanshou Yu, “Feed-Forward Neural Networks and Their Applications in Forecasting”, M.Sc
thesis, December, 2000.
[10] A. Khotanzad, R. Afkhami-Rohani, L. Tsun-Liang, A. Abaye, M. Davis and D. J.

Maratukulam, “ANNSTLF-a neural-network-based electric load forecasting system,” IEEE
Transactions on Neural Networks, vol. 8, pp. 835-846, 1997.
[11] A. Khotanzad, R. Afkhami-Rohani, and D. Maratukulam, “ANNSTLF- Artificial Neural

Network Short-Term Load Forecaster generation three,” IEEE Transactions on Power Systems,
vol. 13, pp. 1413-1422, 1998.
57
References
[12] A. Khotanzad, Z. Enwang, and H. Elragal, “A neuro-fuzzy approach to short- term load
forecasting in a price-sensitive environment.” IEEE Transactions on Power Systems, vol. 17,
pp. 1273-1282, 2002.
[13] H. S. Hippert and C. E. Pedreira, “Estimating temperature profiles for short- term load
forecasting: neural networks compared to linear models,” IEE Proceedings - Generation,
Transmission and Distribution, vol. 151, pp. 543-547, 2004.
[14] A. Khotanzad, M. H. Davis, A. Abaye and D. J. Maratukulam, “An artificial neural network
hourly temperature forecaster with applications in load forecasting,” IEEE Transactions on
Power Systems, vol. 11, pp. 870-876, 1996.
[15] T. Haida and S. Muto, “Regression based peak load forecasting using a transformation
technique,” IEEE Transactions on Power Systems, vol. 9, pp.1788-1794, 1994.
[16] O. Hyde and P. F. Hodnett, “An adaptable automated procedure for short-term electricity load
forecasting,” IEEE Transactions on Power Systems, vol. 12,pp. 84-94, 1997.
[17] S. Ruzic, A. Vuckovic, and N. Nikolic, “Weather sensitive method for short term load
forecasting in Electric Power Utility of Serbia.” IEEE Transactions on Power Systems, vol. 18,
pp. 1581-1586, 2003.
[18] W. Charytoniuk, M. S. Chen and P. Van Olinda, “Nonparametric regression based short-term
load forecasting,” IEEE Transactions on Power Systems, vol.13, pp. 725-730, 1998.
[19] B. Krogh, E. S. de Llinas and D. Lesser, “Design and Implementation of An on-Line Load
Forecasting Algorithm,” IEEE Transactions on Power Apparatus and Systems, vol. PAS-101,
pp. 3284-3289, 1982.
[20] S. R. Huang, "Short-term load forecasting using threshold autoregressive models," IEE
Proceedings-Generation, Transmission and Distribution, vol.144, pp. 477-481, 1997.
[21] S.-J. Huang and K.-R. Shih, “Short-term load forecasting via ARMA model identification
including non-Gaussian process considerations.” IEEE Transactions on Power Systems, vol.
18, pp. 673-679, 2003.
[22] J. F. Chen, W. M. Wang and C.-M. Huang, “Analysis of an adaptive time- series autoregressive
moving-average (ARMA) model for short-term load forecasting.” Electric Power Systems
Research, vol. 34, pp. 187-196, 1995.
[23] M. Espinoza, C. Joye, R. Belmans and B. DeMoor, “Short-Term Load Forecasting, Profile
58
References
Identification, and Customer Segmentation: A Methodology Based on Periodic Time Series.”

IEEE Transactions on Power Systems, vol. 20, pp. 1622-1630, 2005.
[24] C. M. Huang, C. J. Huang and M. L. Wang, “A particle swarm optimization to identifying the
ARMAX model for short-term load forecasting.” IEEE Transactions on Power Systems, vol.
20, pp. 1126-1133, 2005.
[25] M. Espinoza, J. A. K. Suykens, R. Belmans and B. De Moor, “Electric Load Forecasting.”

IEEE Control Systems Magazine, vol. 27, pp. 43-57, 2007.
[26] D. C. Park, M. A. El-Sharkawi, R. J. Marks, II, L. E. Atlas and M. J.Damborg, “Electric load
forecasting using an artificial neural network.”IEEE Transactions on Power Systems, vol. 6, pp.
442-449, 1991.
[27] Methaprayoon K, Lee WJ, Rasmiddatta S, Liao JR, and Ross RJ, “Multistage artificial neural
network short-term load forecasting engine with front-end weather forecast.” IEEE
Transactions on Industry Applications, vol. 43, no. 6, pp. 1410-1416, 2007.
[28] Deihimi A and Showkati H, “Application of echo state networks in short-term electric load
forecasting”, Energy, vol. 39, pp. 327-340, 2012.
[29] Ferreira VH and Alves da Silva AP, “Toward estimating autonomous neural network-based
electric load forecasters”, IEEE Transactions on Power Systems, vol. 22, no. 4, pp. 1554-1562,
2007.
[30] Xia C, Wang J and McMenemy K, “Short, medium and long term load forecasting model and
virtual load forecaster based on radial basis function neural networks.” International Journal
of Electrical Power & Energy Systems, Vol. 32, No. 7 pp.743-750, 2010.
[31] Vermaak J and Botha EC, “Recurrent neural networks for short-term load forecasting.” IEEE
Transactions on Power Systems, Vol.13, pp.126-132, 1998.
[32] Elias R S, Fang L and Wahab M I M, “Electrical load forecasting based on weather variables
and seasonalities: A neural network approach.” 8th International conference on service systems
and service management, Canada, 2011.
[33] H. A. Malki, N. B. Karayiannis and M. Balasubramanian, “Short-term electric power load

forecasting using feedforward neural network.” Expert systems, vol. 21, no. 3, pp. 157-167,
2004.
[34] D. O. Arroyo, M. K. Skov and Q.Huynh, “Accurate electricity load forecasting with artificial
59
References
neural networks.” International conference on computational intelligence for modeling,

control, and automotion-International conference on intelligenct agents, web technologies, and
internet commerce (CIMCA-IAWTIC’05), 2005
[35] A. G. Baklrtzls, V. Petrldls, S. J. Klartzls and M. C. Alexladls, “A neural network short term
load forecasting model for the greek power system.” IEEE Transactions on Power Systems,
vol. 11, no. 2, pp. 858-863, 1996.
[36] D. Srinivasan, A. C. Liew and C. S. Chang, “A neural network short-term load forecaster.”
Electric Power Systems Research, vol. 28, pp. 227-234, 1994.
[37] C. C. Hsu and C. Y. Chen, “Regional load forecasting in Taiwan-applications of artificial

neural networks.” Energy Conversion and Management, vol. 44, pp. 1941-1949, 2003.
[38] Hippert H S, Pedreira C E and Zareipour R C S, “Neural Networks for Short-Term Load
Forecasting: A Review and Evaluation.” IEEE Transactions on Power Systems, vol. 16, no. 1,
pp. 44-55, 2001.
[39] Y. Chen, P.B. Luh, C. Guan, Y. Zhao, L.D. Michel and M.A. Coolbeth, et al., “Short-term load
forecasting: similar day-based wavelet neural networks.” IEEE Transactions on Power
Systems, vol. 25, no. 1, pp. 322-330, 2010.
[40] https://en.wikipedia.org/wiki/Artificial_Neural_Network
[41] T. Masters, “Practical Neural Network Recipes in C++.” Academic Press, Inc., 1993
[42] S. T. Welstead, “Neural Network and Fuzzy Logic Applications in C++.” John Wiley & Sons,
Inc., 1994
[43] Ben Kröse and Patrick V.D. Smagt, “An Introduction to Neural Network.” The University of
Amsterdam, 1996
[44] L. Fausett, “Fundamentals of Neural Networks: Architectures, Algorithms, and Applications.”

Prentice-Hall, Inc., 1994
[45] W.S. Sarles, “Neural Network FAQ.” periodic posting to the Usenet newsgroup
comp.ai.neural-net, URL: ftp://ftp.sas.com/pub/neural/FAQ.html, 1997.
[46] Zhanshou Yu, “Feed-Forward Neural Networks and Their Applications in Forecasting”, M.Sc
thesis,December,2000
60

Thesis Paper STELF Md. Sabuj Hossain 66B EEE

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Thesis Paper STELF Md. Sabuj Hossain 66B EEE

Uploaded by

Copyright:

Available Formats

WUB

Short-Term Electrical Load Forecasting using

A thesis report submitted to the Department of Electrical and Electronic Engineering,

Md. Sabuj Hossain

Rounok Jahan Mitul

Dr. Md. Riyad Tanshen

Dated: October, 2021.

Dr. Md. Riyad Tanshen

Subject : Submission of Thesis Report.

Md. Sabuj Hossain

Rounok Jahan Mitul

Md. Sabuj Hossain

Rounok Jahan Mitul

I wish his/her success in future.

(Dr. Md. Riyad Tanshen)

Thank you all

Md. Sabuj Hossain

Rounok Jahan Mitul

1.1 Business Needs of Load forecasts ....................................................................................... 2

3 Artificial Neural Network 15

3.2.5 Objective Function ................................................................................................... 22

4 Proposed Model for Electrical Load Forecasting 33

5 Analysis and Comparisons 45

6 Conclusion and Future works 55

Figure 2.1: A typical STELF process .............................................................................................. 10

Figure 3.1: Processing unit .............................................................................................................. 17

Figure 3.2: Identity function ............................................................................................................ 18

Figure 3.3: Binary step function ...................................................................................................... 18

Figure3.4: Sigmoid function............................................................................................................ 19

Figure 3.5: Bipolar sigmoid function .............................................................................................. 19

Figure 3.6: Recurrent Neural Network ............................................................................................ 20

Figure 3.7: Supervised Learning model .......................................................................................... 21

Figure 3.8: Feed forward Neural network ....................................................................................... 27

Figure 4.1: Flowchart of CAELF .................................................................................................... 34

Figure 5.1: Sample data of the Mackey-glass time series................................................................ 48

Figure 5.5: Sample data of the Rossler time series........................................................................... 52

Table 1.1: Needs of forecasts in utilities ........................................................................................... 3

Table 4.4: Result for Sample prototype data ................................................................................... 44

Table 5.1: The value of MAPE of CAELF on Mackey-glass data .................................................. 48

Table 5.2: The value of MAPE of CAELF on Lorentz data ............................................................ 49

Table 5.3: The value of MAPE of CAELF on Rossler data ............................................................ 51

1.1 Business Needs of Load Forecasts

TABLE 1.1. Needs of forecasts in utilities.

Minimum updating cycle Maximum horizon

1.2 Characteristics of the Power System Load

1.2.4 Random Disturbance

1.3 Classification of Developed ELF Methods

In terms of lead time, load forecasting is divided into four categories:

1.4 Short-Term Electrical Load Forecast

1.5 Application of Short-Term Electrical Load Forecast

 assessing the security of the power system at any time

1.6 Specific Aims of the Thesis

The thesis is highlighted on basis of the following objectives:

Fig. 2.1 A typical STELF process that is adapted from 9

2.2 Statistical Approaches

2.2.1 Regression Analysis

A regression-based approach to STELF is proposed by Papalexopoulos and Hesterberg 6 .

2.2.2 Autoregressive Integrated Moving Average

2.3 Neural Network based Approaches

Artificial Neural network

3.2 Fundamentals of Neural Networks

Neural networks, sometimes referred to as connectionist models, are parallel-distributed

3.2.1 Processing Unit

Fig. 3.1 Processing unit