Professional Documents
Culture Documents
Ucalcary 2017 Chitsaz Hamed
Ucalcary 2017 Chitsaz Hamed
2017
Chitsaz, Hamed
University of Calgary graduate students retain copyright ownership and moral rights for their
thesis. You may use this material in any way that is permitted by the Copyright Act or through
licensing that has been assigned to the document. For uses that are not allowable under
copyright legislation or licensing, you are required to seek permission.
Downloaded from PRISM: https://prism.ucalgary.ca
UNIVERSITY OF CALGARY
by
Hamed Chitsaz
A THESIS
CALGARY, ALBERTA
NOVEMBER, 2017
c Hamed Chitsaz 2017
Abstract
In recent years, distributed energy resources and microgirds have attracted a great deal of interests
in the power industry. The development of microgrids has required engineers and operators to
enhance the efficiency and the energy management of such small-scale power systems. To do so,
energy forecasting plays a key role in their optimal operation. In particular, Short-term Wind Fore-
casting (STWF), Short-term Load Forecasting (STLF) and Short-term Price Forecasting (STPF)
are important tools for reliable operation scheduling of grid-connected microgrids with renewable
energy sources (e.g., wind). The generated energy forecasts are used in an optimization platform
to ensure the most economical operation for microgrids as the end goal of this thesis.
The main focus of this thesis is the development of forecasting models that are tailored for the
gence and an evolutionary algorithm to provide wind forecasts. This model can be applied to gen-
erate wind predictions at the power system, wind farm and/or wind turbine levels. By statistically
ogy is developed based on neural networks to provide satisfactory forecasts for volatile electricity
microgrids by taking advantage of energy arbitrage opportunities with the grid. It is noted that the
Numerical results in Chapters 2, 3, 4 and 5 of this thesis are provided based on Alberta, On-
tario, British Columbia, California, Texas and NewYork power systems, and two campuses. The
simulations show the effectiveness of the proposed neural networks for STWF and STLF. The sta-
tistical and economic evaluations show the satisfactory performance of the developed STPF model
tion platforms are developed for the optimal operation of microgrids, which can help the operator
apply the most effective approach under different scenarios of generation and market conditions.
i
Acknowledgments
First and the foremost, I would like to express my sincere gratitude to my supervisor, Dr. Hamidreza
Zareipour, and my co-supervisor, Dr. David Wood, for their continuous support in my Ph.D. stud-
ies. Their motivation and passion inspired me to proceed my research and achieve my goals not
only in the academia but also in personal life. I truly appreciate their fantastic mentorship and their
patience in this broad training process. The success and outcome of this thesis required a lot of
guidance and assistance from my supervisors, colleagues, friends and family, and I am extremely
Also, I would like to thank the members of my committee: Dr. Andrew Knight, Dr. Svetlana
Yanushkevich, Dr. Geoffrey Messier, and Dr. Sherif Faried for their insightful comments.
A very special gratitude goes out to my friend and colleague Dr. Hamid Shaker for the great
collaboration during my PhD studies. I would also like to extend my gratitude to my colleagues
and friends Dr. Payam Zamani-Dehkordi, Dr. Soroush Shafiee, Dr. Ehsan Nasrolahpour, Mr.
Hamidreza Rafieenia, Mr. Saeed Masoumi, Mr. Shahab Esmaeilnejad, Mr. Babatunde Odetayo,
Mr. Juan Arteaga, Mr. Shubhrajit Bhattacharjee, and Dr. Mostafa Kazemi who encouraged,
education and supporting my PhD application by his strong recommendation. Also, I would like to
thank Mr. David Adair and Mr. Ben Thomas for their help in designing a website for visualization
of price forecasts, and Mr. Gregor Hähner and Mr. Shane Fast for their help in preparing different
Last but not the least, I owe my deep gratitude to my parents and to my brother and sister
throughout my life.
ii
Dedication
iii
Table of Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
List of Symbols and Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Research Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Research Objectives and Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Research Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Wind Power Forecast Using Wavelet Neural Network Trained by Improved Clonal
Selection Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.1 The Developed Wavelet Neural Network . . . . . . . . . . . . . . . . . . 15
2.3.2 The Proposed Training Strategy . . . . . . . . . . . . . . . . . . . . . . . 19
2.4 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4.1 Numerical Results with 6-hour Updates . . . . . . . . . . . . . . . . . . . 28
2.4.2 Numerical Results with Hourly Updates . . . . . . . . . . . . . . . . . . . 33
2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3 Short-term Electricity Load Forecasting of Buildings in Microgrids . . . . . . . . . 36
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2 Data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3 The forecasting model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.3.1 Self-Recurrent Wavelet Neural Network . . . . . . . . . . . . . . . . . . . 44
3.3.2 The training algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.4 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4 Electricity Price Forecasting for Operational Scheduling of Behind-the-meter Stor-
age Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.2.1 Operation of a BESS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2.2 Forecasting Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.3 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.3.1 Ontario’s Electricity Market . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.3.2 Statistical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.3.3 Economic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
iv
5 Impact of Uncertainty Modeling on Economic Performance of Microgrids . . . . . 84
5.1 Nomenclature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.3 Forecasting Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.3.1 Deterministic Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.3.1.1 Electricity Load Forecasting . . . . . . . . . . . . . . . . . . . . 89
5.3.1.2 Electricity Price Forecasting . . . . . . . . . . . . . . . . . . . . 89
5.3.1.3 Wind Power Forecasting . . . . . . . . . . . . . . . . . . . . . . 91
5.3.2 Probabilistic Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.4 Optimization Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.4.1 Microgrid Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.4.2 Deterministic Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.4.3 Probabilistic Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.5 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.5.1 Statistical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.5.2 Economic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.1 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
A Wind power forecasting at wind farm levels . . . . . . . . . . . . . . . . . . . . . 129
B Benchmark models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
C Formulation of the training algorithm . . . . . . . . . . . . . . . . . . . . . . . . 133
D Mutual-Information feature selection . . . . . . . . . . . . . . . . . . . . . . . . . 137
E Copyright permission letters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
v
List of Tables
2.1 nRMSE (%) and nMAE (%) results for the four test weeks of year 2012 . . . . . . 29
2.2 Comparison of the proposed method and a WNN with Mexican hat wavelet . . . . 31
2.3 Comparison of the proposed training strategy, i.e. ICSA, with SA, PSO, DE and
CSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.4 Wind power prediction error results of the proposed method for 10 months . . . . . 32
vi
List of Figures and Illustrations
3.1 One-year hourly load data of BC’s power system and the building in BCIT . . . . . 41
3.2 Distribution of 1-hour and 2-hour ramps . . . . . . . . . . . . . . . . . . . . . . . 42
3.3 Architecture of the SRWNN. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4 Mean absolute error (kW) of different hours of the day in different months . . . . . 53
3.5 10-month mean absolute error for different hours of the day . . . . . . . . . . . . . 53
3.6 Samples for bad (a) and good (b) forecasting days . . . . . . . . . . . . . . . . . . 54
3.7 Forecasting errors for different days of the week . . . . . . . . . . . . . . . . . . . 55
5.1 Costs for Scenario #1: low price and low wind . . . . . . . . . . . . . . . . . . . . 102
5.2 Costs for Scenario #2: low price and high wind . . . . . . . . . . . . . . . . . . . 103
5.3 Costs for Scenario #3: high price and low wind . . . . . . . . . . . . . . . . . . . 104
vii
List of Symbols and Abbreviations
Symbol Definition
AESO Alberta Electric System Operator
AI Artificial Intelligence
BA Balancing Authority
BTM Behind-the-meter
CI Computational Intelligence
DA Day-ahead
DE Differential Evolution
DER Distributed Energy Resource
DG Diesel Generator
EMS Energy Management System
viii
FFNN Feed-Forward Neural Network
GE General Electric
HE Hour Ending
PI Prediction Interval
ix
PICP Prediction Interval Coverage Probability
PP Pool Price
PSO Particle Swarm Optimization
RH Rolling Horizon
SA Simulated Annealing
x
Chapter 1
Introduction
The idea of microgrids was first introduced in the technical literature by Lesseter [1] as a solution
for the reliable integration of Distributed Energy Resources (DERs), including Energy Storage
Systems (ESSs) and controllable loads. Microgrids are integrated energy systems operating as
an autonomous grid, which can be either in parallel to or islanded from the existing power grid.
Despite this general definition for microgrids, there is still an ambiguity of what is and is not
a microgrid in the literature. This vagueness mainly comes from the size and type of energy
resources within such small-scale power systems. For instance, inclusion of a renewable power
generation resource, Combined Heat and Power (CHP), or some form of energy storage as well as
network controls for optimization of generation and loads is one of the criteria for a remote electric
In a recent report (2nd Quarter of 2017), Navigant Research states that there are currently
more than 1,840 known microgrid projects across 135 countries with a total capacity of 19,279.4
MW [3]. In this report, seven major market segments with their shares are defined: remote (45%),
(9%), military (5%), and direct current (less than 1%) microgrids [2]. Although remote projects
constitute the majority of microgrids with a total of 8,708.1 MW, the commercial/industrial sector
is anticipated to become the fastest-growing microgrid sector, representing more than 35% of the
With the fast and worldwide development of microgrids, energy management of such electric
systems become important. The optimal utilization of available resources within the microgrid is
essential for operating the system with the lowest possible cost, which requires advanced tools and
1
energy management techniques [5]. In practice, operating a grid-connected microgrid in the most
economic way can be a challenging task due to a number of reasons. The integration of renewable
energy in microgrids causes an additional complexity in their operation because of the uncertainties
attached to such intermittent energy resources. Another challenge in the operation of microgrids
is the non-smooth behavior of the electricity load at building and/or microgrid levels. Unlike the
smooth variations of electricity loads in power systems, microgrid loads can have severe variations
that negatively impact their predictability [6]. In addition, grid-connected microgrids capable of
trading energy with the main grid are subject to the risks of fluctuations in electricity market prices,
which can affect the economics of the microgrid. Hence, accurate short-term forecasting tools are
Forecasts are important because they are the main inputs to the optimization platform for the
wind energy, the only approach to deal with the generation intermittency in operation time scale
is to forecast the energy over an extended period of time. Wind power forecasts are needed for
operators to control balancing, operation and safety of the system [8]. Additionally, electricity
load forecast is an indispensable task for the operation of a micro-grid, as many operating deci-
sions, such as dispatch scheduling of generating capacity and demand side management, are based
on load forecasts [9]. It is discussed that the forecasted loads as well as forecasted generation
of renewable resources are the main inputs for optimal energy management [7] and generation
scheduling [10] in micro-grids. However, electricity price forecasts also become essential for the
operation of grid-connected microgrids. Accurate price forecasts help the operator with effective
strategies for the energy arbitrage with the main grid. Microgrids tend to rely only on local energy
resources when the electricity price is high, whereas it is economical to purchase energy from the
grid when electricity prices are lower than the cost of generating power locally.
Therefore, this thesis provides methodologies for short-term energy forecasting applicable in
the optimal operation of microgrids as the end goal of this thesis. In particular, wind power gen-
2
eration, electricity loads and electricity market prices are the main focus in this thesis. Energy
forecasts are applied to an optimization platform to provide operational schedules of the resources
in the microgrid. Furthermore, this thesis aims to investigate the most suitable optimization plat-
The main objective of this thesis is to provide efficient short-term forecasting models required for
the optimal operation of microgrids. In particular, this thesis has four specific objectives. The first
is to develop a short-term wind power forecasting model. The second objective is to implement
a short-term forecasting model for volatile electricity loads in microgrids. The third objective is
to develop a short-term prediction model for electricity market prices required for the economical
operation of a grid-connected microgrid. The last objective of this thesis is to investigate the most
Figure 1.1 illustrates a general platform for optimal operation of a grid-connected microgrid.
3
The energy management system contains forecasting models as well as the optimization frame-
work. The forecasting models are fed by required input data located in a database. Forecasts of
wind power generation, electricity loads and electricity prices along with the operational infor-
mation of all microgrid components are sent to the optimization framework for optimal operation
scheduling of the microgrid. The final outputs are the schedules for dispatchable units within the
In this manuscript-based thesis, I use publicly available data of different electricity markets,
such as British Columbia, Alberta, Ontario in Canada, as well as electricity markets in New York
and Texas in the U.S. In addition, I use the electricity demand data of a building in British Columbia
Institute of Technology (BCIT) campus and a campus building at the University of Calgary for the
electricity load forecasting purpose. Although the developed forecasting methodologies have been
tailored to the characteristics of the time series of interest, they can be used for other markets or
In Chapter 2, I focus on building a short-term forecasting model for wind power generation.
According to the literature on wind power forecasting methodologies, nonlinear models such as
artificial neural networks have shown better prediction performance than linear models, e.g., linear
regression models, as wind power is generally a non-linear function of its input features. Therefore,
I base the structure of the model on a state-of-the-art artificial neural network. In addition to the
forecasting engine, the training algorithm plays a key role in the performance of forecasting tools.
Thus, to enable the model to capture high fluctuations of wind power time series, I equip the
algorithm that optimizes the free parameters of the model in the training phase. This helps achieve
the first objective of this thesis, which is the development of a wind power prediction tool with
high forecasting accuracy. The wind power forecasts are essential in the optimal operation of
microgrids with wind turbines. An improvement in wind power forecasts means the real-time
4
Therefore, unexpected actual wind generation would not result in operational risks for other units,
In Chapter 3, the focus is on the development of a short-term forecasting model for electricity
loads in microgrids. First, I perform an analysis to highlight the main challenge for load forecasting
in microgrids. To do so, the characteristics of the historical load of a microgrid is compared with
those of electricity loads in two power systems. In other words, I compare the volatility of the
microgrid load with the volatility of the electricity load in two power systems using two measures.
In addition, I perform another experiment to analyze the ramp events in the electricity loads of
the microgrid versus a power system. Knowing the characteristics of the microgrid loads, I then
propose a forecasting methodology based on a state-of-the-art neural network with high capability
of capturing severe variations in non-smooth load time series. Therefore, the objective of this
chapter is to propose a load forecasting model that can provide accurate forecasts for the optimal
operation of microgrids. An improvement in load forecast accuracy can be translated into less
power mismatch in real-time operation of the microgrid. Overestimating the electricity load bring
more controllable unit online which results in excess power generated, whereas underestimating
the load results in supply shortfall that could result in load curtailments in stand-alone operation or
Chapter 4 presents a methodology for short-term electricity price forecasting that is specifically
tailored to enhance the operation of microgrids. When it comes to price forecasting, it is of high
importance to take into account the main application for the prediction tool. In this chapter, the
application is the operation of a behind-the-meter battery energy storage system within a microgrid.
Hence, I first formulate the operation scheduling of the storage system using common sets of
forecasts, i.e., day-ahead forecasts and rolling horizon forecasts. Then, I propose a new operation
framework for the storage system using intra-hour rolling horizon that can potentially enhance the
economics of the microgrid. To achieve this, I perform an analysis to evaluate the potential of high-
resolution market data, i.e., market clearing prices, in capturing the price spikes using the publicly
5
available data of four North American electricity markets. Accordingly, I build a price forecasting
tool that includes high-resolution and low-resolution market data with high capability of capturing
price spikes. Thus, the objective in this chapter is to construct such a price forecasting model that
can enhance the economics of the grid-connected microgrid by an effective energy arbitrage with
In Chapter 5, I focus on the optimal operation of a microgrid using the generated forecasts.
Since the forecasts are not perfect, the uncertainty related to them can negatively affect the op-
eration. There are two main approaches to mitigate the forecast uncertainties; rolling horizon
technique and prediction intervals. In the rolling horizon technique, point forecasts are updated
every hour to provide more accurate prediction for the remaining hours of the operating day. Al-
ternatively, prediction intervals provide a range in which the actual values are expected to fall with
a certain confidence level. Using the prediction intervals, robust optimization can be used in the
operation of microgrid in which the worst case scenario is considered. In this chapter, I first pro-
vide simple methodologies for the electricity load, electricity price and wind power forecasting for
both point and interval forecasts. Then, I formulate optimization platforms for both deterministic
and probabilistic approaches. Afterwards, each platform is evaluated by three different scenarios
of wind power generation and electricity price. The objective of this chapter is to investigate the
impact of different optimization approaches on the economic performance of the microgrid under
To achieve the objectives stated above, several contributions are made to the existing literature. A
summary of the main contributions are provided in the following paragraphs. These contributions
The first contribution of this thesis is to build a short-term forecasting model for wind power
generation. Having access to wind power generation in the province of Alberta, Canada, the pro-
6
posed methodology is constructed based on aggregated wind power from all wind farms in the
province. This model can be used for providing point forecasts of wind power for power systems,
wind farms or small wind turbines with minor adjustments if needed (please see Appendix A for
further information). For the microgrid application, the operator can generate wind power forecasts
and then apply them into an optimization platform to determine the operational schedules of the
dispatchable units within the microgrid. The forecasting engine is built on a Wavelet Neural Net-
work (WNN) which is an efficient artificial neural network model. Due to the local properties of
wavelets and the ability of adapting the wavelet shape according to the training data set instead of
adapting the parameters of the fixed shape activation function, WNNs offer higher generalization
capability compared to the classical feed forward neural networks [11]. This make them suitable
methods for predicting volatile time series. To enhance the performance of this forecasting en-
gine, the activation functions of the hidden neurons are constructed based on multi-dimensional
Morlet wavelets. To train the forecasting engine, I propose a new stochastic evolutionary algo-
rithm, named Improved Clonal Selection Algorithm (ICSA), which optimizes the free parameters
of the WNN for wind power prediction. The obtained numerical results confirm the validity of the
As the second contribution of this thesis, I develop a Short-term Load Forecasting (STLF)
model for microgrids, which is ultimately used for enhancing the energy management of available
resources in microgrids. First, I perform a data analysis to compare the characteristics of a power
system load versus an electricity load in a microgrid. The case study is British Columbia’s power
system and British Columbia Institute of Technology (BCIT) campus microgrid. As the analy-
sis suggests, the microgrid load has highly volatile behavior and consequently, this non-smooth
characteristic decreases the predictability of such a time series. Therefore, in this thesis, an STLF
prediction model is proposed based on Self-Recurrent Wavelet Neural Network (SRWNN) as the
and adapted to train the SRWNN. The numerical results show the effectiveness of this forecasting
7
model to deal with severe variations of the electricity load in microgrids.
The third contribution of this thesis is to develop a Short-term Price Forecasting (STPF) model
for the operation of a grid-connected microgrid. As a common mode of operation for microgrids,
they can trade energy with the main grid. Therefore, there is a potential for an energy arbitrage.
As mentioned, electricity price forecasting is very dependent on the application. Since the oper-
ation of the microgrid is of interest, I develop a methodology specifically tailored to benefit the
tem. The forecasting model takes advantage of high-resolution market data, e.g., market clearing
prices, that carries the most recent information about the condition of the electricity market. I first
conduct an analysis to highlight the significance of the candidate high-resolution market data, and
its potential capability to capture sharp variations in market prices. To develop this model, I focus
on the Ontario’s electricity market as the case study. It should be noted that different jurisdictions
have different electricity markets. However, the proposed model can be adapted to other electricity
markets as all have similar high-resolution data. Moreover, an effective algorithm, named Intra-
day Rolling Horizon (IRH), is designed to embed the generated price forecasts in an optimization
platform. This optimization algorithm schedules the operation of the storage system in such a way
that the microgrid can make a profit by trading energy with the main grid.
The fourth and the last contribution of this thesis is to investigate the impact of different uncer-
tainty mitigation approaches on the economic performance of microgrids under different scenarios.
Two main strategies have been introduced in the literature to deal with the uncertainties of fore-
casts, i.e., rolling horizon technique and probabilistic forecasting. The rolling horizon technique
updates the point forecasts on every forecasting step, while the probabilistic forecasting provides
prediction intervals with a certain confidence level as opposed to point forecasts. First, I imple-
ment models to generate point and interval forecasts for wind power generation, electricity loads
and electricity prices. I use the wind power generation data of Taber wind farm in Alberta, the
electricity demand data of a building in University of Calgary and the electricity price of the Al-
8
berta’s electricity market in this study. Then, I develop deterministic and probabilistic optimization
platforms to feed the forecasts to the operation of the microgrid. Two different scenarios of high
wind and high electricity price are considered. As the main contribution of this chapter, I inves-
tigate the impact of these different approaches on the economic performance of microgrids. This
helps the operator to apply the most effective approach that leads to higher economic benefits of
the microgrid.
In this manuscript-based thesis, Chapters 2, 3, and 4 are works that have been published as journal
papers. Chapter 5 is a journal paper submitted for publication. The articles have been modified for
coherency of the thesis. However, the contents are the same as the papers. The rest of this thesis is
organized as follows:
• Chapter 2 is titled “Wind Power Forecast Using Wavelet Neural Network Trained by
Improved Clonal Selection Algorithm”. This chapter was published in the journal
Energy Conversion and Management [12]. Dr. Amjady is a co-author of this paper.
implemented the models, performed the simulations and analyzed the numerical
grids”. This chapter was published in the journal Energy and Buildings [13]. Dr.
models, performed the simulations and prepared the analyses along with the paper
write-up.
9
Behind-the-meter Storage Systems”. This chapter was published in the journal
a co-author of this paper who provided valuable feedback on this work. I imple-
mented the models, performed the simulations and analyzed the numerical results
Microgrids”. This chapter has been submitted as a manuscript to the journal IEEE
• Appendices A, B, C and D are also included in the thesis to provide further infor-
wind farm levels and benchmark models in Chapter 2, the formulation of the train-
in Chapter 4.
10
Chapter 2
2.1 Introduction
In recent years, wind power has been the fastest growing renewable electricity generation tech-
nology in the world [15, 16]. The worldwide wind capacity reached approximately 300 GW by
the end of June 2013, out of which 13.9 GW were added in the first six months of 2013 [17]. In
particular, Canada installed 377 MW during the first half of 2013, which is 50 % more than in the
previous period of 2012 [17]. In the Province of Alberta, Canada, in particular, the installed capac-
ity reached 1087 MW in late 2012, and is expected to grow to 2388 MW by 2016 [18]. Despite the
environmental benefits of wind power [19], it has an intermittent nature [20], which could affect
One approach to deal with wind power intermittency in operation time scale is to forecast it
over an extended period of time. Accurate wind power forecasting can improve the economical
and technical integration of large capacities of wind energy into the existing electricity grid [23].
Wind forecasts are important for system operators to control balancing, operation and safety of the
grid [8]. On the other hand, wind power forecast errors might sometimes require system operators
to re-dispatch the system in real time. The costs of re-dispatch affect electricity prices and system
performance [24]. Moreover, reserve requirements are connected to wind forecast uncertainty
[25]. Hence, reducing the costs of re-dispatch and contribution of spinning reserves by more
accurate wind power prediction can effectively increase system operation efficiency. For instance,
1
2015
c Elsevier Ltd. Reprinted, with permission, from [12]:H. Chitsaz, N. Amjady, and H.
Zareipour,“Wind power forecast using wavelet neural network trained by improved Clonal selection algo-
rithm”, Energy Conversion and Management, vol. 89, pp. 588-598, January 2015.
11
the economic benefits of accurate wind forecasting were assessed by GE Energy for the New York
State Energy Research and Development Authority (NYSERDA) and the NYISO - all terms are
defined in the list of symbols in this thesis. In that study, it was estimated that $125 Million, or
36%, of the cost reduction is associated with state-of-the-art wind power forecasting. It was about
80% of the estimated cost reduction that could be achieved with a perfect wind power production
forecast [26].
Hence, various approaches have been proposed to improve wind power forecasting accuracy in
the literature. In one group of models, wind speed and other climate variables are predicted using
Numerical Weather Prediction (NWP) models, and those forecasts are used to predict the wind
power output of a wind turbine or a wind farm [27, 28] using turbine or farm production curves. In
another group of models, the NWP forecasts, or self-generated climate variables forecasts, are fed
into secondary time series models to predict wind power output for a turbine, a farm or system level
wind power production. The time series models may be built based on ensemble forecasting [29],
statistical approaches [30, 31], or artificial intelligence techniques [32, 33]. In a third group of
models, only past power production values are used in univariate models to predict future wind
power values [34, 35]. Despite improvements in wind power forecasting methods, wind power
forecasts still suffer from relatively high errors, ranging from 8% to 22% (in terms of normalized
mean squared error) depending on several factors, such as, forecasting horizon, type of forecasting
The contribution of the present paper is to propose a new forecasting technique for short-term
wind power forecasting. In particular, we develop a wind power forecasting engine based on
Wavelet Neural Network (WNN) with multi-dimensional Morlet wavelets as the activation func-
tions of the hidden neurons and maximum correntropy criterion as the error measure of the training
phase. We propose a stochastic search technique, which is an improved version of Clonal search
algorithm, for training the forecasting engine. The significance of the proposed forecasting tech-
nique is that the combination of the WNN and the proposed training strategy is capable of capturing
12
highly non-linear patterns in the data and result in improved forecast accuracy. Particularly, high
exploitation capability of the proposed training strategy enables it to find more optimal solutions
The remaining sections of the paper are organized as follows. A brief literature review on wind
power forecasting models is provided in Section 2.2. The architecture of the wind forecasting
model is introduced in subsection 2.3.1. The proposed training strategy is then presented in sub-
section 2.3.2. The results of the proposed wind forecasting method, obtained for the real-world test
cases, are compared with the results of several other prediction approaches in section 2.4. Section
In this section, a literature review of the existing wind power forecasting models is provided. As
mentioned in section 2.1, the forecasting methods based on time series, either statistical models
or artificial intelligence models, use historical wind power data recorded at the wind farms along
with the historical data of the exogenous meteorological variables such as wind speed, tempera-
ture and humidity, providing that the data is available. Auto-Regressive Moving Average (ARMA)
models [37], Auto-Regressive Integrated Moving Average (ARIMA) model [38], and Fractional
ARIMA (FARIMA) model [31], have already been applied to wind speed and wind power predic-
tion. Although time series models are simple forecasting methods and can be easily implemented,
most of them are linear predictors, while wind power is generally a non-linear function of its input
features.
Artificial intelligence techniques, especially artificial neural networks, have been used in sev-
eral papers to predict wind power generation [39]. Recurrent neural network [40], Radial Basis
Function (RBF) neural network [41], and Multi-Layer Perceptron (MLP) neural networks [35],
have been proposed for wind power forecasting. Although neural networks can model nonlinear
input/output mapping functions, a single neural network with traditional training mechanisms has
13
limited learning capability and may not be able to correctly learn the complex behavior of wind
signal. To remedy this problem, combinations of neural networks with each other and with fuzzy
inference systems such as Adaptive Neuro-Fuzzy Inference System (ANFIS) [42, 43], and Hy-
brid Iterative Forecast Method (combining MLP neural networks) [44], have also been suggested
for wind speed and power prediction. However, such models and especially fuzzy logic models,
involve high complexity and a long processing time in the case of many rules [45].
Another approach to tackle the complex behavior of wind power time series is using wavelet
transform. In [46], it has been discussed that wavelets can effectively be used for both stationary
and non-stationary time series analysis 2 , and that is one of the reasons for the wide and diverse ap-
plications of wavelets. Wind speed and power prediction approaches based on wavelet transform,
as a preprocessor to decompose wind speed/power time series, and ANFIS [47], Auto Regres-
sive Moving Average (ARMA) [48], Artificial Neural Network (ANN) [49], and Support Vector
Regression (SVR) [50], as forecast engines, have been presented. As for SVM-based models,
they highly depend on appropriately tuning of parameters and involve complex optimization pro-
cess [45]. Wavelet can also be applied in a more efficient structure called wavelet neural network,
in which wavelet functions are used as the activation functions of the neurons in neural networks.
In [51], wavelet has been used in the form of WNN for wind speed prediction and it is trained
by extended Kalman filter. Since such a model consists of many scaled and shifted wavelets of
the utilized mother wavelet, it requires a powerful training algorithm to efficiently train the model
and not to be trapped in local optima while finding the best input/output mapping function of the
model.
In [52], a wind power prediction strategy including a Modified Hybrid Neural Network and
Enhanced Particle Swarm Optimization (EPSO) has been proposed. In this paper, a developed
2
This footnote was added in response to a question raised by a committee member in the PhD oral examination.
A time series is stationary if its mean, variance and autocorrelation do not change over time, i.e., constant mean and
variance over time. Non-stationary data is often transformed to become stationary. This is to obtain meaningful sample
statistics such as means, variances, and correlations with other variables. Such statistics are useful as descriptors of
future behavior only if the series is stationary. For example, if the series is consistently increasing over time, the
sample mean and variance will grow with the size of the sample, and they will always underestimate the mean and
variance in future periods.
14
evolutionary algorithm, i.e., EPSO, is presented to empower the training phase of the utilized neural
network, which is generally a combination of three simple MLPs. Taking into consideration the
advantages of wavelet transform in the form of WNN and evolutionary algorithms as the training
algorithms, we propose a wind power prediction model, which is elaborated in the next section.
2.3 Methodology
In this section, we provide the details of the proposed forecasting engine and its training strategy.
Briefly, the proposed forecasting technique is composed of a WNN structure with Morlet wavelet
functions as activation functions in the hidden layer and a new training strategy. These components
Wavelet transform has been used in some recent research works for wind forecasting, as a pre-
processor to decompose wind speed/power time series to a set of sub-series [47–50]. The future
values of the sub-series are predicted by ANFIS [47], SVR [50], ARMA [48] and ANN [49] and
then combined by the inverse WT to form the forecast value of wind power/speed. Another ap-
proach to utilize wavelet in a forecast process is through constructing wavelet neural network in
which a wavelet function is used as the activation function of the hidden neurons of an ANN. For
instance, WNNs with Mexican hat and Morlet wavelets, shown in Fig. 1, as the activation function
of the hidden neurons have been applied for another application, i.e. price forecast of electric-
ity markets, in [11, 49], respectively. Due to the local properties of wavelets and the ability of
adapting the wavelet shape according to the training data set instead of adapting the parameters of
the fixed shape activation function, WNNs offer higher generalization capability compared to the
classical feed forward ANNs [11]. Recently, a WNN using Mexican hat mother wavelet function
is proposed for wind speed forecast [51]. However, Morlet wavelet has vanishing mean oscillatory
behavior with more diverse oscillations with respect to Mexican hat wavelet, which can be seen
15
Mexican hat function Morlet function
1 1
0.5 0.5
(X)
(X)
0 0
-0.5 -0.5
-1 -1
-5 -4 -3 -2 -1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 2 3 4 5
X X
from Fig. 2.1, and so it can better localize high frequency components in frequency domain and
various changes in time domain of severely non-smooth time series, e.g. wind power. In [53], it
is mentioned that Morlet leads to a better electricity price forecast compared with Mexican hat.
In this paper, we propose a WNN with multi-dimensional Morlet wavelet as the activation func-
tion of the hidden neurons for wind power forecasting. In the proposed method, we implement a
new training algorithm which can efficiently search for the global optimum solution, while a sim-
ple gradient method is used as the training algorithm of the WNN in [53]. In addition, since the
performance of these two activation functions has not been illustrated in [53], we compare these
functions in numerical results in order to demonstrate the effectiveness of Morelet wavelet over
Architecture of the WNN is shown in Fig. 2.2, which is a three-layer feed-forward structure. In
Fig. 2.2, X = [x1 , x2 , ..., xm ] is the input vector of the forecast process and y is the target variable.
The inputs x1 , x2 , ..., xm of the forecasting engine can be from the past values of the target variable
and past and forecast values of the related exogenous variables. For instance, past values of wind
power along with the past and forecast values of wind speed, wind direction, temperature and
humidity can be considered for wind power prediction, provided that their data is available [33].
A feature selection technique can be used to refine these candidate features and select the most
effective inputs for the forecast process. In this research work, we use the feature selection method
16
Hidden Layer
Input Layer
F1
x1 I
F2 w1 Output Layer
x2 I w2
v1
v2
+ y
xm vm
I
wn
Fn
Figure 2.2: Architecture of the proposed wind forecasting engine (WNN with Morlet wavelet)
of [52]. This method is based on the information theoretic criterion of mutual information and
selects the most informative inputs for the forecast process by filtering out the irrelevant and re-
dundant candidate features through two stages. In the first stage, so-called irrelevancy filter, mutual
information between each candidate input, i.e. xi (t), and the target variable is computed. The can-
didate input with higher value of mutual information has more common information content with
the target variable. The candidate inputs with calculated mutual information value greater than a
relevancy threshold T H1 are considered as the relevant features of the forecast process, which are
retained for the next stage, while other candidate inputs with mutual information value lower than
T H1 are considered as irrelevant features, which are filtered out. In the second stage, so-called
redundancy filter, redundant features among the selected candidate inputs from the first stage are
found and filtered out. Higher value of mutual information between two selected candidates, e.g.,
xk (t) and xl (t), means more common information between these two candidates and thus, they
have a higher level of redundancy. Therefore, the redundancy of each selected feature xk (t) with
the other candidate inputs is measured. Afterwards, if the measured redundancy becomes greater
between this candidate and its rival, which has the maximum redundancy with xk (t), one with
17
lower relevancy should be filtered out [52]. The selected candidate features in the second stage
are considered as the inputs of the wind power forecast engine. Moreover, tuning the values of
the thresholds T H1 and T H2 is performed by cross validation technique. Since this method is not
the focus of this paper, it is not further discussed here. The interested reader can refer to [52] for
details of this feature selection method. In addition, here, the target variable is wind power of the
next time interval that the forecasting engine presents a prediction for it. Multi-period forecast,
e.g. prediction of wind power for the next forecast steps, is reached via recursion, i.e. by feeding
input variables with the forecaster’s outputs. For instance, forecasted wind power for the first hour
is used as y(t − 1) for wind power prediction of the second hour provided that y(t − 1) is among
The forecasting engine should construct the input/output mapping function of X ⇒ y. The
activation function of the input layer nodes is the identity function, i.e. I(x) = x. In other words,
the input layer only propagates the inputs of the WNN to the next layers. The activation function
of the hidden layer nodes of the WNN, i.e. multi-dimensional Morlet wavelet, is constructed as
follows:
m
Y
Fi (x1 , x2 , ..., xm ) = ψai ,bi (xj ), ∀i = 1, 2, ..., n (2.1)
j=1
xj − b i
ψai ,bi (xj ) = ψ( ) (2.2)
ai
where n indicates the number of hidden neurons of WNN, and one-dimensional Morlet wavelet
2
ψ(x) = e−0.5x cos(5x) (2.3)
In (2.1) and (2.2), ψai ,bi is the scaled and shifted version of ψ(.) with ai and bi as the scale and
shift parameters, respectively. Each activation function Fi (.) has its own ai and bi . Based on (2.1),
Fi (.) is m-dimensional wavelet function of x1 , x2 , ..., xm constructed by the tensor product of one-
dimensional Morlet wavelets ψai ,bi . Finally, the output of the WNN, denoted by y in Fig. 2.2, is
18
computed as follows:
n
X m
X
y= wi Fi (x1 , x2 , ..., xm ) + vj xj (2.4)
i=1 j=1
where, wi is the weight between ith hidden neuron and the output node, and vj is the direct input
weight between j th input and the output node. In other words, the output of the WNN is obtained
combination of inputs, i.e. xj . Thus, the proposed WNN not only can benefit from the capabilities
of wavelet functions, such as their ability to capture cyclical behaviors, but also can capture trends
of the signal. Based on the above formulation, the vector of the free parameters of the WNN,
denoted by Z, is as follows:
Therefore, the WNN has N P = m + 3n free parameters, which should be determined by the
We propose a new training strategy to train the developed WNN. This strategy is based on improved
Clonal selection algorithm. In the following, at first, the Clonal Selection Algorithm (CSA) is
briefly introduced. Then, the proposed Improved CSA (ICSA) is presented and adapted as the
As an antigen (e.g., a virus) invades the human body, the biological immunity system selects
the antibodies that can effectively recognize and destroy the antigen. The selection mechanism of
the immunity system operates based on the affinity of the antibodies with relation to the invading
antigen. CSA is an efficient optimization method, inspired by the biological immunity system
selection mechanism, proposed by De Castro and Van Zuben [54]. This method has successfully
been applied to optimization and pattern recognition domains [54,55] and also unit commitment of
power systems [56] in recent years. The performance of CSA can be summarized as the following
19
step by step procedure [54]:
Step 1: Randomly produce the initial population of CSA within the allowable ranges. Each indi-
vidual of the population, so called antibody in CSA, is a candidate solution for the optimization
problem including its decision variables, called genes in CSA [54]. Here, each antibody of the
population of CSA includes N P free parameters of WNN, shown in (2.5). The number of antibod-
ies in the population is denoted by N . The generation number g is set to zero (g = 0). In general,
Zkg = [Zk,1
g g
, Zk,2 g
, ..., Zk,N P ], ∀k = 1, 2, ..., N (2.6)
g
where the lth gene or decision variable Zk,l (l = 1, ..., N P ) can be from v1 , ..., vm or w1 , ..., wn or
a1 , ..., an or b1 , ..., bn as shown in (2.5). The four sets of vj , wi , ai and bi of each individual are
randomly initialized with uniform distribution in the intervals [-1, 1], [-1, 1], [0.5, 2] and [-3, 3],
respectively. It is noted that the interval of [-1,1] is the most common range for weight and bias ini-
tialization of neural networks [57]. According to Fig. 1, the output of the Morlet wavelet function
becomes very close to zero for the input values bigger than 3 and smaller than -3 and therefore, the
initial values for bi is set in the interval of [-3,3]. Finally, with regard to the initialization of ai , the
interval of [0.5 2] is chosen as it is not an extreme interval to make the function too spread/dense.
Step 2: Determine the affinity of the antibodies with respect to the antigen. In the optimization
problems, usually there is no explicit antigen population to be recognized, but an objective function
responds to the evaluation of the objective function for the given antibody [56]. Here, the proposed
WNN is trained, i.e. its free parameters are optimized, by CSA. Thus, training error of the WNN
is considered as the objective function of CSA, which should be minimized. Since training error
of ANN-based forecasting engines is widely measured in terms of Mean Squared Error (MSE), it
Step 3: Sort antibodies based on their training error values in terms of MSE (i.e., the objective
function values) such that the best antibody with the lowest MSE ranks the first.
20
Step 4: Copy the antibodies based on their position in the sorted population:
βN
nck = Round( ), ∀k = 1, 2, ..., N (2.7)
k
where nck is the number of antibodies copied from k th antibody; Round(.) function rounds
up/down its real argument to the nearest integer value; β is a constant coefficient which indi-
cates rate of copy. Thus, an antibody with lower MSE and higher rank (lower k) will be copied
more than those with higher MSE. At the end of this step, the number of copied antibodies will be
N C as follows:
N
X βN
NC = Round( ) (2.8)
k=1
k
Performance of this step is graphically shown in Fig. 2.3. As seen, the first antibody with the
highest rank is copied nc1 times, the second one nc2 times and so on.
Step 5: Mutate the N C antibodies, produced in step 4, using the hypermutation operator [55].
Step 6: Determine MSE value for each mutated antibody. Among the N C mutated and N original
antibodies, select N S antibodies (N S < N ) with the lowest MSE values. These N S antibodies
Step 7: Randomly generate N − N S new antibodies for the next generation. These randomly
generated antibodies enhance search diversity of CSA, and consequently, the algorithm takes the
Step 8: Increment the generation number (g → g + 1). If the termination criterion, such as
maximum number of generations, is satisfied, the algorithm is terminated and the best antibody of
the last generation, owning the lowest MSE, is determined as the final solution of CSA; otherwise,
go back to step 2 and repeat this cycle. The termination criterion used for the training of the WNN
In the above algorithm, N , N S, and β are user-defined settings of CSA. In this algorithm,
antibodies are evolved through the mutation, which is the key operator of CSA. The proposed
Improved CSA (ICSA) includes two enhancements for this operator as follows. De Castro and
21
Figure 2.3: Representation of step 4 (copy operator) of CSA
Van Zuben [54] discuss that the mutation rate should be inversely proportional to the antigenic
affinity: the higher the affinity, the smaller the mutation rate. Thus, in an optimization problem,
more optimal candidate solutions should be less mutated. This general idea is modeled in the
where M SEk is MSE of k th antibody and M SEmin is the lowest MSE among the antibodies of the
current population; NkM ut represents number of genes of the antibodies copied from k th antibody,
produced in step 4, that should be mutated (NkM ut ≤ N P ); the coefficient ρ controls the mutation
rate such that higher ρ leads to lower values of NkM ut . Thus, fewer decision variables are mutated
in antibodies with lower MSE (more optimal candidate solutions) and more decision variables are
mutated in antibodies with higher MSE (less optimal candidate solutions), and consequently, the
individuals of ICSA population are mutated in a coordinated manner. As a result, not only does
ICSA allow antibodies with lower MSE to be copied more than those with higher MSE, but also
controls the mutation process of copied antibodies in accordance with their MSE using (2.9). Note
22
that although the mutation operation is applied to the N C copied antibodies as shown in step 5,
only N values of NkM ut should be computed, since the antibodies copied from one individual have
After determining NkM ut for the antibodies of ICSA, the set of genes that should be mutated
are randomly selected among the decision variables of each antibody. The hypermutation operator
of step 5 adds a normal random variable with zero mean and constant variance to each decision
variable of the mutating antibody [39]. Here, a more effective mutation operation inspired from
M SEmin M SEmin
g+1 −ρ g −ρ g g
Zk,l = [1 − e M SEk
]Zk,l +e M SEk
(Zk1,l − Zk2,l ), (2.10)
1 ≤ k 6= k1 6= k2 ≤ N C, 1 ≤ l ≤ N P
g g+1
where Zk,l and Zk,l represent gene l of antibody k in two successive generations g and g + 1. As
seen, to mutate k th antibody, this mutation operator uses the values of decision variables in two
In [58], it has been discussed that DE mutation operator by computation of difference between
two randomly chosen individuals from the population (here, antibodies k1 and k2 ), determines
a function gradient in a given area (not in a single point), and therefore, prevents trapping the
solution in a local optimum of the objective function. Moreover, to enhance the search diversity of
the proposed mutation operation, it is separately applied to each gene of the mutating antibodies.
Additionally, the exp(.) term of (2.9) and its complement 1 − exp(.) are used in (2.10). If k th
g
antibody Zk,l is a good candidate solution, the exp(.) term and its complement become close to 0
and 1, respectively. Thus, the next generation decision variables mainly take their values from the
current generation decision variables and small gradient terms are added to them. Consequently,
ICSA can search promising areas of the solution space with small steps or high resolution, which is
known as exploitation capability. Based on this capability, a stochastic search technique can extract
optimal solutions of the solution space from its promising areas. High exploitation capability of
the proposed ICSA allows it to find more optimal solutions for the optimization problem of WNN
23
training. In other words, the WNN using ICSA can better learn the complex input/output mapping
function of the wind power forecast process and so predict its future values with higher accuracy.
However, if k th antibody is a poor candidate solution, the exp(.) term and its complement become
close to 1 and 0, respectively, and so the next generation decision variables mainly obtain their
values from large gradient terms. Hence, ICSA can move out from the non-promising areas of the
solution space by large steps. Finally, note that due to the effect of the mutation operation of (2.10),
even the copied antibodies of the same antibody lead to different individuals after the mutation
operation. Thus, at the end of step 5 of the proposed ICSA, N C diverse candidate solutions are
added to N original antibodies (Fig. 2.3), which further enhance the search ability of ICSA.
The termination criterion is an important aspect of the proposed training strategy, as it can af-
fect the performance of the proposed forecasting engine. A low number of ICSA generations may
lead to insufficient training of the forecasting engine and cause the WNN to incorrectly learn the
input/output mapping function of the forecast process, i.e. X ⇒ y. On the other hand, a large
number of ICSA generations increases the computation burden and more importantly may lead to
over-fitting problem of the WNN. When over-fitting occurs in a neural network based forecast en-
gine, it memorizes the training samples instead of learning them, and thus, while the neural network
obtains very low training error, its generalization capability (i.e. its ability to reply to unseen fore-
cast samples) degrades. To avoid these problems, a termination criterion based on cross-validation
technique is used for the training phase of the proposed forecasting engine. In this technique, the
whole gathered historical data is divided to training and validation samples. The WNN is trained
by the proposed ICSA trying to minimize MSE of training samples in each generation. However,
at the end of each generation (step 8), the WNN is tested on the unseen validation samples. For
instance, validation samples can be some samples at the end of the historical data interval, i.e. the
closest historical data to the forecasting horizon. When MSE of the unseen validation samples, as
a measure of the WNN’s forecast error, begins to rise, the generalization capability of the WNN
begins to degrade, and therefore, the training phase should be terminated. Then, the best individual
24
of the WNN in the generation leading to the minimum validation error, which is expected to yield
the maximum generalization capability of the WNN, is selected as the final solution of the training
Although MSE has widely been used in forecasting models as a training error measure, its
applicability to train a neural network is optimal only if the probability distribution of the predic-
tion errors is Gaussian. However, wind power forecast error presents a non-Gaussian shape [59].
Minimizing the squared error is equivalent to minimizing the variance of the error distribution.
Accordingly, the higher moments (e.g., skewness, kurtosis, etc.) are not captured, but they contain
information that should be passed to the free parameters of the neural network instead of remain-
ing in the error distribution. Ricardo Bessa et al. in [59] proposed some new training error criteria
based on minimizing the information content of the error distribution (instead of minimizing the
where G is the Gaussian kernel, Ne is the number of training samples, i is the error for ith training
sample, and σ 2 is the variance. Therefore, MCC approximates a non-Gaussian shape by summation
of Gaussian functions corresponding the errors of training samples. For more detailed information
relating to MCC, refer to [59]. It should be noted that MCC is a maximization problem, while the
proposed training algorithm in section 2.3.2, i.e. ICSA, was described as a minimization problem
since it was based on MSE. Hence, to adapt this criterion to the proposed training algorithm,
1
minimization of M CC
or −M CC can be considered instead of maximization of MCC. It is noted
that the value of correntropy is always positive [60]. Moreover, the values of M SEk and M SEmin
are simply replaced by ( M 1CC )k and ( M 1CC )min (or (−M CC)k and (−M CC)min ), respectively,
in equations (9) and (10). The main reason for including MSE in the structure of the proposed
algorithm is because of the popularity and common use of MSE criterion in training phase of
forecasting models. Moreover, the minimization of errors for training samples is often the objective
function of forecasting problems, and thus, it is more tangible to deal with a minimization problem
25
in this area.
In order to generate forecasts, two parameters must be decided, namely, forecast interval and fore-
cast horizon. Forecast interval determines the length of each time step into the future (e.g., 10
minutes or one hour), and forecast horizon determines how many time steps into the future are of
interest. Both factors depend on the application of the forecasts. For instance, if the forecasts are
used for very short-term adjustments in operation schedules, the forecast horizon could be as short
as a few hours. Furthermore, the forecast interval may also vary depending on the application. For
example, while a unit commitment algorithm may consider hourly intervals, an economic dispatch
algorithm may look into shorter intervals (e.g., 5 minutes). Note that in generating the forecasts,
selecting the forecast interval is sometimes limited by the availability of the data. For example, in
Alberta, the meteorological towers that measure weather factors at wind farm sites are set to collect
the data for every 10-minute interval. Thus, the forecast interval cannot be less than 10 minutes if
In this paper, we have selected to generate hourly wind power production forecasts for Al-
berta’s power system for up to 6 hours into the future as our test case. Alberta’s electricity market
is a real-time market with an hourly settlement interval. Although the system marginal price is
determined every minute, the supply and demand offers must be submitted to the system operator
for hourly intervals. The bids and offers may be changed up to two hours before the operation
hour, which is refereed to as the T-2 window. The majority of slower generators do not strate-
gically adjust their prices and bid at $0/MWh. Faster units, on the other hand, actively watch the
supply-demand balance in the market and adjust their strategies. These units normally act based on
the developments in the market in the short-term, usually the next few hours. In particular, for any
given operation hour, forecasts of wind power generated before the start of the T-2 Window, when
the market participants can still change their bids and offers, is of important value. In addition,
26
given the real-time nature of the market, and considering the fact that only short lead-time units
behave strategically, the system operator is mainly concerned with the supply-demand balance for
the next hour or two. Thus, while forecasts for longer horizons are important, the ones for the
short forecast horizons are particularly valuable for the system operator and market participants in
Alberta.
The forecasts of meteorological variables, e.g., wind speed, through NWP models are known
to be useful in wind power forecasts for longer forecast horizons [41]. Thus, we choose not to
include such data into our model since we are focusing on short-term prediction, and generate
forecasts solely based on past power production values. More specifically, 100 hourly lagged
values of wind power are considered as the candidate inputs, which are processed by the feature
selection technique to select a minimum subset of the most informative features for the proposed
forecasting engine. Furthermore, 60 days prior to each forecast day are considered as the historical
data divided to 59 days as the training set and one day before the forecast day as the validation set.
The second week of March, June, September, and December of year 2012 are considered as test
weeks. For each test week, the forecasts are updated in two different ways in this paper. In the first
part of the numerical results, we update the forecasts every 6 hours, and thus, 28 sets of forecasts
are generated for each of the representative weeks. In this part, the error measures are evaluated
based on the average of errors for the individual forecasts over the entire week. In the second part
of the results, we test the model by updating the forecasts every hour, i.e., for every test hour, there
will be six versions of forecasts produced at previous hours. For these forecasts, we evaluate the
error measures for each forecast horizon, as further discussed later in this section.
To show an example for selection of inputs using the feature selection technique, the selected
features for the third forecasting window of September test week, i.e. the third 6-hours of Septem-
27
20), W P (t − 21), W P (t − 22), W P (t − 23), W P (t − 24), W P (t − 25), W P (t − 26), W P (t − 28)]
These 20 features are selected from 100 candidate inputs W P (t−1), W P (t−2), ..., W P (t−100).
Two error criteria are used in this paper to evaluate forecast errors: normalized Root Mean
Square Error (nRMSE) and normalized Mean Absolute Error (nMAE) defined as follows:
v
u NH
u 1 X WPACT(t) − WPFOR(t) 2
nRMSE = t ( ) × 100 (2.12)
NH t=1 WPCap
NH
1 X WPACT(t) − WPFOR(t)
nMAE = | | × 100 (2.13)
NH t=1 WPCap
where WPACT(t) and WPFOR(t) indicate the actual and forecast values of wind power for hour t.
Also, NH indicates number of hours, which is 168 for test weeks, and WPCap is the total wind
power capacity of aggregated wind farms, which are 861, 941, 941 and 1087 MW in March, June,
September and December of year 2012, respectively, due to growth of wind power capacity in
Alberta.
For these forecasts, at the end of each 6-hour window, when the wind power values of 6 hours
become available, the historical data is updated to perform the wind power prediction of the next 6
hours. Thus, each forecast horizon includes 6 forecast steps. However, one week or 168 hours are
considered as the evaluation period for the error criteria of nRMSE and nMAE to better evaluate
The results obtained from the proposed forecasting method, i.e. WNN with Morlet wavelet
function, ICSA training algorithm and MCC training criterion, in comparison with the same model
but consisting of MSE training criterion instead of MCC are shown in Table 2.1. We have also
generated the forecasts based on some other popular models, i.e., the persistence method, and RBF
and MLP neural networks. A brief description of these models is presented in B. For the sake of a
28
Table 2.1: nRMSE (%) and nMAE (%) results for the four test weeks of year 2012
Test week
Model Error Average
Mar. Jun. Sep. Dec.
nRMSE 13.71 15.14 18.44 12.49 14.95
Persistence
nMAE 10.08 10.79 13.11 8.84 10.71
nRMSE 18.32 14.57 18.62 14.11 16.40
RBF
nMAE 13.32 10.45 13.77 10.24 11.95
nRMSE 15.36 15.62 19.80 12.32 15.78
MLP
nMAE 12.42 11.56 14.54 9.02 11.89
WNN nRMSE 12.38 14.99 17.66 11.65 14.17
with MSE nMAE 9.36 10.64 12.49 8.53 10.26
nRMSE 12.23 12.48 16.68 11.58 13.24
Proposed Method
nMAE 9.22 9.64 11.73 8.22 9.70
fair comparison, all of these methods have the same historical data except the persistence method
that does not require training samples. Observe from Table 2.1 that WNN with MSE as the training
measure outperforms the three other forecast methods. For instance, the average nRMSE and
Moreover, as seen from the results of Table 2.1, considering MCC as the training error measure
can significantly improve the forecasting accuracy of wind power prediction. For instance, the
average nRMSE and average nMAE for the proposed method are respectively 6.6% and 5.5%
lower than those for WNN with MSE, and 11.4% and 9.4% lower than those for the persistence
method, which clearly show the advantage of using MCC error measure in training phase of a wind
power forecasting model. Note that while persistent forecasts are a useful benchmark, especially
for one-step-ahead forecasts, they do not provide any information on variations and ramps in a
multi-step-ahead forecasting practice. Thus, despite their average errors being relatively close to
the proposed method, they do not contain ramping information, and thus, less useful.
Fig. 2.4 graphically shows the forecast results of different forecasting models for June 10, 2012,
in which there is a sharp downward ramp, and June 12, 2012, in which there is an upward ramp.
Observe from this figure that the proposed method, shown in red color, can satisfactorily follow the
trend and ramps of the measured wind power, shown in dotted black color, for both days. Neural
29
June 10, 2012 June 12, 2012
700 600
Proposed
MLP
600
Persistence
500
RBF
Wind power production (MW)
400
400
300
300
Proposed
200 MLP
Persistence
200
RBF
100
Measured
0 100
1 7 13 19 24 1 7 13 19 24
Hour Hour
Figure 2.4: Forecast results of different models for two different days
network based models, e.g., MLP and RBF shown in marked blue color, can also follow ramps of
wind power to some extent although they might miss the correct direction sometimes. However,
persistence model shown in green, cannot provide any ramp information or good forecast values,
especially when there is a ramp in the time series; hence, this model can only be useful for very
To illustrate the effectiveness of Morlet wavelet function as the activation function of WNN,
the proposed method, i.e., WNN with Morlet wavelet function, ICSA training algorithm and MCC
training criterion, is compared with the same model but with the Mexican hat wavelet function (i.e.,
WNN with Mexican hat wavelet function, ICSA training algorithm and MCC training criterion)
in Table 2.2. Improved performance of the proposed method can be seen from Table 2.2 such that
wind power forecast accuracy of the proposed method is better than the WNN with Mexican hat
In the next numerical experiment, the effectiveness of the proposed training strategy is eval-
uated. For this purpose, the proposed ICSA is replaced with several other well-known stochastic
30
Table 2.2: Comparison of the proposed method and a WNN with Mexican hat wavelet
Test week
Model Error Average
Mar. Jun. Sep. Dec.
WNN with nRMSE 13.77 13.23 17.12 12.22 14.08
Mexican hat nMAE 10.13 9.80 12.54 8.71 10.30
nRMSE 12.23 12.48 16.68 11.58 13.24
Proposed Method
nMAE 9.22 9.64 11.73 8.22 9.70
search techniques including Simulated Annealing (SA), Particle Swarm Optimization (PSO), Dif-
ferential Evolution (DE), and CSA, while the other parts of the suggested wind power forecasting
engine, i.e. WNN with Morlet wavelet function and MCC training criterion, are kept unchanged.
As seen from Table 2.3, the proposed ICSA leads to the lowest wind forecasting errors in all test
weeks among all stochastic search techniques including SA, PSO, DE and CSA by 20.7%, 25.1%,
20.8% and 9.8% improvement of average nRMSE, and by 22.4%, 25.0%, 22.5% and 9.4% im-
provement of average nMAE, respectively, demonstrating the effectiveness of the proposed train-
ing strategy.
Table 2.3: Comparison of the proposed training strategy, i.e. ICSA, with SA, PSO, DE and CSA
Test week
Algorithm Error Average
Mar. Jun. Sep. Dec.
nRMSE 19.95 16.23 16.96 13.70 16.71
SA
nMAE 14.73 11.57 13.53 10.18 12.50
nRMSE 18.33 17.69 20.70 14.01 17.68
PSO
nMAE 14.18 13.38 14.37 9.88 12.95
nRMSE 17.51 17.43 17.94 13.97 16.72
DE
nMAE 13.39 13.34 13.08 10.25 12.52
nRMSE 13.79 13.47 18.29 13.22 14.69
CSA
nMAE 10.49 10.29 12.80 9.21 10.70
Proposed Method nRMSE 12.23 12.48 16.68 11.58 13.24
(ICSA) nMAE 9.22 9.64 11.73 8.22 9.70
Finally, we applied the proposed method to predict wind power for the year 2012 so as to have
a comprehensive evaluation of its performance. Hence, 10 months from March to December 2012
have been considered in the last numerical experiment and monthly errors are reported in Table
2.4. It is noted that the data for the first two months is used as training samples for prediction
of month March, and therefore, no prediction result can be reported for January and February.
31
Table 2.4: Wind power prediction error results of the proposed method for 10 months
Error
Test month
nRMSE nMAE
Mar. 11.89 8.32
Apr. 11.98 8.46
May 12.32 9.26
Jun. 13.69 9.74
Jul. 10.71 7.29
Aug. 12.08 8.05
Sep. 13.26 8.78
Oct. 11.35 7.78
Nov. 12.21 8.64
Dec. 11.52 7.80
Average 12.10 8.41
According to this table, monthly errors for months March and December are very close to those
earlier presented for the associated test weeks in these months. For instance, monthly nRMSE
and nMAE are respectively 11.52% and 7.80% for the month December, and weekly nRMSE
and nMAE are 11.58% and 8.22%, respectively, for the test week of December. Errors related
to September test month shown in Table 2.4 are considerably lower than those for the test week
of September presented in previous tables. The reason is that there are more severe ramps in the
second week of September selected as the test week compared with other weeks in this month, and
therefore, the average error of the month is lower than the error associated with the second week
of this month. On the contrary, forecasting errors related to test month of June presented in Table
2.4 are higher than those earlier presented for the test week of June. Here, wind power data for
the second week of June has been more predictable than other weeks in this month. Moreover,
the average errors presented in Table 2.4, i.e., 12.10% and 8.41% in terms of nRMSE and nMAE,
respectively, are close and even lower than ones presented for the four test weeks, i.e., 13.24% and
9.70%. As a result, it validates the comparative results presented in Tables 2.1, 2.2 and 2.3 between
the proposed method and other methods based on the test weeks’ consideration. Furthermore,
considering forecasting errors presented in Tables 2.1, 2.2 and 2.3, no matter which test week is
considered (e.g., with high or low predictability), the proposed method demonstrated its higher
32
wind power forecast accuracy compared with other benchmark models.
Prediction errors can also be calculated for each look-ahead forecast distinctly (i.e., each forecast
hour). In some of the mainly academic literature, the average of prediction errors for 1-hour ahead
up to the last hour of the forecast horizon is calculated. For instance, in 6-hour ahead prediction,
each forecasting window includes 1-hour to 6-hour ahead forecasts, and the error is the average
of the prediction errors of these 6 forecast values, as performed in subsection 2.4.1. However,
in most commercial forecasting tools, the forecasts are updated after each forecast interval, and
forecast accuracy is evaluated for each look-ahead interval individually. In other words, the values
of wind power for the first 6 hours, i.e., W P (t + 1), ..., W P (t + 6), are predicted at time t. When
the observed value of wind power for the hour t + 1 becomes available, the data is updated and
forecasts for the next 6 hours, i.e., W P (t + 2), ..., W P (t + 7), are predicted at time t + 1.
In other words, the data is updated every hour, while the forecast horizon is still 6-hour-ahead.
In the following numerical experiment, the proposed method is applied for aggregated wind power
forecast of Alberta in 10 test weeks, including the second weeks of March to December 2012,
such that the prediction accuracy of different look-ahead forecasts within the 6-hour window is
evaluated. Fig. 2.5 demonstrates the results of this numerical experiment including the curves
of average nRMSE for the six look-ahead forecasts, obtained from the proposed approach. We
also present the same measures of accuracy for the third-party forecasts currently employed by
the Alberta Electric Systems Operator [18]. Observe from this figure that forecast accuracy for
both models decreases as the look ahead forecast increases due to higher cumulative errors. For
instance, the average forecast error of the proposed method for 1-hour-ahead forecast, i.e. forecast
values generated 1 hour before the real time, is 5.04%, while for 6-hour-ahead, i.e. forecast values
Compared to the third-party forecasts, the proposed method results in better or comparable
forecast accuracy up to 3 hours ahead. For instance, the proposed model improves the accuracy
33
16
14
12
nRMSE (%)
10
8 Third-party Forecasts
Proposed Method
4
1 2 3 4 5 6
Look-ahead Forecast (Hour)
Figure 2.5: Average nRMSE errors of two forecasting models in different look ahead forecasts
for 1-hour-ahead forecast by 30.29%. Note that at any settlement interval, the next three hours are
particularly important to the system operator in Alberta. This is because the real-time nature of
the market requires the next hour or two need to be monitored properly to ensure system security
and supply-demand balance. Strategic market participants also need to watch the next three hours
because it is just outside the T-2 Window. For the three longer look-ahead windows, the third
party forecasts outperform the ones generated by our model. This is mainly because the third party
forecasts include NWP data. Thus, a combination of the forecasts generated by our model for the
first three hours and the third-party forecasts for the last three hours would provide a more accurate
2.5 Conclusion
In this paper, a new wind power prediction strategy is proposed. A WNN with multi-dimensional
Morlet wavelet as the activation function of the hidden neurons and MCC as the training crite-
rion is applied as the forecasting engine to implement the input/output mapping function of wind
34
prediction process. A new stochastic search technique, named ICSA, which is the improved ver-
sion of Clonal selection algorithm, is proposed and adapted as the training procedure to optimize
the free parameters of the forecasting engine. Effectiveness of the whole proposed wind power
forecasting strategy as well as the effectiveness of its main components including the suggested
WNN and training procedure is extensively evaluated by real-world data for wind power predic-
tion. As regards the training algorithm, the proposed ICSA outperforms the other stochastic search
algorithms including SA, PSO, DE and CSA in terms of both nRMSE and nMAE, illustrating the
effectiveness of the proposed training strategy. Moreover, the suggested Morelet wavelet function
results in more accurate wind power forecast than Mexican-hat wavelet function, and MCC train-
ing criterion leads to lower wind power prediction errors than the traditional training error measure
of MSE.
35
Chapter 3
Microgrids 1
3.1 Introduction
Micro-grids are integrated energy systems composed of distributed energy resources and multiple
electrical loads operating as an autonomous grid, which can be either in parallel to or islanded from
the existing power grid. A micro-grid can be considered as a small-scale version of the traditional
power grid that its small scale results in far fewer line losses and lower demand on transmis-
sion infrastructure. All of these advantages are consequently motivating an increased demand for
Considering the fast and worldwide development of micro-grids, their optimal operation re-
quires advanced tools and techniques. In particular, Short-Term Load Forecast (STLF) is an in-
dispensable task for the operation of a micro-grid. In conventional power systems, STLF is an
important tool for reliable and economic operation of power systems, as many operating decisions,
such as dispatch scheduling of generating capacity, demand side management, security assessment
and maintenance scheduling of generators, are based on load forecast [9, 62–66]. Load forecasts
also have significant roles in energy transactions, market shares and profits in competitive elec-
tricity markets [66, 67]. Different prediction strategies have already been presented for the STLF
of traditional power systems over the years. These methodologies are generally divided into two
main groups: classical statistical techniques and computational intelligent techniques. Reviews on
1
2015
c Elsevier Ltd. Reprinted, with permission, from [13]: H. Chitsaz, H. Shaker, H. Zareipour, D. Wood,
and N. Amjady,“Short-term Electricity Load Forecasting of Buildings in Microgrids”, Energy and Buildings,
vol. 99, pp. 50-60, July 2015.
36
some of these strategies can be found in [9, 62, 64–67].
In a similar way, STLF is a key factor in operation of micro-grids such as energy management
for optimal utilization of available resources in order to minimize the operation cost or any environ-
mental impact of a micro-grid [5]. Moreover, STLF for a micro-grid can be used for profitable trade
of electric energy within the grid. In other words, it is important for the operator of a micro-grid
to determine the amount of exchanged power with a wholesale energy market so as to maximize
the total benefit [68]. It has also been discussed that the forecasted loads as well as forecasted
generation of renewable resources are the main inputs for optimal energy management [7, 69] and
However, modeling and forecasting of micro-grids’ loads can be more complex tasks than
those usually applied for conventional power systems, as the load time series of micro-grids is
more volatile in comparison with the load of power systems, as demonstrated later in the present
paper. Since the size of a micro-grid is considerably small compared to a traditional power system,
the load of a micro-grid includes more fluctuations. In other words, the inertia in small-scale
systems is low and therefore, the smoothness of load time series in such systems degrades. Using a
criterion to measure the volatility of a time series, it will be shown in this paper that the volatility of
load time series for a micro-grid is considerably higher than that for a conventional power system.
As a result, there is a need to adapt a suitable STLF model to volatile behavior of micro-grids load
time series. Despite the importance of STLF for micro-grids there are a few works presented in
this area. Authors in [70] present an on-line learning model based on Multiple Classifier Systems
(MCSs) for short-term load forecasting of micro-grids, and the model was tested on real data of a
micro-grid. A bi-level prediction strategy is proposed in [6] for STLF in micro-grids. This strategy
is composed of a forecaster including neural network and evolutionary algorithm in the lower level
and an enhanced differential evolution algorithm in the upper level for optimizing the performance
of the forecaster. The proposed model in [6] is designed having the aggregated micro-grid load in
mind. However, the present paper focuses on forecasting the load of the individual loads within a
37
micro-grid, with potentially significantly higher volatility compared to the aggregated micro-grid
load. Forecasting individual micro-grid load components is important for operation scheduling and
Some research works have also been presented regarding electricity load prediction for resi-
dential areas and buildings [72–74]. The proper consumption of electricity in buildings leads to
lower operational costs. If the facility manager could predict the electricity demand of the build-
ing, actions could consequently be taken to reduce the amount of energy and therefore, reduce the
operational cost of the building [74]. A few works have been published very recently in the area of
energy prediction of buildings. For instance, long-term energy consumption of a residential area in
South West China has been studied in [75]. In this reference, an Artificial Neural Network (ANN)
model is compared with some other prediction models, including Grey model, regression model,
polynomial model and polynomial regression model, to forecast the total energy consumption of
the residential area, and it is shown that ANN model outperforms the other models. Having access
to detailed data of a six-story multi-family residential building located on the Columbia University
campus in New York City, the authors in [76] were able to conduct a comparative spatial analysis
to forecast the energy consumptions of units, floors and the whole building for different temporal
intervals (e.g., 10-min, hourly and daily). The results indicate that the most effective models are
built with hourly consumption at the floor level providing that high resolution and granular data
is available via advanced smart metering devices. In [77], a Case-Based Reasoning (CBR) model,
demand in an office building located in Verennes, Quebec, Canada. Three forecasting horizons
of 3-hour, 6-hour and 24-hour ahead have been simulated with hourly prediction resolution, and
the results demonstrate that the prediction capability of the model is improved when the hori-
zon is reduced to 3-hour ahead. Authors in [78] have proposed a new methodology for electrical
consumption forecasting based on end-use decomposition and similar days. Total consumption
forecast is also obtained from end-use consumptions and the data of selected days. In [79], a
38
building-level neural network-based ensemble model is presented for day-ahead electricity load
forecasting, and it is shown that the presented model outperforms SARIMA (Seasonal Auto Re-
gressive Moving Average) by up to 50%. However, the comparisons are made only with SARIMA
model, which is a linear statistical model, which may not be capable of capturing high nonlinearity
To summarize the main points, micro-grids can bring considerable benefits to power systems,
such as supplying loads in remote areas, reducing total system expansion planning cost, reducing
carbon emission through coordinated utilization of Renewable Energy Sources (RESs), provid-
ing cheaper electricity through proper energy management of available resources and energy trade
with the main grid, and improving system reliability resiliency by providing dispatchable power
for use during peak power conditions or emergency situations. Moreover, it was discussed that a
short-term load forecasting tool is of high importance in optimal energy management and secure
operation of micro-grids. In this way, some research works have been conducted to develop load
forecasting models with higher accuracy. However, as discussed above, a few works have focused
ment of forecast accuracy is still needed in this area. In the present paper, a forecast method is
proposed for the STLF of micro-grids with the focus of electricity load prediction for individual
buildings. The main contribution of this paper is applying a Self-Recurrent Wavelet Neural Net-
work (SRWNN) forecasting engine for electricity load prediction of micro-grids. Moreover, the
Levenberg-Marquardt (LM) learning algorithm is implemented to train the SRWNN. The proposed
method improves the forecast accuracy for highly volatile and non-smooth time series of micro-
grid electricity load. The higher the forecast accuracy of electricity load, the more efficient energy
The remaining parts of the paper are organized as follows. Section 3.2 provides a data analysis
on different electricity load time series to draw a distinction between the load of a micro-grid and a
power system. The proposed forecasting method consists of the SRWNN as the forecasting engine
39
Table 3.1: Comparison of electricity load time series in terms of volatility
British Columbia’s California’s
Volatility index BCIT
System Load System Load
Daily volatility (%) 8.34 2.66 3.18
Weekly volatility (%) 7.09 2.28 3.15
and LM as the training algorithm, and is presented in Section 3.3. The proposed load forecasting
method is tested on real-world test cases and the results are compared with the results of some
other prediction approaches in Section 3.4. Finally, Section 3.5 concludes the paper.
A data analysis is presented in this section so as to compare the characteristics of a micro-grid load
time series and electricity load in power systems. The British Columbia Institute of Technology
(BCIT) in Vancouver, the Province of British Columbia (BC), Canada, is considered as the micro-
grid test case studied in this paper. BCIT’s Burnaby campus is Canada’s first Smart Power Micro-
grid comprised of power plants (including renewable resources of wind and photovoltaic modules),
campus loads, command and control (including substation automation, micro-grid control center
and distributed energy management), and communication network [80]. The load data used in this
work is from one building with a peak value of 694 kW from March 2012 to March 2013, within
the BCIT micro-grid. Hereafter, we refer to this load as BCIT. To draw a comparison between the
characteristics of a micro-grid load and power system load level, the load time series of two power
systems, i.e., British Columbia where BCIT micro-grid is located, and California, are analyzed.
Electricity load follows daily and weekly periodicities. In this way, we consider two measures
for volatility analysis, i.e., daily volatility and weekly volatility. These measures are based on the
standard deviation of logarithmic returns over a time window. In general, daily volatility quantifies
the overall change in hourly electricity load from one day to another, and weekly volatility mea-
sures the load changes in subsequent weeks. For more details regarding aforementioned volatility
40
(a) BC’s power system electricity load (b) Building electricity load in BCIT
Figure 3.1: One-year hourly load data of BC’s power system and the building in BCIT
One year hourly load data has been considered for British Columbia’s and California’s power
systems for the same period, i.e., from March 2012 to March 2013. Observe from Table 3.1 that
both daily and weekly volatility indices for a micro-grid are considerably higher than those for
power systems, which demonstrate low smoothness of micro-grid load time series. For instance,
daily volatility related to the micro-grid is 8.34%, while it is respectively 2.66% and 3.18% for
British Columbia’s and California’s power systems. It means that electricity load of the micro-grid
fluctuates more severely from one day to another compared with that of power systems. Like-
wise, weekly fluctuations are more severe in the micro-grid than those in a power system. As a
result, daily and weekly periodicities of electricity load in a micro-grid are noticeably low, and
Fig. 3.1 illustrates one-year hourly load data of British Columbia’s power system and that of
the building in BCIT. It is noted that the data is normalized to the maximum value. As seen, the
aggregated electricity load in a large area (e.g., the province of British Columbia) is noticeably
different from the aggregated load in a building. For instance, British Columbia’s load follows a
common seasonal pattern, as the load decreases in the spring in April and starts to increase in the
fall in October. The building’s load follows a fairly similar seasonal pattern. From the beginning
of the academic year in september, electricity load starts increasing, and it starts decreasing in
41
(a) 1−hour ramp distribution
2000
Number of occurrences
BC power system
1500 Building in BCIT
1000
500
0
−20 −15 −10 −5 0 5 10 15 20 25
% of the peak load
(b) 2−hour ramp distribution
1200
Number of occurrences
BC power system
900 Building in BCIT
600
300
0
−20 −15 −10 −5 0 5 10 15 20 25
% of the peak load
February. Moreover, it is seen that fluctuations of load are more severe for a building compared
with those for a power system. These variations in load time series of the micro-grid graphically
To have a better understanding of such severe load fluctuations for a building, hourly changes
of load, i.e., the difference between the two observations at subsequent hours so-called 1-hour
ramps, can be taken into consideration. Fig. 3.2 (a) shows the distribution (with equally spaced
bins of 1% of the peak load) of 1-hour ramps for both the building in BCIT and BC’s power system
loads. Note that the negative values show downward ramps. As seen, more frequent hourly upward
and downward ramps have been occurred in the building in BCIT with the amplitude of more than
5% of the peak load. The most severe ramp happened in BC’s power system load is a ramp up
with the amplitude of almost 9% of the peak load, while it is a ramp up with more than 20% of the
peak load for the building. Similarly, Fig. 3.2 (b) illustrates the distribution for 2-hour ramps, i.e.,
load variations in two-hour duration. As the longer time is considered for ramps, the larger ramps
42
Table 3.2: Ramp events in electricity load time series
Interval 1-hour 2-hour
Building BC power Building BC power
(% of the peak load)
in BCIT system in BCIT system
5% ≤ RU < 10% 556 504 818 971
10% ≤ RU < 15% 84 0 362 352
Ramp Up
15% ≤ RU < 20% 12 0 121 55
(RU)
RU > 20% 4 0 21 0
5% ≤ RD < 10% 520 259 1003 1272
10% ≤ RD < 15% 48 0 283 141
Ramp Down
15% ≤ RD < 20% 9 0 57 0
(RD)
RD > 20% 0 0 11 0
will be detected. Obviously, sharp upward and downward ramps have more frequently happened
To provide more detailed statistics of ramps, table 3.2 shows the number of upward and down-
ward ramps for 1-hour and 2-hour duration. For instance, there have been 100 upward ramps more
than 10% of the peak load in BCIT building load, while no 1-hour ramp up has occurred with the
amplitude of more than 10% of the peak load in BC system load. With regard to 2-hour ramp up,
there have been 55 2-hour ramp ups more than 15% of the peak load occurred in BC, while it has
been 142 for BCIT load. This table also demonstrates that the number of downward ramps are
fewer than the number of upward ramps when large ramps are concerned.
Based on the above descriptions, prediction of electricity load time series of a building seems to
be more difficult than that of a power system since high volatility lowers the predictability. Conse-
quently, it is required to adapt a forecasting model so as to cope with the challenging characteristics
of such time series. In the next section, a forecasting model is proposed to capture the dynamic
43
3.3 The forecasting model
The discussion in section 3.2 showed that dealing with micro-grid load time series is a more chal-
lenging task compared with a power system, and therefore, traditional STLF will not result in
satisfactory accuracy in micro-grid load prediction. In this way, the SRWNN forecasting engine
is firstly presented in this section and the training algorithm is then implemented to set the free
The wavelet theory has been applied through two different approaches for forecast processes. The
first one is using the wavelet transform as a preprocessor to compose the load time series into
its low and high frequency components. Each component is separately processed by a forecast
engine [82]. The other approach is constructing the wavelet neural network (WNN) in which a
wavelet function is used as the activation function of the hidden neurons of a Feed-Forward Neural
Network (FFNN). The WNN was first introduced in [83] for approximating nonlinear functions.
Due to the local properties of wavelets and the concept of adapting the wavelet shape according to
training data set instead of adapting the parameters of the fixed shape basis function, WNNs have
better generalizability compared to the classical FFNNs, and therefore, these are more appropriate
The SRWNN is a modified model of WNN including the properties of the dynamics of Re-
current Neural Networks (RNNs) [84] and the fast convergence of WNNs, which has successfully
been applied to estimating and controlling nonlinear systems [85]. Since the SRWNN has a self-
recurrent mother wavelet layer, it can store the past information of wavelets and well capture the
3
complex nonlinear systems [86]. Having self-feedback loops and input direct terms, SRWNN
3
This footnote was added in response to a question raised by a committee member in the PhD oral examination.
A linear system is a system in which the output can be represented by a linear combination of inputs. In time se-
ries forecasting, linear regression models are an example of linear systems in which the output (forecast) is a linear
weighted average of a set of inputs. A nonlinear system, however, is a system in which the change of the output is
not proportional to the change of the input. In prediction processes, artificial neural networks are an example such a
nonlinear mapping function of a set of inputs to the output.
44
has improved capabilities compared to WNN, such as its dynamic response and information stor-
ing ability. Therefore, SRWNN has been applied as a forecast engine in this paper to overcome
the volatile and non-smooth behavior of the load time series in a micro-grid. Moreover, SRWNN
does not include limitations, such as dependency on appropriate tuning of parameters and complex
optimization process, which are likely to be found in models such as Support Vector Machines
(SVMs) [45].
The architecture of the SRWNN, shown in Fig. 3.3, is a feed forward network with four layers.
As seen, X = [x1 , ..., xM ] is the input vector of the forecast engine and y is the target variable.
The inputs x1 , ..., xM of the forecast engine can be from the past values of the target variable and
past and forecast values of the related exogenous variables. For instance, past values of electricity
load along with the past and forecast values of temperature can be considered for electricity load
A feature selection technique can be used to refine these candidate features and select the most
effective inputs for the forecast process. In this research work, we use the feature selection method
of [52]. This method is based on the information theoretic criterion of mutual information and
selects the most informative inputs for the forecast process by filtering out the irrelevant and re-
dundant candidate features through two stages. In the first stage, which is called irrelevancy filter,
mutual information between each candidate input, i.e. xi (t), and the target variable is calculated.
The higher value of mutual information for xi (t) means the more common information content
of this feature with the target variable. The candidate inputs with computed mutual information
value greater than a relevancy threshold, denoted by T H1 , are considered as the relevant features
of the forecast process, which are retained for the next stage. However, other candidate inputs with
mutual information value lower than T H1 are considered as irrelevant features, which are filtered
out. In the second stage, which is called redundancy filter, redundant features among the candidate
inputs secected by the relevancy filter are found and filtered out. Two selected candidates, e.g.,
xk (t) and xl (t), with high value of mutual information have more common information, i.e., high
45
Figure 3.3: Architecture of the SRWNN.
level of redundancy. Thus, the redundancy of each selected feature xk (t) with the other candidate
inputs is calculated. Then, if the measured redundancy becomes greater than a redundancy thresh-
old, denoted by T H2 , xk (t) is considered as a redundant candidate input. Hence, between this
candidate and its rival, which has the maximum redundancy with xk (t), one with lower relevancy
should be filtered out [52]. The selected candidate features in the relevancy filter are considered
as the inputs of the load forecasting engine. Moreover, fine-tuning the values of the thresholds
T H1 and T H2 is performed by cross validation technique. Since this method is not the focus of
this paper, it is not further discussed here. The interested reader can refer to [52] for details of this
Therefore, the target variable is the electricity load of the next time interval that the forecasting
engine presents a prediction for it using the past values of electricity load and calendar effects.
Moreover, Multi-period forecast, e.g. load prediction for the next 24 hours, is reached via recur-
sion, i.e. by feeding input variables with the forecaster’s outputs. For instance, forecasted load for
the first hour is used as y(t − 1) for load prediction of the second hour provided that y(t − 1) is
The input layer of the forecast engine transmits M input variables, which are selected by the
46
feature selection technique, to the next layer without any changes. The second layer, which is
called the wavelet layer, consists of N × M neurons that each has a self-feedback loop. In this
paper, Morlet wavelet function has been considered as the activation function of neurons in the
2
ψ(x) = e−0.5x cos(5x) (3.1)
In SRWNN, a wavelet of each node is derived from its mother wavelet as below:
ui,j − bi ui,j − bi
ψi,j (ri,j ) = ψ( ), ri,j = (3.2)
ai ai
where ψi,j is the scaled and shifted version of Morlet mother wavelet with ai and bi as the scale
and shift parameters, respectively. In addition, the inputs of the wavelets in (3.2) are as follows:
where z −1 is the time delay; thus, the input of this layer contains the memory term ψi,j z −1 which
can store the past information of networks, and θi,j denotes the weight of the self-feedback loop,
which represents the rate of information storage. This feature is the main difference between a
SRWNN and a WNN. In fact, the SRWNN is the same as WNN when all θi,j are equal to zero.
However, it is noted that the initial values for θi,j are usually considered zero, which means there
is no feedback initially.
where, wi is the weight between ith neuron of the product layer and the output node, vj is the
direct input weight between j th input and the output node, and g is the bias of the output node.
47
Therefore, the output of SRWNN is obtained from a combination of multi-dimensional wavelet
functions, i.e. Ψi , as well as a combination of inputs, i.e. xj . In other words, the proposed model
not only can benefit from the capabilities of wavelet functions, such as their ability to capture
cyclical behaviors, but also can capture trends of the signal. In addition, SRWNN can benefit from
its dynamic response by storing the past information of wavelets in self-feedback loops (equation
3.3) to capture complex nonlinearities. Based on the aforementioned formulation, the vector of the
Therefore, the SRWNN has N P = M +3N +M ×N +1 free parameters which are determined by
the training method. It should also be noted that the SRWNN model presented in this paper differs
from the SRWNN proposed in [86]. There are two differences between these two models. First,
there is an additional external bias (e.g., g) to the output layer of the presented SRWNN in this
work. A bias can increase or lower the net input of the activation function, depending on whether
it is positive or negative, respectively [87]. Consequently, biases can enhance the input/output
mapping function by adding another feature to neural networks. Second, Morlet wavelet functions
have been used as the activation functions in Wavelet layer of SRWNN in this paper, while the
second derivative of Gaussian functions, i.e., Mexican hat wavelet function, in reference [86] of
the previous version. Although the Mexican hat wavelet function has successfully been used in
WNN models for forecasting applications due to its superiorities over Daubechies wavelets [11], it
has been shown that Morlet wavelets outperform Mexican hat wavelets for prediction applications
[12, 53]. Therefore, we applied Morlet mother wavelets as the activation functions in SRWNN in
our paper.
In this subsection, a training algorithm is implemented to set the free parameters of the SRWNN
denoted by P in (3.6). Since the mother wavelet function used in the SRWNN, i.e. Morlet wavelet
48
function, is differentiable with respect to all free parameters, the Levenberg-Marquardt (LM) learn-
ing algorithm can be used in this regard. This learning algorithm was applied to train the neural
networks by Hagan and Menhaj in [88]. Due to the advantages of the LM algorithm, such as
accurate training and fast convergence, it has been recommended in many research works, and
therefore, it is implemented for training the SRWNN in this paper. The LM algorithm is briefly
Moreover, the termination criterion used for the training of the SRWNN is based on early-
stopping technique. Accordingly, the whole available data is divided into training and validation
samples. The SRWNN is trained using the training samples and the error for validation samples is
monitored in each iteration. As the validation error begins to rise during some number of iterations,
usually five, the training phase is stopped and the values of the free parameters relating to the
iteration with the least validation error are stored as the final solution of the training algorithm.
In this paper, we mainly focus on 24-hour ahead load prediction with hourly forecast steps. Day-
ahead load forecasting can bring significant operational advantages for energy management of
micro-grids. For instance, the BCIT micro-grid consists of different types of generating units (e.g.,
thermal, wind and PV units), and day-ahead load predictions are used for energy management pur-
poses. In other words, optimal utilization of available resources is achieved using load forecasting
in order to minimize the operation cost for the BCIT campus micro-grid. Moreover, as this micro-
grid can operate in both stand alone and grid-connected modes, accurate load forecasts can be used
for profitable trade of electric energy within the British Columbia power system.
The same load time series data of the building in BCIT and two power systems are used for nu-
merical experiments of this section. Based on the data analyses presented in section 3.2, electricity
load not only depends on the load profile of the previous day, i.e., daily periodicity, but also the
load pattern of the previous week, i.e., weekly periodicity. To capture such patterns, 192 candidate
49
inputs have been considered as lagged hourly load data, i.e., {Lt−192 , ..., Lt−1 } where Lt indicates
the electricity load at time t. The feature selection technique selects the most informative lagged
load values from these candidate inputs. Calendar information is also highly important for a load
forecasting model so as to capture weekly and seasonal patterns. For instance, either considering
the day of the week or differentiating weekends and weekdays is a common way presented in the
literature [5, 64, 89]. Thus, weekends and holidays are considered in this work using a binary vari-
able for detecting weekends and holidays from weekdays. The month of the year is also used in
some cases [89]; however, it is not considered in this paper since the seasonality factor is already
captured, as the model is re-trained every day. Furthermore, temperature data as an exogenous vari-
able has been used to improve load forecasting prediction since temperature time series usually has
high relevancy to electricity consumption time series [64, 66, 67, 90]. Accordingly, based on pub-
licly available data, seven daily values of temperature for the previous week (e.g., Td−7 , ..., Td−1 ),
and the daily forecast value of the temperature for the prediction day (e.g., Td ) were first considered
for the model, where Td represents the average daily temperature for day d. However, numerical
experiments for the BCIT test case revealed that low resolution temperature data, i.e. daily data,
cannot improve the accuracy for hourly load forecast. Therefore, we tested historical hourly tem-
perature data (located in Vancouver) and also used the same time series for temperature forecasts,
i.e., perfect forecasts, in order to observe if hourly temperature data can enhance the forecast results
for BCIT test case. For this purpose, lagged hourly temperature data, i.e., {Tt−192 , ..., Tt−1 }, are
considered as 192 candidate inputs that feed the feature selection stage along with 192 candidate
inputs for load data. The feature selection technique then selects the most informative candidates
among the candidates of load and temperature and transfer them to the model. Considering the
selected inputs, few temperature inputs are among the selected inputs that show the low correlation
of the temperature time series and load time series of BCIT. The low correlation results from the
fact that the electric load of this building is mainly lighting. Considering the mild temperatures
in Vancouver, the heating load is not as significant. The numerical results also supported this low
50
correlation, as hourly temperature data with even perfect forecasts could not improve the forecast
accuracy of the model. Therefore, temperature inputs are not considered for the numerical results
in this paper.
To show the effectiveness of different forecasting engines, SRWNN is compared with two other
efficient neural network-based forecasting models, i.e., WNN and Multi-Layer Perceptron (MLP).
It is noted that statistical models (e.g., Autoregressive Integrated Moving Average (ARIMA) model)
are not considered in this paper since such techniques are basically linear methods and have limited
capability to capture nonlinearities in the load series [91, 92]. Therefore, we chose two efficient
Computational Intelligence (CI) based models, e.g., MLP as an efficient Feed Forward Neural
Network (FFNN) and WNN as an effective model combining nonlinear mapping merits of FNNNs
Hence, 10 test months of hourly load data from the building in BCIT from May 2012 to Febru-
ary 2013 are considered for 24-hour ahead load prediction. It is noted that the first two months
of the historical data is used for training of the forecast engine and so the results of the first two
months cannot be presented here. Two error criteria are used in this paper to evaluate forecast er-
rors: (i) normalized Root Mean Square Error (nRMSE) and (ii) normalized Mean Absolute Error
N
1 X LACT(t) − LFOR(t)
nMAE = | | × 100 (3.8)
N t=1 LPeak
where LACT(t) and LFOR(t) indicate the actual and forecast values of electricity load for hour t.
Moreover, N indicates number of hours for each month, and LPeak is the peak value of the electricity
load over the year, which is 694 kW for this test case. Observe from Table 3.3 that SRWNN out-
performs the other forecasting models in all test months and in terms of both nRMSE and nMAE.
For instance, the average nRMSE and average nMAE of SRWNN are (5.67-4.98)/5.67=12.1%
51
Table 3.3: Forecasting errors, in %, of SRWNN, WNN and MLP for 10 test months.
MLP WNN SRWNN
Month nRMSE nMAE nRMSE nMAE nRMSE nMAE
May 8.44 6.22 5.96 4.05 5.23 3.80
Jun. 9.92 7.55 5.44 4.27 4.86 3.80
Jul. 10.41 7.92 7.04 5.26 5.43 4.01
Aug. 10.40 8.14 6.57 4.95 6.46 4.80
Sep. 11.88 8.40 7.83 6.01 6.28 4.82
Oct. 10.45 7.68 4.81 3.83 4.24 3.28
Nov. 6.34 4.89 4.62 3.56 4.30 3.21
Dec. 6.21 4.58 4.54 3.35 4.22 3.05
Jan. 6.93 4.74 4.86 3.40 4.25 3.11
Feb. 6.85 5.29 5.06 3.94 4.58 3.50
Average 8.78% 6.54% 5.67% 4.26% 4.98% 3.74%
and (4.26-3.74)/4.26=12.2% lower than those of WNN, and (8.78-4.98)/8.78=43.2% and (6.54-
3.74)/6.54=42.8% lower than those for MLP, respectively. This table demonstrates that for a highly
volatile time series, i.e. micro-grid electricity load, a SRWNN forecasting model can more effi-
ciently cope with the variations and non-smooth behavior of the time series.
Moreover, Fig. 3.4 illustrates the carpet charts of monthly mean absolute errors for different
hours of the day for SRWNN and WNN on BCIT test case. This figure clearly shows that large
errors for both models usually occur between 12:00 PM and 16:00 PM when the load peaks. How-
ever, this colormap shows lower errors during the peak hours for SRWNN in comparison with the
WNN. More importantly, the superiority of SRWNN over WNN is revealed during the upward
ramps in the morning. As analyzed in section 3.2, sharp upward ramps occur more than downward
ramps for BCIT test case, and consequently, any improvements in forecasting ramp up events can
considerably enhance the forecast accuracy of this load time series. Fig. 3.5 demonstrates the
average of mean absolute errors for all 10 months. According to these two curves, SRWNN shows
lower yearly errors during the morning ramp, which usually occurs from 7:00 AM to 12:00 PM. In
addition, there is an improvement in ramp down forecasting from 16:00 PM to 18:00 PM.
Curves of generated forecasts and real data for a good forecasting day, i.e. November 15,
and a bad forecasting day, i.e. September 7, is demonstrated in Fig. 3.6. Fig. 3.6(a) shows that
52
(a) SRWNN (b) WNN
Figure 3.4: Mean absolute error (kW) of different hours of the day in different months
45
40
35
Mean Absolute Error (kW)
30
25
20
SRWNN
WNN
15
10
5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Hour
Figure 3.5: 10-month mean absolute error for different hours of the day
there are sharp changes and variations on September 7. Sharp spikes could result from the high
temperatures during specific days, which increase the electricity consumption of the buildings for
air conditioning. As a consequence of such severe ramps, the forecasting model faces difficulties
to capture this high sudden variations in electricity load. The major error is the magnitude error
occurred during the peak load. On the contrary, there has been smoother variations on November
15 shown in Fig. 3.6(b), so the forecasting model could perfectly capture the upward ramp. As
a result, the challenge of high volatility and sharp ramps in micro-grid time series is evidently
distinct from power system loads, and makes such time series more unpredictable.
In the next experiment, forecasting errors of different days of the week for the same 10-month
53
(a) Bad forecasting day (b) Good forecasting day
Figure 3.6: Samples for bad (a) and good (b) forecasting days
period are separately considered to observe the users’ behavior. It is noted that the electricity con-
sumption of the building is mainly from lighting as mentioned earlier in this section. Here, users’
behavior is represented by considering the calendar effect as the inputs of the model. A binary
variable for differentiating weekends and holidays from weekdays is used, i.e., zero represents
weekends and holidays, while one represents weekdays. Fig. 3.7 demonstrates the forecasting
errors with and without the calendar effects. First, observe that the average of nRMSE considering
the calendar effect, i.e., 4.78%, is lower than that when the calendar effect is not included, i.e.,
5.32%. Moreover, according to the figure, the highest error occurs on Mondays, which is the first
working day at the campus. Calendar inputs can efficiently capture such behavior of the users. For
instance, the forecasting error in terms of nRMSE for Monday considerably decreased from 7.32%
to 5.83% when the calendar effect is taken into account. In addition, the standard deviation of the
error associated with different days of week has decreased from 0.97% to 0.57% using the calendar
effect. In other words, the model performs in a more robust way for predicting different days of
54
Figure 3.7: Forecasting errors for different days of the week
the week. According to Fig. 3.7, the difference between the maximum and the minimum errors
with calendar, i.e., 1.85%, and without calendar, i.e., 2.83%, also shows the better performance of
the model including the calendar effect. As a result, users’ behavior can be efficiently captured by
In the last experiment, the proposed forecasting model is applied to predict two power system
time series. The main goal of this numerical experiment is to show how forecast accuracy of
SRWNN improves, compared with WNN, as the volatility of the time series increases. Hence, from
a power system with low volatility to one with higher volatility, forecast accuracy improvements
increase for SRWNN. In this way, the same test cases for British Columbia’s and California’s power
systems are considered. Table 3.4 shows the obtained forecast error results (based on the average
of 10-month error) for both SRWNN and WNN models. Firstly, this table demonstrates noticeably
lower forecast errors of both models for prediction of power systems’ load data compared with
those for a micro-grid illustrated in Table 3.3. For instance, 4.98% compared with 2.29% in terms
of nRMSE for the micro-grid and British Columbia’s power system, respectively. Besides, Table
3.1 shows the volatility for British Columbia’s power system time series is the lowest in terms of
both daily and weekly volatility indices. Consequently, it is expected to have higher predictability
55
Table 3.4: Forecasting errors of SRWNN and WNN for two power systems.
Power WNN SRWNN Improvements(%)
System nRMSE nMAE nRMSE nMAE nRMSE nMAE
British Columbia 2.46 1.81 2.29 1.69 6.9 6.6
California 3.67 2.57 3.38 2.37 7.9 7.8
for British Columbia’s power system compared to the micro-grid and California’s power system.
Table 3.4 indicates the forecasting errors for British Columbia are lower than those for California,
Secondly, Table 3.4 shows how effective the SRWNN becomes as the volatility of a time se-
ries increases. As seen from the last column of Table 3.4, the forecast accuracy improvements
obtained from SRWNN in terms of nRMSE and nMAE are respectively (2.46-2.29)/2.46=6.9%
and (1.81-1.69)/1.81=6.6% for BC’s power system. Similarly, there are 7.9% and 7.8% forecast
accuracy improvements in terms of nRMSE and nMAE for California’s power system, respec-
tively. Therefore, since the volatility of California’s power system is higher than that for British
Columbia’s, SRWNN obtained higher improvement of forecast accuracy compared with WNN for
California’s power system. In other words, California’s load contains higher daily and weekly
volatilities, and consequently, the SRWNN can capture these variations and provide more accurate
forecasts compared with WNN. To have a better sense of these percentage errors, forecast accuracy
improvement in terms of mean absolute error is around 93 MW, which is almost twice as big as the
capacity of Kumeyaay wind farm, i.e. 50 MW, located in San Diego, California [93]. As a result,
as the volatility of the time series increases, the performance of SRWNN improves in comparison
with WNN. As mentioned earlier in this section, load forecast accuracy can be improved using
weather forecast data as exogenous inputs to the forecasting model. For instance, load forecasting
models utilized in California ISO (CAISO) include weather forecasts, such as temperature, dew
point, wind speed and cloud cover, for the next 9 days from 24 weather stations [94]. It is noted
that including such exogenous inputs to the model depends on the availability of the public data.
The computation time of the SRWNN model for the training phase is less than 35 seconds for
one day prediction for the test cases of this paper, which is measured on a hardware set of Mac
56
Intel Core i5 2.7 GHz with 12 GB RAM. Although this computation time is larger than that for
WNN, which is less than 11 seconds, it is completely acceptable within a 24-hour decision making
3.5 Conclusions
STLF is an important tool for reliable and economic operation of power systems as many operating
decisions are based on load forecast, e.g., dispatch scheduling of generating units, security assess-
ment and demand side management. Likewise, precise STLF for a micro-grid can enhance the
management of its renewable and conventional resources and improve the economics of energy
trade with electricity markets. Considering volatile and non-smooth characteristics of load time
series of micro-grids compared with power systems’ electricity load, a new forecasting method is
proposed to deal with such challenges in this paper. The proposed method has the structure of a
SRWNN as the forecasting engine, in which feedback loops have been added to a WNN so as to
better capture nonlinear complexities of volatile time series. The LM learning algorithm is im-
plemented to train the SRWNN, i.e., adjusting the free parameters of the SRWNN. High volatility
of a micro-grid load was shown by defining a volatility criterion and comparing with the volatil-
ity of two power systems’ load data. The effectiveness of the proposed forecasting method was
demonstrated by real-world load data of a micro-grid and power systems. The results show that the
proposed SRWNN model leads to more accurate forecasts when a volatile time series prediction is
of interest.
Acknowledgements
Partial support for this work came from the Canadian National Science and Engineering Research
Council (NSERC) and the ENMAX Corporation under the Industrial Research Chairs program.
Moreover, the authors would like to thank Dr. Hassan Farhangi and Dr. Ali Palizan of British
Columbia Institute of Technology (BCIT) for providing data and invaluable insight.
57
Chapter 4
4.1 Introduction
The market for behind-the-meter (BTM) battery energy storage is growing rapidly. For example,
in the third quarter of 2015, more than 13 MW of such units were installed, which indicates a
15 times year-over-year growth [95]. Behind-the-meter storage enables consumers to have more
control over their own energy usage [96]. Depending on the market structure, the use of BTM
storage may vary. Where flat rates are charged by utilities, a common use of BTM storage is for
demand charge management [97]. In competitive markets where large customers are charged at
wholesale pool price rates, BTM storage could be employed for avoiding peak prices. In such
cases, an insight into price fluctuations is essential for the optimal use of the storage system. The
focus of this work is the operation of a storage system owned by a large consumer that purchases
electricity from the wholesale market, and thus, needs short-term price forecasting (STPF) for
operation scheduling.
Most electricity price forecasting studies in the literature have considered short horizons be-
cause of the proximity to the real-time operation [98]. Different methodologies have been pro-
posed for point, probabilistic, and threshold forecasting of the electricity market price in the lit-
erature [98]. While the point forecasting approaches provide single-valued predictions [99], the
probabilistic forecasting models can quantify the uncertainties associated to point forecasts using
58
do not require exact values for future prices but apply pre-specified price thresholds as the ba-
sis for the decision-making process [103]. Therefore, the threshold forecasting models work as
classification problems and simply provide different classes for future prices [104].
Moreover, a few research works have presented methodologies aiming to detect price spikes
using statistical [105–107] or data mining approaches [108–110]. In [105], a recursive dynamic
factor analysis (RDFA) is combined with a Kalman Filter model for electricity price forecast. It is
shown that the proposed model has better forecasting accuracy than three other approaches under
the presence of price spikes, however, a solid price spike detection analysis has not been provided.
In [106], authors proposed a closed loop prediction mechanism including neural networks and
feature selection techniques to predict both price spike occurrences and values. An autoregres-
sive approach has been used to model the time series of price spikes in the Australian electricity
market in [107]. Classification techniques along with feature selections [108, 110] and similarity
searching methods [109] have also been applied to detect price spikes. However, these proposed
methodologies are based on analyzing the historical data. Such analyses might not be reliable
in practice because the most influential factors creating price spikes are unplanned generation or
transmission line outages, which are neither predictable nor possible to model [103]. In addition,
price spikes are distinguished by defining fixed price thresholds in presented works. Despite simple
implementation, it may not be efficient as price statistics vary significantly from month to month.
Most STPF models developed in the literature use hourly historical/forecast data of different
explanatory variables [111] to predict hourly prices. In [112], authors investigated that using intra-
day prices with 30-min resolution could improve the short-term forecasts of base-load electricity
prices in the U.K. market. However, the price settlement process in real-time markets is with
a higher resolution than an hourly basis. For instance, Market Clearing Prices (MCPs) are set
every five minutes in Ontario’s [113], California’s [114], Texas’ [115] and New York’s electricity
markets [116], and every one-minute in Alberta’s electricity market [117]. Such high-resolution
market data contains recent updates on the electricity market conditions, and can be used along
59
with hourly market data for predicting the hourly prices. The major benefit from utilizing higher
resolution data is to capture price variations and detect price spikes efficiently.
In our previous works (e.g., [104, 118, 119]), we demonstrated that unlike load forecasting
where predicting the absolute value of demand is critical in operation scheduling, price forecasting
accuracy should be measured in terms of economic savings/gains. Thus, the main contribution
of the present paper is to propose an intra-hour rolling-horizon electricity price forecasting strat-
egy that is specifically developed to optimize the operation of behind-the-meter Battery Energy
Storage Systems (BESS). Hence, two objectives are of interest in this paper: an efficient price
forecasting tool as well as optimal operation of the BESS using generated price forecasts. The
proposed strategy takes advantage of high-resolution five-minute market clearing price data in the
real-time market to generate hourly market price forecasts. Using intra-hour market clearing price
information enables the method to detect if the current hour is going to have a high price. With the
capability to capture high prices, the battery is discharged accordingly to offset energy purchase
from the grid, and thus save on energy costs. A simple regression-based forecasting model is de-
veloped that is computationally very light and can be effectively used on micro-grid management
firmware systems.
The remaining parts of the paper are organized as follows. The operation scheduling of a BESS
and the proposed price forecasting strategy are described in section 4.2. In section 4.3, Ontario’s
electricity market is briefly presented as the case study in this paper. Afterwards, the proposed
forecasting strategy is evaluated from both statistical and economic perspectives. Finally, section
4.2 Methodology
In this section, the operation of a BESS is outlined considering different forecasting perspectives.
60
4.2.1 Operation of a BESS
With a behind-the-meter BESS, energy can be stored during off-peak hours when prices are low
and then injected back to load/grid during peak periods to reduce the amount of energy purchased
from the wholesale market at high prices. The price-based optimal operation scheduling of such a
Subject to Φ
where the objective function is to maximize the net energy arbitrage saving, denoted by S. T
represents the scheduling horizon (e.g., T = 24 for day-ahead scheduling), λ̂t ($/MWh) is the
price forecast for hour t, and Pt (MW) is the variable showing the power consumed (Pt < 0)
from the grid or injected (Pt > 0) back to the micro-grid by the BESS at hour t. The objective
function is subject to a set of battery operation and technical constraints, denoted by Φ, such as,
charging/discharging rates, rated energy capacity, battery state of charge and number of cycles per
Equations (4.2) and (4.3) ensure that the energy stored in the BESS at hour ending t, denoted
by Et , is within the allowable range. Pt is the amount of scheduled power for charging (Pt < 0)
and discharging (Pt > 0). Eini is the initial stored energy in the battery, Eemg is the energy related
to the emergency load (150 kWh), and Emax is the maximum capacity for the battery (500 kWh).
Pch,max and Pdis,max are maximum charging and discharging rates. Equations (4.5) and (4.6) are
61
auxiliary constraints to count the number of full cycles per day, denoted by NCycle . M is a large
positive number and ut is defined as a flag variable that is set to ut = 1 when the battery is in the
charging mode. It should be noted that this formulation only includes the basic, most dominant
features of a storage unit in order to show the impact of price forecasts on the BESS’s operation.
The solution of this optimization problem is the scheduled power of the BESS for each hour,
denoted by Pt∗ . The Mixed-Integer Linear Programming (MILP) solver of the optimization toolbox
in MATLAB was used to solve the optimization problem of battery operation. When the actual
prices, λt , are realized, the after-the-fact value of the net energy arbitrage saving (S ∗ ) can be
calculated as:
T
X
∗
S = Pt∗ λt (4.7)
t=1
In this approach, the schedules are set once for the whole scheduling horizon with no changes. As
a common practice to deal with uncertainties of forecasts, they are updated at each forecasting step
(e.g., each hour). This is called the Rolling Horizon (RH) approach, which has successfully been
applied to a few scheduling and energy management problems under load and wind uncertainties
in the literature [123, 124]. Basically, the optimization problem is run on an hourly basis using
the updates on price forecasts. As a result, schedules of the BESS can efficiently be updated once
This approach is presented in Algorithm (1), where h represents the forecasting origin; that
is, the hour at which the forecasts are generated and used in the optimization platform. h = 0 is
(h)
the last hour of day D − 1 when the first set of forecasts is generated for day D. λ̂t is the price
(h)
forecast for hour t generated at hour h, and Pt is the power of the BESS for hour t scheduled at
forecasting origin h. According to Algorithm (1), for scheduling the BESS from hour t up to hour
T , the optimization problem is run one hour in advance, hour t − 1. For instance, forecasts are
updated at hour 1, i.e., forecast origin h = 1, and fed into the optimization problem at the same
hour to solve for the optimal scheduling of the BESS for hour 2 up to hour T (t = 2, ..., T ).
With this approach, updates on scheduling of the BESS leads to a more adaptive and efficient
62
Algorithm 1 Scheduling based on the conventional RH
1: h = 0
2: while h ≤ T − 1 do
3: Solve the optimization problem:
(h) (h)
Max S = Tt=h+1 Pt λ̂t
P
Pt
Subject to Φ
4: h=h+1
5: end while
operation. The after-the-fact value of the net energy arbitrage saving is:
T
∗(h)
X
S∗ = Pt λt (4.8)
h=t−1
t=1
∗(h=t−1)
Pt is power of the BESS for hour t that has been scheduled at forecast origin h (h = t−1). In
it may not be the most effective approach for operating a behind-the-meter BESS. The reason is
that such a RH approach applies the market data of the previous hour to generate the forecast
updates for the current hour. As illustrated in Fig. 4.1, the conventional RH provides price forecast
(0) (0)
(λ̂1 ) and the corresponding battery schedule (P1 ) for the first hour of the current day (t = 1)
sometime in the last hour of the previous day (t = 24) at forecasting origin h = 0. The battery
schedule is fixed for the whole first hour. It is also noted that the schedules are provided for the
whole operation horizon (e.g., up to T = 24), however, the schedules for t = 2, ..., T will be
updated once the next forecasts are available (h = 1). This approach may not be able to efficiently
capture electricity price spikes, as the most recent market data is not incorporated. Consequently,
the operational scheduling of the BESS is negatively affected by missing severe price variations.
In this paper, a forecasting strategy is proposed to provide more informative forecasts to en-
hance the operation of a BESS. In this strategy, severe variations in electricity price can potentially
be captured by predicting the price for any hour during the same hour. The proposed rolling hori-
zon framework is called Intra-hour Rolling Horizon (IRH). The operation scheduling of the BESS
using the proposed time-frame is presented in Algorithm (2) and also illustrated in Fig. 4.2. Ac-
cordingly, the first updates on price forecasts are generated at forecast origin h = 1, which is the
63
Figure 4.1: Graphical representaion of Algorithm 1
first hour of the current day (day D) as opposed to h = 0 (the last hour of day D − 1) presented in
Algorithm (1). As seen in Step 3 of Algorithm (2), the first hour is separated from the remaining
hours of the scheduling horizon as the optimization problem is performed. γ depicts the fraction of
(h)
time that the forecasting model requires to generate λ̂t in practice. For this fraction of the current
hour, the BESS follows the operation that has been scheduled in the previous update. Therefore,
only the second fraction of the current hour proportional to (1−γ) is considered in the optimization
Subject to Φ
4: h=h+1
5: end while
As shown in Fig. 4.2, at the forecasting origin h = 1 that corresponds to the first hour (t = 1)
(1) (1) (1)
of the current day, the first set of price forecasts are generated, i.e., λ̂1 , λ̂2 , ..., λ̂T . In a practical
live forecasting system, it takes a fraction of the first hour, i.e., γ, before the forecasts become
available. This is the time needed for the forecasting tool to fetch new data, process them and
generate new forecasts and communicate them to the optimization platform. All communication
64
Figure 4.2: Graphical representaion of Algorithm 2
delays and latencies are included in this time. Afterwards, the price forecasts are fed into the
optimization algorithm to schedule the BESS for the (1 − γ) fraction of the first hour, shown in the
first term of step 3, and all remaining hours of the scheduling horizon, shown in the second term
of step 3. The optimization platform applies the prices forecasts to provide the most economic
hourly schedules for the BESS operation up to T . In the first fraction of the second hour (t = 2)
associated to γ, the battery follows the operation instruction that was scheduled in the first hour
(1)
for the second hour, i.e., P2 . Once the second updates on price forecasts become available (at
forecast origin h = 2), the optimization algorithm applies them to update battery schedules for
(2) (2) (2)
the remaining fraction of the current hour, P2 , and the remaining hours, i.e., P2 , ..., PT . This
process is repeated until the last hour of the operating horizon, T . As a result, the proposed IRH
can update the price forecasts and consequently operation schedules of the battery for the current
hour up to T while standing in the current hour. Whereas in the conventional RH approach, the
latest update for the operation of battery for any given hour is prepared one hour in advance. The
∗(1)
S ∗ = (1 − γ)P1 λ1 (4.9)
T
∗(h) ∗(h)
X
+ [γPt + (1 − γ)Pt ]λt
h=t−1 h=t
t=2
Equation (4.9) suggests that the saving value associated to the first hour of the scheduling period
∗(1)
is proportional to (1 − γ)P1 . In other words, the first fraction of only the first hour of the
65
scheduling horizon is not considered because the algorithm does not take into account the price
forecast generated in the last hour of the previous day. For any remaining hour of the scheduling
∗(h)
period, the first part of the saving comes from γPt that includes the operation scheduled
h=t−1
in the previous hour. The second part is related to the operation scheduled in the same hour,
∗(h)
(1 − γ)Pt . In this way, the optimization problem can benefit from more recent forecast
h=t
updates generated during the operation hour. The operation scheduling of the BESS with the
proposed strategy is evaluated and compared with the conventional RH approach in section 4.3.
In general, prediction models consist of three main components: 1) data pre-processing, 2) feature
selection, and 3) model selection [103]. In this section, a brief background regarding each of these
components are provided, and the methods developed in this work are described as well.
1) Data Pre-processing: Also known as data cleaning, this component performs preliminary
analyses on the raw data gathered for the prediction purpose, such as dealing with missing values,
removing the outliers, and normalizing the data [125]. In this work, outliers such as price spikes
are limited by defining a threshold calculated by the mean and variance of the times series. Hence,
outliers remain in the time series as they carry important information about the nature of the time
series, while their abnormal values are reduced to not negatively affect the learning capability of
2) Feature Selection: This component selects a subset of features among all candidate fea-
tures from the original dataset according to a feature goodness criterion [126]. There are several
features influencing the electricity spot price, including historical load and price, imports/exports,
capacity excess/shortfall, historical reserves, and generation types as well as calendar effects, e.g.,
day of week, month of year, seasonal and holiday effects [111]. In this paper, Mutual Information
(MI) [52] technique was applied to the original database with hourly resolution to select the best
subset that contains the least number of key features contributing to forecast accuracy, while dis-
carding the remaining insignificant features. This feature selection technique is briefly described
66
in Appendix D, and the detailed formulations can be found in [52].
3) Model Selection: The forecasting model constructs the input/output mapping function,
where inputs come from the feature selection and the output is the forecast value. A number
of approaches have been presented in the literature for electricity price prediction for various pre-
diction horizons, prediction steps and applications. In general, the point forecasting engines fall
into three main categories of statistical time series models, computational intelligence models, and
hybrid models [111]. For instance, ensemble price forecasts from individual models are gener-
ated using linear regression models for price-directed demand management in smart grids [127].
In [11], an adaptive Wavelet Neural Network (WNN) is proposed for short-term price forecasting
that outperforms a number of prediction approaches such as, statistical models (Auto-regressive
Integrated Moving Average - ARIMA), Multi-Layer Perceptron (MLP) and Radial Basis Func-
tion (RBF) neural networks, and fuzzy neural network (FNN). Each group of models has its own
strengths and weaknesses. Selecting a model depends on the nature of the data and the application.
Weron in [98], has provided a comprehensive review on state-of-the-art forecasting models for the
electricity price.
In the present work, the forecasting model itself is not the main focus, and therefore, an autore-
where yt is the output of the model (the price forecast). The first term on the right-hand side
of the formulation shows the auto-regressive part with lagged values of the target variable and
the corresponding parameters, i.e., ai . The second term represents the exogenous variables and
their associated parameters, i.e., bi . The parameters of the model are determined in the training
phase. t is a normal white noise process with zero mean and variance. The input features of the
forecasting model are discussed later in this section. Despite the simplicity of the regression model,
it is specifically tailored in the proposed forecasting strategy for application to the operation of a
storage system. In other words, this work aims to build a forecasting strategy to provide informative
67
forecasts such that the operation of a BESS is optimized. In particular, the forecasting strategy
should be capable of capturing high prices and severe variations in real-time prices as much as
possible. The better price variations can be captured, the better an energy storage control system
can adapt its scheduling strategy. Accordingly, high-resolution market information is used along
with low-resolution market data in order to enable the model to detect sudden price fluctuations.
The low-resolution data includes the hourly market data selected by the feature selection. The
high-resolution data could be from any informative market data with higher resolution than one
Our studies showed that market clearing prices carry significant information regarding the price
variations in electricity markets. MCPs are set every five minutes in many electricity markets
[113–116], and the average of twelve MCPs in an hour represents the hourly price. MCPs contain
the information about the most recent state of the electricity market. For instance, in the event
of a contingency in power systems, e.g., generation outages, the consequent power imbalance is
reflected as a change of MCP in supply-demand curve. Here, four North American electricity
markets are studied to demonstrate the effectiveness of MCPs in detecting price variations. MCP
values over the year 2015 are considered for Ontario’s (Independent Electric System Operator-
IESO), Alberta’s (Alberta Electric System Operator-AESO), Texas’ (Electric Reliability Council
of Texas-ERCOT), and New York’s (New York Independent System Operator-NYISO) electricity
markets. It is noted that MCPs are set every minute in Alberta’s market and hence, the average of
five one-minute MCPs are calculated to form MCPs with five-minute resolution in this study.
Being able to capture high price hours is important in order to increase the profit gained by an
energy storage system. High prices, referred to as price spikes, are abnormal high prices that can be
distinguished by statistical methods based on the historical data. In [109], a price spike threshold is
defined using the mean (µ) and the standard deviation (σ) of historical prices as, TSpike = µ + 2σ.
Electricity prices greater than TSpike are considered as spikes. Thresholds are calculated for each
month, as the electricity price has seasonal trends. Once the price spikes for each month are
68
distinguished, the capability of MCPs to potentially detect those high price spikes is of interest in
this experiment. Defining a threshold for MCPs, TMCP , it is investigated if MCPs can potentially
detect a distinguished price spike. To do so, for any hour at which a price spike has been detected,
if MCPs in that hour are greater than TMCP , then they are likely to reflect the price spike in that
given hour.
Here, two important factors should be considered: 1) the value of TMCP and 2) the number of
MCPs in the given hour. Regarding TMCP , two values of TMCP,1 = µ + 2σ and TMCP,2 = µ + σ
are considered. The reason for considering two values is that a MCP with the value greater than
TMCP,1 = TSpike can clearly detect a price spike, while a distinguished price spike can also be
potentially detected having a lower MCP value, i.e., TMCP,2 . With regard to the number of MCPs, it
is evident that the more MCPs are considered during an hour, the higher is the chance of detecting
the price spike in the same hour. On the other hand, the cost of considering more MCPs is the
time that has been lost during an hour waiting for new MCPs to be released. In other words, the
more MCPs are used, the longer the forecasting model has to wait to generate the forecast for a
given hour and therefore, such a late forecast may not be as useful for decision-making. In this
Spike Prediction Accuracy (SPA) measures the ability of correctly predicting spike occur-
Fig. 4.3 illustrates the value of SPA versus the number of MCPs considered for detecting price
spike in four electricity markets. SPA is calculated for each month of the year 2015, and the average
over the year is reported. Fig. 4.3(a) takes into account TMCP,1 for evaluating the capability of
MCPs in spike detection. Evidently, the more MCPs are included, the higher SPA value would be.
For instance, the chance of detecting a price spike is over 90% considering nine MCPs. However,
even considering the first MCP in each hour results in at least around 40% chance of detecting a
spike (ERCOT). Fig. 4.3(b) shows the same experiment by considering TMCP,2 . In fact, the values
69
(a) TMCP,1 = µ + 2 × σ (b) TMCP,2 = µ + σ
for MCPs do not need to be as high as TMCP,1 = TSpike to be able to detect the spike. A high MCP
with the value lower than TSpike could also be capable of detecting a high hourly price. SPA values
considering TMCP,2 are expected to be higher than those with TMCP,1 as the threshold. For instance,
there is at least 50% chance of detecting a spike using only the first MCP in an hour. Thus, this
experiment shows the effectiveness of MCP values as high-resolution market data for capturing
high prices. Inspired by this high capability, a forecasting strategy is proposed in this work to
The proposed forecasting strategy consists of two separate forecasting models, high-resolution
model (MHR ) and low-resolution model (MLR ), with different purposes, illustrated in Fig. 4.4. The
forecasting models provide day-ahead price predictions with an hourly resolution. MHR generates
the price forecast for the first step of the forecasting horizon, i.e., 1-hour-ahead forecast. Because of
this very short prediction horizon, MHR can take advantage of the high-resolution data available in
the current hour ending. Note that Hour Ending (HE) denotes the time when an hour ends, e.g., HE
23 represents the time period 22:00-23:00. Thus, hourly inputs selected by the feature selection,
HR
SX HR = {sxHR HR
1 , ..., sxm }, as well as a number of MCPs in the current HE are fed to M to
predict the hourly electricity price of the same HE. A vector of high-resolution inputs is denoted
70
(a) High-resolution model (M HR ) (b) Low-resolution model (M LR )
by XM CP = {M CP1 , M CP2 , ..., M CPl }, where MCP1 corresponds to the MCP that is set for the
first five-minute and so forth. l represents the number of MCPs fed to the high-resolution model.
As mentioned, MHR provides the price prediction for the first step of the forecasting horizon, i.e.,
current HE. If the after-the-fact value of the current hour turns out to be a price spike, there is a
high chance that any of the included MCPs have an unusual high value leading to this hourly price
spike. Hence, the inclusion of these MCPs in the model increases the value of the output of the
model, i.e., 1-hour-ahead price spike prediction. Obviously, the more MCP values are included in
the model, the higher the chance would be to detect the hourly spike by any of those high MCP
values. Once the first step is predicted, the forecast is used as an input in MLR , which provides
multi-step-ahead price predictions for the remaining hours of a day recursively. MLR is fed by
illustrates the structure of both models. Hourly Price Forecast (HPF) denotes the output for both
models.
The forecasting strategy applies the two forecasting models in the IRH framework. This ap-
proach can mitigate the impacts of uncertainties associated to forecasts by updating the forecasts at
each forecasting step over the forecasting horizon. In this work, updates of the price forecasts are
generated once new observations of the required market data are available, i.e., every hour. Fig.
4.5 demonstrates how the IRH performs to update the forecasts employing MHR and MLR . As
seen, the first set of forecasts is generated in HE 01 for the whole hours of the current day. MHR
71
generates the price forecast for the first hour, i.e., HPF 1, while MLR provides the forecasts for the
remaining hours of the day, i.e., HPF 2 to HPF 24. Forecasting horizons for MHR and MLR are
shown in red and blue arrows in the figure, respectively. When the first hour is past, both models
are used to update the forecasts up to HE 24 of the current day. MHR predicts the price for the
second HE, HPF 2, and MLR provides HPF 3 to HPF 24. This process is repeated every hour to
update the forecasts up to the last hour of the current day. Updates can go beyond the current day if
the optimization problem of the storage system is designed for multiple days. Once the updates are
available, they can be applied to the optimization platform of a behind-the-meter BESS to adjust
charging/discharging strategies.
In this section, first, Ontario’s electricity market is briefly described as a case study. The proposed
price forecasting strategy is then evaluated from both statistical and economic points of view.
The Ontario’s wholesale electricity market is real-time energy and operating reserves markets.
Ontario demand is supplied by installed capacity within the province and imports from neighboring
power systems. The IESO accepts the lowest-cost offers to supply electricity until sufficient power
72
generations are available to meet the demand. MCPs are set every five minutes and the Hourly
Ontario Electric Price (HOEP), which is the average of the twelve MCPs in each hour, is published
Pre-Dispatch Price (PDP) is the publicly available price prediction published by the IESO.
PDPs are updated hourly and provided according to the conventional rolling horizon approach
explained in section 4.2.1. For instance, the 1-hour-ahead PDP for HE 2 is published in HE 1. This
is the most recent update on future electricity prices available in this market. A data analysis over a
two-year period (January 2014 to December 2015) reveals a significant deviation of 1-hour-ahead
PDP from HOEP values, i.e., over 40% error on average. For optimizing the operation of a behind-
the-meter storage system, this source of price forecasts may not be helpful because high prices are
likely to be missed. For this reason, the proposed price forecasting strategy is applied to Ontario’s
market to improve the forecasting accuracy, and consequently operation of such participants.
In [128], the authors evaluated several potential input variables for predicting HOEP. The re-
sults reveal that none of temperature, predicted shortfalls and predicted transmission constraints
can improve the forecast accuracy of hourly electricity prices in Ontario. The feature selection
selects hourly features including PDP, Ontario’s demand forecast, supply cushion, lagged values
of HOEP as inputs of the two forecasting models. Although PDP is likely to have an error with
respect to electricity prices in real time, it is regarded as an initial price forecast for the two fore-
casting models. In [129], it is stated that price spikes are usually observed when the value of
supply cushion is very low, e.g., 10%. Supply cushion is an important publicly available variable
that includes the information about forecasts of demand and non-dispatchable generations (e.g.,
wind and solar generations), and import schedules. The number of MCPs considered for the high-
resolution model can be influential on the performance of the forecasting strategy from statistical
and economic aspects. This is studied in the following sections. Note that all the required market
data for this study are publicly available on IESO’s website [130].
73
4.3.2 Statistical Analysis
In this section, the proposed price forecasting strategy is evaluated from the statistical perspective,
i.e., the prediction errors of the generated price forecasts are analyzed. For this, the proposed
forecasting strategy is tested on publicly available data of Ontario’s electricity market in year 2015.
An error measure of Mean Absolute Error (MAE) is considered to evaluate the forecasting errors
where N denotes the number of hours in a month. PACT(t) and PFOR(t) represent the actual and
forecast values of price at hour t, respectively. Monthly forecasting errors in terms of MAE are
shown in Fig. 4.6. In this figure, the proposed strategy is applied with different number of MCPs
fed to M HR , e.g., l = 1, 2, ..., 6. As expected, the more MCP values are used, the lower prediction
error is achieved, because of a higher chance of detecting price spikes during the predicting hour.
The average of forecasting errors in terms of MAE are respectively $7.24/MWh, $6.33/MWh,
son, 1-hour-ahead PDP published by the IESO is also shown in blue for all months. The proposed
forecasting strategy, even with l = 1, results in lower forecasting errors in all test months compared
to PDP values. The average of forecasting error for PDP is $9.02/MWh. Therefore, using only
one MCP in the proposed strategy results in 20% improvement in forecast accuracy in comparison
with PDP.
Fig. 4.7 displays hourly actual prices of Ontario’s market against 1-hour-ahead forecasts from
the proposed strategy and PDP values during the first week of August 2015. As seen, the proposed
model can satisfactorily track the price fluctuations and spikes. It should be noted that in many
scheduling and operational optimization problems, the exact value of the price peaks is not of
interest but the occurrence. This is because the control system of a BESS only requires a precise
occurrence of the price spike rather than the magnitude. For instance, there was a price spike of
$371/MWh at HE 41 of this test week, where the proposed model effectively detected this price
74
Figure 4.6: Forecasting errors in different months
To evaluate the performance of the proposed strategy in price spike detection, confusion ma-
trices for different number of MCPs are presented in Table 4.1. Here, the confusion matrix is a
2 × 2 square matrix indicating two classes of prices, i.e., price spikes and normal prices, for actual
and predicted price values. In this experiment, the total 201 spikes are distinguished from nor-
mal prices using TSpike , defined in section 4.2.2, for each month in 2015. For instance, including
one MCP (l = 1), 102 spikes have been correctly detected, while 99 spikes have been missed.
This results in 50.7% accuracy in spike detection shown in the last column of the table. Whereas,
only 15 normal prices have incorrectly been detected as price spikes among 8559 normal prices,
which leads to 99.8% accuracy in this class. As the number of MCP values used in the model is
increased, the spike detection accuracy is enhanced significantly, e.g., 76.1% for l = 6, which is
expected according to Fig. 4.3. Although considering more MCPs results in more accurate fore-
casts, an economic aspect should also be considered to assess whether more MCPs lead to higher
75
Figure 4.7: Price values for the first week of August 2015
In this section, the performance of the proposed forecasting strategy is evaluated from an economic
point of view. Thus, the generated price forecasts from the proposed strategy are applied to opera-
tion scheduling of a behind-the-meter BESS within a micro-grid facility in Ontario. The micro-grid
is designed to operate as backup power during a power outage of the main grid. A 500kW Li-ion
BESS should provide emergency power for critical loads in a building of this micro-grid. The
remaining capacity of the BESS can be utilized to trade energy with the main grid.
Real-time electricity prices can provide end-users in electricity markets with the opportunity to
reduce their electricity costs by strategically responding to prices that varies with different times
of the day [131]. The size of the battery is very small compared to the total load of the micro-grid,
and thus its operation does not cause any major issues in power flows. Hence, the BESS in the
micro-grid is modeled as an individual agent with the objective of maximizing its profit (or the net
energy arbitrage saving). This can also be interpreted as reducing the amount of energy purchased
from the wholesale market during peak hours when the electricity prices are high. The energy
is stored during off-peak hours where prices are low and injected back to micro-gird/grid during
peak periods. It should be noted that other factors may impact the operation of a battery system
76
Table 4.1: Confusion matrices
Predicted
No. MCP Actual
Spike Normal Accuracy
Spike 102 99 50.7%
l=1
Normal 15 8544 99.8%
Spike 126 75 62.7%
l=2
Normal 8 8551 99.9%
Spike 133 68 66.2%
l=3
Normal 11 8548 99.9%
Spike 140 61 69.7%
l=4
Normal 10 8549 99.9%
Spike 147 54 73.1%
l=5
Normal 8 8551 99.9%
Spike 153 48 76.1%
l=6
Normal 7 8552 99.9%
inside a microgrid (e.g., renewable energy fluctuations or load balance). The microgrid operator
may consider those factors, in addition to prices, when making charging/discharging decisions.
A significant input for this optimization problem is electricity price forecasts of Ontario’s mar-
ket, which are provided by the proposed forecasting strategy. The formulation for this optimiza-
tion problem is provided in Algorithm (2). Accordingly, the optimization problem is run every
hour up to the last hour of the day. The value of γ is calculated using the number of MCPs (l),
γ = (l + 1)/12.
In Ontario’s market, MCP values are published 3 minutes past each five-minute interval, and
thus, it should be taken into account in the optimization problem. For instance, given that only
the first MCP is used, the forecasting strategy generates the forecasts eight minutes past each hour.
Then, the forecasts are used in the BESS optimization platform, and hence, 10 minutes is expected
to be past until the scheduling updates of the BESS is available. The emergency load is 150 (kW)
and the Depth of Discharge (DOD) for this battery is 70%. This leaves 200 (kW) available capacity
for the operation of this battery in energy arbitrage. Let assume that the BESS can only have one
full charge-discharge cycle per day. Obviously, the more cycles the battery can be operated per
day, the more profit can be gained; however, it may decrease the battery’s life-time.
77
In the first experiment, the proposed strategy generates forecasts considering different number
of MCPs as inputs (l = 1, ..., 6). The aforementioned operation scheduling is then applied to
calculate the amount of money saved in energy cost. The goal is to find an optimal value for l (or
γ). The total saved money for each month is demonstrated in Fig. 4.8. Interestingly, this figure
shows that more accurate forecasts generated with higher number of MCPs do not necessarily
lead to higher profits gained by the BESS. The monthly average of saved money shown in the
last column for l = 1 to l = 6 are $254.1, $251.7, $246.5, $238.5, $225.9, $221.1, respectively.
The relationship between the forecasting performance and corresponding economic values can be
interpreted using the parameter γ. An increase in the value of l means that the forecasting strategy
has to wait longer to use more MCPs as inputs of the model. This consequently leads to a higher
value for γ in the optimization problem. With larger γ, although the generated forecasts for the
current hour are more accurate, a smaller portion of the current hour associated to (1 − γ) uses
such accurate forecasts for scheduling the BESS. This can be seen form Fig. 4.2 when the value
of γ increases and (1 − γ) decreases. According to the same figure, the first portion of the current
hour associated to γ has already been scheduled in the previous hour using the forecasts generated
in the same hour, which is not the latest forecast update for the current hour. As a result, more
accurate predictions from a higher value of l could not be necessarily effective for the BESS from
the economic point of view. Thus, the optimal value of γ that results in the highest economic value
is γ = 1/6 (γ = (l + 1)/12). This means that only the first MCP of the current hour should be
fed into the model. Thus, the forecasting strategy can be tuned including only the first MCP in its
high-resolution model, because it leads to the highest economic benefits although the forecasting
For the sake of comparisons, four operational strategies are considered based on available price
1. PDP Scheduling: This scheme considers PDP values with hourly updates published by the IESO
as available price forecasts. PDPs are applied to the optimization problem in accordance with the
78
Figure 4.8: Total money saved for different number of MCPs
2. Proposed Strategy Scheduling: In this scheme, price forecasts with an hourly updates generated
by the proposed forecasting strategy (with l = 1) are fed into the optimization platform (Algorithm
(2)). The operational schedules of the BESS are updated every hour as the updates of price fore-
casts are generated. The scheduling horizon is kept the same, up to HE 24 of the day.
3. Ad-hoc Strategies: It is considered that the BESS has an unchanging operation strategy rather
than finding the optimal operation scheduling using price forecasts fed into the optimization prob-
lem. Two different ad-hoc strategies for the operation of the BESS are considered as below:
3.a. Ad-hoc #1: Weighted averages of electricity prices from 2002 to 2013 are calculated for each
month separately. Hours with the lowest and the highest electricity prices are correspondingly con-
sidered for charging and discharging. For instance, the charging hour is HE 4 and the discharging
3.b. Ad-hoc #2: Charging and discharging decisions for the current day are made based on the pro-
file of electricity price in the previous day. Hours with the lowest and the highest electricity prices
in the previous day are considered for charging and discharging in the current day, respectively.
In this experiment, the potential profit that could be achieved by operating the BESS based on
79
Figure 4.9: Monthly total money saved by operating the BESS
perfect forecasts is calculated. Having the perfect price forecasts, the micro-grid could have po-
tentially saved $4,688 in total in energy cost by operating the BESS over the year 2015. Applying
the proposed forecasting strategy into the developed optimization platform, 62% of the potential
saving could be captured (total $2,937). Scheduling based on available PDP could only capture
43% of this potential (total $2,019). Hence, applying the proposed forecasting strategy could result
in around 50% improvement over the use of available PDP in the profit gained by operating the
BESS. The results indicate that the total profits based on two ad-hoc strategies are only $1,407 and
$1,120, respectively. Fig. 4.9 illustrates the total amount of money saved by operating the BESS
for each month by adopting different strategies compared with the potential profit. It is also noted
that the size of the battery can affect the total energy arbitrage saving, while this is not the focus of
An example of the operation of the BESS on October 2nd , 2015 is presented for different
strategies in Fig. 4.10. Observe that the proposed strategy could effectively detect the price spike
at HE 10, while PDP failed to do so. Therefore, the BESS is scheduled to discharge when the price
is at its highest value. For both ad-hoc strategies, equal discharging rates are considered. It is also
noted that the charging hour for ad-hoc #2 is at HE 6, which has been covered with the one for
80
Figure 4.10: BESS schedules based on different strategies for October 2nd, 2015
PDP scheduling. The amount of money saved by applying the proposed strategy is $140.58 for
this particular day. This amount is 52 times greater than the corresponding amount obtained from
scheduling based on PDP values, i.e., $2.67. The amounts of saved money are respectively $3.03
and $3.59 for the first and the second adopted ad-hoc strategies. This figure clearly depicts the
effect of accurately detecting price spike occurrences in the operation of a storage system. In this
practical application, it is of high importance to predict the exact timing of price spikes in order
to discharge the battery accordingly. It is noted that the higher the difference between electricity
prices at charging and discharging hours, the higher would be the saved money by operating the
BESS.
In the last experiment, we evaluate the potential for gaining higher economic values by increas-
ing the number of cycles in the operation of the BESS. It is noted that frequent and deep cycles
accelerate cyclic aging and consequently reduce the life of the battery. For any type of battery,
number of cycles is usually defined as a function of depth of discharge, which can be obtained by
a fitting technique using detailed experimental data provided by manufacturers [132]. However,
since we do not have access to such data, we assume a constant depth of charge for analyzing the
economic impact of higher number of cycles. Fig. 4.11 shows the total saved money obtained
81
Figure 4.11: Economic effect of number of cycles for the BESS
by increasing the number of cycles. As expected, the higher the number of cycles, the higher the
potential profit would be if perfect price forecasts were available. For instance, the potential profit
increases from $4,688 with one cycle per day to $6,283 with 4 cycles per day. It is observed from
this figure that the total saved money starts to saturate once the battery is operated with more than
two cycles per day. This is because the opportunity for arbitrage is limited during a day depending
of the price profile, and therefore, an increase in the number of cycles cannot necessarily lead to a
significant increase in the total saved money. Higher cycles for the proposed strategy, on the other
hand, do not result in any noticeable changes in the total saved money. This can be justified by
the effect of forecasting errors along with the limited arbitrage capacity during a day. Finally, high
forecasting errors associated with PDP even leads to lower total saved money as the number of
cycles increases. As a result, this BTM BESS is economically better off being operated with one
4.4 Conclusions
In this paper, a forecasting strategy is proposed to provide accurate price forecasts for operation
of behind-the-meter storage systems. This strategy includes two separate forecasting models to
82
take advantage of high-resolution market data along with hourly data in order to capture price
spikes as much as possible. The proposed intra-hour rolling horizon framework is applied to
update the forecasts on an hourly basis. From statistical analysis, the proposed strategy results in
20% improvement in forecast accuracy compared to available PDPs, and has a high capability of
detecting price spikes. The generated forecasts are fed to an optimization platform for operation
scheduling of the BESS within a micro-grid facility. It is concluded that the BESS can bring more
economic values using the price predictions generated by the proposed forecasting strategy, 62%
of the potential saving, in comparison with a number of other strategies, e.g., 43% of the potential
Acknowledgment
The authors would like to thank NRGStream Inc. for providing a complimentary license to use
their market data collection platform. We would also like to thank GE Digital Energy for financial
supports in this work. Finally, we want to thank Dr. Mostafa Kazemi for his comments on operation
83
Chapter 5
Microgrids 1
5.1 Nomenclature
Parameters:
84
PtL,U p Upper bound of load forecast for hour t, (MW)
λLow
t Lower bound of price forecast for hour t, ($/MWh)
Variables:
Gen
Pi,t Power of diesel generator i at hour t, (MW)
uGrid
t State of the grid at hour t
85
5.2 Introduction
With the increasing interest in distributed energy resources and microgrids in recent years, the
optimal operation of such energy systems has become important. In practice, operating a grid-
connected microgrid in the most possible economic way can be a challenging task due to a number
their operation because of the uncertainties attached to such intermittent energy resources [134].
In addition, grid-connected microgrids capable of trading energy with the main grid are subject
to the risks of fluctuations in electricity market prices, which can affect the economics of the
microgrid [135]. Another challenge in the operation of microgrids is the non-smooth behavior
of the electricity load at microgrid levels [6]. Unlike the smooth variations of electricity loads in
power systems, microgrid loads can have severe variations that negatively affect their predictability.
Hence, accurate short-term forecasting tools are essential for economic energy management in
microgrids [7].
Many approaches have been presented in the literature for energy management of microgrids.
Typically, point forecasts of the electricity load, price and renewable energy generation are fed to
an optimization problem to schedule the dispatchable units within the microgrid [136]. However,
it has been argued that point forecasting does not provide any information regarding uncertainties
associated with forecasts, and thus cannot be fully relied on for decision-making [101]. Hence,
different strategies have been introduced to mitigate the effect of forecasting errors on the operation
of microgrids.
An efficient approach to deal with the effect of forecasting errors is to update forecasts, and
consequently the set-points of dispatchable units at each scheduling step (e.g., every hour or 15-
min) [137]. This is called the Rolling Horizon (RH) technique that can significantly reduce the
effect of the forecasting errors on microgrid operation [123, 138]. Another approach to deal with
uncertainties associated with forecasts is to generate different scenarios for predictions [134, 139,
140]. For instance, Khodaei [139] considered 1000 scenarios for sensitivity analysis of load, price
86
and wind forecast errors with uniform random error of 10%, 30% and 30%, respectively. Monte
Carlo simulations were used to generate 10000 scenarios for electricity load in [140]. However,
in addition to a high computational burden [141], such scenarios are usually generated randomly
according to the distribution of random variables [142], which may not be available or even realistic
enough [143].
forecasting. This provides an operator with an insight of to what extent forecasts could be trusted
[144]. Prediction intervals are used in a Robust Optimization (RO) formulation that considers the
worst-case scenario for operation scheduling of the microgrid [143,145]. RO is used to incorporate
the uncertainty from wind power generation in the operation of microgrids in [146–148]. To do so,
prediction intervals are considered as lower and higher bounds of forecasts based on the distribution
of uncertain variables.
of alternative methods and their merits under different circumstances has not been investigated in
the literature. In this paper, as the first approach, the RH technique is applied to update the point
forecasts, and consequently the schedules of the microgrid are updated every hour. In the second
approach, we apply the Quantile Regression Averaging (QRA) probabilistic model to generate pre-
diction intervals (PIs) for the electricity load, price and wind power generation with different con-
fidence intervals. The generated PIs are fed into the RO formulation to find the optimal schedules
of the microgrid components in the worst-case scenario. The third approach is the combination
of the first two strategies, i.e., generating PIs with RH technique for hourly updates. The main
contribution of this paper is exploring the impact of these three approaches on the economic per-
formance of microgirds under different scenarios of wind power generation levels and electricity
price volatilities. The significance of this work is to determine which approach the operator should
adopt to meet the highest economic performance of the microgrid under different scenarios.
The remaining parts of the paper are organized as follows. The forecasting methodologies for
87
electricity load, price and wind power generation are presented in section 5.3. In section 5.4, the
optimization platforms for the operation of the microgrid are formulated. Statistical and economic
alnalyses are provided in section 5.5. Finally, section 5.6 concludes the paper.
In this section, the forecasting methodologies for electricity load, price and wind power are pre-
sented. The generated forecasts are then fed into the optimization algorithm for operation schedul-
Deterministic forecasting models provide point predictions of the target variable of interest. Many
approaches have been presented for short-term point forecasting of electricity load [72], electricity
price [98] and wind power generation [149] in the literature. In general, the forecasting models fall
into three main groups: statistical models, neural network-based models and hybrid models. Each
model has its own strengths and weaknesses and hence, there is no solid model that can always
outperform others in terms of different forecasting criteria. Since the forecasting engine itself is
not the contribution of this paper, we implemented the well-known linear autoregressive model
with exogenous variable (ARX) for predicting electricity loads, market prices and wind power
where yt is the output of the model (i.e., the forecast) at hour t. The first term on the right-hand
side of the formulation represents the auto-regressive part with lagged values of the target variable
(yt−k ) and the corresponding parameters (ak ). The second term includes the exogenous variables
(xt−l ) with the parameters (bl ). The parameters of the model are determined in the training phase.
t is a normal white noise process with zero mean and finite variance.
88
Despite the simple structure of the ARX model, an effective forecasting strategy, in which the
forecasting model is implemented, could significantly enhance the performance of such models.
For instance, a forecasting strategy may include: i) an efficient pre-processing stage, in which the
most informative inputs are selected, ii) a precise forecasting timeline that runs the online model
at the exact time when all the essential inputs are available, and iii) a proper post-processing stage
to fine-tune the generated forecasts. The forecasting strategies for electricity load, price and wind
A few research works have highlighted the challenges of electricity load predictions for residential
buildings and microgrids due to high volatility of such loads compared to electric loads of power
systems [13, 76]. To overcome the non-smooth behavior of the microgrid load, an effective pre-
processing stage is very important. In this paper, the feature selection technique presented in [150]
is implemented to select the most informative inputs features. The candidate inputs consist of
lagged values of the electricity load, temperature, minimum and maximum loads of the previous
day and the same day in the previous week, and the hours corresponding to occurrences of peak
load and ramp-up for previous day and the same day in the previous week.
The forecasting strategy also includes two separate models for forecasting electricity loads of
weekdays, and weekends and holidays. The strategy of generating forecasts for different days of
the week from different models would result in better forecasting accuracy since the electricity is
highly dependent on daily load profiles. In the training phase, the parameters of the model, defined
in equation (5.1), are determined. Having used two models for load forecasting, the historical data
are divided into two groups of weekdays, and weekends and holidays.
When it comes to short-term electricity price forecasting, the application for using generated fore-
casts becomes critical. Here, the application for the price forecasting tool is the operation schedul-
ing of a grid-connected microgrid. A microgrid does not need to submit any offers/bids into the
89
market, and it is mainly treated as a load with local generation. Hence, the operation and energy ar-
bitrage of microgrids are usually through real-time markets for which the predictions for real-time
electricity prices are required. Note that this is subject to the structure of the electricity market of
interest; however, real time settlements are also performed even in the markets that have day-ahead
Moreover, price forecasting methodologies could vary significantly for different electricity
markets with different structures. Thus, a price prediction tool should be tailored specifically for
a particular electricity market. In this paper, we focus on Alberta’s electricity market in Canada.
Alberta’s real-time electricity market is a Balancing Authority (BA) connected to Western Elec-
tricity Coordinating Council (WECC). The Alberta Electric System Operator (AESO) oversees the
competitive electricity market and operates the power grid. The electricity price, established every
minute, is the System Marginal Price (SMP) and the Pool Price (PP) is the average of SMPs in
every hour. The AESO publishes SMPs, PPs, as well as price forecasts up to 3 hours ahead on its
website [151]. Using the same feature selection technique [150] as applied to load forecasting, the
main drivers for pool price predictions include lagged pool prices, SMPs, AESO’s price predic-
tions for 3 hours ahead, hourly average of historical pool prices for the last week, and historical
Having access to such publicly available market data and using the multi-variate linear regres-
sion model presented in (5.1), a forecasting strategy is proposed to predict hourly pool prices of
Alberta’s market using three different forecasting models. In our previous research work [14], we
showed that high-resolution market data such as market clearing prices (e.g., SMP in Alberta’s
market) are of high importance in capturing severe price variations in a real-time market. In this
paper, we take advantage of the first one-minute SMP along with the pool price and the demand
forecasts published by the AESO to generate the pool price forecast for the first hour of the fore-
casting horizon (e.g., 24-hour-ahead). Thus, M1P is specifically used for 1-hour-ahead pool price
prediction. The AESO’s forecasts can also be used as inputs for generating 2-hour-ahead and 3-
90
hour-ahead pool price forecasts. To do so, M2P is fed by AESO’s pool price forecasts as well as
the demand forecast and the average of pool price over the last week for a given hour of the day.
Finally, lagged pool prices, the average of pool prices for a given hour of the day over the last
week, and demand forecasts are fed into M3P to generate pool price forecasts from 3 to 24 hours
ahead.
This forecasting strategy was used to effectively generate price forecasts for Alberta’s electric-
A microgrid may consist of a few small wind turbines, which are usually less than 50 kW for
residential areas [152]. The aggregate wind power generation is of interest for short-term predic-
tion. Having applied the same feature selection technique, the selected features with the highest
impact on wind power generation are lagged values of wind power and wind speed, and forecast
values of wind speed and temperature and their squared values. The historical and forecasts values
of weather data are usually available online on weather station websites. In this paper, we used
Environment Canada website to gather the weather data [153]. The regression model 5.1 is fed by
these input features to generate hourly wind power forecasts for the next 24 hours.
Deterministic forecasting does not provide any information about uncertainties of the provided
point forecasts. Alternatively, probabilistic forecasting is a way to quantify such prediction uncer-
tainties. In this way, when prediction intervals with a specific confidence level are generated, an
operator can assess the extent to which these results could be trusted. Various state-of-the-art prob-
abilistic forecasting models have been presented in the literature [101]. Presented by Nowotarski
and Weron in [154], Quantile Regression Averaging (QRA) is as an efficient method for generat-
ing probabilistic forecasts for the electricity load, electricity price and wind power generation in
this paper. QRA provides prediction intervals using point forecasts from different individual de-
91
terministic models [155]. To do so, individual models can have different forecasting engines (e.g.,
different statistical models and/or neural networks), or the same forecasting model (e.g., a regres-
sion model) can be applied with different sets of input features to generate various point forecasts.
In this paper, we used the latter approach to generate different point forecasts.
To generate prediction intervals using QRA, the following procedure is performed: i) a number
of individual deterministic models are developed, ii) the individual models are trained using a
number of training samples, iii) point forecasts from the individual models are generated, iv) QRA
is trained using the generated point forecasts, and finally v) prediction intervals for the test samples
are generated. Using a set of point forecasts, the QRA model is trained as follows:
where Q(q|) is the conditional q th quantile of the target variable, Xt is the vector of n point fore-
casts for a given time t, and βq is the vector of parameters for quantile q. The parameters are
determined by minimizing the loss function over the vector of parameters for a particular q th quan-
tile, as follows:
X X
Min. q|At − Xt βq | + (1 − q)|At − Xt βq | (5.3)
βq
t:At >Xt βt t:At <Xt βt
where At is the actual value of the target variable at time t. In this process, the parameters of the
In this work, we consider five confidence levels of 50%, 60%, 70%, 80% and 90% to investigate
their impacts on the operation of the microgrid. With a confidence level of 90%, prediction inter-
vals are expected to cover the real values of the target variable with the probability of 90%. To have
a confidence level of 90%, the lower and the upper quantiles should be 5% and 95%, respectively.
Thus, the value of the quantile should be set to 5% (q = 0.05) to estimate the vector of parameters
for the lower bound, β0.05 . Likewise, the value of the quantile is set to 95% (q = 0.95) to determine
the vector of parameters for the upper bound, β = 0.95. A similar process is performed for other
confidence levels. In addition, similar to the point forecasting models, the prediction intervals are
92
5.4 Optimization Platform
In this section, after a brief overview of microgrids, deterministic and robust optimization platforms
A grid-connected microgrid with different components is considered in this paper. The microgrid
includes an electric load, a number of Diesel Generators (DGs), a Battery Energy Storage System
(BESS), as well as wind power generation from a few small wind turbines. The microgrid is
connected to the main grid, and thus it can purchase shortage power from the grid and sell excess
power to the grid as well. The objective is to schedule the units of the microgrid such that the
total cost of serving the microgrid load is minimized. An Energy Management System (EMS)
is the main microgrid controller that sends directives to the dispatchable units. The optimization
platform could be embedded in the EMS. In this way, the required forecasts of the load, price, and
wind power are fed into the optimization problem, and the outputs are schedules of the units.
For the microgrid with components mentioned above, a deterministic operation scheduling prob-
lem of the microgrid is formulated in this section. Deterministic scheduling is one way to schedule
the operation of a micro grid, in which a set of point forecasts (e.g. load, price and wind power) is
fed to the optimization problem, while their associated uncertainties are not considered [141, 142].
93
NT
" NG
#
X X
Min. λ̂t PtGrid + Gen
(F Ci,t (Pi,t ) + SU Ci,t ) (5.4)
Ψ
t=H i=1
Gen Gen
F Ci,t (Pi,t ) = ai ui,t + bi Pi,t (5.5)
s.t.
In the formulations above, the objective is to minimize the cost of operating the microgrid up
to NT hours into the future, e.g., 24-hour-ahead, shown in Equation (5.4). The total cost includes
the fuel costs and start-up costs of generators, plus the power purchased from the main grid. The
fuel cost, shown in Equation (5.5), is a function of the power generated by the generator. The
objective function is subject to a set of operational constraints. The start-up cost is formulated
using the state of the generator, presented in Equations (5.6) and (5.7). Equation (5.8) shows the
power balance constraint, in which the total generation should be equal to the electric load at each
time. PtGrid is positive when the microgrid purchases energy from the grid, while it is negative
when the microgrid sells the excess power to the grid. In other words, when PtGrid is negative,
the grid is considered as a load that is supplied by the local generation in the microgrid. Equa-
94
tions (5.9) and (5.10) show the operation limits of diesel generators. It should also be noted that
ramp up/down constraints of diesel generators are ignored in this study due to fast ramping rates
of small-scale diesel generators. Equations (5.11)-(5.15) also represent the operational constraints
of the BESS. The outputs of this optimization problem are the optimal values for the set of vari-
Gen
ables, i.e., Ψ = {Pi,t , ui,t , SU Ci,t , PtGrid , PtB , EtB }. In other words, the solution of the optimiza-
tion problem is the schedules of the dispatchable/controllable units in the microgrid. The rolling
horizon framework updates the forecasts at each forecasting origin, i.e., H, and consequently the
schedules every hour to ensure the most economical operation of the microgrid in an operating
day.
Robust optimization is an interval based approach in which a confidence gap is defined around
the forecast parameters [156]. The worst case realization of the uncertainties is then evaluated
for the decision-making process. Therefore, RO can be fed by prediction intervals with different
confidence levels for the operation scheduling of the microgrid. The RO formulation is presented
NT
" NG
#
X X
Grid Gen
Min. Max. λ t Pt + (F Ci,t (Pi,t ) + SU Ci,t ) (5.16)
Ψ Ω
t=H i=1
s.t.
NG
X
PtGrid + Gen
Pi,t + PtW + ηB PtB = PtL (5.17)
i=1
λLow
t 6 λt 6 λUt p (5.18)
Here, the uncertain parameters in the objective function (Equation (5.16)) and the power bal-
ance constraint (Equation (5.17)) are Ω = {λt , PtL , PtW }. The objective is to minimize the cost
95
with respect to Ψ, the set of decision variables. Meanwhile the cost is maximized with respect
to the set of uncertain parameters, i.e., Ω, in order to reduce the risk of uncertain parameters on
decision making. Observe from Equations (5.18) - (5.20) that uncertainties vary in specific inter-
vals. These are prediction intervals provided by the probabilistic forecasting models. The robust
optimization evaluates the worst case of uncertainties and guarantees a level of cost. Thus, the
actual cost of operating the microgrid would be lower than the guaranteed level if the actual values
To solve this min-max optimization problem using commercial solvers, it requires modification
to a simple minimization problem. It is observed from Equations (5.16) - (5.20) that the optimiza-
tion problem is linear with respect to uncertain parameters. Therefore, the worst-case would be
obtained on the lower or upper bounds of these uncertain parameters. The extreme points of the
uncertain parameters electricity load and non-dispatchable generation can be easily determined. A
higher load and a lower non-dispatchable generation (i.e., wind) would result in a higher operation
cost. Therefore, the worst-case solution would be obtained when the non-dispatchable generation
is at its lower uncertainty bound (i.e., PtW,Low ) and the load is at its upper uncertainty bound (i.e.,
PtL,U p ).
NG
X
PtGrid + Gen
Pi,t + PtW,Low + ηB PtB = PtL,U p (5.21)
i=1
Hence, the power balance constraint is accordingly replaced with (5.21), where the upper bound of
load and the lower bound of wind power generation prediction intervals replace their corresponding
uncertain parameters.
However, it is not as easy to determine which bound of the uncertain parameter electricity
price would lead to the worst case. This is because it could be the upper bound in some cases
(e.g., when buying energy from the grid) or the lower bound in other cases (e.g., when injecting
power to the grid). Therefore, a binary variable is first defined, denoted by uGrid
t , in order to define
96
been formulated in Equations (5.22) and (5.23). Using uGrid
t and a big enough positive number,
denoted by M, we can determine the worst case of the uncertain parameter λt . This method is also
known as the Big M method [156]. Bt is a replacement variable for the nonlinear term λt PtGrid in
the objective function. The following equations determine the worst case for the electricity price
uGrid
t ∈ {0, 1} (5.22)
PtGrid 6 uGrid
t M (5.23)
Bt − M 6 λLow
t PtGrid − (1 − uGrid
t )M (5.26)
Bt + M > λLow
t PtGrid + (1 − uGrid
t )M (5.27)
Equations (5.24) and (5.25) ensure that the uncertain parameter λt is set to λUt p when the mi-
crogrid purchases energy from the grid, i.e., the worst case. Similarly, equations (5.26) and (5.27)
i.e., the worst case. Hence, the final objective function can be re-formulated as follows:
NT
" NG
#
X X
Gen
Min. Bt + (F Ci,t (Pi,t ) + SU Ci,t ) (5.28)
Ψ
t=H i=1
where we could eliminate the max part of the optimization. Therefore, a simple minimization
problem is formulated in a manner that can be input to any commercial solver. The final objective
With this formulation, the cost of the operation of the microgrid is minimized when the worst case
of uncertain parameters occurs. This suggests that the cost of operation would be lower if the
97
5.5 Numerical Results
In this study, the electric load consumptions of a building in the University of Calgary, Calgary,
Alberta, Canada, is considered as the microgrid load. The electricity load data of this campus
building is from January 2016 to December 2016 with the peak value of 968 kW. For the electricity
price, the data of pool prices from Alberta’s electricity market is considered for the year 2016. In
addition, the hourly data of Taber wind farm in Alberta, Canada, in year 2015 is used for generating
wind power forecasts. The wind power data is scaled down to 100 kW to be consistent with a
normal residential capacity. The battery energy storage system has the capacity of 200 kW/400
kWh with the efficiency of 80%. In addition, two diesel generators are considered within the
microgrid with the capacity of 500 kW for generator #1 and 600 kW for the generator #2.
In this section, a brief statistical evaluation is provided for generated point and interval forecasts to
show their accuracy. An error measure of Mean Absolute Error (MAE) is considered to evaluate
where Nh is the number of hours in the test month, and ACT (t) and F OR(t) are the actual
and forecast values at hour t, respectively. For the electricity load and wind power generation,
the normalized MAE is expressed as a percentage of the peak load and wind power capacity,
respectively. Table 5.1 shows the results of point forecasting errors for 1-hour-ahead predictions for
electricity load, electricity price and wind power generation in terms of MAE for 12 test months.
In addition, the generated price forecasts are compared with forecasts published by the AESO
as a benchmark to show the accuracy of the developed methodology. The average of error for
the electricity load is 9.9 kW, which is only 1% of the peak load. Wind power forecasting error
averaged 6.77 kW that accounts for 6.7% of the total wind capacity. These figures demonstrate a
satisfactory performance for load and wind power prediction models. The last two columns show
98
Table 5.1: Errors of point forecasts in terms of MAE
Test Load Wind Price ($/MWh)
Month (%) (%) AESO Proposed
Jan. 0.98 7.01 2.93 1.98
Feb. 0.89 6.89 0.53 0.52
Mar. 0.97 7.20 0.39 0.38
Apr. 0.94 7.52 0.45 0.33
May 1.07 5.91 1.06 0.66
Jun. 0.90 5.63 0.53 0.46
Jul. 0.88 5.42 1.10 0.99
Aug. 0.94 6.50 2.42 1.39
Sep. 1.02 6.62 0.78 0.67
Oct. 1.15 7.45 2.25 1.66
Nov. 1.18 7.82 0.51 0.61
Dec. 1.38 7.38 1.71 1.39
Avg. 1.02 6.77 1.22 0.92
the effectiveness of the developed price forecasting model for Alberta’s electricity market. As
shown, the average of the error is $1.22/MWh for the price forecasts generated by the AESO, while
the developed model resulted in $0.92/MWh MAE. In other words, there is 25% improvement in
Two error measures are also used for evaluating the probabilistic forecasts; Prediction Interval
Coverage Probability (PICP) and Mean Prediction Interval Width (MPIW) [98], defined as follows:
Nh
1 X 1,
ACT (t) ∈ [Lt , Ut ]
P ICP = ct ; ct = (5.30)
Nh t=1
0,
otherwise
Nh
1 X
M P IW = (Ut − Lt ) (5.31)
Nh t=1
where Ut and Lt are upper and lower bounds of the prediction interval for hour t, respectively.
Table 5.2 displays the yearly average of errors for probabilistic forecasts of the electricity load,
price and wind power for different Confidence Levels (CL). PICP is an indication of the proportion
of the time that generated prediction intervals contain the actual values of interest. It is observed
99
Table 5.2: Errors of probabilistic forecasts
Load Wind Price
CL PICP MPIW PICP MPIW PICP MPIW
(%) (kW) (%) (kW) (%) ($/MWh)
50% 50.2 14.5 48.7 8.7 52.1 0.62
60% 60.1 18.9 57.9 11.5 61.6 0.95
70% 70.2 23.8 69.5 15.0 71.5 1.46
80% 80.5 31.1 78.4 20.8 80.6 2.21
90% 90.3 42.2 88.9 30.1 90.5 3.48
that the coverage probability for different confidence levels are satisfactorily above the nominal
probabilities for electricity load and price. Although PICP values for wind power are slightly
below the nominal probability, the results show an acceptable level of coverage for all confidence
levels. A satisfactorily large PICP can be easily achieved by widening prediction intervals from
either side. However, such intervals are too conservative and less useful in practice, as they do not
show the variation of the targets. Therefore, MPIW is an alternative measure to show how wide
the intervals are. This table clearly demonstrates that MPIW increases as the nominal confidence
level increases. However, even for 90% nominal confidence level, the average width of generated
prediction intervals is very small for the electricity load, price and wind power. For instance, the
MPIW for the electricity load prediction with 90% confidence level is 42.2 kW, which is only 4% of
the peak load. The lower the average width of the prediction intervals, the higher the quality of the
generated probabilistic forecasts. It should also be noted that confidence levels less than 70% are
not usually considered in practical applications. The inclusion of 50% and 60% confidence levels
is only to investigate how they would impact the performance of microgrid from an economic
perspective.
In this section, the performance of the generated forecasts is evaluated from an economic point of
view. To do so, the point forecasts are fed into the deterministic optimization problem presented
in section 5.4.2, and the prediction intervals are used as inputs to the RO problem introduced in
section 5.4.3. The optimization problem schedules the units for the operating day and the cost of
100
operation is calculated considering actual values of electricity load, price and wind power gener-
ation. The objective is to find which approach leads to better economics of the microgrid under
different scenarios.
In the first experiment, a scenario is considered with the electricity price of Alberta’s market
in 2016 along with the wind power generation capacity of 100 kW. This wind capacity is almost
10% of the peak load and hence, it is considered a low wind profile in this study. Hourly price
in Alberta’s market averaged $18.28/MWh, which was a remarkable decrease of 45% from 2015.
Even the average pool price during the peak period (i.e., 7 a.m. to 11 p.m.) was only $19.73/MWh
[157]. Thus, this shows a low, smooth price profile for the first scenario. Also, given the low ratio
of total wind power generation to the peak load (i.e., almost 10%), the first scenario investigates
the performance of two approaches with low price and wind profiles.
The total cost of operating the microgrid over a year considering different approaches is shown
in Fig. 5.1. Having access to perfect forecasts, the operation cost of the microgird could be
as low as $71,215 over the year. While the day-ahead deterministic scheduling resulted in the
total operation cost of $102,327, the RH approach could reduce the cost to $90,403 (i.e., 11.5%
improvement). As seen, the day-ahead probabilistic approach could slightly decrease the total cost
with 50% and 60% confidence levels, whereas any further increase in the confidence interval led
to higher operation costs. Moreover, applying both approaches together with all confidence levels
could not outperform the deterministic RH strategy. Due to smooth wind and price profiles, the
point forecasts have high accuracy and thus, applying robust scheduling leads to an unnecessarily
In the next experiment, a high wind profile with the same price profile is studied to investigate
the impact of high wind power generation. To create a high wind scenario, the wind capacity is
increased to 500 kW, which is approximately 50% of the peak load. As seen from Fig. 5.2, higher
wind power capacity clearly resulted in lower operation cost overall compared to the first scenario.
Here, perfect forecasts could lower the total cost to $68,079. Observe that with high wind level,
101
110
105 RH
DA
100
95
Cost (thousand $)
90
85
80
75
70
65
60
ic ) ) ) ) )
ist 0% 0% 0% 0% 0%
in (5 (6 (7 (8 (9
m tic tic tic tic tic
er
De
t
b ilis ilis ilis ilis ilis
ba b ab b ab b ab b ab
o ro ro ro ro
Pr P P P P
Figure 5.1: Costs for Scenario #1: low price and low wind
the combination of RH and probabilistic methods for 50% and 60% confidence levels would lead
to the lowest possible operation cost. Therefore, in case of a smooth price profile, the uncertainty
related to high wind generation can be mitigated with robust scheduling in a more effective way.
The higher the confidence level of interest, the higher the total cost could be. This is because higher
confidence levels lead to more conservative scheduling and consequently higher costs. On the other
hand, low confidence levels might not be very attractive to operators. The proper confidence level
needs to be determined based on the level of wind power penetration in such a scenario.
In the final experiment, the effect of volatile, non-smooth electricity price on the operation of
microgrid is assessed. To do so, the price profile of the same market in year 2013 is considered.
Alberta’s market had an average pool price of $80.19/MWh over 2013 [157]. The peak prices
averaged $106.13/MWh including numerous price spikes (with the price cap of $999.99/MWh) in
2013 that shows a very high electricity price profile. In terms of volatility, the root mean squared
of the price was $187.5/MWh in 2013 compared to $22.2/MWh in 2016. This simply shows a
volatile price profile in year 2013. Considering a low wind generation profile along with high
prices, Fig. 5.3 demonstrates the operation costs for different strategies. As shown in this figure,
102
100
RH
95 DA
90
Cost (thousand $)
85
80
75
70
65
60
ic ) ) ) ) )
ist 0% 0% 0% 0% 0%
in (5 (6 (7 (8 (9
rm tic tic tic tic tic
te ilis ilis ilis ilis ilis
De b b b b b
ba ba ba ba ba
P ro Pr
o
Pr
o
Pr
o
Pr
o
Figure 5.2: Costs for Scenario #2: low price and high wind
there is a significant cost reduction in the operation of the microgrid due to high potential for energy
arbitrage with the main grid. When the electricity price is low, the microgrid purchases energy from
the grid and also charges the BESS. As the price increases the excess power generated within the
microgrid is injected to the grid for making profit. Interestingly, given the perfect forecasts, the
total annual cost of the microgrid is $-6,141; in fact, the microgrid is making a profit over the year.
The deterministic RH approach resulted in the lowest operation cost with 92% improvement
over the DA deterministic method. This is mainly due to high capability of the price forecast-
ing model with rolling horizon framework in capturing high prices for timely energy arbitrage.
Although the DA probabilistic method with 50%, 60% and 70% confidence levels and RH proba-
bilistic method with all confidence levels could improve the deterministic DA approach, they failed
to outperform deterministic RH strategy. This is because the RO acts conservatively and might not
take advantage of high price hours. Hence, this approach can significantly enhance the economics
of microgrids when the electricity prices contains severe variations and spikes.
103
30
RH
DA
25
Cost (thousand $)
20
15
10
ic ) ) ) ) )
ist 0% 0% 0% 0% 0%
in (5 (6 (7 (8 (9
m tic tic tic tic tic
er
De
t
bilis b ilis bilis b ilis bilis
ba ba ba ba ba
P ro Pr
o
Pr
o
Pr
o
Pr
o
Figure 5.3: Costs for Scenario #3: high price and low wind
5.6 Conclusions
In this paper, the operation of a grid-connected microgrid is studied using both determinsitic and
probabilistic strategies. Point and interval forecasts are generated for the electricity load, price and
wind power generation. The generated forecasts are fed into an optimization platform to operate
the microgrid with the lowest possible cost. Two strategies to mitigate the uncertainties related
to forecasts, i.e., rolling horizon technique and prediction intervals, are implemented, and their
impact on the economic performance of microgrids are investigated using different scenarios. It
is concluded that the level of wind power generation and the volatility of the electricity price are
the main drivers to select the effective approach for the operation of a microgrid. The numerical
experiments showed that prediction intervals with robust optimization can improve the economics
of the microgrid when wind power generation is considerable with respect to the peak load. How-
ever, when the electricity price is highly volatile, the deterministic rolling horizon strategy is the
most economical way to operate the microgrid, while RO might not fully take advantage of high
104
Chapter 6
Conclusions
In this thesis, short-term forecasting tools are developed for the operation of microgrids. The
three sets of forecasts required for this purpose are i) the power generation from renewable energy
resources within the microgrid, ii) the electricity load of the microgrid, and iii) the price of the elec-
tricity market in which the microgrid is located and operated. Therefore, a wind power forecasting
model is developed to provide predictions for short-term wind power generation. Then, the special
characteristics of electricity loads in microgrids are analyzed and compared to loads in power sys-
tems. A prediction methodology is accordingly built to accommodate the volatile behavior of loads
is proposed for the electricity market price to enable the microgrid to trade energy with the main
grid in the most economical way. To operate the microgrid, forecasts are fed into an optimization
algorithm that provides the operational schedules of the dispatchable resources in the microgrid.
In the last stage of this thesis, different optimization platforms for the operation of microgrids are
implemented. Then, the impacts of different uncertainty mitigation approaches on the economic
The overall significance of this thesis is that it focuses on supportive tools for the development
of microgrids. The outcome of this thesis could help the microgrid operators to operate their
systems in a more reliable, efficient and economical way. The detailed conclusions of each chapter
In Chapter 2, a prediction method is developed for wind power generation. The model is based
105
the proposed wind power forecasting model as well as its main components, e.g., the training
procedure, is extensively evaluated by real-world wind power data. The statistical performance of
the proposed model is compared with that of an existing wind power forecasts for Alberta’s power
system from a third-party company. The results show the satisfactory performance of the proposed
model for short-term horizons. The contribution of this chapter is the development of an efficient
forecasting model that can provide accurate short-term wind power predictions for aggregated or
individual wind farms. The significance of this work is the application of wind power forecasting
Chapter 3 proposes a short-term load forecasting model for the operation of micro-grids. Con-
sidering volatile and non-smooth characteristics of load time series of micro-grids compared with
power systems’ electricity loads, the proposed forecasting method aims to deal with such chal-
lenges. The model has the structure of a state-of-the-art neural network-based forecasting engine,
i.e., Self-recurrent Wavelet Neural Network (SRWNN), capable of capturing nonlinear complexi-
ties of volatile time series. The Levenberg-Marquardt learning algorithm is implemented to train
the forecasting engine. The effectiveness of the proposed forecasting model is demonstrated using
real-world load data of a micro-grid and two power systems. The results show that the proposed
model leads to more accurate forecasts when the prediction of a volatile time series, i.e., microgrid
loads, is of interest. Thus the main contribution of this chapter is the development of a forecasting
model that can capture non-smooth behavior of electricity loads in microgrids. The significance
of this work is that accurate load forecasts can enhance the energy management of both renewable
and conventional resources in the microgrid and also improve the economics of energy trades with
electricity markets.
In Chapter 4, a forecasting strategy is proposed to provide accurate price forecasts for the oper-
ation of behind-the-meter storage systems within microgrids. This strategy includes two separate
forecasting models to take advantage of high-resolution market data along with hourly data in or-
der to capture price spikes as much as possible. Moreover, the forecasting models are embedded
106
in an intra-hour rolling horizon framework in order to update the forecasts on an hourly basis.
Using real-world price data from Ontario’s electricity market, the proposed strategy is evaluated
from both statistical and economic perspectives. From statistical analysis, the proposed strategy
results in 20% improvement in forecast accuracy compared to available pre-dispatch prices from
the system operator, and has a high capability of detecting price spikes. For economic assessments,
the generated forecasts are fed to an optimization platform for the operation scheduling of a bat-
tery energy storage system within a micro-grid facility. It is concluded that the storage system
can bring better economic return using the price predictions generated by the proposed forecasting
strategy, 62% of the potential saving, in comparison with a number of other strategies, e.g., 43%
of the potential saving using available pre-dispatch prices. Hence, the contribution of this chapter
is to develop a price forecasting strategy that can efficiently capture severe variations in electricity
prices, e.g., price spikes. The significance of this work is that the detection of high electricity
prices can enhance the economics of grid-connected microgrids by timely energy arbitrage with
Chapter 5 summarizes the findings of an investigation in the alternative approaches for mitigat-
ing the impact of forecast errors in the operation of microgrids. The operation of a grid-connected
microgrid is considered using both deterministic and probabilistic strategies. Using real-world
data, point and interval forecasts are first generated for the electricity load, price and wind power
generation. The generated forecasts are then fed into an optimization platform to operate the mi-
crogrid with the lowest possible cost. The rolling horizon technique and prediction intervals, as
the two strategies to mitigate the uncertainties related to forecasts, are evaluated and compared
using different scenarios. It is concluded that the level of wind power generation and the volatility
of the electricity price are the main drivers to select the effective approach for the operation of a
microgrid. The numerical experiments show that prediction intervals with robust optimization can
improve the economics of the microgrid when wind power generation is considerable with respect
to the peak load. However, when the electricity price is highly volatile, the deterministic rolling
107
horizon strategy is the most economical way to operate the microgrid, while robust optimization
might not fully take advantage of high price hours and their potential energy arbitrage opportuni-
ties. Thus, the contribution of this chapter is to explore the most efficient optimization platform
for the operation of a microgrid given a set of deterministic and probabilistic forecasts. The signif-
icance of this work is that it helps the operator to adopt the most economical approach to operate
the microgrid under different scenarios of wind integration levels and market conditions.
data with an hourly resolution. Wind energy could have severe variations during
an hour that might not be effectively captured by the hourly average of the data.
Considering a higher resolution for wind power prediction, e.g., 15 minutes, could
better capture sub-hourly volatility of such time series. As an extension to this work,
the development of a wind power forecasting method with such higher resolutions
could improve the forecast accuracy. However, an optimization platform also needs
campus building might have a totally different load pattern than a building con-
which different technologies are used to increase the energy efficiency (e.g., triple-
glazed windows, occupancy sensors and smart timing schedules), the consumed
108
energy might continuously change with high magnitudes. Hence, an extension to
this chapter could be investigating different types of buildings with various load
patterns, and developing effective forecasting models accordingly. This could po-
ing/microgrid.
market data in the proposed intra-hour optimization platform for microgrids. Since
the forecasting engine is not the main focus of this chapter, a simple linear re-
of a more efficient forecasting model that can particularly provides more accurate
forecasts for longer horizons, e.g., from 6 to 24 hours ahead. This can potentially
voltaic (PV) solar energy units within the microgrid. There will not be significant
the same as other sources, e.g., wind power generation. PV and wind production are
often anti-correlated, and there could be value in combining them just for that rea-
son. This will reveal which mitigation approach can accommodate the uncertainty
5. Since the load forecasting error in Chapter 5 was significantly lower than those for
wind power generation and the electricity price, the research focused only on sce-
narios for high wind penetration and high price volatility. Hence, another extension
to Chapter 5 could be investigating different types and levels of the electricity load,
109
Bibliography
[1] B. Lasseter, “Microgrids [distributed power generation],” Power Engineering Society Winter
andnanogrids-an-emerging-energy-access-solution-ecosystem
and-the-challenges
industrial-microgrid-market-tips-scales
energy management for a microgrid,” IEEE Transactions on Industrial Electronics, vol. 60,
[6] N. Amjady, F. Keynia, and H. Zareipour, “Short-term load forecast of microgrids by a new
bilevel prediction strategy,” IEEE Transactions on Smart Grid, vol. 1, no. 3, pp. 286–294,
December 2010.
tion, monitoring and replanning approach for optimal energy management in microgrids,”
[8] K. Rohrig and B. Lange, “Improvement of the power system reliability by prediction of
wind power generation,” Power Engineering Society General Meeting, 2007. IEEE, pp. 1–
8, 2007.
[9] E. Paparoditis and T. Sapatinas, “Short-term load forecasting: The similar shape functional
110
time-series predictor,” IEEE Transactions on Power Systems, vol. 28, no. 4, pp. 3818–3825,
November 2013.
day ahead generation scheduling for micro-grids with renewable sources,” 2011 IEEE PES
Innovative Smart Grid Technologies Asia (ISGT), pp. 1–6, November 13-16 2011.
[11] N. M. Pindoriya, S. N. Singh, and S. K. Singh, “An adaptive wavelet neural network-based
energy price forecasting in electricity markets,” IEEE Transaction on Power System, vol. 23,
[12] H. Chitsaz, N. Amjady, and H. Zareipour, “Wind power forecast using wavelet neural net-
work trained by improved clonal selection algorithm,” Energy Conversion and Management,
[13] H. Chitsaz, H. Shaker, H. Zareipour, D. Wood, and N. Amjady, “Short-term electricity load
forecasting of buildings in microgrids,” Energy and Buildings, vol. 99, pp. 50–60, July 2015.
C. Gil, “Wind turbine selection for wind farm layout using multi-objective evolutionary
algorithms,” Expert Systems with Applications, vol. 41, no. 15, pp. 6585 – 6595, 2014.
111
[18] Aeso. [Online]. Available: https://www.aeso.ca
of Mexico,” Renewable and Sustainable Energy Reviews, vol. 14, no. 9, pp. 2830 – 2840,
2010.
Sierra, “Is the wind a periodical phenomenon? the case of Mexico,” Renewable and Sus-
tainable Energy Reviews, vol. 15, no. 1, pp. 721 – 728, 2011.
[21] Y. Xu, Z.-Y. Dong, Z. Xu, K. Meng, and K. P. Wong, “An intelligent dynamic security as-
sessment framework for power systems with wind power,” IEEE Transactions on Industrial
[22] P. Hu, R. Karki, and R. Billinton, “Reliability evaluation of generating systems containing
wind power and energy storage,” Generation, Transmission Distribution, IET, vol. 3, no. 8,
Agugliaro, “Wind energy resource in northern Mexico,” Renewable and Sustainable Energy
[24] J. Cardell, L. Anderson, and C. Y. Tee, “The effect of wind and demand uncertainty on
electricity prices and system performance,” Transmission and Distribution Conference and
[25] M. Black and G. Strbac, “Value of bulk energy storage for managing wind power fluctua-
tions,” IEEE Transactions on Energy Conversion, vol. 22, no. 1, pp. 197–205, 2007.
112
[27] N. Chen, Z. Qian, I. Nabney, and X. Meng, “Wind power forecasts using gaussian processes
and numerical weather prediction,” IEEE Transactions on Power Systems, vol. 29, no. 2, pp.
[28] M. Khalid and A. Savkin, “A method for short-term wind power prediction with multiple
observation points,” IEEE Transactions on Power Systems, vol. 27, no. 2, pp. 579–586, May
2012.
[29] P. Kou, D. Liang, F. Gao, and L. Gao, “Probabilistic wind power forecasting with on-
line model selection and warped gaussian process,” Energy Conversion and Management,
parison of two new short-term wind-power forecasting systems,” Renewable Energy, vol. 34,
[31] R. G. Kavasseri and K. Seetharaman, “Day-ahead wind speed forecasting using f-arima
models,” Renewable Energy, vol. 34, no. 5, pp. 1388 – 1393, 2009.
forecasting model (wrf): A case study in peru,” Energy Conversion and Management,
[33] S. Fan, J. Liao, R. Yokoyama, L. Chen, and W. jen Lee, “Forecasting the wind generation
[34] M. Milligan, M. Schwartz, and Y. Wan, “Statistical wind power forecasting models:
results for u.s. wind farms,” NREL, Tech. Rep. NREL/CP-500-33956, May 2003. [Online].
Available: http://www.nrel.gov/docs/fy03osti/33956.pdf
113
[35] K. Methaprayoon, C. Yingvivatanapong, W. jen Lee, and J. Liao, “An integration of ann
wind power estimation into unit commitment considering the forecasting uncertainty,” IEEE
pubs/65613.pdf
[37] E. Erdem and J. Shi, “Arma based approaches for forecasting the tuple of wind speed and
direction,” Applied Energy, vol. 88, no. 4, pp. 1405 – 1414, 2011.
[38] A. Sfetsos, “A novel approach for the forecasting of mean hourly wind speed time series,”
mization methods applied to renewable and sustainable energy: A review,” Renewable and
Sustainable Energy Reviews, vol. 15, no. 4, pp. 1753 – 1766, 2011.
[40] T. Barbounis and J. Theocharis, “Locally recurrent neural networks for long-term wind
speed and power prediction,” Neurocomputing, vol. 69, no. 4-6, pp. 466 – 496, 2006.
[41] G. Sideratos and N. Hatziargyriou, “Using radial basis neural networks to estimate wind
power production,” Power Engineering Society General Meeting, 2007. IEEE, pp. 1–7,
2007.
[42] C. Potter and M. Negnevitsky, “Very short-term wind forecasting for tasmanian power gen-
eration,” IEEE Transactions on Power Systems, vol. 21, no. 2, pp. 965–972, 2006.
[43] H. Pousinho, V. Mendes, and J. Catalao, “A hybrid pso-anfis approach for short-term wind
power prediction in portugal,” Energy Conversion and Management, vol. 52, no. 1, pp. 397
– 402, 2011.
114
[44] N. Amjady, F. Keynia, and H. Zareipour, “A new hybrid iterative method for short-term
wind speed forecasting,” European Transactions on Electrical Power, vol. 21, no. 1, pp.
581–595, 2011.
short-term wind speed and power,” Renewable and Sustainable Energy Reviews, vol. 34,
[47] J. Catalão, H. M. I. Pousinho, and V. Mendes, “Hybrid intelligent approach for short-term
wind power forecasting in portugal,” Renewable Power Generation, IET, vol. 5, no. 3, pp.
251–257, 2011.
[48] D. Faria, R. Castro, C. Philippart, and A. Gusmao, “Wavelets pre-filtering in wind speed
“Application of wavelet and neural network models for wind speed and power generation
[50] P. Chen, H. Chen, and R. Ye, “Chaotic wind speed series forecasting based on wavelet
packet decomposition and support vector regression,” IPEC, 2010 Conference Proceedings,
[51] L. J. Ricalde, G. Catzin, A. Y. Alanis, and E. N. Sanchez, “Higher order wavelet neural
networks with kalman learning for wind speed forecasting,” IEEE Symposium on Computa-
115
[52] N. Amjady, F. Keynia, and H. Zareipour, “Wind power prediction by a new forecast engine
composed of modified hybrid neural network and enhanced particle swarm optimization,”
IEEE Transactions on Sustainable Energy, vol. 2, no. 3, pp. 265–276, July 2011.
[53] L. Wu and M. Shahidehpour, “A hybrid model for day-ahead price forecasting,” IEEE Trans-
[54] L. de Castro and F. Von Zuben, “Learning and optimization using the clonal selection prin-
ciple,” IEEE Transactions on Evolutionary Computation, vol. 6, no. 3, pp. 239–251, 2002.
[55] Q. Wang, C. Wang, and X. Gao, “A hybrid optimization algorithm based on clonal selection
[56] G. C. Liao, “Application of an immune algorithm to the short-term unit commitment prob-
lem in power system operation,” IEEE Proceedings on Generation, Transmission and Dis-
[58] A. Slowik and M. Bialko, “Training of artificial neural networks using differential evolution
[59] R. Bessa, V. Miranda, and J. Gama, “Entropy and correntropy against minimum square error
in offline and online three-day ahead wind power forecasting,” IEEE Transactions on Power
[60] W. Liu, P. P. Pokharel, and J. C. Principe, “Correntropy: Properties and applications in cor-
on Signal Processing, vol. 55, no. 11, pp. 5286–5298, November 2007.
microgrids
116
[62] J. Taylor and P. McSharry, “Short-term load forecasting methods: An evaluation based
on european data,” IEEE Transactions on Power Systems, vol. 22, no. 4, pp. 2213–2219,
November 2007.
[63] T. Hong, M. Gui, M. Baran, and H. Willis, “Modeling and forecasting hourly electric load by
multiple linear regression with interactions,” Power and Energy Society General Meeting,
[64] H. Hippert, C. Pedreira, and R. Souza, “Neural networks for short-term load forecasting:
a review and evaluation,” IEEE Transactions on Power Systems, vol. 16, no. 1, pp. 44–55,
February 2001.
[65] Y. Wang, Q. Xia, and C. Kang, “Secondary forecasting based on deviation analysis for short-
term load forecasting,” IEEE Transactions on Power Systems, vol. 26, no. 2, pp. 500–507,
May 2011.
[66] E. Ceperic, V. Ceperic, and A. Baric, “A strategy for short-term load forecasting by support
vector regression machines,” IEEE Transactions on Power Systems, vol. 28, no. 4, pp. 4356–
[67] Y. Goude, R. Nedellec, and N. Kong, “Local short and middle term electricity load forecast-
ing with semi-parametric additive models,” IEEE Transactions on Smart Grid, vol. 5, no. 1,
low voltage grid: A market-based multiperiod optimization model,” Electric Power Systems
[69] A. Mohamed, V. Salehi, and O. Mohammed, “Real-time energy management algorithm for
mitigation of pulse loads in hybrid microgrids,” IEEE Transactions on Smart Grid, vol. 3,
117
[70] P. Chan, W.-C. Chen, W. Ng, and D. Yeung, “Multiple classifier system for short term
[71] M. Shahidehpour and M. Khodayar, “Cutting campus energy costs with hierarchical con-
trol,” IEEE Electrification Magazine, vol. 1, no. 1, pp. 40– 56, September 2013.
“A review on applications of ANN and SVM for building electrical energy consumption
forecasting,” Renewable and Sustainable Energy Reviews, vol. 33, pp. 102 – 109, 2014.
neural network prediction method for electrical consumption forecasting based on building
end-uses,” Energy and Buildings, vol. 43, no. 11, pp. 3112 – 3119, 2011.
[74] A. H. Neto and F. A. S. Fiorelli, “Comparison between detailed model simulation and arti-
ficial neural network for forecasting building energy consumption,” Energy and Buildings,
[75] S. Farzana, M. Liu, A. Baldwin, and M. U. Hossain, “Multi-model prediction and simulation
of residential building energy in urban areas of chongqing, south west china,” Energy and
multi-family residential buildings using support vector regression: Investigating the impact
diction tool for commercial buildings using case-based reasoning,” Energy and Buildings,
118
[78] G. Escriva-Escriva, C. Roldan-Blay, and C. Alvarez-Bel, “Electrical consumption forecast
using actual data of building end-use decomposition,” Energy and Buildings, vol. 82, pp. 73
– 81, 2014.
[79] J. G. Jetcheva, M. Majidpour, and W. P. Chen, “Neural network model ensembles for
building-level electricity load forecasts,” Energy and Buildings, vol. 84, pp. 214 – 223,
2014.
microgrid/
The case of Ontario,” Energy Policy, vol. 35, pp. 4739–4748, 2007.
[82] N. Amjady and F. Keynia, “Short-term load forecasting of power systems by combination
of wavelet transform and neuro-evolutionary algorithm,” Energy, vol. 34, no. 1, pp. 46 – 57,
2009.
[83] Q. Zhang and A. Benveniste, “Wavelet networks,” IEEE Transactions on Neural Networks,
[84] J. Vermaak and E. Botha, “Recurrent neural networks for short-term load forecasting,” IEEE
Transactions on Power Systems, vol. 13, no. 1, pp. 126–132, February 1998.
[85] S. J. Yoo, J. B. Park, and Y. H. Choi, “Adaptive dynamic surface control of flexible-joint
robots using self-recurrent wavelet neural networks,” IEEE Transactions on Systems, Man,
and Cybernetics, Part B: Cybernetics, vol. 36, no. 6, pp. 1342–1355, December 2006.
[86] ——, “Indirect adaptive control of nonlinear dynamic systems using self recurrent wavelet
neural networks via adaptive learning rates,” Information Sciences, vol. 177, no. 15, pp.
119
[87] S. S. Haykin, Neural Networks: A Comprehensive Foundation. Prentice Hall, 1999.
[88] M. T. Hagan and M. B. Menhaj, “Training feedforward networks with the marquardt al-
gorithm,” IEEE Transactions on Neural Networks, vol. 5, no. 6, pp. 989–993, November
1994.
“Short-term load forecasting for microgrids based on artificial neural networks,” Energies,
[90] A. Pandey, D. Singh, and S. Sinha, “Intelligent hybrid wavelet models for short-term load
forecasting,” IEEE Transactions on Power Systems, vol. 25, no. 3, pp. 1266–1273, August
2010.
[91] B.-L. Zhang and Z.-Y. Dong, “An adaptive neural-wavelet model for short term load fore-
casting,” Electric Power Systems Research, vol. 59, pp. 121–129, 2001.
[92] N. Amjady and A. Daraeepour, “Mixed price and load forecasting of electricity markets by a
new iterative prediction method,” Electric Power Systems Research, vol. 79, pp. 1329–1336,
2009.
2792 kumeyaay.php
1c578a8751b30.pdf
newsletters/january-2016/emergence-of-the-behind-the-meter-energy-storage-market
tation of grid frequency regulation using behind-the-meter batteries compensating for fast
120
load demand variations,” IEEE Transactions on Power Systems, vol. 32, no. 1, pp. 484 –
demand charge reduction,” National Renewable Energy Laboratory (NREL), January 2015.
[98] R. Weron, “Electricity price forecasting: A review of the state-of-the-art with a look into
the future,” International Journal of Forecasting, vol. 30, no. 4, pp. 1030–1081, December
2014.
approach using wavelet, firefly algorithm, and fuzzy artmap for day-ahead electricity price
forecasting,” IEEE Transaction on Power System, vol. 28, no. 2, pp. 1041–1051, May 2013.
[100] C. Wan, Z. Xu, Y. Wang, Z. Y. Dong, and K. P. Wong, “A hybrid approach for probabilistic
forecasting of electricity price,” IEEE Transactions on Smart Grid, vol. 5, no. 1, pp. 463–
energy forecasting: Global energy forecasting competition 2014 and beyond,” International
[102] C. Wan, M. Niu, Y. Song, and Z. Xu, “Pareto optimal prediction intervals of electricity
price,” IEEE Transactions on Power Systems, vol. 32, no. 1, pp. 817–819, January 2017.
future electricity market prices,” IEEE Transaction on Power System, vol. 26, no. 1, pp.
[104] D. Huang, H. Zareipour, W. D. Rosehart, and N. Amjady, “Data mining for electricity
121
on Smart Grid, vol. 3, no. 2, pp. 808–817, June 2012.
[105] H. C. Wu, S. C. Chan, K. M. Tsui, and Y. Hou, “A new recursive dynamic factor analysis for
point and interval forecast of electricity price,” IEEE Transaction on Power System, vol. 28,
[106] N. Amjady and F. Keynia, “A new prediction strategy for price spike forecasting of day-
ahead electricity markets,” Applied Soft Computing, vol. 11, no. 6, pp. 4246–4256, April
2011.
[107] T. Christensen, A. Hurn, and K. Lindsay, “Forecasting spikes in electricity prices,” Interna-
tional Journal of Forecasting, vol. 28, no. 2, pp. 400 – 411, June 2012.
[108] J. H. Zhao, Z. Y. Dong, X. Li, and K. P. Wong, “A framework for electricity price spike
analysis with advanced data mining methods,” IEEE Transaction on Power System, vol. 22,
[109] X. Lu, Z. Y. Dong, and X. Li, “Electricity market price spike forecast with data mining
techniques,” Electric Power Systems Research, vol. 73, no. 1, pp. 19–29, January 2005.
day-ahead electricity markets using decision trees,” 12th International Conference on the
markets: A review and evaluation,” International Journal of Electrical Power & Energy
[112] K. Maciejowska and R. Weron, “Short- and mid-term forecasting of baseload electricity
prices in the u.k.: The impact of intra-day price relationships and market fundamentals,”
IEEE Transactions on Power Systems, vol. 31, no. 2, pp. 994–1005, March 2016.
122
[113] Independent Electric System Operator (IESO) - Pricing. [Online]. Avail-
able: http://www.ieso.ca/Pages/Ontario’s-Power-System/Electricity-Pricing-in-Ontario/
How-Wholesale-Electricity-Price-is-Determined.aspx
http://www.caiso.com/market/Pages/MarketProcesses.aspx
[115] Electric Reliability Council of Texas (ERCOT) - pricing. [Online]. Available: http:
//www.ercot.com/mktinfo/prices
[116] New York Independent System Operator (NYISO) - Pricing. [Online]. Available:
http://www.nyiso.com
[117] Alberta Electric System Operator (AESO) - Pricing. [Online]. Available: http:
//www.aeso.ca/rulesprocedures/18592.html
[121] M. Kazemi, H. Zareipour, M. Ehsan, and W. D. Rosehart, “A robust linear approach for of-
fering strategy of a hybrid electric energy company,” IEEE Transactions on Power Systems,
123
[122] S. Shafiee, P. Zamani-Dehkordi, H. Zareipour, and A. M. knight, “Economic assessment of
a price-maker energy storage facility in the alberta electricity market,” Energy, vol. 111, no.
537-547, 2016.
microgrid energy management system based on the rolling horizon strategy,” IEEE Trans-
timization framework for the simultaneous energy supply and demand planning in micro-
[125] J. Han and M. Kamber, Data Mining: Concepts and Techniques. Morgan Kaufmann
Publishers, 2006.
regression of relevance vector machines for electricity pricing signal forecasting in smart
grids,” IEEE Transactions on Smart Grid, vol. 6, no. 6, pp. 2997–3005, November 2015.
[128] C. P. Rodriguez and G. J. Anders, “Energy price forecasting in the ontario competitive power
system market,” IEEE Transaction on Power System, vol. 19, no. 1, pp. 366–374, February
2004.
domain market information to forecast ontario’s wholesale electricity prices,” IEEE Trans-
action on Power System, vol. 21, no. 4, pp. 1707–1717, November 2006.
Pages/Power-Data/default.aspx
124
[131] A. H. Mohsenian-Rad and A. Leon-Garcia, “Optimal residential load control with price
[132] G. He, Q. Chen, C. Kang, P. Pinson, and Q. Xia, “Optimal bidding strategy of battery
storage in power markets considering performance-based regulation and battery cycle life,”
IEEE Transactions on Smart Grid, vol. 7, no. 5, pp. 2359–2367, September 2016.
economic performance of microgrids,” IEEE Transactions on Smart Grid, no. Under review,
2017.
[134] W. Su, J. Wang, and J. Roh, “Stochastic energy scheduling in microgrids with intermittent
renewable energy resources,” IEEE Transactions on Smart Grid, vol. 5, no. 4, pp. 1876–
[135] C. Ju, P. Wang, L. Goel, and Y. Xu, “A two-layer energy management system for microgrids
with hybrid energy storage considering degradation costs,” IEEE Transactions on Smart
Grid, 2017.
[136] S.-J. Ahn, S.-R. Nam, J.-H. Choi, and S.-I. Moon, “Power scheduling of distributed genera-
tors for economic and stable operation of a microgrid,” IEEE Transactions on Smart Grid,
[137] A. Parisio, E. Rikos, and L. Glielmo, “A model predictive control approach to microgrid
operation optimization,” IEEE Transactions on Control Systems Technology, vol. 22, no. 5,
[138] D. Olivares, C. Cañizares, and M. Kazerani, “A centralized energy management system for
isolated microgrids,” IEEE Transactions on Smart Grid, vol. 5, no. 4, pp. 1864–1875, July
2014.
125
[139] A. Khodaie, “Microgrid optimal scheduling with multi-period islanding constraints,” IEEE
Transaction on Power System, vol. 29, no. 3, pp. 1383–1392, May 2014.
smart meters data and temperature dependent thermal load modeling,” IEEE Transactions
[141] W. Shi, N. Li, C.-C. Chu, and R. Gadh, “Real-time energy management in microgrids,”
IEEE Transactions on Smart Grid, vol. 8, no. 1, pp. 228–238, January 2017.
[142] T. Niknam, F. Golestaneh, and A. Malekpour, “Probabilistic energy and operation manage-
devices based on point estimate method and self-adaptive gravitational search algorithm,”
[143] K. P. Kumar and B. Saravanan, “Recent techniques to model uncertainties in power gener-
ation from renewable energy sources and loads in microgrids – a review,” Renewable and
[144] J. Che and J. Wang, “Short-term electricity prices forecasting based on support vector re-
gression and auto-regressive integrated moving average modeling,” Energy Conversion and
Management, vol. 51, no. 10, pp. 1911 – 1917, October 2010.
[145] R. Wang, P. Wang, and G. Xiao, “A robust optimization approach for energy generation
scheduling in microgrids,” Energy Conversion and Management, vol. 106, pp. 597–607,
October 2015.
[146] R. Gupta and N. K. Gupta, “A robust optimization based approach for microgrid operation
in deregulated environment,” Energy Conversion and Management, vol. 93, pp. 121–131,
January 2015.
126
[147] C. Zhang, Y. Xu, Z. Y. Dong, and J. Ma, “Robust operation of microgrids via two-stage
coordinated energy storage and direct load control,” IEEE Transactions on Power Systems,
[148] E. Kuznetsova, C. Ruiz, Y.-F. Li, and E. Zio, “Analysis of robust optimization for decen-
tralized microgrid energy management under uncertainty,” Electrical Power and Energy
[149] S. Soman, H. Zareipour, O. Malik, and P. Mandal, “A review of wind power and wind speed
forecasting methods with different time horizons,” North American Power Symposium, Ar-
[150] N. Amjady and F. Keynia, “Day-ahead price forecasting of electricity markets by mutual
[151] Alberta Electric System Operator (AESO) - Energy Trading System. [Online]. Available:
http://ets.aeso.ca
Pages/Operations-WorkPlace-Centre/Bearspaw-OWC/Wind-need-for-small-turbine.aspx
[154] J. Nowotarski and R. Weron, “Computing electricity spot price prediction intervals using
quantile regression and forecast averaging,” Computational Statistics, vol. 30, no. 3, pp.
[155] B. Liu, J. Nowotarski, T. Hong, and R. Weron, “Probabilistic load forecasting via quantile
regression averaging on sister forecasts,” IEEE Transactions on Smart Grid, vol. 8, no. 2,
127
[156] M. Kazemi, H. Zareipour, N. Amjady, W. D. Rosehart, and M. Ehsan, “Operation scheduling
of battery storage systems in joint energy and ancillary services markets,” IEEE Transac-
[157] AESO - Annual Market Statistics Report 2017. [Online]. Available: https://www.aeso.ca/
market/market-and-system-reporting/annual-market-statistic-reports/
[158] R. Velo, P. Lopez, and F. Maseda, “Wind speed estimation using multilayer perceptron,”
128
Appendix A
In this section, the performance of the proposed wind power forecasting methodology, presented in
Chapter 2, is evaluated at a wind farm level. The objective is to investigate the forecasting accuracy
as the the size of wind power generation is reduced from a system-level to a wind farm level. To
do so, the historical wind power generation data of Taber wind farm located in Alberta, Canada is
used. For the sake of a fair comparison, the same test months presented in Table 2.4 are considered.
As shown in Table A.1, the forecasting errors in terms of both nMAE and nRMSE are higher
at a wind farm level than those for an aggregated wind generation. However, this table also shows
demonstrates that forecasting errors for the wind farm level are lower in two months of June and
July. Moreover, the forecasting errors seem to be noticeably higher in winter months. Overall, this
numerical experiment shows that the proposed forecasting methodology performs satisfactorily for
The higher average of the forecasting error at a wind farm level is because the aggregate wind
generation from different geographically dispersed wind farms is expected to be slightly smoother
129
than that for an individual wind farm. This could become significant if wind farms are placed in
130
Appendix B
Benchmark models
Here, three forecasting methods used as benchmarks in Table 2.1, i.e., Persistence, MLP and RBF,
Persistence is the simplest forecasting model as it assumes the forecast value at a certain time
in the future is the same as the last measured value. Therefore, this naive method is useful for
very short-term prediction purposes, while its forecast error significantly increases as the forecast
horizon increases.
Multi-Layer Perceptron (MLP) and Radial Basis Function (RBF) are famous Artificial Neu-
ral Networks (ANNs), which have successfully been applied to forecasting problems in power
systems. MLP is a feed-forward artificial neural network model capable of creating a mapping
function between sets of input data and a set of corresponding outputs. It consists of multiple
layers of nodes, so-called neurons in neural networks, and each layer is connected to the others.
Neurons are the processing elements of the network composed of activation functions, such as
linear, logarithmic sigmoid and tangent hyperbolic sigmoid functions. The last one is used as the
activation function of neurons in this paper, as it results in better performance. Weights connecting
the neurons of layers in the network and, biases connected to each neuron are free parameters that
In RBF neural networks, radial basis functions are used as the activation functions of neurons
in the hidden layer and linear functions for the neurons of the output layer. In this network, the
vector distance between the input weights vector and the input vector is calculated (using the dot
product of the two) and then multiplied by the bias. Afterwards, the result is transferred to the
radial basis function. The output of the first layer is then transferred to the second layer in which a
linear function is the activation function of the output neuron. More details about MLP and RBF
131
neural networks can be found in [41, 57, 158].
132
Appendix C
The task of the forecasting engines is to learn the mapping function between a specified set of
input/output pairs {(X1 , t1 ), (X2 , t2 ), ..., (XQ , tQ )}, known as training samples. Q indicates the
number of training samples. Xq and tq are the q th input vector and the corresponding target output
of the forecasting model, respectively. Mean squared error (MSE) is usually considered to be the
where, yq is the output of the forecasting engine when Xq is fed as the input of the forecasting
as follows:
where P is the vector of the free parameters according to (3.6). k represents the iteration number,
and I is the identity matrix. J is the Jacobian matrix composed of the first derivatives of the
network errors with respect to all its free parameters and J | J is the Hessian matrix. Considering
(A.1) as the performance function that should be minimized, the gradient of (A.1) can be shown as
J | e.
The main modification of the LM algorithm with respect to Newton’s method is the parameter
µ, such that the algorithm becomes the Newton’s method if µ is zero in (A.2). When µ is large,
the LM algorithm tends to gradient descent with a small step size, i.e., (1/µ), while for small µ the
LM algorithm tends to Newton’s method. Since the Newton’s method is faster and more accurate
than the gradient descent, the aim is to shift toward Newton’s method as quickly as possible. Thus,
133
µ is divided by a factor β (β > 1) after each successful step, i.e. reduction in the MSE given in
(A.1). On the contrary, µ is multiplied by the factor β when a tentative step increases the MSE.
Therefore, the MSE is always reduced at each iteration of the algorithm [88]. The initial value
for µ is usually considered 0.01 and β is usually set as 10. For further details regarding the LM
training algorithm, the interested reader can refer to [88]. The implementation of the LM learning
Since computation of the Jacobian matrix is the most important part of the LM algorithm, it is
required to determine the first derivative of the network errors with respect to each free parameter
of (3.6) in the SRWNN, i.e., vj , wi , ai , bi , θi,j , and g. The elements in the Jacobian matrix are
∂e ∂(t − y)
= = −xj , j = 1, 2, ..., M (A.3)
∂vj ∂vj
∂e ∂(t − y)
= = −Ψi , i = 1, 2, ..., N (A.4)
∂wi ∂wi
∂e ∂(t − y) ∂Ψi
= = −wi , i = 1, 2, ..., N (A.5)
∂ai ∂ai ∂ai
M
" M
#
∂Ψi X dψi,j Y
= ψ(ri,l ) , (A.6)
∂ai j=1
dai l=1,l6=j
dψi,j −ri,j 0
= ψ (ri,j ), (A.7)
dai ai
∂e ∂(t − y) ∂Ψi
= = −wi , i = 1, 2, ..., N (A.8)
∂bi ∂bi ∂bi
M
" M
#
∂Ψi X dψi,j Y
= ψ(ri,l ) , (A.9)
∂bi j=1
dbi l=1,l6=j
dψi,j −1 0
= ψ (ri,j ), (A.10)
dbi ai
134
∂e ∂Ψi
= −wi , i = 1, ..., N , j = 1, ..., M (A.11)
∂θi,j ∂θi,j
M
∂Ψi ψi,j z −1 0 Y
= ψ (ri,j ) ψ(ri,l ) (A.12)
∂θi,j ai l=1,l6=j
∂e ∂(t − y)
= = −1 (A.13)
∂g ∂g
Therefore, the Jacobian matrix with the size of Q × N P can be computed using (A.3) to (A.13)
and all free parameters of the SRWNN are updated using (A.2). The procedure of the LM learning
1. Set the iteration number to 1, i.e., (k = 1). Randomly initialize the free parameters
vj , wi , ai , bi , θi,j and g of the forecasting engine within their allowable ranges for
2. Present all xq s and compute the corresponding SRWNN outputs yq using (3.5).
Moreover, compute the corresponding errors eq and the performance index MSE
using (A.1).
4. Update the free parameters of the forecasting engine using (A.2) to obtain Pk+1 .
5. Compute the performance index MSE using Pk+1 . If the new MSE is smaller than
the one computed in step 2, reduce the parameter µ by the factor β, and save Pk+1 .
termination criterion can be the maximum number of iterations. However, the early
of the training algorithm in this paper as it can monitor the prediction ability of
135
SRWNN forecast engine for the unseen samples and terminate the training process
136
Appendix D
This thesis implements the feature selection technique proposed in [52] for electricity price pre-
diction. Constructed on the information theoretic criterion of mutual information, this method
selects the most informative input features for the forecast process by filtering out the irrelevant
and redundant candidate features through two stages. In this work, a vector of candidate features is
formed including market clearing prices, hourly Ontario electric prices, pre-dispatch prices, supply
In the first stage, called irrelevancy filter, mutual information between each candidate feature,
i.e. fi (t), and the target variable is calculated. The higher value of mutual information for fi (t)
means the more common information content of this feature with the target variable. Defining a
relevancy threshold denoted by TRel , the candidate inputs with calculated mutual information value
greater than TRel are considered as the relevant features of the forecast process. These features are
retained for the second stage, while other candidate features whose mutual information values are
lower than TRel are considered as irrelevant features, which are filtered out.
In the second stage, redundant features among the candidate features selected by the relevancy
filter are detected and filtered out. This stage is called redundancy filter. Two selected candidates
from the first stage, e.g., fk (t) and fl (t), with high value of mutual information have more common
information, meaning high level of redundancy. Therefore, the redundancy of each selected feature
fk (t) with the other candidate inputs is first calculated. Defining a redundancy threshold denoted
by TRed , if the measured redundancy becomes greater than TRed , fk (t) is then considered as a
redundant candidate feature. Thus, between this candidate and the feature that has the maximum
redundancy with fk (t), one with lower relevancy should be filtered out [13].
The selected candidate features in redundancy filter are considered as the inputs of the price
137
forecasting model. It is noted that cross validation technique is used for fine-tuning the values of
the thresholds TRel and TRed . Since this method is not the focus of this paper, the interested reader
138
Appendix E
I, Dr. Hamidreza Zareipour, hereby grant permission to Mr. Hamed Chitsaz to reuse the below
three articles in his thesis titled “Developing Energy Forecasting Tools in Power Systems:
Application to Microgrids”.
1. H. Chitsaz, N. Amjady, and H. Zareipour, “Wind power forecast using wavelet neural
network trained by improved clonal selection algorithm,” Energy Conversion and
Management, vol. 89, pp. 588–598, January 2015.
I agree to the terms outlined in the University of Calgary Non-Exclusive Distribution License. I
am aware that all University of Calgary Theses are also achieved by the Library and Archives
Canada (LAC) and the University of Calgary Theses may be submitted to ProQuest.
Date:
Signature:
139
To Whom It May Concern:
I, Dr. David Wood, hereby grant permission to Mr. Hamed Chitsaz to reuse the below article in
his thesis titled “Developing Energy Forecasting Tools in Power Systems: Application to
Microgrids”.
I agree to the terms outlined in the University of Calgary Non-Exclusive Distribution License. I
am aware that all University of Calgary Theses are also achieved by the Library and Archives
Canada (LAC) and the University of Calgary Theses may be submitted to ProQuest.
Date:
Signature:
140
To Whom It May Concern:
I, Dr. Nima Amjady, hereby grant permission to Mr. Hamed Chitsaz to reuse the below two articles
in his thesis titled “Developing Energy Forecasting Tools in Power Systems: Application to
Microgrids”.
1. H. Chitsaz, N. Amjady, and H. Zareipour, “Wind power forecast using wavelet neural
network trained by improved clonal selection algorithm,” Energy Conversion and
Management, vol. 89, pp. 588–598, January 2015.
I agree to the terms outlined in the University of Calgary Non-Exclusive Distribution License. I
am aware that all University of Calgary Theses are also achieved by the Library and Archives
Canada (LAC) and the University of Calgary Theses may be submitted to ProQuest.
Date:
Signature:
141
To Whom It May Concern:
I, Dr. Hamid Shakerardakani, hereby grant permission to Mr. Hamed Chitsaz to reuse the below
article in his thesis titled “Developing Energy Forecasting Tools in Power Systems: Application
to Microgrids”.
I agree to the terms outlined in the University of Calgary Non-Exclusive Distribution License. I
am aware that all University of Calgary Theses are also achieved by the Library and Archives
Canada (LAC) and the University of Calgary Theses may be submitted to ProQuest.
Date:
Signature:
142
To Whom It May Concern:
I, Dr. Payam Zamani-Dehkordi, hereby grant permission to Mr. Hamed Chitsaz to reuse the below
article in his thesis titled “Developing Energy Forecasting Tools in Power Systems: Application
to Microgrids”.
I agree to the terms outlined in the University of Calgary Non-Exclusive Distribution License. I
am aware that all University of Calgary Theses are also achieved by the Library and Archives
Canada (LAC) and the University of Calgary Theses may be submitted to ProQuest.
Date:
Signature:
143
To Whom It May Concern:
I, Dr. Palak Parikh, hereby grant permission to Mr. Hamed Chitsaz to reuse the below article in
his thesis titled “Developing Energy Forecasting Tools in Power Systems: Application to
Microgrids”.
I agree to the terms outlined in the University of Calgary Non-Exclusive Distribution License. I
am aware that all University of Calgary Theses are also achieved by the Library and Archives
Canada (LAC) and the University of Calgary Theses may be submitted to ProQuest.
Date:
Signature:
144
Rightslink® by Copyright Clearance Center 2017-11-29, 9)03 PM
Please note that, as the author of this Elsevier article, you retain the right to include it in a thesis or
dissertation, provided it is not published commercially. Permission is not required, but please ensure
that you reference the journal as the original source. For more information on this and on your other
retained rights, please visit: https://www.elsevier.com/about/our-business/policies/copyright#Author-
rights
Copyright © 2017 Copyright Clearance Center, Inc. All Rights Reserved. Privacy statement. Terms and Conditions.
Comments? We would like to hear from you. E-mail us at customercare@copyright.com
https://s100.copyright.com/AppDispatchServlet Page 1 of 1
145
Rightslink® by Copyright Clearance Center 2017-11-29, 9)04 PM
Please note that, as the author of this Elsevier article, you retain the right to include it in a thesis or
dissertation, provided it is not published commercially. Permission is not required, but please ensure
that you reference the journal as the original source. For more information on this and on your other
retained rights, please visit: https://www.elsevier.com/about/our-business/policies/copyright#Author-
rights
Copyright © 2017 Copyright Clearance Center, Inc. All Rights Reserved. Privacy statement. Terms and Conditions.
Comments? We would like to hear from you. E-mail us at customercare@copyright.com
https://s100.copyright.com/AppDispatchServlet Page 1 of 1
146
Rightslink® by Copyright Clearance Center 2017-11-24, 2)41 PM
The IEEE does not require individuals working on a thesis to obtain a formal reuse license, however,
you may print out this statement to be used as a permission grant:
Requirements to be followed when using any portion (e.g., figure, graph, table, or textual material) of an IEEE
copyrighted paper in a thesis:
1) In the case of textual material (e.g., using short quotes or referring to the work within these papers) users
must give full credit to the original source (author, paper, publication) followed by the IEEE copyright line © 2011
IEEE.
2) In the case of illustrations or tabular material, we require that the copyright line © [Year of original
publication] IEEE appear prominently with each reprinted figure and/or table.
3) If a substantial portion of the original paper is to be used, and if you are not the senior author, also obtain the
senior author's approval.
1) The following IEEE copyright/ credit notice should be placed prominently in the references: © [year of original
publication] IEEE. Reprinted, with permission, from [author names, paper title, IEEE publication title, and
month/year of publication]
2) Only the accepted version of an IEEE copyrighted paper can be used when posting the paper or your thesis
on-line.
3) In placing the thesis on the author's university website, please display the following message in a prominent
place on the website: In reference to IEEE copyrighted material which is used with permission in this thesis, the
IEEE does not endorse any of [university/educational entity's name goes here]'s products or services. Internal or
personal use of this material is permitted. If interested in reprinting/republishing IEEE copyrighted material for
advertising or promotional purposes or for creating new collective works for resale or redistribution, please go to
http://www.ieee.org/publications_standards/publications/rights/rights_link.html to learn how to obtain a License
from RightsLink.
If applicable, University Microfilms and/or ProQuest Library, or the Archives of Canada may supply single copies
of the dissertation.
Copyright © 2017 Copyright Clearance Center, Inc. All Rights Reserved. Privacy statement. Terms and Conditions.
Comments? We would like to hear from you. E-mail us at customercare@copyright.com
https://s100.copyright.com/AppDispatchServlet#formTop Page 1 of 2
147