A Fast Technique For Smart Home Management ADP With Temporal Difference Learning

IEEE TRANSACTIONS ON SMART GRID, VOL. 9, NO.
4, JULY 2018 3291
A Fast Technique for Smart Home Management:

ADP With Temporal Difference Learning
Chanaka Keerthisinghe, Student Member, IEEE, Gregor Verbič, Senior Member, IEEE,
and Archie C. Chapman, Member, IEEE
Abstract—This paper presents a computationally efficient Ck Cost or reward at time-step k

smart home energy management system (SHEMS) using an π Policy
approximate dynamic programming (ADP) approach with tem- r Realisation of random information
poral difference learning for scheduling distributed energy
resources. This approach improves the performance of an R Total number of random realisations
SHEMS by incorporating stochastic energy consumption and PV α Stepsize
generation models over a horizon of several days, using only the Vkπ Expected future cost/reward for following π from k
computational power of existing smart meters. In this paper, we si,max Maximum state of a controllable device
consider a PV-storage (thermal and battery) system, however, si,min Minimum state of a controllable device
our method can extend to multiple controllable devices with-
out the exponential growth in computation that other methods μi Efficiency of a controllable device [%]
such as dynamic programming (DP) and stochastic mixed-integer li Losses of a controllable device per time-step
linear programming (MILP) suffer from. Specifically, probabil- a Index of a segment in the VFA
ity distributions associated with the PV output and demand are Ak Total number of segments at time-step k
kernel estimated from empirical data collected during the Smart zka Capacity of a segment
Grid Smart City project in NSW, Australia. Our results show
that ADP computes a solution much faster than both DP and v̄ka Slope of a segment
stochastic MILP, and provides only a slight reduction in quality n A particular scenario
compared to the optimal DP solution. In addition, incorporating N Total number of scenarios in stochastic MILP.
a thermal energy storage unit using the proposed ADP-based
SHEMS reduces the daily electricity cost by up to 26.3% with-
out a noticeable increase in the computational burden. Moreover, I. I NTRODUCTION
ADP with a two-day decision horizon reduces the average yearly
UTURE electrical power grids will be based on a
electricity cost by a 4.6% over a daily DP method, yet requires
less than half of the computational effort. F “demand-following-supply” paradigm, because of the
increasing penetration of intermittent renewable energy
Index Terms—Demand response, smart home energy man-
agement, distributed energy resources, approximate dynamic sources and advances in technology enabling non-dispatchable
programming, dynamic programming, stochastic mixed-integer loads. A cornerstone of achieving this is demand side
linear programming, value function approximation, temporal management, which can be roughly divided into demand
difference learning. response (DR) and direct load control. This research focuses
N OMENCLATURE on DR, which is used to reduce electricity costs on one side
and provide system services on the other. In particular, we
k Time-step
focus on residential customers, because roughly 30% of the
K Total number of time-steps
energy use in Australia is comprised of residential loads [1]
i Index of controllable devices
and their diurnal patterns drive daily and seasonal peak
I Total number of controllable devices
loads.
j Index of stochastic variables
Currently, DR is implemented using small-scale voluntary
J Total number of stochastic variables
{i,j} programs and fixed pricing strategies. The main drawback of
sk State of i or j at time-step k
using the same pricing strategy for all the customers is the
xki Decision of controllable device i at time-step k possibility of inducing unwanted peaks in the demand curve
j
ωk Variation of stochastic variable j at time-step k as customers respond to prices. Also, it is not possible for res-
idential and small commercial users to directly participate in
Manuscript received February 25, 2016; revised May 21, 2016 and
August 10, 2016; accepted November 9, 2016. Date of publication wholesale energy or ancillary services markets because of the
November 16, 2016; date of current version June 19, 2018. Paper no. regulatory regimes and computational requirements. As such,
TSG-00250-2016. DR revolves around the interaction between an aggregator and
The authors are with the School of Electrical and Information
Engineering, University of Sydney, Sydney, NSW 2006, Australia customers, as shown in Fig. 1. In many cases, the retailer
(e-mail: chanaka.keerthisinghe@sydney.edu.au; gregor.verbic@sydney.edu.au; acts as the aggregator. The aggregator’s task is to construct
archie.chapman@sydney.edu.au). a scheme that coordinates, schedules or otherwise controls
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. part of a participating user load via interaction with its smart
Digital Object Identifier 10.1109/TSG.2016.2629470 home energy management system (SHEMS) [2]. In the context
1949-3053 c 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
3292 IEEE TRANSACTIONS ON SMART GRID, VOL. 9, NO. 4, JULY 2018
SHEMSs that consider stochastic variables such as variations

in PV output, electrical and thermal demand using appropri-
ate probability distributions yield better quality schedules than
those obtained from a deterministic model. Moreover, in [14]
we showed that extending the optimisation horizon beyond
one day is economically beneficial as the SHEMS can exploit
inter-daily variations in consumption and solar-insolation pat-
terns. Given this, stochastic mixed-integer linear programming
(MILP) [13], [15]–[18], particle swarm optimisation [19], [20]
and dynamic programming (DP) [21], [22] have previously
been proposed to solve the SHEMS problem. In particular,
DP accommodates full non-linear controllable device models
and produces close-to-optimal? solutions when finely discre-
tised. However, DP is infeasible if the optimisation horizon
is extended or when multiple controllable devices are added
to the SHEMS due to the exponential growth in the state and
action spaces [23].
Given these insights, in [24] we reported our prelimi-
Fig. 1. Customers, aggregator and the wholesale electricity market. nary work towards implementing a computationally efficient
SHEMS using approximate dynamic programming (ADP) with
temporal difference learning, an approach that has successfully
of DR, the aggregator sends control signals in the form of been applied for the control of grid level storage in [25]. In
electricity price signals to the SHEMS. The SHEMS then this proposed approach, we obtain policies from value func-
schedules and coordinates customer’s? distributed generation approximations (VFAs).1 Our choice to use VFAs is based
tion (DG), storage and flexible loads, collectively known as on the observation that they work best in time-dependent
distributed energy resources (DERs), to minimise energy costs problems with regular load, energy and price patterns, rela-
while maintaining a suitable level of comfort. In particular, this tively high noise and less accurate forecasts (errors grow with
research assumes that the exact electricity price signals are the horizon) [26]. Other ADP methods are better suited to
available from a DR aggregator/retailer in the form of time- applications with different characteristics [27]–[33].
of-use pricing (ToUP). ToUP is chosen as it is prevalent in Building on this, the main contributions of this paper are
Australia [3], [4]. the development of a computationally efficient SHEMS using
The DERs in the smart homes considered in this paper ADP with temporal difference learning, which can incorpo-
comprise a PV unit, a battery and a thermal energy storage rate multiple controllable devices, and a demonstration of its
(TES) unit (i.e., electric water heater). In the existing litera- practical implementation using empirical data. Specifically, the
ture, a range of DER have been used to achieve DR [5]–[8]. proposed ADP method enables us to:
However, our choice stems from Australia’s increasing pene- 1) incorporate stochastic input variables without a notice-
tration of rooftop PV and battery storage systems in response able increase in the computational burden;
to rising electricity costs, decreasing technology costs and 2) extend the decision horizon with less computational bur-
existing fleet of hot water storage devices [9], [10]. According den to consider uncertainties over several days, which
to AEMO, the payback period for residential PV-storage sys- results in significant financial benefits;
tems is already below 15 years in South Australia, with the 3) enable integration of multiple controllable devices with
other states to follow suit in less than a decade [11]. Similarly, less computational burden than DP;
residential users with PV-storage systems in the USA have 4) integrate the SHEMS into an existing smart meter as it
been forecast to reach grid parity within the next decade [12]. uses less memory compared to existing methods.
The SHEMS schedules the DER in a such a way that electri- In order to show the performance of ADP, we use it with a
cal power drawn from the grid is minimised, especially during two-day decision horizon over three years using a rolling hori-
peak periods. Given the recent drop in PV costs, feed-in tar- zon approach where the expected electricity cost of each day is
iffs (FiTs) in Australia are significantly less than the retail minimised considering the uncertainties over the next day. The
tariffs paid by the households, and it is anticipated that this daily performance is benchmarked against DP and stochastic
may happen in other parts of the world too. As such, selling MILP by applying them to three different scenarios with dif-
power back to the electrical grid is uneconomical, and house- ferent electrical and thermal demand patterns and PV outputs.
holds have a strong incentive to self-consume as much locally The three year evaluation is benchmarked against a daily DP
generated power as possible. Therefore, when PV generation approach by applying it to 10 smart homes. Moreover, the PV
is higher than electrical demand, an effective SHEMS should output and demand profiles are from data [34] collected dur-
either store the surplus energy or consume it using controllable ing the Smart Grid Smart City project by Ausgrid and their
loads.
The underlying optimisation problem here is a sequential 1 A value function describes the expected future cost of following a policy
decision making process under uncertainty. As shown in [13], from a given state.
KEERTHISINGHE et al.: A FAST TECHNIQUE FOR SMART HOME MANAGEMENT: ADP WITH TEMPORAL DIFFERENCE LEARNING 3293
consortium partners in NSW, Australia, which investigated the

benefits and costs of implementing a range of smart grid tech-
nologies in Australian households [35]. Specifically, SHEMSs
estimate probability distributions associated with the PV out-
put and electrical demand from empirical data using kernel
regression, which are more realistic than assuming parametric
approaches. In addition, throughout the paper we demonstrate
the benefits of residential PV-storage systems with a SHEMS.
The paper is structured as follows: Section II states the
Fig. 2. Illustration of electrical and thermal energy flows in a smart home,
stochastic energy management problem and the existing solu- and the state, decision and random variables use to formulate the problem.
tion techniques. This is followed by a description of the
ADP formulation in Section III. The stochastic variable models
are described in Section IV. Section V presents the simulation
B. Instantiation
results and the discussion.
The objective of the SHEMS is to minimise energy costs
over a decision horizon. The sequential stochastic optimisation
II. S MART H OME E NERGY M ANAGEMENT P ROBLEM
problem is solved before the start of each day, using either a
In this section, we describe the general formulation of daily or a two day decision horizon. In this paper we consider a
the sequential stochastic optimisation problem, formulate our PV unit, battery and a hot water system (TES unit), as depicted
stochastic SHEMS problem as a sequential stochastic opti- in Fig. 2. We use a single inverter for both the battery and the
misation problem, and present a short description about the PV, which is becoming popular in Australia.
stochastic MILP and DP use to solve this problem. In order to optimise performance, a SHEMS needs to incor-
porate the variations in PV output, electrical and thermal
A. General Sequential Stochastic Optimisation Problem demand of the household. Given this, we model stochastic
A sequential stochastic optimisation problem comprises: variables using their mean as state variables and variation as
• A sequence of time-steps, K = {1 . . . k . . . K}, where k random variables. This enables us to use an algorithmic strat-
and K denote a particular time-step and the total number egy that separates the transition function into a deterministic
of time-steps in the decision horizon, respectively. term, using the mean, and a random term, using variation
• A set of controllable devices, I = {1 . . . i . . . I}, where (discussed in Section III). In some cases, electricity prices
each i is represented using: may be considered as stochastic variables. However, in this
• A state variable, sik ∈ S. paper, we assume that the exact electricity prices are available
• A decision variable, xki ∈ X , which is a control before the start of the decision horizon from a residential DR
action. aggregator/retailer (i.e., in the form of ToUP).
• A set of non-controllable inputs, J = {1 . . . j . . . J}, where In more detail, we cast our SHEMS problem as the sequen-
each j is represented using: tial stochastic optimisation formulation in Section II-A as
j follows: The daily decision horizon is a 24 hour period,
• A state variable, sk ∈ S.
j divided into K = 48 time-steps with a 30 minutes resolu-
• A random variable, ωk ∈ , capturing exogenous
tion. We do this similarly for the two-day decision horizon.
information or perturbations.
Here 30 minutes time resolution is chosen to match with typi-
Given this, we let: sk = [ sik . . . sIk , sjk . . . sJk ]T , xk =
cal dispatch time lines because the PV and demand data from
[ xki . . . xkI ]T , and ωk = [ ωkj . . . ωkJ ]T . Note that the state
the Smart Grid Smart City project [34] are only available at
variables contain the information that is necessary and
30 minutes intervals. If required, the proposed ADP approach
sufficient to make the decisions and compute costs,
can increase the time resolution with less computational bur-
rewards and transitions.
den compared to existing methods. The controllable devices
• Constraints for state and decision variables.
are the battery and the TES, while the non-controllable inputs
• Transition functions, sk+1 = sM (sk , xk , ωk ), describing
are the PV output and the electrical and thermal demand.
the evolution of states from k to k + 1, where sM (.) is
As depicted in Fig. 2, for each time-step, k, in the deci-
the system model that consists of controllable device i’s
sion horizon, state variables are used to represent: the battery
operational constraints such as power flow limits, effi- pv
SOC, sbk ; TES SOC, stk ; mean PV output, sk ; mean electrical
ciencies and losses. Note that the transition functions are
demand, sd,e d,t
k ; mean thermal demand, sk ; and electricity tariff,
only required for the controllable devices. p
sk . Control variables consist of: charge and discharge rates of
• An objective function:
K the battery, xkb ; and electric water heater input, xkwh . Random
pv
variables are: the variations in PV output, ωk ; variations in
F=E Ck (sk , xk , ωk ) , (1) thermal demand, ωk ; and variations in electrical demand, ωkd,e .
d,t
k=1
We use empirical data to estimate the probability distributions
where Ck (sk , xk , ωk ) is the contribution (i.e., cost or associated with the uncertain variables using kernel regression,
reward of energy, or a discomfort penalty) incurred at which are more realistic than assuming parametric approaches
time-step k, which accumulates over time. (discussed in detail in Section IV).
The energy balance constraint is given by:

g
sd,e d,e
k + ωk + xk = μ xk + xk ,
wh i i
(2)
pv pv
where: xki = sk + ωk − μb xkb is the inverter power at the DC
side (positive value means power into the inverter); μi is the
efficiency of the inverter (note that the efficiency is 1/μi when
the inverter power is negative); μb is the efficiency of the bat-
tery action corresponding to either charging or discharging;
g
and xk is the electrical grid power. The charge rate of the bat-
tery is constrained by the maximum charge rate xkb+ ≤ γ c and
discharge rate of the battery is constrained by the maximum
discharge rate xkb− ≥ γ d . The electric water heater input should
never exceed the maximum possible electric water heater input
xkwh ≤ γ wh . In order satisfy thermal demand at all time-steps,
we make sure that the TES has enough energy at each time-
t,req Fig. 3. Characteristics of the battery and the inverter.
step, sk to satisfy thermal demand for the next 2 hours.
Therefore, the energy stored in the TES is always within the
limits: where Ck (sk , xk , ωk ) is the cost incurred at a given time-step,
which is given by:
s t,req
≤ stk ≤s t,max
. (3)
p
Ck (sk , xk , ωk ) = sk sd,e
k + ω d,e
k − μi i
xk + xk
wh
. (7)
The energy stored in the battery should be within the limits
sb,min ≤ sbk ≤ sb,max . Note that we don’t use any specific user comfort criteria in
Transition functions govern how the state variables evolve the contribution function. However, we endeavour to supply
over time. The battery SOC, denoted sbk ∈ [sb,min , sb,max
k ], the thermal demand at all time-steps without any user dis-
progresses by: comfort by penalising undesired states of the TES in DP and
ADP, and directly using the constraint (3) in stochastic MILP.
sbk+1 = 1 − lb sbk sbk − xkb- + μb+ xkb+ , (4) The problem is formulated as an optimisation of the expected
contribution because the contribution is generally a random
where lb (sbk ) models the self-discharging process of the bat- variable due to the effect of ωk . In all the SHEMSs, we
tery. The TES SOC is denoted stk ∈ [st,req , st,max ], and evolves obtain the decisions xk = π(sk ) = [xkb , xkwh ], depending on the
k pv d,t p
according to: state variables sk = [sbk , stk , sd,e
k , sk , sk , sk ], and realisations
pv
of random variables ωk = [ωk , ωkd,e , ωkd,t ] at each time-step.
stk+1 = 1 − lt (stk ) stk − sd,t
k − ωk
d,t
+ μwh wh
x k , (5)
C. Solution Techniques
where lt (stk ) models the thermal loss of the TES and μwh is The first method we use is a scenario-based MILP approach,
the efficiency of the electric water heater. In the above, both which we referred to as stochastic MILP in [23]. This tech-
the transition functions are non-linear functions of state. nique requires us to linearise the constraints and transition
The discharge efficiency of the battery and the efficiency functions mentioned in Section II-B and model the problem
of the inverter are non-linear, and the different ways that the as a mathematical programming problem. The second method
stochastic MILP, DP and ADP approaches represent them we use is DP, in which we model our problem as a Markov
are illustrated in Fig. 3. These indicate that DP and ADP decision process (MDP). This method enables us to incor-
can directly incorporate non-linear characteristics, while lin- porate all the non-linear constraints and transition functions
ear approximations have to be made with stochastic MILP. with no additional computational burden over using linear con-
For all the implemented SHEMSs the following device char- straints and transition functions. Details of these methods are
acteristics are the same: the charging efficiency of the battery as follows:
is μb+ = 1; the maximum and minimum battery SOC are 1) Stochastic MILP: The deterministic version of the
2 kWh and 10 kWh, respectively; the maximum charge and SHEMS problem can be solved using a MILP approach, which
discharge rates of the battery are 2 kWh; electric water heater optimises a linear objective function subject to linear con-
efficiency is μwh = 0.9 while its maximum possible input is straints with continuous and integer variables [13]. Note that
3 kWh; and TES limit is set to 12 kWh. the transition functions presented in Section II-B are consid-
The optimal policy, π ∗ , is a choice of action for each state ered as constraints in the MILP formulation. Integer variables
π : S → X , that minimises the expected sum of future costs are used to model power flow directions.
over the decision horizon; that is: In order to incorporate stochasticity, a large set of scenar-
K ios are generated by sampling from all combined realisations

π∗
F = min ∗
E Ck (sk , π(sk ), ωk ) , (6) of the stochastic variables mentioned in Section II-B. A larger
π number of scenarios should improve the solutions generated by
k=0
better incorporating the stochastic variables, but this imposes

a greater computational burden. Therefore, heuristic scenario
reduction techniques are employed to obtain a scenario set
of size N, which can be solved within a given time with
reasonable accuracy.
Given this, a scenario-based stochastic MILP formulation
of the problem is described by:
K
N
p,buy g+ p,sell g-
min Pn s j,n sk xk − sk xk , (8)
n=1 k=1
where Pn (s j,n ) is the probability of a particular scenario n

corresponding
N to realizations of stochastic variables s j , subject
to n=1 Pn (s ) = 1.
j,n
For each realized scenario, the optimisation problem is

solved for the whole horizon at once using a standard MILP
solver, so the solution time grows exponentially with the length
of the horizon. Here CPLEX is used, however, all commer-
cial solvers gives similar quality solutions. As such, in the Fig. 4. A deterministic DP example using a battery storage, where expected
future cost is calculated using (10). The instantaneous contributions from the
existing literature, a one day optimisation horizon is typically battery decisions are on the edges of the lines while the expected future cost is
assumed. Moreover, the solutions are of lower quality because below the states. The optimal policy satisfies (6), which is obtained using (10).
of the linear approximations made and the inability to incorpo-
rate all the probability distributions [23]. In response to these
limitations, DP was proposed in [22] to improve the solution while the other two states are penalised with a large cost. This
quality. is an important step that allows us to control the end-of-day
2) Dynamic Programming (DP): The problem in (6) is eas- battery SOC (discussed in Section V). The expected future cost
ily cast as an MDP due to the separable objective function at every possible state is calculated using (10), which is the
and Markov property of the transition functions. Given this, minimum of the combined instantaneous cost that results from
DP solves the MDP form of (6) by computing a value func- the decision that we take and the expected future cost from
tion V π (sk ). This is the expected future cost of following a the state we end up at the next time-step. In Fig. 4, instan-
policy, π , starting in state, sk , and is given by: taneous cost is on the edges of the lines while the expected

future cost is below the states. An optimal policy is extracted
V π (sk ) = P s |sk , π (sk ), ωk C sk , π (sk ), s + V π s . from the value functions by selecting a minimum value action
s ∈S for each state using (10). For example, from sb1 , if we take
(9) the optimal decision to go to sb2 = L then the total combined
An optimal policy, π ∗ , is one that minimises (6), and which cost of 10 consists of a instantaneous cost of 2 and a expected
also satisfies Bellman’s optimality condition: future cost of 8. Even though the expected future cost of 7
∗ from sb2 = M is lower than the expected future cost from sb2 =
∗
Vkπ (sk ) = min
∗
Ck (s k , π (s k )) + E π
Vk+1 s |sk . (10) L, the instantaneous cost that takes us there is 4 so the total
π combined cost is 11. Given this, the expected future cost of
The expression in (10) is typically computed using backward following the optimal policy from sb1 is 10 and at time-step 2
induction, a procedure called value iteration, and then an opti- we will be at sb2 = L.
mal policy is extracted from the value function by selecting There are several reasons to prefer DP over MILP. First, DP
a minimum value action for each state. This is the key func- produces close-to-optimal solutions when the value functions
tional point of difference between DP and stochastic MILP. obtained during the offline planning phase are from finely dis-
DP enable us to plan offline by generating value functions cretised state, action and outcome spaces. Second, in practical
for every time-step. Once we have the value functions, we applications, the SHEMS can make real-time decisions using
can make faster online solutions using (10) (more details are the policy implied by (10). This means that at each time-step,
towards the end of this section). Note that a value function at the optimal decision from the current state can be executed.
a given time-step consists of the expected future cost from all Note that (10) is a simple linear program at each time-step so
the states. This process of mapping states and actions is not is computationally feasible using existing smart meters. This
possible with stochastic MILP. stands in contrast to a stochastic MILP formulation, which
An illustration of a deterministic DP using a simplified would involve solving the entire stochastic MILP program,
model of a battery storage is shown in Fig. 4. At every time- which is computationally difficult even for the offline planning.
step, there are three battery SOC states (i.e., highest, middle, Third, we can always obtain a solution with DP regardless
and lowest) and three possible battery actions that results in of the constraints and the inputs while MILP fails to find a
different instantaneous costs. At the last time-step, k = K, the solution when the constraints are not satisfied. From our expe-
expected future cost from the desired state, sk = M, is zero, rience, MILP fails to find a solution when the end-of-day TES
SOC is fixed, because the energy out of the TES unit can not Algorithm 1 : ADP Using Temporal Difference Learning
be controlled, which is the thermal demand of the household. TD(1)
We can overcome this by either removing the end-of-day TES 1: Initialize V̄k0 , k ∈ K,
constraint or by having a range of values. However, this means 2: Set r = 1 and k = 1,
we end up with a sub optimal level of TES SOC at the end 3: Set s1 .
of the day or require user interaction to adjust the TES SOC, 4: while r ≤ R do
which we should avoid in practical applications. 5: Choose a sample path ωr .
However, the required computation to generate value func- 6: for k = 0, . . . , K do
tions using DP grows exponentially with the size of the state, 7: Solve the deterministic problem (17).
action and outcome spaces. One way of overcoming this prob- 8: for i = 1, . . . , I do
lem is to approximate the value function, while maintaining 9: Find right and left marginal contributions (18).
the benefits of DP. 10: end for
11: if k < K then
III. A PPROXIMATE DYNAMIC P ROGRAMMING (ADP) 12: Find the post decision states (11) and the next
pre decision states (12).
ADP, also known as forward DP, is an algorithmic strategy
13: end if
for approximating a value function, which steps forward in
14: end for
time, compared to backward induction used in value iteration.
15: for k = K, . . . , 0 do
Policies in ADP are extracted from these VFAs [29]. Similar
16: Calculate the marginal values (19).
to DP, ADP operates on an MDP formulation of the prob-
17: Update the estimates of the marginal values (20).
lem, so all the non-liner constraints and transition functions in
18: Update the VFAs using CAVE algorithm.
Section II-B can be incorporated with the same computational
19: Combine value functions of each controllable
burden as modelling linear transition functions and constraints.
device (16)
ADP is an anytime optimisation solution technique2 so we
20: end for
always obtain a solution regardless of the constraints and
21: r = r + 1.
the inputs. In this instance, the problem is formulated as a
22: end while
maximisation problem for convenience.
23: Return the value function approximations V̄kR ∀k.
A. Policy-Based Value Function Approximation

VFAs are obtained iteratively, and here the focus is on
approximating the value function around a post decision state
vector, sxk , which is the state of the system at discrete time, k,
soon after making the decisions but before the realisation of
any random variables [29]. This is because approximating the
expectation within the max or min operator in (10) is diffi-
cult in large practical applications as transition probabilities
from all the possible states are required. Pseudo-code of the
method used to approximate the value function is given in
Algorithm 1, which is a double pass algorithm referred to as
temporal difference learning with a discount factor λ = 1
or TD(1).
Given this, the original transition function sk+1 =
sM (sk , xk , ωk ) is divided into the post-decision state:
Fig. 5. Illustration of the modified Markov decision process, which separates
sxk = sM,x (sk , xk ), (11) the state variables into post decision states and pre-decision states.
and the next pre-decision state:

correspond to high, middle and lowest states. However, the
sk+1 = sM,ω sxk , ωk , (12)
next pre decision state s2 depends on the random variables ω1 .
which are used in line 12 of Algorithm 1. Given this, the new form of the value function is written as:
An example of the new modified MDP is illustrated in
V̄kπ (sk ) = max Ck (sk , xk ) + V̄kπ,x sxk , (13)
Fig. 5, which uses the mean and variation of the stochas- xk
tic variables to obtain the post-decision and next pre-decision where V̄kπ,x (sxk ) is the VFA around the post-decision state sxk ,
states, respectively. In more detail, at s1 , there are three possi- given by:
ble decisions that takes us to three post-decision states, which π
V̄kπ,x sxk = E Vk+1 (sk+1 )|sxk . (14)
2 An anytime algorithm is an algorithm that returns a feasible solution even
if it is interrupted prematurely. The quality of the solution, however, improves This
π (s
method is computationally feasible because
if the algorithm is allowed to run until the desired convergence. E Vk+1 k+1 )|sk is a function of the post-decision state sk ,
x x
respectively. Instead, the inter-device coupling between the

battery and the TES only arise through their effect on energy
costs, so it is done in the contribution function in (7). If the
state transition functions of the controllable devices depend on
each other, then the VFAs are not separable and we have to use
multidimensional value functions. In such situations the num-
ber of iterations needed for VFA convergence will increase
and concavity needs to be generalised as well.
In more detail, Algorithm 1 proceeds as follows:
1) Set the initial VFAs to zero (i.e., all the slopes to zero)
or to an initial guess to speed up the convergence (lines
1-3). Estimates for the initial VFAs can be obtained by
solving the deterministic problem using MILP. The value
of the initial starting state si1 is assumed.
2) For each realisation of random variables, step forward
in time by solving the following deterministic problem
Fig. 6. (a) Expected future reward or VFA (14) for following the optimal (line 7) using the VFA from the previous iteration:
policy vs. state of the battery for time-steps k = 49 and k = 60, and (b) value
of the objective function (i.e., reward) vs. iterations for ADP approach and

the expected value from DP. xrk = arg max C srk , xk + V̄kr−1 sx,r k ,
xk ∈Xk
⎛ ⎞
Ar−1
⎜ k
⎟
that is a deterministic function of xk . However, in order to = arg max⎝C srk , xk + v̄r−1
ka zka ⎠. (17)
solve (13), we still need to calculate the value functions xk ∈Xk a=1
in (14) for every possible state sxk for all k. This can
be computationally difficult since sxk is continuous and 3) Determine the positive and the negative marginal con-
multidimensional, so we approximate (14). tributions ĉr,i+ r,i r,i− r,i
k (sk ) and ĉk (sk ), respectively (line 9)
Two strategies are employed. First we construct lookup for each controllable device, using:
tables for VFAs in (14) that are concave and piecewise linear
in the resource dimension of all the state variables of control- cr,i+
k sr,i+
k , xkr,i+ − cr,i k sr,i
k , xkr,i
lable devices [25]. For example, in the VFA for k = 49, which ĉr,i+ sr,i = ,
δs
k k
is depicted in Fig. 6(a), the expected future rewards stay the
same after approximately 7 kWh, so if we are at 7 kWh in
cr,i
k sk , xk
r,i r,i
− cr,i−
ki sr,i−
k , xk
r,i−
r,i− r,i
ĉk sk = , (18)
the time-step k = 48, charging the battery further will have no δs
future rewards and will only incur an instantaneous cost if the
where sr,i+ = s r,i
+ δs , x r,i+
= X π sr,i+ , and δs is
electricity has to come from the grid. However, if the electric- k k k k k
ity price or demand is high then we can discharge the battery as the mesh size of the state space. We do this similarly
the expected future rewards will only decrease slightly. Given for sr,i−
k and xkr,i− .
this, we never charge the storage when there is no marginal 4) Find the post-decision and the next pre-decision states
value so the slopes of the VFA are always greater than or using (11) and (12), respectively. Transition functions of
equal to zero. Accordingly, the VFA is given by: the controllable devices can be non-linear (line 12).
5) Starting from K, step backward in time to compute the
Ak
¯ i,x slopes, v̂r,i+k , which are then used to update the VFA
Vk sk =
i v̄ka zka , (15)
a=1
(line 16). Compute v̂r,i+ k :
⎧
where a zka = ski,x and 0 ≤ zka ≤ z̄ka for all a. zka is the ⎪
⎪ ĉr,i+ sr,i , if k = N
⎨ ⎪ K
K
resource coordinate variable for segment a ∈ (1 . . . Ak ), Ak ∈
A, z̄ka is the capacity of the segment and v̄ka is the slope. Other v̂r,i+ sr,i = ĉr,i+ sr,i + , (19)
k k ⎪
⎪
k k
strategies that could be used for this step are parametric and ⎪
⎩ v̂ r,i+ r,i+ r,i
k k+1 sk+1 otherwise
non-parametric approximations of the value functions [29].
Second we handle the multidimensional state space by gen- where r,i+ k = δs1 M r,i
S (xk − xkr,i+ ) is the marginal flow
erating independent VFAs for each controllable device, which (i.e., whether or not there is a change in energy in the
are then combined to obtain the optimum policy. The separable storage as a result of the perturbation). We do this simi-
VFA is given by: larly for v̂r,i− r,i
k (sk ). Note that we took the power coming
x
I out of the storage as negative.
¯
Vk sk = V¯ki ski,x . (16) 6) Update the estimates of the marginal values [27]:
i=1
It is possible to separate the VFAs for battery and TES because v̄r,i+
k−1 k−1s x,r,i
= 1 − α r−1 r−1,i+ x,r,i
v̄k−1 s k−1 + α
r−1 r,i+
v̂k ,
their state transitions are independent as shown in (4) and (5), (20)
where α is a “stepsize”; α ∈ (0, 1], and similarly

for v̄r,i− x,r,i
k−1 (sk−1 ) (line 17). In this research, a harmonic
step-size formula is used, α = b/(b + r), where b is
a constant. This step-size formula satisfies conditions
ensuring that the values will converge as r → ∞ [36].
7) Use the concave adaptive value estimation (CAVE)
algorithm to update the VFAs [37] (line 18).
8) Combine value functions of each device using (16).
9) Repeat this procedure over R iterations, which are gener-
ated randomly according to the probability distributions
of the random variables. We find that R = 1000 real-
isations is enough for the objective function to come
within an acceptable accuracy even for the worst possi-
ble scenario. We investigated a range of scenarios and
an example is given in Fig. 6(b).
Note that when solving a deterministic SHEMS problem
using ADP, the post decision and next pre decision states are
the same because the random variables will be zero. However,
Fig. 7. Probability density functions of the (a) PV output over different times
the remaining steps in Algorithm 1 stay the same. This means, of a sunny day (January 1st , 2013), and (b) electrical demand over different
with ADP, there is no noticeable computational burden for times of a high demand day (July 2nd , 2012).
considering variation in the stochastic variables compared to
solving the deterministic problem. On the other hand, with
DP we have to loop over all the possible combinations of In this paper, the mean PV output and electrical demand
realisation of random variables, which significantly increases are from a data set collected during Smart Grid Smart City
the computational burden. project. The data set [34] consists of PV output and electrical
Now we present the probabilistic models of the stochastic demand measurements at 30 minutes intervals over 3 years
variables. for 50 households. In real world applications, the SHEMSs
estimate mean PV output using the weather forecast [38] and
IV. E STIMATING S TOCHASTIC VARIABLE M ODELS the mean electrical demand using a suitable demand prediction
In order to optimise performance, it is important for a algorithm [39], [40].
SHEMS to incorporate variations in the PV output and elec- Commonly, probability distributions associated with these
trical and thermal demand, and to do so over a horizon of stochastic variables are modelled as Gaussian or skew-Laplace
several days. The benefits of using a stochastic optimisation distributions. However, in this paper, we kernel estimate the
over a deterministic optimisation are discussed in Section V-D probability distributions of PV-output and electrical demand
and in [13], [19], [20], [22], and [23]. Given this, SHEMSs using a hierarchical approach, where we first cluster total
require the mean PV output and the demand with its appro- daily empirical data, and then kernel estimate probability
priate probability distributions before the start of the decision distributions within each cluster. In more detail, we obtain
horizon. The effects of these random variations on the SHEMS the probability distributions of the PV output, which depend
problem are discussed below: on the time and type of day (sunny, normal or cloudy days)
1) PV output depends on solar insolation, a forecast of in two steps.
which can be obtained before the horizon starts with 1) First we cluster daily empirical data using a k-means
a reasonable accuracy from weather forecasting ser- algorithm to obtain 3 clusters with different total daily
vices. PV output is important to the SHEMS problem PV generations, corresponding to sunny, normal and
as it is a key source of energy and is expected to be cloudy days.
closely coupled with the battery storage profile. Failing 2) Second, for each time-step in the corresponding clusters,
to accommodate for variation in PV generation would we estimate a probability distribution of the PV output
be expected to increase costs to the household as more using an Epanechnikov kernel estimating technique.
power is imported from the grid. Note that the draws from the kernel estimates within a day-
2) Electrical demand of the household depends on the num- type are independent. We obtain the probability distributions
ber of occupants and their behavioural patterns, which of the electrical demand in a similar way except the clustering
is difficult to predict in the real world. In the context of is done according to days with high, normal or low demand
SHEMS, electrical demand should be supplied from the levels. The probability distributions of the PV output and
DG units, storage units and the electrical grid. Failure to electrical demand follows a skewed unimodal distributions as
accommodate variations in electrical demand may result depicted in Fig. 7. It is worthwhile to note that before the start
in additional costs to the household. of the decision horizon, the SHEMS uses the predicted mean
3) Thermal demand is also difficult to predict in the real PV-output and the electrical demand to determine the type
world so failure to accommodate variations in thermal of day and hence the corresponding probability distribution.
demand may result in user discomfort. Note that the prediction is accurate enough to decide the type
TABLE I
DAILY O PTIMISATION R ESULTS FOR THE T HREE S CENARIOS
of day but not accurate enough to use a deterministic opti-

misation. Given this, we investigate the effects on the total
electricity costs from deterministic and stochastic SHEMSs,
using a range of possible PV output and demand profiles, in
Fig. 8. PV output and electrical and thermal demand for (a) Scenario 1,
Section V-D. (b) Scenario 2 and (c) Scenario 3.
Finally, we construct the magnitudes of the thermal demand
and the time they occur making use of Australian Standard
AS4552 [41] and the hot water plug readings in [35]. NSW, Australia, based residential building shown in Fig. 8:
We assume a Gaussian distribution for the thermal demand Scenario 1, 2, and 3 are on August 20th , 2012, July 2nd , 2012,
because there is not enough empirical data to obtain a and January 1st , 2013, respectively. The PV system size is
reasonable distribution. 2.2 kWp.
Now we show the performance of the presented SHEMSs From our preliminary investigations, and as depicted in
using real-data. Fig. 6(a), the expected future rewards in the start-of-day two
(time-step k = 49) only increases slightly after 6 kWh, which
V. S IMULATION R ESULTS AND D ISCUSSION suggests that using half of the available battery capacity as the
There are three sets of simulations. The first set consists start-of-day and end-of-day battery SOC (sb1 = sB K = 6 kWh)
of discussions about: challenges of estimating the end-of- in daily optimisations with DP is a valid assumption. However,
day SOC; benefits of PV-storage systems; quality of the we observe that on days with a low morning demand, high
solutions from ADP, DP and stochastic MILP; and the ben- PV output and medium-high evening demand (see Fig. 8(c)),
efits of a stochastic optimisation over a deterministic one sb1 = sBK = 2 kWh gives the best results because the battery
(Sections V-A–V-D). The second set has discussions on: can be used to supply the evening demand and there is no need
computational aspects; and effects of extending the decision to charge it back. However, the next day’s electricity cost can
horizon (Sections V-E, and -F). The third set is about the year- significantly increase if we are anticipating a high morning
long optimisation (Section V-G). The time-of-use electricity demand and low-high PV output (see Fig. 8(a-b)). Because
tariff consists of off-peak, shoulder and peak periods with of such situations, it is beneficial to control the end-of-day
$0.11, $0.2 and $0.47 per kWh, respectively. In weekdays, battery SOC by considering uncertainties over several days,
off-peak period is between 12 am - 7 am and 10 pm - 12 am, which is our special focus of attention in Sections V-E and F.
and the peak is between 2 pm - 8 pm. On weekends peak-
period is replaced by the shoulder. Stochastic MILP uses 6000 B. Benefits of PV-Storage Systems
scenarios. MATLAB is used to implement all the SHEMSs.
Residential PV-storage systems using a DP based SHEMS
result in significant financial benefits, which is evident from
A. Challenges of Estimating the End-of-Day SOC the electricity cost of the three scenarios in three instances
In the first set of simulations, we discuss the challenges of (i.e., benchmark cost with neither PV or storage, cost with
estimating the end-of-day battery SOC (Section V-A), benefits PV but no storage and PV-storage system with SHEMSs) in
of residential PV-storage systems (Section V-B), the perfor- Table I. The daily electricity cost can be reduced by 6.2%,
mance of the three SHEMSs over a day (Section V-C) and the 6.1% and 22.6% for Scenario 1, 2 and 3, respectively, if there
benefits of using a stochastic optimisation over a deterministic is only a PV system. We can further improve this by having a
one (Section V-D) using three scenarios for a Central Coast, battery and effectively controlling its SOC using a SHEMS in
which the battery is charged to a certain level from solar power

and electrical grid before peak periods. A DP based SHEMS
constrained to a 60% start-of-day and end-of-day battery SOC
reduces the total electricity cost by further 42.25%, 17.33%
and 36.72% for Scenario 1, 2, & 3, respectively. That is a total
of 48.45%, 23.43% and 59.32% cost reduction for Scenario 1,
2 and 3, respectively, by controlling both the battery and TES.
As shown in Fig. 8, inhabitants are away during the day for
Scenario 1 and 3 so the extra PV generation is stored in the
battery thus shows the benefits of having a storage. In contrast
to Scenarios 1 & 3, Scenario 2 electrical demand exceeds PV
generation so the benefit of battery is minimal. The electricity
cost of controlling the TES in Scenario 1 and 3 are the low-
est because the surplus of solar and battery power is used to
charge the TES instead of sending it back to the grid as FiTs
are negligible in Australia. Note that we obtain the benchmark
cost of the TES by assuming a dummy control system, which
operates regardless of the electricity price. The dummy TES
control electricity cost varies depending on the time the hot
water is used.
C. Quality of the Solutions From ADP, DP and Fig. 9. The total electricity cost of PV-battery systems of scenarios 1-3
Stochastic MILP from deterministic and stochastic ADP for a range of possible PV output and
demand profiles, which are generated by adding Gaussian noise with vary-
ADP and DP results in better quality solutions than stochas- ing standard deviation and zero mean to the actual PV output and electrical
demand.
tic MILP as they both incorporate stochastic input variables
using appropriate probabilistic models and non-linear con-
straints [24]. However, ADP results in slightly lower quality ADP enables us to incorporate the stochastic input variables
schedules compared to the optimal DP solutions because the without a noticeable increase in the computational effort over
value functions used are approximations. This is evident in a deterministic ADP. Moreover, stochastic ADP requires less
Table I. Note that the DP based SHEMS with two controllable computational effort than deterministic DP. An ADP-based
devices (battery and thermal) is computationally intractable stochastic optimisation for a PV-battery system can reduce the
to be use in an existing smart meter, and we only use it to total electricity cost by 13.62%, 0.16%, and 94.67% for sce-
compare solutions with ADP. narios 1, 2, and 3, respectively, in instances without Gaussian
noise. The benefits from Scenarios 1 and 3 are noticeable
D. Benefits of a Stochastic Optimisation Over a and their forecast errors are what we can expect from elec-
Deterministic Optimisation trical demand prediction algorithms [39], [40]. The benefits
A stochastic optimisation always performs better over a from the stochastic optimisation is minimal when the fore-
deterministic one and we investigate this using a range of pos- cast errors are very high (Scenario 2) or very low, which are
sible PV output and the demand profiles as shown in Fig. 9. highly unlikely. In Scenario 2, the stochastic optimisation gives
The Fig. 9 is obtained as follows: first we obtain VFAs from slightly lower quality results after 0.2 kW standard deviation of
both the stochastic and deterministic optimisations using ADP. Gaussian noise because both the forecast and kernel estimation
The stochastic optimisation uses the kernel estimated proba- errors are high. However, these scenarios are highly unlikely
bility distributions of the PV output and electrical demand and the resulting cost is negligible compared to the benefits
while the deterministic ADP only uses the predicted mean PV from a stochastic optimisation. Moreover, the initial point of
output and the electrical demand (i.e., all the random vari- Scenario 2 already has a mean absolute error of 0.696, which is
ables are zero). Second we obtain the total electricity cost highly unlikely to increase any further in a practical scenario.
for different possible PV output and demand profiles, which In summary, even though the benefits vary with the scenario
are generated by adding Gaussian noise with varying standard and the forecast error, a stochastic optimisation performs bet-
deviation and zero mean to the actual PV output and elec- ter or same as a deterministic one, and ADP provides these
trical demand. The mean absolute errors of the actual (i.e., benefits without a noticeable increase in the computational
zero Gaussian noise) and the predicted mean electrical demand effort.
are 0.238%, 0.696%, and 0.117%, for Scenarios 1, 2, and 3,
respectively, which are the initial points. Note that the forecast E. Computational Aspects
errors associated with residential electrical demand predictions In our second set of simulation results, we discuss the
are typically very high and our aim here is not to minimise computational performance of the three solution techniques
forecast errors but to find a suitable stochastic optimisation (Section V-E) and the benefits of extending the decision
technique that performs well under uncertainty. horizon (Section V-F) of households with PV-battery systems.
TABLE II
Y EARLY O PTIMISATION R ESULTS FOR T WO H OUSEHOLDS OVER 3 Y EARS
Fig. 10. Effects of extending the decision horizon for PV-battery systems,
(a) computational time of ADP, DP and stochastic MILP against the length
of the decision horizon, and (b) normalised electricity cost with error bars
against the length of the decision horizon.
ADP computes a solution much faster than both DP and

stochastic MILP and, most importantly, ADP with a two-day Fig. 11. Electricity cost savings over 3 years for 10 households, where blue
lines indicate 25th and 75th percentiles and the red lines indicate the median.
decision horizon computes a solution with less than half of the
computational time of a daily DP, as depicted in Fig. 10(a).
The computational time of both SHEMSs using ADP and G. Year-Long Optimisation
DP increases linearly as we increase the decision horizon, In our third set of simulation results, we compare the ADP
however, ADP has a lesser slope. This linear increase with based SHEMS with a two-day decision horizon to a DP
DP is because the state transitions in this problem are only approach with a daily decision horizon over three years for
between two adjacent time-steps so time does not increase 10 households with PV-battery systems. We omit TES as we
exponentially. The computational time of ADP with a two- already identified its benefits in Section V-B, and moreover, a
day decision horizon will only increase by 4 minutes when yearly optimisation using DP with two controllable devices is
the TES is added while a finely discretised DP based SHEMS computationally difficult. The time-periods of the three years
takes approximately 2.5 hours. are: year 1 from July 1st , 2012 to June 31st , 2013, year 2 is
from July 1st , 2011 to June 31st , 2012 and year 3 is from
July 1st , 2010 to June 31st , 2011. Electricity cost savings for
F. Effects of Extending the Decision Horizon all the households over three years are given in Fig. 11 and
in Table II we demonstrate detailed results of two households
Extending the decision horizon beyond one day to con- in Central Coast (Household 1) and Sydney (Household 2), in
sider inter-daily variations in PV output and electrical demand NSW, Australia. The PV sizes of the Households 1 & 2 are
results in significant benefits as depicted in Fig. 10(b), which 2.2 kWp and 3.78 kWp, respectively.
shows the normalised electricity cost vs. length of the decision The proposed ADP-based SHEMS implemented using a
horizon. We obtain our results using a DP based SHEMS for two-day decision horizon that consider variations in PV output
10 households over 2 months.3 Our results show that increas- and electrical demand reduces the average yearly electricity
ing the decision horizon beyond two days has no significant cost by 4.63% compared to a daily DP based SHEMS, as
benefits. However, increasing the decision horizon up to one depicted in Fig. 11. We also find that the average yearly sav-
week is beneficial in some situations, e.g., off-the-grid systems ings are 5.12%, 3.89% and 4.95% for years 1-3, respectively.
and if there are high variations in PV output and demand. The This is because 2013 was a sunny year compared to 2012
benefits of the two-day decision horizon varies depending on and 2011 so the two day optimisation has greater benefits.
the household, which we will discus in the next section. For example: if we are anticipating a sunny weekend with
low demand then we can completely discharge the battery
3 Here we use DP as we want to obtain the exact solution. In a practical on Friday night. However, if we have a half charge battery
implementation, extending the decision horizon with DP is difficult as the on Friday night then we will be wasting most of the free
computational power is limited in existing smart meters. energy as the storage capacity is limited. Our results showed
that a daily DP results in a significant cost savings of 12.04% [2] A. C. Chapman, G. Verbič, and D. J. Hill, “Algorithmic and strategic
($107.35) and 1.91% ($39.35) for Households 1 and 2, respec- aspects to integrating demand-side aggregation and energy manage-
ment methods,” IEEE Trans. Smart Grid, vol. 7, no. 6, pp. 2748–2760,
tively, compared to a daily stochastic MILP approach. The Nov. 2016.
difference in the savings is because of the following reasons. [3] Time-of-Use Pricing, Ausgrid, Sydney, NSW, Australia. [Online].
In scenarios with high demand (i.e., Household 2), most of the Available: http://www.ausgrid.com.au/Common/Customer-Services/
Homes/Meters/Time-of-use-pricing.aspx#.WDO97lwpHIU
time the battery will discharge its maximum possible power [4] Flexible Pricing FAQs, Energy Australia, Melbourne, VIC,
to the household during peak periods, so the battery and the Australia. [Online]. Available: https://www.energyaustralia.com.au/
inverter will operate in the maximum efficiencies even though faqs/residential/flexiblepricing-plan-information
MILP solver does not consider the non-linear constraints. The [5] S. Lu et al., “Centralized and decentralized control for demand
response,” in Proc. IEEE PES Innov. Smart Grid Technol. (ISGT),
converse is happening for scenarios with low demand (i.e., Anaheim, CA, USA, 2011, pp. 1–8.
Household 1). [6] M. Pipattanasomporn, M. Kuzlu, and S. Rahman, “An algorithm for
For demonstration purposes we show optimisation results intelligent home energy management and demand response analysis,”
IEEE Trans. Smart Grid, vol. 3, no. 4, pp. 2166–2173, Dec. 2012.
for two households over three years in Table II. The average [7] S. Li, D. Zhang, A. B. Roget, and Z. O’Neill, “Integrating home energy
yearly electricity cost for a residential PV-battery system with simulation and dynamic electricity price for demand response study,”
the proposed ADP based SHEMS can reduce the total electric- IEEE Trans. Smart Grid, vol. 5, no. 2, pp. 779–788, Mar. 2014.
[8] M. Muratori and G. Rizzoni, “Residential demand response: Dynamic
ity cost over 3 years by 57.27% and 50.95% for Households energy management and time-varying electricity pricing,” IEEE Trans.
1 and 2, respectively, compared to the 22.80% and 28.60% Power Syst., vol. 31, no. 2, pp. 1108–1117, Mar. 2016.
improvements by having only a PV system. It is important to [9] Australian PV Institute. (Jan. 2015). Australian PV Market Since April
note that a DP based SHEMS over a two-day long decision 2001. [Online]. Available: http://pv-map.apvi.org.au/analyses
[10] “The integrated grid realizing the full value of central and distributed
horizon may result in a slightly better solution. However, it is energy resources,” Elect. Power Res. Inst. (EPRI), Palo Alto, CA, USA,
computationally difficult and the computational power avail- Tech. Rep., 2014.
able will be limited as it won’t be worthwhile investing in a [11] “Emerging technologies information,” Aust. Energy
Market Operat. (AEMO), Tech. Rep., 2015. [Online].
specialised equipment to solve this problem given the savings Available: https://www.aemo.com.au/Media-Centre/2015-Emerging-
on offer. Technologies-Information-Paper
[12] “The economics of grid defection when and where distributed solar gen-
eration plus storage competes with traditional utility service,” Rocky
VI. C ONCLUSION Mountain Inst., Boulder, CO, USA, Tech. Rep., 2014.
[13] Z. Chen, L. Wu, and Y. Fu, “Real-time price-based demand response
This paper shows the benefits having a smart home energy management for residential appliances via stochastic optimization
management system and presented an approximate dynamic and robust optimization,” IEEE Trans. Smart Grid, vol. 3, no. 4,
pp. 1822–1831, Dec. 2012.
programming approach for implementing a computationally
[14] C. Keerthisinghe, G. Verbič, and A. C. Chapman, “Evaluation of a
efficient smart home energy management system with sim- multi-stage stochastic optimisation framework for energy management
ilar quality schedules as with dynamic programming. This of residential PV-storage systems,” in Proc. Aust. Univ. Power Eng.
approach enables us to extend the decision horizon to up Conf. (AUPEC), Perth, WA, Australia, Sep./Oct. 2014, pp. 1–6.
[15] M. C. Bozchalui, S. A. Hashmi, H. Hassen, C. A. Canizares, and
to a week with high resolution while considering multiple K. Bhattacharya, “Optimal operation of residential energy hubs in smart
devices. Our results indicate that these improvements pro- grids,” IEEE Trans. Smart Grid, vol. 3, no. 4, pp. 1755–1766, Dec. 2012.
vide financial benefits to households employing them in a [16] F. De Angelis et al., “Optimal home energy management under dynamic
electrical and thermal constraints,” IEEE Trans. Ind. Informat., vol. 9,
smart home energy management system. Moreover, stochas- no. 3, pp. 1518–1527, Aug. 2013.
tic approximate dynamic programming always performs better [17] J. Wang, Z. Sun, Y. Zhou, and J. Dai, “Optimal dispatching model of
over deterministic approximate dynamic programming under smart home energy management system,” in Proc. IEEE Innov. Smart
Grid Technol. Asia (ISGT Asia), Washington, DC, USA, 2012, pp. 1–5.
uncertainty without a noticeable increase in the computational
[18] K. C. Sou, J. Weimer, H. Sandberg, and K. H. Johansson, “Scheduling
effort. smart home appliances using mixed integer linear programming,” in
In practical applications, we can use value function approx- Proc. 50th IEEE Conf. Decis. Control Eur. Control Conf. (CDC-ECC),
imations generated from approximate dynamic programming Orlando, FL, USA, 2011, pp. 5144–5149.
[19] M. A. Pedrasa, E. D. Spooner, and I. F. MacGill, “Robust schedul-
during offline planning phase to make faster online solu- ing of residential distributed energy resources using a novel energy
tions. This is not possible with stochastic mixed-integer linear service decision-support tool,” in Proc. IEEE PES Innov. Smart Grid
programming and generating value functions using dynamic Technol. (ISGT), Anaheim, CA, USA, 2011, pp. 1–8.
[20] M. A. A. Pedrasa, T. D. Spooner, and I. F. MacGill, “Coordinated
programming is computationally difficult. In the future, we scheduling of residential distributed energy resources to optimize
will learn initial value function approximations of approximate smart home energy services,” IEEE Trans. Smart Grid, vol. 1, no. 2,
dynamic programming using historical results, which will fur- pp. 134–143, Sep. 2010.
ther speed up the value function approximation convergence. [21] D. P. Bertsekas, Dynamic Programming and Optimal Control, vol. 1,
3rd ed. Belmont, MA, USA: Athena Sci., 2005.
Given the benefits outlined in this paper and the possible [22] H. Tischer and G. Verbič, “Towards a smart home energy management
future work, we recommend the use of approximate dynamic system—A dynamic programming approach,” in Proc. IEEE PES Innov.
programming in smart home energy management systems. Smart Grid Technol. Asia (ISGT), Anaheim, CA, USA, 2011, pp. 1–7.
[23] C. Keerthisinghe, G. Verbič, and A. C. Chapman, “Addressing the
stochastic nature of energy management in smart homes,” in Proc. Power
Syst. Comput. Conf. (PSCC), Wrocław, Poland, Aug. 2014, pp. 1–7.
R EFERENCES [24] C. Keerthisinghe, G. Verbič, and A. C. Chapman, “Energy manage-
ment of PV-storage systems: ADP approach with temporal difference
[1] “Australian government bureau of resources and energy economics,” learning,” in Proc. Power Syst. Comput. Conf. (PSCC), Genoa, Italy,
Energy in Australia, Tech. Rep., 2014. Jun. 2016, pp. 1–7.
[25] D. F. Salas and W. B. Powell, “Benchmarking a scalable approximate Chanaka Keerthisinghe (S’10) received the
dynamic programming algorithm for stochastic control of multidimen- B.E. (Hons.) and M.E. (Hons.) degrees in electrical
sional energy storage problems,” Working Paper, Dept. Oper. Res., and electronic engineering from the University of
Financial Eng., Princeton, NJ, USA, 2013. Auckland, New Zealand, in 2011 and 2012, respec-
[26] D. R. Jiang, T. V. Pham, W. B. Powell, D. F. Salas, and W. R. Scott, tively. He is currently pursuing the Ph.D. degree with
“A comparison of approximate dynamic programming techniques on the Centre for Future Energy Networks, University
benchmark energy storage problems: Does anything work?” in Proc. of Sydney, Australia. His research interests include
IEEE Symp. Adap. Dyn. Program. Reinforcement Learn. (ADPRL), demand response, energy management in residential
Orlando, FL, USA, Dec. 2014, pp. 1–8. buildings, wireless charging of electric vehicles and
[27] R. N. Anderson, A. Boulanger, W. B. Powell, and W. Scott, “Adaptive applying approximate dynamic programming and
stochastic control for the smart grid,” Proc. IEEE, vol. 99, no. 6, optimization techniques in power systems.
pp. 1098–1115, Jun. 2011.
[28] W. B. Powell et al., “SMART: A stochastic multiscale model for the
analysis of energy resources, technology, and policy,” INFORMS J.
Comput., vol. 24, no. 4, pp. 665–682, 2012.
[29] W. B. Powell, Approximate Dynamic Programming: Solving the Curses
of Dimensionality. Hoboken, NJ, USA: Wiley, 2007.
[30] W. B. Powell and H. Topaloglu, Approximate Dynamic Programming
for Large-Scale Resource Allocation Problems. Princeton, NJ, USA:
Gregor Verbič received the B.Sc., M.Sc., and
Princeton Univ. Press, 2005.
Ph.D. degrees in electrical engineering from the
[31] F. Borghesan, R. Vignali, L. Piroddi, M. Prandini, and M. Strelec,
University of Ljubljana, Ljubljana, Slovenia, in
“Approximate dynamic programming-based control of a building cool-
1995, 2000, and 2003, respectively. In 2005, he
ing system with thermal storage,” in Proc. 4th IEEE/PES Innov. Smart
was a North Atlantic Treaty Organization Natural
Grid Technol. Europe (ISGT EUROPE), Oct. 2013, pp. 1–5.
Sciences and Engineering Research Council of
[32] M. Strelec and J. Berka, “Microgrid energy management based on
Canada Post-Doctoral Fellow with the University of
approximate dynamic programming,” in Proc. 4th IEEE/PES Innov.
Waterloo, Waterloo, ON, Canada. Since 2010, he has
Smart Grid Technol. Europe (ISGT EUROPE), Oct. 2013, pp. 1–5.
been with the School of Electrical and Information
[33] P. Samadi, H. Mohsenian-Rad, V. W. S. Wong, and R. Schober, “Real-
Engineering, University of Sydney, Sydney, NSW,
time pricing for demand response based on stochastic approximation,”
Australia. His expertise is in power system oper-
IEEE Trans. Smart Grid, vol. 5, no. 2, pp. 789–798, Mar. 2014.
ation, stability and control, and electricity markets. His current research
[34] E. L. Ratnam, S. R. Weller, C. M. Kellett, and A. T. Murray,
interests include integration of renewable energies into power systems and
“Residential load and rooftop PV generation: An Australian dis-
markets, optimization and control of distributed energy resources, demand
tribution network dataset,” Int. J. Sustain. Energy, pp. 1–20,
response, and energy management in residential buildings. He was a recipient
Oct. 2015. [Online]. Available: http://www.tandfonline.com/doi/abs/
of the IEEE Power and Energy Society Prize Paper Award in 2006. He is an
10.1080/14786451.2015.1100196
Associate Editor of the IEEE T RANSACTIONS ON S MART G RID.
[35] (Jan. 2015). Smart-Grid Smart-City Customer Trial Data.
[Online]. Available: https://data.gov.au/dataset/smart-grid-smart-
city-customer-trial-data
[36] T. Jaakkola, M. I. Jordan, and S. P. Singh, “Convergence of stochastic
iterative dynamic programming algorithms,” Neural Comput., vol. 6,
no. 6, pp. 1185–1201, 1994.
[37] G. A. Godfrey and W. B. Powell, “An adaptive, distribution-free
algorithm for the newsvendor problem with censored demands, with
applications to inventory and distribution,” Manag. Sci., vol. 47, no. 8, Archie C. Chapman (M’14) received the B.A.
pp. 1101–1112, 2001. degree in math and political science and the B.Econ.
[38] N. Sharma, P. Sharma, D. Irwin, and P. Shenoy, “Predicting solar gen- (Hons.) degree from the University of Queensland,
eration from weather forecasts using machine learning,” in Proc. IEEE Brisbane, QLD, Australia, in 2003 and 2004, respec-
Int. Conf. Smart Grid Commun. (SmartGridComm), Brussels, Belgium, tively, and the Ph.D. degree in computer science
Oct. 2011, pp. 528–533. from the University of Southampton, Southampton,
[39] K. Bao, F. Allerding, and H. Schmeck, “User behavior prediction U.K., in 2009. He is currently a Research Fellow
for energy management in smart homes,” in Proc. 8th Int. Conf. in Smart Grids with the School of Electrical and
Fuzzy Syst. Knowl. Disc. (FSKD), vol. 2. Shanghai, China, Jul. 2011, Information Engineering, Centre for Future Energy
pp. 1335–1339. Networks, University of Sydney, Sydney, NSW,
[40] D. Lachut, N. Banerjee, and S. Rollins, “Predictability of energy use in Australia. He has expertise is in game-theoretic and
homes,” in Proc. Int. Green Comput. Conf. (IGCC), Dallas, TX, USA, reinforcement learning techniques for optimization and control in large dis-
Nov. 2014, pp. 1–10. tributed systems. His research focuses on integrating renewables into legacy
[41] “Water heating data collection and analysis,” E3–Equipment power networks, using distributed energy and load scheduling methods, and on
Energy Efficiency, Tech. Rep., 2012. [Online]. Available: http:// designing tariffs and market mechanisms that support efficient use of existing
www.energyefficient.com.au/reports-technical.html infrastructure and new controllable devices.

A Fast Technique For Smart Home Management ADP With Temporal Difference Learning

Uploaded by

Copyright:

Available Formats

You might also like

A Fast Technique For Smart Home Management ADP With Temporal Difference Learning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Fast Technique For Smart Home Management ADP With Temporal Difference Learning

Uploaded by

Copyright:

Available Formats

IEEE TRANSACTIONS ON SMART GRID, VOL. 9, NO.

4, JULY 2018 3291

A Fast Technique for Smart Home Management:

Abstract—This paper presents a computationally efficient Ck Cost or reward at time-step k

SHEMSs that consider stochastic variables such as variations

consortium partners in NSW, Australia, which investigated the

The energy balance constraint is given by:

better incorporating the stochastic variables, but this imposes

where Pn (s j,n ) is the probability of a particular scenario n

For each realized scenario, the optimisation problem is

A. Policy-Based Value Function Approximation

and the next pre-decision state:

respectively. Instead, the inter-device coupling between the

where α is a “stepsize”; α ∈ (0, 1], and similarly

of day but not accurate enough to use a deterministic opti-

which the battery is charged to a certain level from solar power

ADP computes a solution much faster than both DP and

You might also like