Professional Documents
Culture Documents
Energy_Storage_Management_via_Deep_Q-Networks
Energy_Storage_Management_via_Deep_Q-Networks
Abstract—Energy storage devices represent environmentally On the other hand, several prior works [11]–[14] have
friendly candidates to cope with volatile renewable energy gen- studied real-time or online energy management under stochastic
eration. Motivated by the increase in privately owned storage assumptions regarding the environment. The authors of [11],
systems, this paper studies the problem of real-time control of
a storage unit co-located with a renewable energy generator [12] assumed full knowledge of idealized stationary stochastic
and an inelastic load. Unlike many approaches in the literature, processes for the demand and the renewable energy generation.
no distributional assumptions are being made on the renewable However, even under stationary assumptions, the knowledge
energy generation or the real-time prices. Building on the deep Q-
of the distribution is very hard to obtain by the consumer.
networks algorithm, a reinforcement learning approach utilizing In [13], a real-time management framework was proposed that
a neural network is devised where the storage unit operational
constraints are respected. The neural network approximates combines an off-line optimal solution with a sliding-window
the action-value function which dictates what action (charging, optimization that requires estimates of the demand, renewable
energy generation, and prices. Estimating these using only
discharging, etc.) to take. Simulations indicate that near-optimal
performance can be attained with the proposed learning-based consumer-side data is one challenge; complexity is another. An
control policy for the storage units. analytic solution was sought in [14] where it was shown that
Index Terms—Energy storage, energy management systems, re-
inforcement learning, deep neural networks, data-driven control.
the optimal solution can be characterized using two thresholds.
The consumer charges the energy storage until it is fully
charged when the energy price is below a certain threshold,
I. I NTRODUCTION and discharges it until it is empty if the price is above another
threshold. Albeit the approach is simple and well-motivated,
Renewable energy generation has expanded rapidly in order the two thresholds can be obtained only if the demand and
to enable the environmental, social and economic benefits of energy prices are known in advance.
future smart grids.Due to their intermittent nature, renewable Reinforcement learning (RL) is an area of machine learning
energy sources increase the variability of the energy generation that deals with an agent learning how to behave in an
portfolio and introduce new challenges of real-time control. environment by performing actions and observing the results.
Energy storage systems provide an opportunity to increase the Classical RL, e.g., Q-learning [15], considers an environment
grid efficiency and cope with fluctuating renewable energy with discrete states which is not suitable for the power systems
generators [1]. The ability of batteries to respond quickly to application where the state of the system is mostly continuous.
changes in the system makes them ideal candidate for a wide Combining advances in deep learning [16] with Q-learning
range of power systems applications, including spatio-temporal resulted in deep Q-networks (DQN) [17] which achieved
energy arbitrage [2], peak shaving [3], frequency [4], and human-level control for several Atari video games. In a nutshell,
congestion management [2], [5]. deep neural networks were proposed for estimating the action-
This paper focuses on optimal energy management for a value function.
single power consumer (or microgrid system) with an energy In this paper, we propose a learning-based framework
storage unit, a renewable energy source, and an inelastic load for operating energy storage units building on the DQN
demand. Privately owned and operated storage units are being algorithm which matches the continuous nature of the en-
rapidly installed by both large-scale energy consumers [6] as ergy management system. The proposed approach learns a
well as individuals [7]. In addition, social benefits have been control policy using interactions with the environment with
identified for energy consumer ownership of energy storage no distributional knowledge of the load, the renewable energy
units [8]. generation, or the power prices. A neural network is fitted
The operation of energy storage units has recently received utilizing historical observations to approximate the action-value
significant attention. One line of research concerning this function. Therefore, the consumer can implement the action
problem assumes a deterministic time-varying environment [4], with the highest expected reward. The control policy ensures
[9], [10]. Thus, the operational and planning decisions are that the battery operational constraints are never violated.
made by solving an off-line optimization problem that takes The simulation results show near-optimal performance of the
into account operational losses and/or installation costs. proposed approach.
978-1-7281-1981-6/19/$31.00
Authorized licensed use limited to:©2019 IEEE
Penn State University. Downloaded on September 15,2023 at 04:47:30 UTC from IEEE Xplore. Restrictions apply.
The remainder of this paper is organized as follows. In rates are bounded as follows
Section II, the energy management system model is presented. b−t
Then, the DQN approach is introduced in Section III. The ηc b+
t ≤ rc , ≤ rd (1)
ηd
proposed framework is put forth in Section IV. In Section V,
the simulation results are presented, and the paper is concluded where ηc ∈ (0, 1] and ηd ∈ (0, 1] are the charging and
in Section VI. discharging efficiency coefficients. Therefore, the change in
the state of charge of the battery can be written as follows
II. E NERGY M ANAGEMENT S YSTEM M ODELING
b−
t
This paper studies the operation of a storage unit owned by et+1 = et + δ ηc b+ t − δ (2)
ηd
an electricity consumer. The charging and discharging of the
storage unit is assumed to be locally controlled. The consumer where δ is the slot duration. The decision to purchase bt power
purchases power from the grid to charge the battery, or sells its from the grid is feasible if 0 ≤ et+1 ≤ E and (1) is satisfied.
stored energy to the grid. The temporal axis is discretized into The consumer receives a utility ut (bt + dt − rt , pt ) at time
time slots with duration δ. We denote the state of charge of slot t, which depends on the net energy withdrawn from the
the battery at time slot t by et . In addition, let the capacity of grid (bt + dt − rt ) and the energy price pt . Then, the energy
the battery be denoted by E (in KWh). The consumer also has storage is controlled in order to minimize the discounted cost
a renewable energy source installed. Fig. 1 depicts the model function given by
considered in this paper. nX ∞
0
o
Ct = E γ t ut0 (bt0 + dt0 − rt0 , pt0 ) (3)
t0 =t
Authorized licensed use limited to: Penn State University. Downloaded on September 15,2023 at 04:47:30 UTC from IEEE Xplore. Restrictions apply.
Algorithm 1 Deep Q-Networks Learning to charge the battery, discharge it, or to set bt to be zero. The
Input: 0 < γ < 1, 0 < < 0, and 0 < κ < 1 consumer utilizes a deep neural network to approximate the
action-value function Q(s, a).
1: Intialization: Replay memory D, and action-value The consumer aims at minimizing the discounted cost
2: function Q(s, a; θ) with random weights θ0 . function (3) using the DQN approach. In the proposed approach,
3: while t ≤ T do after sufficient exploration of the state space, the consumer
4: With probability , select a random action at . implements the action a0 that maximizes Q(s, a). Afterwards
5: Otherwise, select at = arg maxa0 Q(st , a0 ; θt ). the consumer gets a reward rt for the implemented action,
6: Implement action at . and observes the new state of the system st+1 . Then, using
7: Observe the reward rt and the new system state st+1 . Bellman equation the value of the Q(st , a0 ) at optimality should
8: Store the tuple (st , at , rt , st+1 ) in D. be rt + γ maxa Q(st+1 , a). Therefore, a training sample is
9: Sample a random minibatch J from D. created to update the parameters of the neural network using
0 the observed reward and the new state of the system as in
10: Set yj = rj + γ maxa0 Q(s Pj+1 , a ; θt ) for all j ∈ J .2 Algorithm 1.
11: Perform gradient step on j∈J (yj − Q(sj , aj ; θt )) .
12: Update = κ. The proposed learning-approach is model-free in the sense
13: end while that no assumptions are being made on the processes generating
the energy prices or the renewable energy generation. The
approach learns the mapping using samples obtained by
physical systems, the state of the environment is naturally interactions with the system.
continuous, and hence, the classic Q-learning approaches are Utilizing the DQN approach, the control policy π of the
not directly applicable. storage unit is defined such that
The deep reinforcement learning approach [17], [18] utilizes
a neural network that approximates the action-value function at = µt (st ) = arg max Q(st , a0 ) (6)
a0
Q(s, a) which is defined as the maximum expected return
achievable by following any strategy. That is, the optimal where the function Q(st , a) is defined by the deep neural net-
action-value function can be expressed as work. In order to ensure feasibility of the actions implemented
∞
using this policy, the rate of charge and discharge have to
?
X t0 −t
be chosen such that the battery is not over charged nor over
Q (s, a) = max E γ rt0 |st = s, at = a, π . (4)
π discharged. When the selected action is to charge the storage
t0 =t
unit, the rate of charge is set to be the maximum charging
Known as universal function approximators, neural networks
rate rc unless the state of charge is expected to exceed the
are perfectly suited for accurate representation of the action-
maximum storage capacity if battery is charged with this rate
value function. A deep neural network with weights θ is used
for a time duration of length δ. Therefore, the charging rate is
to approximate Q(s, a) which is referred to as a Q-network.
set to be
The optimal action-value function Q? (s, a) obeys the Bell-
man equation which can be written as b+
t = max{ 0, min{ r c , (E − et ) / δηc }
Q? (st , at ) = rt + γ max
0
Q? (st+1 , a0 ). (5) which ensures that the state of charge of the battery is always
a
kept below E. Similarly, when the selected action is to
The idea behind the training of reinforcement learning methods
discharge the battery, the discharge rate is set to be
is to use Bellman equation to devise an iterative update rule.
Such update rule eventually converges to the optimal action- b−t = min{ r d , max{ 0, Eηd / δ}
value function [15] under mild conditions on the step size. The
learning procedure is summarized in Algorithm 1 where the which ensures that there is enough energy available to dis-
-greedy approach is used to ensure adequate exploration of charge.
the state space. V. E XPERIMENTS
During the learning process, we utilize the so-called ex-
perience replay technique [19]. This approach stores tuples We now present the results of our experiments to assess
representing past experiences in a replay memory. Hence, at the performance of the learning-based energy management
each update, a randomly drawn minibatch from past experiences framework. We use a time slot duration δ = 5 minutes.
are selected to perform gradient updates on the neural network A measured load data is utilized to simulate the customer
parameterizing the action-value function. Algorithm 1 shows consumption of energy. The load data was measured at 18
the implementation of the experience replay mechanism within buses, with a 1-second resolution, from a feeder in Anatolia,
our learning procedure. California, during a day in the summer of 2012 [20]. In order
to get enough training data, we concatenate the load profiles
IV. E NERGY S TORAGE M ANAGEMENT USING DQN at the 18 buses in the feeder which results in a sequence of
In this section, we present the learning-based energy storage power consumption of 18 days with resolution of 1 second.
management system. At any time slot, the consumer chooses Therefore, for each period of 5 minutes duration, we have
Authorized licensed use limited to: Penn State University. Downloaded on September 15,2023 at 04:47:30 UTC from IEEE Xplore. Restrictions apply.
forecast as well as the energy prices during the optimization
period. The second algorithm assumes exact knowledge of the
energy forecasts and prices for a small future time interval.
40
The detailed description of both approaches follows.
Optimal solver: assumes that the energy prices and forecast
Price (¢/KWh)
Authorized licensed use limited to: Penn State University. Downloaded on September 15,2023 at 04:47:30 UTC from IEEE Xplore. Restrictions apply.
R EFERENCES
200 [1] W. A. Braff, J. M. Mueller, and J. E. Trancik, “Value of storage
DQN
Optimal technologies for wind and solar energy,” Nature Climate Change, vol. 6,
MPC no. 10, p. 964, 2016.
[2] H. Pandžić, Y. Wang, T. Qiu, Y. Dvorkin, and D. S. Kirschen, “Near-
150 optimal method for siting and sizing of distributed storage in a
SoC (KWh)
transmission network,” IEEE Trans. Power Syst., vol. 30, no. 5, pp.
2288–2300, 2015.
[3] D. Gayme and U. Topcu, “Optimal power flow with large-scale storage
100
integration,” IEEE Trans. Power Syst., vol. 28, no. 2, pp. 709–717, 2013.
[4] Y. V. Makarov, P. Du, M. C. Kintner-Meyer, C. Jin, and H. F. Illian,
“Sizing energy storage to accommodate high penetration of variable
50 energy resources,” IEEE Trans. Sustain. Energy, vol. 3, no. 1, pp. 34–40,
2012.
[5] L. S. Vargas, G. Bustos-Turu, and F. Larraı́n, “Wind power curtailment
and energy storage in transmission congestion management considering
0 power plants ramp rates,” IEEE Trans. Power Syst., vol. 30, no. 5, pp.
0 10 20 30 40 2498–2506, 2015.
[6] Y. Guo, Z. Ding, Y. Fang, and D. Wu, “Cutting down electricity
Hour cost in internet data centers by using energy storage,” in Global
Telecommunications Conference (GLOBECOM 2011), 2011 IEEE. Ieee,
200 2011, pp. 1–5.
DQN [7] S. B. Peterson, J. Whitacre, and J. Apt, “The economics of using plug-in
Optimal hybrid electric vehicle battery packs for grid storage,” Journal of Power
MPC Sources, vol. 195, no. 8, pp. 2377–2384, 2010.
150 [8] R. Sioshansi, “Welfare impacts of electricity storage and the implications
of ownership structure,” The Energy Journal, pp. 173–198, 2010.
SoC (KWh)
Authorized licensed use limited to: Penn State University. Downloaded on September 15,2023 at 04:47:30 UTC from IEEE Xplore. Restrictions apply.