BESS Aided Renewable Energy Supply Using Deep Rein

This article has been accepted for publication in a future issue of this journal, but has not been
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TGCN.2021.3136363, IEEE
Transactions on Green Communications and Networking
1
BESS Aided Renewable Energy Supply using Deep

Reinforcement Learning for 5G and Beyond
Hao Yuan, Guoming Tang, Deke Guo, Kui Wu, Xun Shao, Keping Yu, Wei Wei
Abstract—The year of 2020 has witnessed the un- base station (BS), its signal coverage range is much
precedented development of 5G networks, along with shorter than that of the 4G/LTE. Consequently, the
the widespread deployment of 5G base stations (BSs). mobile operators need to deploy a large number
Nevertheless, the enormous energy consumption of BSs
of 5G BSs to tackle the problem of poor signal
and the incurred huge energy cost have become significant
concerns for the mobile operators. As the continuous coverage. This would result in an ultra-dense BS
decline of the renewable energy cost, equipping the power- deployment, especially in “hotspot” areas, as illus-
hungry BSs with renewable energy generators could be a trated in Fig. 1.
sustainable solution. In this work, we propose an energy Building and operating such large-scale BSs re-
storage aided renewable energy supply solution for the quire enormous investments and resources. Accord-
BS, which could supply clean energy to the BS and
ing to a field survey in the cities of Guangzhou and
store surplus energy for backup usage. Specifically, to
flexibly regulate the battery’s discharging/charging, we Shenzhen, China, the full-load power consumption
propose a deep reinforcement learning based regulating of a typical 5G BS is about 2 ∼ 3 times of
policy, which can adapt to the dynamical renewable energy that of a 4G one [5]. Considering the ultra-dense
generations as well as the varying power demands. Our deployment of 5G BSs, it could lead to a tenfold in-
experiments using the real-world data on renewable energy crease in energy consumption. In addition, with the
generations and power demands demonstrate that, our increasing emphasis on environmental protection,
power supply solution can achieve an cost saving ratio of
77.9%, compared to the case with traditional power grid
many governments have shut down some coal-fired
supply. power plants, resulting in severe power shortages
in some areas. In this regard, how to effectively
Index Terms—5G base stations, BESS, renewable energy
supply, deep reinforcement learning
reduce energy consumption and improve the energy
efficiency are critical problems.
Renewable energies like the solar and wind en-
I. I NTRODUCTION ergies are eco-friendly with zero carbon emissions
The 5G network is considered as a promising and become popular in more scenarios in recent
technology to significantly improve the way how years [6]. Owing to the continuing price decline in
we live [1]. Compared to the 4G/LTE, it can ensure photovoltaic (PV) module and wind turbine, the in-
users with higher bandwidth and lower latency and stallation cost of renewable energy has dramatically
thus enable various cutting-edge mobile services, decreased over the past decade, e.g., it reports a
such as the Internet of Vehicles [2], Virtual Real- 61% reduction of the solar equipment from 2010
ity [3], and Smart Medical Home [4]. Nevertheless, to 2017 [7]. Such cost reductions lead to a rapid
due to the adoption of high frequency bands by 5G payback period for the renewable energy invest-
ment, from a couple of years to several months [8].
H. Yuan and D. Guo are with the Science and Technology on The above observations indicate the great potential
Information Systems Engineering Laboratory, National University of
Defense Technology, Changsha, Hunan, China. G. Tang is with the of renewable energy on the market of fossil fuel
Peng Cheng Laboratory, Shenzhen, Guangdong, China. K. Wu is with replacement and carbon emission reduction.
the Department of Computer Science, University of Victoria, Victoria, It thus has inspired the mobile operators to utilize
BC, Canada. X. Shao is with the School of Regional Innovation and
Social Design Engineering, Kitami Institute of Technology, Kitami, renewable energy as the auxiliary power supply
Japan. Keping Yu is with the Global Information and Telecommunica- to tackle the huge power demand at 5G BSs. In
tion Institute, Waseda University, Shinjuku, Tokyo, Japan. W. Wei is some developing countries, solar power has already
with School of Computer Science and Engineering, Xi’an University
of Technology, Xi’an, China. been applied to supply the BSs, some of which
Guoming Tang and Deke Guo are the corresponding authors. occupies over 8% of the total electricity usage [9].
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TGCN.2021.3136363, IEEE
2
Low-orbit
Satellite
Wind Mobile Base
Generator Station
Macro Cell
Power Line
Solar Power Grid
Transformer
Panel
Fig. 1. A vision of the future radio access network (RAN) in 5G and beyond, which consists of macro and small cells, and also includes
the mobile and space BSs. For the purpose of green communication, all the BSs could be supplied by both the renewable energy and power
grid.
By installing the PV and wind turbine near the BSs, of power grid (i.e., fossil energy). Specifically, the
it shows that the maximum power from the solar and energy charge can be continuously reduced by the
wind generators can reach up to 8.5kW and 6.0kW, generated renewable power, and the demand charge
respectively, which could remarkably cut down the can be reshaped and flatten through strategic battery
communication energy supply from the traditional discharging/charging operations.
power gird. When designing the optimal control strategy
To maximize the utilization of renewable energy, in battery discharging/charging operations, we are
energy storage can be strategically utilized such that faced with several challenges. Firstly, the renewable
the energy can be continuously provided, as the energy generation and power demand are highly
renewable (like solar or wind) energy is intermittent varying in both spatial and temporal dimensions and
and unstable. Meanwhile, most BSs are equipped thus hard to predict. Secondly, owing to the phys-
with backup batteries to safeguard the BS’s normal ical constraints of the battery discharging/charging
functioning against power outages, making it the operations (e.g., discharge/charge efficiency), it is
natural energy storage. Besides, with the continuous complicated to design the optimal battery control-
price decline in battery storage these years [10], ling policy. Thirdly, as the battery’s capacity and
[11], combining the battery storage with renewable lifetime are limited and shortened along with the
energy generators could offer even greater cost- discharge/charge cycles, it is necessary while non-
reduction potential. Specifically, i) when the gener- trivial to trade-off between the cost of battery’s
ated renewable power is less than the power demand degradation/replacement and the gain of renewable
(e.g., during the peak hours), the battery can be energy storage.
discharged to flatten the peak power demands, and By tackling the above challenges, we make the
ii) when the generated renewable power is more than following contributions in this work:
the power demand (e.g., during the off-peak hours), • We present the BESS aided renewable energy
the battery can be charged to store the surplus supply paradigm for 5G BS operations, in
renewable energy. which the battery discharging/charging opera-
In this paper, we propose a battery energy stor- tion is modelled as an optimization problem.
age system (BESS) aided renewable energy supply The model is comprehensive by taking into
solution for the 5G network and beyond. Aiming account the practical considerations of dynamic
at energy cost reduction for mobile operators, our power demand and renewable energy gener-
solution is to maximize the utilization of the re- ation, as well as battery specifications and
newable energy and thus minimize the utilization physical constraints.
3
1400 1400
Power Consumption (watt)

1200
1300 1150 1300
1200 1100 1200
1050
1100 1100
1000
1000 950 1000
0 24 48 72 96 120 144 168 0 24 48 72 96 120 144 168 0 24 48 72 96 120 144 168
Time (Hour) Time (Hour) Time (Hour)
(a) Power demand pattern of BSs at resident (b) Power demand pattern of BSs at office (c) Power demand pattern of BSs at compre-
area. area. hensive area.
Fig. 2. Power demand patterns of BSs at different area in one week period [12].
To cope with the intermittent renewable energy

• • Power Demand of BSs at Resident Area: The
generation and dynamic BS power demand, we power demands of this type of BSs increase
propose a deep reinforcement learning (DRL) rapidly in the evening, as most people stay
based battery discharging/charging operating at home after work. Compared with those in
policy. It can update the network parameters by weekdays, the power demands keep at high-
interacting with the environment in real-time, levels in weekends.
so as to improve its decision-making efficiency. • Power Demand of BSs at Office Area: The
• We conduct extensive evaluations using real- power demands of this type of BSs keep at the
world BS deployment scenario and BS traffic high-level in the day time, when most people
load traces. The results show that the proposed work during the time. Besides, due to the fewer
DRL-based battery discharging/charging policy people work on the weekends, the weekend
can effectively utilize the renewable energy and power demands are much lower than those on
cut down the energy cost, with the cost saving the weekdays.
ratio up to 77.9%. • Power Demand of BSs at Comprehensive Area:
The rest of the paper is organized as follows. Due to the diversity of the requests, compared
In Sec. II, we introduce the background of this to the above two BSs, the power demand
paper. In Sec. III, we give the system models and patterns of this type of BSs are more stable:
formulations of the problem, and then propose the constantly keep at a high-level in the day time
BESS aided renewable energy supply solution in and evening and drop down to the valley in late
Sec. IV. We develop a DRL-based battery dis- night and early morning.
charging/charging controlling policy in Sec. V. We The first two types of power demand patterns
evaluate the proposed method by experiments with change relatively dramatically, leading to a huge
a real-world dataset in Sec. VI. We present the energy-saving potential, especially for the demand
related work in Sec. VII and conclude the paper charge, which will be discussed in the next section.
in Sec. VIII.
II. BACKGROUND B. Energy Cost of 5G BS

A. Base Station Power Demand The energy cost of the mobile operator typically
The power demand pattern of a BS is mainly makes up of two components: i) energy charge,
determined by its location and associated with the i.e., the total consumed electricity amount (in kWh)
behavior of users there. Usually the demand also throughout the entire billing cycle, which is the in-
shows a periodic pattern (e.g., with a one-day or terval of time from the end of one billing statement
one-week period). As shown in Fig. 2, in this date to the next billing statement date for electricity
paper, we mainly consider three types of BSs at the (e.g., one month), and ii) demand charge, i.e., the
areas of resident, office, and comprehensive, which peak power demand (in kW) during the billing cycle
account for nearly ninety percentage of the total period. Specifically, the demand charge is regarded
demands [12]. To be detailed, the characteristics of as a penalty due to the caused extra load burden to
these power demand patterns are as follows. the power grid.
4
For example, for a commercial data center con- Note that the feasibility of such an implementa-
suming 10 MW on peak and 6 MW on average, the tion as illustrated by Fig. 3 has been preliminar-
monthly energy charge and demand charge amount ily verified in practice. According to [15], small
to around $24,000 and $165,500, respectively [13]. integrated renewable energy generators have been
The demand charge could be up to 8x the energy provided by some commercial companies for the
charge, therefore, effectively cutting down the de- BS system, which are easily deployed in both open
mand charge could remarkably reduce the energy rural and crowded urban environments.
cost. However, there seems no practical way to
flatten the peak power demands of 5G BSs, e.g.,
B. BS Power Supply and Demand
shifting the real-time demands from mobile users
to the off-peak hours could lead to the long delay The power of each 5G BS is mainly supplied
for some of the classes of jobs [14]. by three parts: power grid, generated renewable
energy, and storage energy. In particular, i) when
III. S YSTEM M ODEL the generated renewable energy is more than the
power demand (e.g., during the off-peak hours),
In this section, we present the system models each 5G BS is only supplied by the renewable
and basic assumptions and problem formulation. For energy (i.e., off-grid) and the surplus renewable
clarity, the major notations used in this paper are energy is stored in the battery storage, ii) when the
explained in Table I. generated renewable energy is less than the power
demand (e.g., during the peak hours), each 5G BS
A. Scenario Overview is supplied by all three parts in a cooperative way.
As illustrated in Fig. 3, the proposed BESS aided In this paper, we consider a discrete time model,
renewable energy supply solution deployed at each where the entire billing cycle (e.g., one month) is
5G BS mainly includes: i) a renewable energy gen- equally spilt into T consecutive slots with length
erator, e.g., the PV panel and wind turbine, which of ∆t and denoted by T = {1, 2, · · · , T }. For an
is deployed near the 5G BS system and generates arbitrary 5G BS, the power demand during the entire
renewable energy for the system, ii) a battery stor- billing cycle can be represented by a power demand
age, which stores the surplus renewable energy and vector:
acts as the power source for the BS as needed, and d := [d(1), d(2), · · · , d(T )] (1)
iii) a controller, which can obtain the environment
where d(t) is the power demand in time slot t, which
state (i.e., the measurement data) so as to control the
can be obtained by power meter readings at each
battery discharging/charging operations through the
BS.
control signals. In addition to the standard meter, as
shown in Fig. 3, an additional generation meter is
installed for the BS power supply system to mea- C. Renewable Energy Generation
sure the renewable energy generation. Furthermore, By harvesting energy from renewable energy
with commands from the controller, the distribution resources, the BSs could be powered in an
panel takes responsibility of power switch between environmentally-friendly and cost-efficient way. In
the renewable energy and grid energy and ensures this paper, in order to make the model extensible,
continuous and stable electricity supply for the BS. we denote the renewable energy generation vector
As the essential component of the BESS aided reas:
newable energy supply solution, the controller deter- g := [g(1), g(2), · · · , g(T )] (2)
mines how efficient this paradigm is. Specifically, at
each scheduling point, the controller needs to decide In this work, we choose two typical renewable
the amount of power supply from either the battery energy as the auxiliary way of power supply, i.e.,
or the power grid. The scheduling operations should solar energy (i.e., g s (t)) and wind energy (i.e.,
be made upon the power demands and battery states g w (t)). Accordingly, for an arbitrary time slot t, the
in real-time, so that the utilization of renewable renewable energy generation vector can be repre-
energy can be enhanced and the total energy cost sented by:
can be minimized. g(t) = g s (t) + g w (t) (3)
5
Power supply module of 5G Base Station
Power
Demand
Solar
Panel
Distribution
Panel
Generation Measurement Standard
Meter Data
Meter
Controller
Control
Wind Turbine Signals Power Grid
Battery Storage
Fig. 3. An exemplified implementation of the BESS aided renewable energy supply solution for the 5G BS.
We assume that if the total generated renewable into electricity. The amount of the power generated
energy is beyond the power demand (i.e., g(t) > by the wind turbine at time slot t can be calculated
d(t)), the power is supplied in proportion to the by the following function:
renewable energy generated. The generation of both
varies during a certain period (e.g., one day) and is g w (t) = FW (W V (t), W S(t), HH(t)) (6)
affected by a some similar factors such as weather, where FW (·) is a known, non-linear function defined
temperature, wind speed, and so on. in [17]. Accordingly, the wind energy generation
1) Solar Energy Generation: Power generated by during the entire billing cycle can be represented
the solar PV system mainly depends on three fac- by a vector:
tors: global horizontal irradiance (GHI(t)), outdoor
temperature (T emp(t)), and time of day (T oD(t)). g w := [g w (1), g w (2), · · · , g w (T )] (7)
By arranging solar PV cells in series/parallel, solar
PV could harvest energy and convert it into DC to D. Battery Specification
charge the battery storage and supply the power At an arbitrary time slot t, the state of the battery
demand. The generated power by the solar PV is modeled as follows:
at time slot t can be measured by the following
function: χ(t) := hSoE(t), SoC(t), DoD(t)i (8)
g s (t) = FS (GHI(t), T emp(t), T oD(t)) (4) where the notations of SoE, SoC, and DoD repre-
sent the state of effective capacity, state of charge,
where FS (·) is a known, non-linear function de- and depth of discharge of the battery, respectively.
fined in PVLIB [16]. Accordingly, the solar energy Specifically, i) SoE indicates the current effective
generation during the entire billing cycle can be capacity of the battery, as a percentage of its initial
represented by a vector: capacity (denoted as π), ii) SoC indicates the current
g s := [g s (1), g s (2), · · · , g s (T )] (5) energy stored in the battery, as a percentage of the
current effective capacity, and iii) DoD indicates
2) Wind Energy Generation: Power generated how much energy the battery has released, as a
by the wind turbine generator fluctuates randomly percentage of the current effective capacity.
with time and mainly depends on the wind velocity For simplicity to tackling the optimization prob-
(W V (t)), weather system (W S(t)), and hub height lem, we discretize the SoC of a battery into
(HH(t)). The wind turbine generate energy typ- M equal-spaced states (e.g., M = 10, i.e.,
ically into two stages: first, it converts the wind {10%, 20%, · · · , 100%}). Accordingly, the DoD are
power into mechanical energy and then transforms also discretized (e.g., release 10% from 90%, i.e.,
6
90% → 80%). Besides, for an arbitrary time TABLE I

slot t, in order to prevent the battery from over- S UMMARY OF NOTATIONS
discharging/charging, we use SoCmax and SoCmin Notation Description
to indicate the upper and lower bounds of SoCs, d(t) power demand of 5G BS in time slot t
respectively, which is shown as follows. g(t) renewable energy generation in time slot t
b(t) discharging/charging operations in time slot t
SoCmin ≤ SoC(t) ≤ SoCmax (9) χ(t) battery state in time slot t
p(t) power supplied by the power gird in time slot t
IV. BESS A IDED R ENEWABLE E NERGY S UPPLY pmax peak power consumption supplied by power gird
π initial capacity of the battery
The battery storage is deployed at 5G BS, and can C e (t) energy charge of 5G BS in time slot t
charge by the surplus renewable energy (generated C d (t) demand charge of 5G BS in time slot t
by solar PV and wind turbine system) and discharge C u (t) investment cost in time slot t
to reshape the power demand, so as to maximize λe prices of energy charge
λd prices of demand charge
the utilization of renewable energy (or minimize the λu prices of investment cost
utilization of fossil fuel) and reduce the electricity α, β discharging and charging efficiencies
bill. R+, R− max charge and discharge rates of battery
We define the battery discharging/charging oper- s(t) environment state in time slot t
ations by a battery operation vector: a(t) action taken by the agent in time slot t
r(t) reward of the action in time slot t
b := [b(1), b(2), · · · , b(T )] (10) ψ mapping policy from environment states to actions
R(a(t), s(t)) reward function of the DQN
where b(t) is a real number variable and indicates Q, Q̃ Q-values of the main net and target net
the amount of discharging/charging operations. To θ, θ̃ parameters of the main net and target net
be detail, i) positive value indicates discharging
the power from the battery storage to the 5G BS
during time slot t, ii) negative value indicates (or charging from renewable energy to the battery
charging from the renewable energy to the battery storage), we denote the actual discharging/charging
storage, and iii) zero value indicates no discharg- operations from/to the battery by:
ing/charging operation performs.
b(t)/α , if b(t) ≤ 0
Meanwhile, the discharging/charging operations b̃(t) = (13)
β · b(t) , if b(t) > 0
is constrained by the maximum charging rate and
maximum discharging rate, denoted as R+ and R− , where α ∈ (0, 1) and β ∈ (0, 1) represent the
respectively. It means the largest power that the charging and discharging efficiencies, respectively.
battery can be recharged and supply with in a time Given the power demand of the 5G BS (i.e.,
slot, which is shown as follows. d(t)), the renewable energy generation (i.e., g(t)),
and the battery discharging/charging operations (i.e.,
− R+ ≤ bn (t) ≤ R− (11)
b(t)), we can derive the power consumption vector
Besides, the battery storage need to meet the supplied by the power grid for an arbitrary time slot
following conditions in discharging/charging opera- t by:
tions: p := [p(1), p(2), · · · , p(T )] (14)
b(t) ≤ 0, if g(t) − d(t) ≥ 0 (12a) where p(t) is denoted as:
b(t) > 0, if g(t) − d(t) < 0 (12b)
p(t) = max{0, d(t) − g(t) − b̃(t)} (15)
which represents that the battery storage can only be
charged when there exists surplus renewable energy
after supplying to the 5G BS, and means that the
battery storage cannot be simultaneously charged A. Energy Cost
and discharged at any time slot. The billing policy of the energy cost for the
Due to the power loss (e.g., AC-DC conver- mobile operators throughout the entire billing cy-
sion and battery leakage [18]) occurred during dis- cle typically make up of two components, energy
charging from battery storage to the power grid charge and demand charge, which is widely applied
7
in previous [13], [14], [19]. And we will introduce

80000
Number of Cycles
them in detail as follows.
• Energy Charge: the total consumed electricity 60000
amount (in kWh) throughout the entire billing 40000
cycle (in the unit $kWh and denoted by λe ). 20000
• Demand Charge: the peak power consumption 0
supplied by power gird (in kW) during the en- 0 20 40 60 80 100
DoD (%)
tire billing cycle (in the unit $kW and denoted
by λd ). Fig. 4. Relationship between DoD levels and battery lifetime (in
number of discharge/charge cycles) for LI battery, respectively [20].
Therefore, the incurred cost of energy charge
of the whole system in each time slot t can be
represented by:
where u(t) is defined by:
e
C (t) = λe · p(t) · ∆t (16)
1 , if using
Accordingly, the incurred cost of demand charge u(t) = (19)
0 , if not using
of the whole system in each time slot t can be
represented by:
We formulate the using cost of the renewable energy
C d (t) = λd · max 0, p(t) − pmax

(17) generator in each time slot t as:
where pmax records the peak power consumption ∆t · u(t)
during the past t − 1 time slots. For any arbitrary C u (t) = λ · (20)
L
time slot t, if p(t) − pmax > 0, pmax will be updated
to p(t) accordingly. where λ is the investment cost of a new renewable
energy generator.
B. Investment Cost We extend the model of renewable energy gener-
ator to specific system, i.e., the solar PV system and
Every usage of this equipment (solar PV, wind wind turbine system. To be detail, i) for the solar PV
turbine, and battery storage) incurs a certain re- system, we denote the lifetime, the investment cost,
duction of its lifetime, which is essential for the and investment as ls (t), C us (t), and λ , respectively,
s
investor. Therefore, it is significant to understand, ii) for the wind turbine system, we denote the
detail and quantify the various factors influencing lifetime, the using cost, and investment as lw (t),
the performance loss curves. For the accuracy of C uw (t), and λ . Accordingly, we can derive the
w
our model, we quantify the investment cost in every using cost of the solar PV system and wind turbine
time slot as follows. system by replacing the symbol in the Eq. 20.
1) Renewable Energy Generator Cost: As mod-
2) Battery Storage Degradation Cost: Every cy-
ules of a renewable energy generated system age,
cle of discharge/charge operation does some “harm”
they gradually lose some performance. In this paper,
to the battery (typically lead-acid) and reduces its
we assume the decline of the system is linear and
capacity and lifetime. Especially, a deep discharging
positively related to its using time. We denote the
severely affect its internal structure, even can perma-
lifetime of the renewable energy generator as L,
nently damage the battery (e.g., an overdischarging).
which indicates the total time it can be used. For an
The battery has to be discarded and replaced by a
arbitrary time slot t, the remaining lifetime of the
new one, when the effective capacity drops down
renewable energy generator is denoted as l(t), which
to the ”ineffective” level, denoted by SoEine in this
is constrained by 0 ≤ l(t) ≤ L. The renewable
paper.
energy generator has to be discarded and replaced
by a new one if l(t) ≤ 0. Given the remaining As illustrated in Fig. 4, each level of DoD
lifetime of the renewable energy generator at time has a corresponding number of discharge/charge
t − 1, the remaining lifetime at time t is updated by: cycles, thus, we can formulate the battery stor-
age degradation cost by the relationship between
l(t) = l(t − 1) − ∆t · u(t) (18) both. Given a state of battery at time slot t, i.e.,
8
hSoE(t), SoC(t), DoD(t)i, the SoE decrease of the 1) Uncertainty of Renewable Energy: Renewable
battery during this time slot can be measured by: energy generation is affected by multiple factors
( such as outdoor temperature and wind velocity. It is
0 , if b(t) ≤ 0 hard to accurately forecast renewable energy gen-
∆SoE(t) = 1−SoEine
h(DoD(t−1)+∆DoD(t))
, if b(t) > 0 eration (i.e., g(t)) and make the optimal discharg-
(21) ing/charging operations (i.e., b(t)) of the battery
where h(·) maps from an input DoD level to the to- storage without accurate information in advance, as
tal number of discharge/charge cycles (exemplified the unpredictable and intermittent nature of these
in Fig. 4), and ∆DoD(t) gives the increase of DoD factors. Therefore, we need to propose a method to
and can be calculated by: tackle the problem of the uncertainty of renewable
b(t)∆t energy generation.
∆DoD(t) = (22) 2) Dynamic of Power Demand: In our modeled
π
problem, we assume the power demand (i.e., p(t))
With the above expression of SoE decrease in is known in advance and thus can essentially opti-
each time slot t, we can then formulate the degra- mize in an offline way. However, such assumptions
dation cost of the battery storage at each time slot are unrealistic in practice. In fact, traditional of-
t as: fline optimization methods (e.g., dynamic program-
C ub (t) = λb · ∆SoE(t) (23) ming [21], [22]) are hard to find the global optimal
where λb is a coefficient converting the battery solution, as the power demand can be obtained only
degradation to a monetary cost, with the unit of when the workload arrives at the 5G BS. Thus,
“$/SoE decrease”. an online method to deal with the dynamic power
To sum up, the total investment cost in each time demands (i.e., d(t)), and make optimal discharg-
slot t can be calculated as: ing/charging operations (i.e., b(t)), is in great need.
3) High Computation Complexity: The optimiza-
u us up ub
C (t) = C (t) + C (t) + C (t) (24) tion problem in Eq. 26 has embedded NP-hard sub-
problems. Firstly, in every time slot t, the controller
C. Optimization Formulation and Difficulty Analy- needs to search the action space (mainly determined
sis by M ), so as to find the the optimal discharg-
ing/charging operation (i.e., b(t)). For simplicity to
The battery discharging/charging operations is
solving the optimization problem, in this paper, we
controlled by the controller. Given the state (i.e.,
discretize the SoC of battery in to M equal-spaced
χ(t)) of the battery storage in time slot t − 1, the
states, however, in real scenario, the state of the
state in time slot t can be updated by:
 battery is continous, which leads to an enormous
 SoE(t) = SoE(t − 1) − ∆SoE(t) searching space. Secondly, during the entire billing
χ(t) ← SoC(t) = SoC(t − 1) − b(t)∆t/π cycle (i.e., T ), it is challenging for the controller to
DoD(t) = DoD(t − 1) + ∆DoD(t) continuously make the optimal discharging/charging

(25) operation.
For the entire billing cycle T , we need to find To tackle the above three challenges, we propose
the optimal battery discharging/charging controlling an online discharging/charging operation control-
policy to solve the optimization problem, so as to ling method based on deep reinforcement learning
minimize the total electricity bill during the entire (DRL) in the following section.
billing cycle, which is defined as follows.
V. A DRL- BASED BATTERY O PERATION
T
X A PPROACH
C e (t) + C d (t) + C u (t)

min (26a)
b(t)
t=1
Recent breakthrough of deep reinforcement learn-
s.t. (9), (11), (12), and (25), ∀t ∈ T (26b) ing (DRL) [23] provides a promising technique for
enabling effective experience-driven control, which
When solving the above optimization problems, exploit the past experience (e.g., historical battery
however, we are faced with the following three discharging/charging operations) for better decision-
challenges. making by adapting to current state of environment.
9
We consider DRL is particularly suitable for online Policy: The battery discharging/charging con-
•
discharging/charging operations because: i), it is trolling policy ψ(s(t)) : S → A defines the
capable of handling a high-dimensional state space mapping relationship from the state space to
(such as in AlphaGo [24]), which is more ad- the action space, where S and A represent the
vantageous over traditional Reinforcement Learning state space and the action space, respectively.
(RL) [25], and ii) it is able to deal with highly Specifically, the controlling policy can be rep-
dynamic time-variant environments such as time- resented by set of a(t) = ψ(s(t)), which maps
varying power demand and renewable energy gener- the state of the environment to the action at
ation. Next, we will introduce the basic components time slot t.
and concepts of DRL and the proposed DRL-based • Reward: After interacting with the environ-
battery discharging/charging controlling policy in ment, the agent will receive a reward r(t) (cal-
detail. culated by the reward function R(s(t), a(t))),
which indicates the effect of the action in
this episode, so as to update the controlling
A. Components & Concepts policy. The objective of the agent is to find a
policy ψ to maximize the total reward through
A typical DRL framework consists of five key continuous interacting with the environment.
components: agent, state, action, policy, and re- The design of the reward function significantly
ward. The concept and design of each component affect the performance of the DRL-based algo-
in our DRL-based battery discharging/charging con- rithm, and we will introduce its detail in the
trolling policy is explained as follows. next subsection.
• Agent: The role of the agent is to make de- To sum up, at each episode, the agent observes the
cisions in every episode by interacting with state s(t), takes an action a(t) generated by the pol-
the environment. Specifically, at the beginning icy ψ, and receives a reward r(t) calculated by the
of each time slot, it determines the discharg- reward function R(s(t), a(t)). The objective of the
ing/charging operations (i.e., b(t)) according proposed DRL-based battery discharging/charging
the current state (e.g., d(t), g(t) and χ(t)) of controlling policy is to take the optimal action in
the environment. The objective is to find an every episode so as to maximize the total reward.
optimal battery discharging/charging control-
ling policy to minimize the total electricity bill
B. Reward Function Design
during the entire billing cycle.
• State: At each episode, the agent first observes At the end of each time slot, the agent evaluates
the state of the current environment to take ac- the performance of the action using a reward func-
tion. In order to take the optimal action at each tion, which transforms the performance statistics to
episode, the current state should cover as much a numerical utility value. For an arbitrary time t, the
information as possible. In this paper, we define agent observes the state s(t), takes the action a(t)
the state vector of the current environment as and adopts the following reward function to access
s(t) = [d(t), g(t), χ(t), pmax ], which concludes the performance of the controlling action:
the current information of the power demand, R(s(t), a(t)) = exp V e (t) + V d (t) + V u (t) (27)

the renewable energy generation, the battery
storage and the peak power consumption. in which:
e e
• Action: After observing the state of the envi- • V (t) = −C (t), measures the reward of the
ronment, the agent will take an action accord- incremental energy charge caused by the action
ingly. In our problem, the action is to control in time slot t.
d d
the battery discharging/charging operations in • V (t) = −C (t), measures the reward of the
each time slot, i.e., b(t), specifically, i) whether incremental demand charge caused by the ac-
the battery should be discharged or charged, tion in time slot t.
u u
and ii) how much energy should be discharged • V (t) = −C (t), measures the reward of the
or charged. We denote the action taken at time investment cost caused by the action in time
t by a(t), which is equivalent to b(t). slot t.
10
Environment Controller
𝑟
Power 𝑠
Demand 𝑚𝑎𝑥! {𝑄 𝑠, 𝑎; 𝜃- }
𝑠′
Replay Target
Renewable (𝑠, 𝑎, 𝑟, 𝑠′) Net
Buffer Loss
Energy 𝜃
(𝑠, 𝑎, 𝑟, 𝑠′) (𝑠, 𝑎) Function
Generation
∇𝜃
𝑎𝑟𝑔𝑚𝑎𝑥" 𝑄(𝑠, 𝑎; 𝜃) 𝑄(𝑠, 𝑎; 𝜃)

Main
Net
Battery Storage
Fig. 5. The framework of the learning process in DQN. For simplicity, we denote s(t + 1) as s0 . After interacting with the environment,
the agent (i.e., controller) will determine the specific discharging/charging operation.
At the end of each time slot, the agent evaluates As illustrated in Fig. 5, two effective techniques
the performance of the action by the reward r(t) were introduced in [23] to improve stability: replay
calculated by the reward function R(s(t), a(t)). In buffer and target network. Specifically,
the DRL-based framework, the objective is to max-
imize the expected cumulative discounted reward: • Replay Buffer: Unlike traditional reinforce-
ment learning, DQN applies a replay buffer
∞
X to store state transition samples in the form
γ k R(s(t), a(t))

r(t) = E (28) of hs(t), a(t), r(t), s(t + 1)i collected during
k=t
learning. Every κ time steps, the DRL-based
where γ ∈ (0, 1] is the discount accumulative agent updates the DNN with a mini-batch
factor indicating the degree of emphasis of future experiences from the replay buffer by means
rewards, and the higher γ indicates a higher degree of stochastic gradient descent (SGD): θi+1 =
of emphasis on future rewards. θi + σ5θ Loss(θ), where σ is the learning
rate. The higher learning rate will lead to the
faster parameters updating speed. However, at
C. Learning Process Design the same time, the algorithm would be more
The learning process of the algorithm adopts affected by abnormal data, which is easy to di-
a deep neural network (DNN) called Deep Q- verge and difficult to converge. Compared with
Network (DQN) to derive the correlation between Q-learning (only using immediately collected
each state-action pair (s(t), a(t)) and its value func- samples), randomly sampling from the replay
tion Q(s(t), a(t)), which is the expected discounted buffer allows the DRL-based agent to break
cumulative reward. If the environment is in state the correlation between sequentially generated
s(t) and follows action a(t), the value function of samples, and learn from a more independently
the state-action (s(t), a(t)) can be represented as: and identically distributed past experiences.
Thus, the replay buffer can smooth out learning
and avoid oscillations or divergence.

Q(s(t), a(t)) = E r(t)|s(t), a(t) (29)
• Target Network: There are two neural net-
After obtaining the value of each state-action works with the same structure but different
(s(t), a(t)), the agent selects the action a(t) with parameters in DQN, the main net and the
the -greedy policy ψ, that is, randomly selects the target net. Q(s, a; θ) and Q(s, a; θ̃) represent
action with the probability of , and chooses the the current Q-value and target Q-value gen-
action with the maximum of Q(s(t), a(t)) with the erated by the main net and the target net,
probability of 1-, i.e., argmaxa(t) Q(s(t), a(t)). respectively. The DRL-based agent uses the
11
target net to estimate the target Q-value Q̃ Algorithm 1: Battery Controlling Algorithm
for training the DQN. Every τ time steps, with DRL
the target net copies the parameters from the Input: Power demand of BS d(t) and renewable
main net, whose parameters are updated in real- energy generation g(t), 1 ≤ t ≤ T
Output: Discharging/charging actions a(t),
time. After introducing the target net, the target
1≤t≤T
Q-value will remain unchanged for a period 1 Initialize replay buffer (RB) to capacity N;
time, which reduces the correlation between 2 Initialize main net Q with random weights θ;
the current Q-value and the target Q-value and 3 Initialize target net Q̃ with weights θ̃ = θ;
improves the stability of the algorithm. 4 for episode = 1 : M axLoop do
5 for t = 1 : T do
Accordingly, the DQN can be trained by the loss:
6 Get environment state s(t) ;

argmaxa Q(s(t), a(t); θ), prob.
Loss(θ) ← E (Q̃ − Q(s(t), a(t); θ))2

(30) 7 a(t) =
random action, prob. 1 −
8 Execute action a(t) and receive r(t) and
where θ is the network parameters of the main net, s(t + 1);
and Q̃ is the target Q-value and calculated by: 9 Store h(s(t), a(t), r(t), s(t + 1)i into RB;
10 Randomly sample a mini-batch of
Q̃ ← r(t)+γmaxa(t+1) Q(s(t+1), a(t+1); θ̃) (31) experience hs(i), a(i), r(i), s(i + 1)i from
RB by every κ steps;
where θ̃ is the network parameters of the target net 11 Q̃ =

and it updates every τ time slots by coping from r(t), terminates at step t + 1
the main net. r(t) + γmaxa(t+1) {Q(s(t + 1), a(t + 1); θ̃)}, else
12 Perform SGD on (Q̃ − Q(s, a; θ))2 w.r.t. θ;
To sum up, the learning process is depicted by the
13 Set Q̃ = Q by every τ steps;
pseudo-code in Alg. 1. The controller first initializes 14 end
the replay buffer and the parameters (i.e., θ and 15 end
θ̃) of the main net and target net, respectively
(Line 1-3 in Alg. 1). After obtaining the value
of each state-action (s(t), a(t)), the agent selects
VI. P ERFORMANCE E VALUATION
the action a(t) with the -greedy policy ψ, and
then performs the action a(t) and interacts with the We evaluate the performance of the proposed
environment (Line 6-7 in Alg. 1). Next, the agent DRL-based battery discharging/charging controlling
will receive the reward r(t) and observe the next policy through extensive numerical analysis.
state s(t + 1) of the environment, meanwhile store
the state hs(t), a(t), r(t), s(t + 1)i into the RB (Line
8-9 in Alg. 1). Every κ time steps, the agent updates A. Experiment Setup
the DNN by Eq. 30 with a mini-batch experience 1) BS and Power Consumption Data: In order
from the replay buffer by means of stochastic gra- to show the performance of the proposed method,
dient descent (SGD). The target net will copy the we mainly consider the 5G BS deployed at the
parameters of the main net by every τ time steps three areas, i.e., resident area, office area, and com-
(Line 10-13 in Alg. 1). During the learning process, prehensive area, whose power consumption within
we set the learning rate σ is 0.001, the in -greedy one-week period are illustrated in Fig. 2, and we
method is 0.9, the discount accumulative factor γ assume the power consumption of the same type
is 0.9, and the step parameters τ and κ are both BSs in different cities (e.g., Beijing, Shanghai and
2000. For the whole battery discharging/charging Guangzhou) is the same. For simplicity, we denote
scheduling process, the algorithm has an overall the BS deployed at the areas of resident, office,
computational
Pn complexity of O(Cconv · T ), where and comprehensive as type I, type II, and type
i i
Cconv = i=1 Cin Cout represents the sum of the III, respectively. We will apply the BESS aided
product of the input channel (neurons) and the renewable energy supply solution to different types
output channel (neurons) of i-th linear layer, leading of BSs in different cities under different weather
to the faster convergence speed compared to other conditions and evaluate its performance through
DRL algorithm. massive simulation experiment.
12
5
Clear Day 5 High-wind Day
Wind Turbine output power (kW)

4 Partial Cloudy Day Middle-wind Day
Cloudy Day
PV output power (kW)
4 Low-wind Day
3 3
2 2
1 1
0 0
0 2 4 6 8 10 12 14 16 18 20 22 24 0 2 4 6 8 10 12 14 16 18 20 22 24
Time (Hour) Time (Hour)
(a) The solar PV output power patterns under different weather (b) The wind turbine output power patterns under different weather
conditions. conditions.
Fig. 6. (a) The solar PV output power patterns under different weather conditions (i.e., GHI(t), T emp(t), and T oD(t)) in one day period.
(b) The wind turbine output power patterns under different weather conditions (i.e., W V (t), W S(t), and HH(t)) in one day period.
2) Renewable Energy Generation Data: In TABLE II

Sec. III-C, we introduce the factors that impact PARAMETER S ETTINGS
the generation of renewable energy. For simplicity, Parameter Setting
we divide the weather conditions into three types. billing cycle window W one month (30 days)
Accordingly, the output power pattern of the solar 1
energy charge price λe US$0.049/kW h
Billing
PV and wind turbine could be divided into three Policy
1
demand charge price λd US$16.08/kW
2
types. Specifically, for the solar PV, the weather battery cost λb US$271/kWh
conditions are divided into the clear day, partial discharge efficiency α 85%
cloudy day, and cloudy day; for the wind turbine, Battery charge efficiency β 99.9%
Config. max charge rate R+ 16 MW
the weather conditions are divided into the high
max discharge rate R− 8 MW
wind velocity, middle wind velocity, and low wind
power rating g s 4950 W
velocity. The output power patterns of the solar PV Solar
PV price λs US$3950
and wind turbine under different weather conditions lifetime Ls 25 years
are illustrated in Fig. 6 and the time slot ∆t is 15 power rating g w 6000 W
minutes in our experiment. Wind
Turbine price λw US$4500
3) Equipment Parameter Settings: In this study, lifetime Lw 20 years
we use a quantity of 15 Panasonic Sc330 solar 1
Prices of energy/demand charges in 2018, referring to the contract
modules each with a power rating of 330W and in [26].
JFNH-5kW wind turbine of Qingdao Jinfan Energy 2
Battery capacity costs in 2018, referring to the data in [27].
Science and Technology Co., Ltd. For the battery
storage, we consider the mainstream lithium-ion three types of BSs (i.e., type I, type II, and type
(LI) battery on the current market. We then refer III BSs) in these cities, and the specific day of the
to [14], [26], [27] for parameter settings of electric- weather conditions in these cities during the billing
ity billing policy and battery configurations and the cycle window are shown in Fig. 7. Specifically, i)
main parameter settings are summarized in Table II. for Beijing, it has more clear days during the billing
4) Scenario Settings: As the generation of the
cycle window, ii) for Shanghai, it is in the plum rain
renewable energy is significantly affected by the
season during the billing cycle window, thus it has
weather conditions, we choose three representative
more high-wind days but less clear days, and iii)
cities in China for this paper, i.e., Beijing, Shanghai,
for Guangzhou, the cloudy days and the low-wind
and Guangzhou, which has different weather pattern
days are relatively more than other two cities.
during the billing cycle window (i.e., from 1st June
2020 to 30th June 2020). We compare and analyze
the overall energy cost (including energy charge, B. Performance under Different Weather Conditions
demand charge and investment cost), detailed con- As is shown in Fig. 6, the output power patterns
trolling results and return of investment (ROI) for of the solar PV and wind turbine are both divided
13
into three types under different weather conditions. C. Performance under Different Types of BSs
Accordingly, the weather pattern can be divided into As the different types of BSs has diverse power
nine types: clear & high-wind day, clear & middle- demand, resulting in different energy charge and
wind day, clear & low-wind day, partial cloudy & demand charge, thus the performance of deployment
high-wind day, partial cloudy & middle-wind day, of the BESS aided renewable energy supply solution
partial cloudy & low-wind day, cloudy & high-wind could be different.
day, cloudy & middle-wind day, and cloudy & low- Specifically, as is shown in Table III, the type I
wind day. BS has the highest cost saving compared to other
two types of BSs, i.e., $50.4 in Beijing, $50.7, and
The power supply patterns under different $49.5. This is because that type I BS has the biggest
weather conditions in one day period of 5G BS power demand and peak value (near 1450 watts),
at the area of resident, office, and comprehensive making it has great potential in energy-saving and
are illustrated in Fig. 8, Fig. 10, and Fig. 11 (in peak power shaving. Besides, as type II BS’s power
the appendix), respectively. As we can see, the demands are relatively small, the generated and
BESS aided renewable energy supply solution could stored renewable energy can effectively reduce the
significantly reduce the power from the grid (i.e., power grid supply. Therefore it has the highest sav-
energy charge and demand charge). Specifically, ing ratio, i.e., 76.4% in Beijing, 77.9% in Shanghai,
with the increase of radiation and wind velocity, and 75.6% in Guangzhou.
renewable energy generation increased accordingly.
It could cover most of the power demand and reduce
the power supplied from the power grid. Especially, D. ROIs of Different Scenarios
under high-wind days, the power demand could be The return of investment (ROI) is a financial
totally supplied by the renewable energy and battery metric defined by the benefit (cost saving in our
storage and need 0 power from the grid. case) divided by the total investment. It indicates the
probability of gaining a return from an investment
After calculating the power supply paradigm un- and has been widely used to evaluate the efficiency
der different weather patterns, we can derive the of an investment [29]. Typically, a bigger ROI value
electricity bill of these three types of BSs during indicates a higher investment efficiency. With the
the billing cycle in different cities (i.e., different costs of renewable energy generator and battery
weather patterns, which is illustrated in Fig. 7), and storage (given in Table II), the total investments can
the results from all the set scenarios are summarized be calculated. Accordingly, the ROIs can thus be
in Table III. derived with the results in Table III.
The ROIs of different types of BSs deployed in
Specifically, for a single 5G BS without the different cities are shown in Table IV. Specifically,
proposed power supply paradigm, the energy charge type I BS has the highest ROI, which could reach
and the demand charge are $45.6 and $22.8, re- to 5.43% in Beijing, 5.46% in Shanghai, and 5.33%
spectively. However, after utilizing the BESS aided in Guangzhou, respectively, indicating a relatively
renewable energy supply solution on the 5G BS, the high investment efficiency for the operators. This is
electricity bill is significantly reduced. Especially because that type I BS has the biggest cost saving.
in Shanghai, which has relatively more clear and As the equipment’s cost is estimated to decrease
high-wind days, the energy charge and the demand dramatically in the future [30], and the ROI could
charge can be reduced to $3.8 and $9.1, respectively. rise significantly in 5G and beyond. Additionally,
Although there exists equipment degradation during as we can see, the city with more clear and high-
the discharge/charge cycles, the investment cost wind days will obtain a bigger ROI value, thus the
still keeps at a well accepted level. The highest proposed solution is more suitable for those cities
cost saving for the BS which utilized the proposed with more sunny and windy days.
power supply paradigm in Beijing, Shanghai, and It is worth noting that, we assume the deployed
Guangzhou in one billing cycle is $50.4, $50.7 and renewable energy generator and the battery storage
$49.5, respectively. Accordingly, the saving ratio only supply power to one single 5G BS, and thus the
can be up to 74.4%, 74.8% and 73.2%, respectively. surplus renewable energy (when the battery is full)
14
Irradiance
Beijing
Wind Velocity
Irradiance
Shanghai
Wind Velocity
Irradiance
Guangzhou
Wind Velocity
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Days
Clear Day Partial Cloudy Day Cloudy Day

High-wind Day Middle-wind Day Low-wind Day
Fig. 7. The weather data is obtain from [28], and the billing cycle window is from 1st June 2020 to 30th June 2020.
1500 1500 1500

Power Grid Power Grid Power Grid
Battery Storage Battery Storage Battery Storage
Output Power (watt)
Output Power (watt)
Output Power (watt)

1000 Wind Turbine 1000 Wind Turbine 1000 Wind Turbine
Solar PV Solar PV Solar PV
500 500 500
0 0 0
0 2 4 6 8 10 12 14 16 18 20 22 24 0 2 4 6 8 10 12 14 16 18 20 22 24 0 2 4 6 8 10 12 14 16 18 20 22 24
(a) The power supply pattern under the clear (b) The power supply pattern under the clear (c) The power supply pattern under the clear
& high-wind day. & middle-wind day & low-wind day
1500 1500 1500

Output Power (watt)
Output Power (watt)

Output Power (watt)

500 500 500
0 0 0
0 2 4 6 8 10 12 14 16 18 20 22 24 0 2 4 6 8 10 12 14 16 18 20 22 24 0 2 4 6 8 10 12 14 16 18 20 22 24
(d) The power supply pattern under the partial (e) The power supply pattern under the partial (f) The power supply pattern under the partial
cloudy & high-wind day cloudy & middle-wind day cloudy & low-wind day
1500 1500 1500

Output Power (watt)
Output Power (watt)
Output Power (watt)

500 500 500
0 0 0
0 2 4 6 8 10 12 14 16 18 20 22 24 0 2 4 6 8 10 12 14 16 18 20 22 24 0 2 4 6 8 10 12 14 16 18 20 22 24
(g) The power supply pattern under the cloudy (h) The power supply pattern under the cloudy (i) The power supply pattern under the cloudy
& high-wind day & middle-wind day & low-wind day
Fig. 8. The power supply pattern of a single 5G BS at area of resident is supplied by different power supply methods under different weather
conditions in one day period.
will be discarded. This actually leads to a relatively E. Total Electricity Bill under Different Algorithms
low utilization, as given in this work. In practice,
the generated renewable energy could supply to In order to reflect the performance of the pro-
multiple BSs [5], so that the ROI and utilization posed method, we mainly compare the total elec-
of the renewable energy could be further improved. tricity bill with two baseline algorithms.
• AC: which uses actor-critic (AC) method [31]
to make the discharging/charging scheduling
15
TABLE III
R ESULTS S UMMARY (O NE B ILLING C YCLE )
BS Type Scenario Energy Charge ($) Demand Charge ($) Investment Cost ($) Cost Saving ($) Saving Ratio (%)
No deployment 44.6 23.1 0 / /
Deployment in Beijing 5.0 12.0 0.4 50.4 74.4
Type I
Deployment in Shanghai 4.7 12.0 0.4 50.7 74.8
Deployment in Guangzhou 5.9 12.0 0.3 49.5 73.2
Type II
Type III
40 40 40
DQN DQN DQN
35 AC 35 AC 35 AC
30 Max 30 Max 30 Max
Total Electricity Bill

25 25 25
20 20 20
15 15 15
10 10 10
5 5 5
0 Beijing Shanghai Guangzhou 0 Beijing Shanghai Guangzhou 0 Beijing Shanghai Guangzhou
(a) Total electricity bill of type I BS. (b) Total electricity bill of type II BS. (c) Total electricity bill of type III BS.
Fig. 9. Total electricity bills under different scheduling algorithms.
TABLE IV VII. R ELATED W ORK

PARAMETER S ETTINGS
The most involved related literatures can be di-
BS Type Beijing Shanghai Guangzhou vided into the following two categories.
Type I 5.43% 5.46% 5.33%
Type II 4.97% 5.06% 4.91%
Type III 5.11% 5.21% 5.00%
A. Base Station Energy-saving Method
With the increase of the BS power consumption,
the energy-efficient cellular networks have recently
received significant attention. One commonly used
operations, one of the DRL methods. scheme is to switching-on/off the BS according to
• Max: which satisfies the BS’s current power with the BS traffic load [32]–[34]. Intuitively, we
demand to the greatest extent. can switch on the BS when the traffic load at the
BS is high, and switch it off or turn it to sleep mode
As shown in Fig 9, because Max only meets the when the traffic load at the BS is low. In addition,
current power demand to the greatest extent and by combining with AI, the accuracy of the traffic
lacks predictability, it may consume too much power load prediction can be improved so that the corre-
in the beginning and fail to discharge continuously sponding energy-saving policies can be elaborately
when the power demand is high later. AC needs formulated. However, due to the shutdown of some
to train two networks (i.e., actor network and critic BSs, the traffic latency may increase, degrading the
network), resulting in poor stability. Compared to QoS of wireless services.
AC, DQN can complete the operation in a shorter For energy management, the peak power shav-
time, which can be converged under 300 iterations, ing is a preferable approach to overcome the un-
proving the efficiency of the proposed method. economic and inefficiency of peak power supply,
16
TABLE V
L ITERATURE S UMMARY
Literature Objective Approach Solution

[32] minimize the weight sum of energy and delay BS ON-OFF switching Decompose into two subproblems.
[33] minimize energy over a period BS ON-OFF switching Evaluate on the network impact of turning ON/OFF.
[34] maximize the energy efficiency BS sleep mode design Maximize a quasi-convex lower bound.
[35] maximize the energy efficiency peak shaving Utilize a nonlinear programming method.
[36] maximize the energy efficiency peak shaving using EV Convex optimization.
[37] achieves load balancing energy-aware resource allocation min-max algorithm.
[38] improve global resource utilization energy-aware resource allocation opportunistic scheduling algorithm.
making the load curve flatten by reducing the peak to 5G BS. Therefore, we propose the DRL-based
amount of load and shifting it to times of lower method to tackle the problem of large and con-
load [35], [36]. Specifically, peak power shaving strained state- and action-space and the uncertainty
is achieved through charging energy storage system of renewable energy generation and power demand.
when demand is low (off-peak period) and discharg-
ing energy when demand is high (on-peak period). VIII. C ONCLUSIONS
For task offloading, the total power consumption can To cope with the ever-increasing electricity bill
also be reduced by dispatching tasks to BSs with for mobile operators in 5G era, we proposed a BESS
lower loads [37], [38]. As shown in Table V, we aided energy supply solution for the 5G BS system,
have summarized the relevant literature. which models the battery discharging/charging as an
optimization problem. With our proposed solution,
B. Battery Storage Optimal Control besides the power grid, a BS can be powered by
The optimal control of energy storage has been the renewable energy and the battery storage, to cut
extensively studied in the past. Most related works down the total energy cost. To solve the problem
formulate an optimization problem that aims to under the dynamic power demands and renewable
maximize the revenue generated by the battery energy generation, we developed a DRL-based ap-
storage co-located with renewable energy generator. proach to the BESS operation that accommodates all
Babacan et al. [39] proposed a convex program factors in the modeling phase and makes decisions
to minimize the electricity bill of operators. Ratnam in real-time. To evaluate the performance of our
et al. [40] aimed to maximize the daily operational solution, we chose three cities with different weather
savings that accrue to customers while penalizing patterns for experiments. The experimental results
large voltage swings stemming from reverse power show that our power supply solution can achieve a
flow and peak load. Kazhamiaka et al. [41] studied cost saving ratio of 74.8% during the entire billing
the profitability of residential PV-storage systems cycle and improve the renewable energy utilization.
in three jurisdictions and set up an integer linear In the future, with further development of the
program to determine the battery controlling policy. communication technology (e.g., B5G/6G), there
These works assume the generations of renewable will be more mobile BSs and air BSs equipped
energy and the power demand are known in advance with more batteries, which could much rely on the
and can be optimized in an offline way. However, renewable energy. Designing an effective battery
these assumptions are unpractical in the real world. discharging/charging policy to ensure the high QoS
Several papers study the optimal control of bat- of mobile networks is also an interesting and chal-
teries under uncertainty and randomness. Guan et lenging problem for future work.
al. [42] utilized a reinforcement learning method
to minimize the homeowner’s cost by taking an ACKNOWLEDGMENT
action that yields the best expected reward. Ener- This work is partically supported by the Na-
gyBoost [18] could provide a predictable ability of tional Natural Science Foundation of China (No.
the renewable energy generation and power demand. 61802421 and No. U19B2024) and National Nat-
However, these works are only applied in the home ural Science Foundation of Hunan Province (No.
scenario, which generates a few demands compared 2019JJ30029), Telecommunications Advancement
17
Foundation (Japan) Research Grant, RIEC Nation- [18] B. Qi, M. Rashedi, and O. Ardakanian, “Energyboost: Learning-
wide Cooperative Research Projects, Research Insti- based control of home batteries,” in Proceedings of the Tenth
ACM International Conference on Future Energy Systems,
tute of Electrical Communication, Tohoku Univer- 2019, pp. 239–250.
sity, Japan, H31/B18, ROIS NII Open Collaborative [19] Y. Shi, B. Xu, B. Zhang, and D. Wang, “Leveraging energy
Research 2021 (21FA03). storage to optimize data center electricity cost in emerging
power markets,” in Proceedings of the Seventh International
Conference on Future Energy Systems, 2016, pp. 1–13.
R EFERENCES [20] B. Aksanli, T. Rosing, and E. Pettis, “Distributed battery control
for peak power shaving in datacenters,” in IEEE IGCC, 2013,
[1] J. G. Andrews, S. Buzzi, W. Choi, S. V. Hanly, A. Lozano, A. C. pp. 1–8.
Soong, and J. C. Zhang, “What will 5g be?” IEEE Journal on [21] D. K. Maly and K.-S. Kwan, “Optimal battery energy storage
selected areas in communications, vol. 32, no. 6, pp. 1065– system (bess) charge scheduling with dynamic programming,”
1082, 2014. IEE Proceedings-Science, Measurement and Technology, vol.
[2] M. Gerla, E.-K. Lee, G. Pau, and U. Lee, “Internet of vehicles: 142, no. 6, pp. 453–458, 1995.
From intelligent grid to autonomous cars and vehicular clouds,” [22] A. Oudalov, R. Cherkaoui, and A. Beguin, “Sizing and optimal
in 2014 IEEE world forum on internet of things (WF-IoT). operation of battery energy storage system for peak shaving
IEEE, 2014, pp. 241–246. application,” in 2007 IEEE Lausanne Power Tech. IEEE, 2007,
[3] G. C. Burdea and P. Coiffet, Virtual reality technology. John pp. 621–625.
Wiley & Sons, 2003. [23] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness,
[4] E. D. Muse, P. M. Barrett, S. R. Steinhubl, and E. J. Topol, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland,
“Towards a smart medical home,” The Lancet, vol. 389, no. G. Ostrovski et al., “Human-level control through deep rein-
10067, p. 358, 2017. forcement learning,” nature, vol. 518, no. 7540, pp. 529–533,
[5] G. Tang, Y. Wang, and H. Lu, “Shiftguard: Towards reliable 2015.
5g network by optimal backup power allocation,” in IEEE [24] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van
SmartGridComm, 2020, pp. 1–6. Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershel-
[6] H. Lund, “Renewable energy strategies for sustainable devel- vam, M. Lanctot et al., “Mastering the game of go with deep
opment,” Energy, vol. 32, no. 6, pp. 912–919, 2007. neural networks and tree search,” nature, vol. 529, no. 7587,
[7] R. Fu, D. Feldman, R. Margolis, M. Woodhouse, and K. Ardani, pp. 484–489, 2016.
“Us solar photovoltaic system cost benchmark: Q1 2017,” [25] R. S. Sutton and A. G. Barto, Reinforcement learning: An
EERE Publication and Product Library, Tech. Rep., 2017. introduction. MIT press, 2018.
[8] J. A. Turner, “A realizable renewable energy future,” Science, [26] Dominion Energy South Carolina, Inc., “Rate 23 - in-
vol. 285, no. 5428, pp. 687–689, 1999. dustrial power service,” https://etariff.psc.sc.gov/Organization/
[9] X. Wang, A. V. Vasilakos, M. Chen, Y. Liu, and T. T. Kwon, TariffDetail/150?OrgId=411, 2020.
“A survey of green mobile networks: Opportunities and chal- [27] US Department of Energy, “Energy storage
lenges,” Mobile Networks and Applications, vol. 17, no. 1, pp. technology and cost characterization report,”
4–20, 2012. https://www.energy.gov/eere/water/downloads/
[10] B. Nykvist and M. Nilsson, “Rapidly falling costs of battery energy-storage-technology-and-cost-characterization-report,
packs for electric vehicles,” Nature climate change, vol. 5, no. 4, 2019.
pp. 329–332, 2015. [28] China Meteorological Administration, “Historical weather fore-
[11] A. Mondal, S. Misra, and M. S. Obaidat, “Distributed home cast,” http://www.weather.com.cn/, 2020.
energy management system with storage in smart grid using
[29] Wikipedia, “Return on investment,” https://en.wikipedia.org/
game theory,” IEEE Systems Journal, vol. 11, no. 3, pp. 1857–
wiki/Return on investment, 2020.
1866, 2015.
[30] National Renewable Energy Laboratory (NREL), “Cost pro-
[12] H. Wang, F. Xu, Y. Li, P. Zhang, and D. Jin, “Understanding
jections for utility-scale battery storage,” https://www.nrel.gov/
mobile traffic patterns of large scale cellular towers in urban
docs/fy19osti/73222.pdf, 2019.
environment,” in Proceedings of the 2015 Internet Measurement
Conference, 2015, pp. 225–238. [31] V. R. Konda and J. N. Tsitsiklis, “Actor-critic algorithms,” in
[13] H. Xu and B. Li, “Reducing electricity demand charge for Advances in neural information processing systems, 2000, pp.
data centers with partial execution,” in Proceedings of the 5th 1008–1014.
international conference on Future energy systems, 2014, pp. [32] K. Son, H. Kim, Y. Yi, and B. Krishnamachari, “Base station
51–61. operation and user association mechanisms for energy-delay
[14] M. Dabbagh, B. Hamdaoui, A. Rayes, and M. Guizani, “Shav- tradeoffs in green cellular networks,” IEEE journal on selected
ing data center power demand peaks through energy storage areas in communications, vol. 29, no. 8, pp. 1525–1536, 2011.
and workload shifting control,” IEEE Transactions on Cloud [33] E. Oh, K. Son, and B. Krishnamachari, “Dynamic base station
Computing, 2017. switching-on/off strategies for green cellular networks,” IEEE
[15] L. Qingdao Jinfan Energy Science & Technology Co., “Renew- Transactions on Wireless Communications, vol. 12, no. 5, pp.
able energy generator,” http://www.jinfanenergy.cn, 2019. 2126–2136, 2013.
[16] W. F. Holmgren, R. W. Andrews, A. T. Lorenzo, and J. S. Stein, [34] C. Liu, B. Natarajan, and H. Xia, “Small cell base station
“Pvlib python 2015,” in 2015 ieee 42nd photovoltaic specialist sleep strategies for energy efficiency,” IEEE Transactions on
conference (pvsc). IEEE, 2015, pp. 1–5. Vehicular Technology, vol. 65, no. 3, pp. 1652–1661, 2015.
[17] A. Jahid, M. S. Hossain, M. K. H. Monju, M. F. Rahman, [35] E. Reihani, M. Motalleb, R. Ghorbani, and L. S. Saoud, “Load
and M. F. Hossain, “Techno-economic and energy efficiency peak shaving and power smoothing of a distribution grid with
analysis of optimal power supply solutions for green cellular high renewable energy penetration,” Renewable energy, vol. 86,
base stations,” IEEE Access, vol. 8, pp. 43 776–43 795, 2020. pp. 1372–1379, 2016.
18
[36] C. G. Tse, B. A. Maples, and F. Kreith, “The use of plug-in Deke Guo received the B.S. degree in industry
hybrid electric vehicles for peak shaving,” Journal of Energy engineering from the Beijing University of
Resources Technology, vol. 138, no. 1, 2016. Aeronautics and Astronautics, Beijing, China,
[37] Y. Bejerano and S.-J. Han, “Cell breathing techniques for in 2001, and the Ph.D. degree in management
load balancing in wireless lans,” IEEE transactions on Mobile science and engineering from the National
Computing, vol. 8, no. 6, pp. 735–749, 2009. University of Defense Technology, Changsha,
[38] A. Sang, X. Wang, M. Madihian, and R. D. Gitlin, “Coordinated China, in 2008. He is currently a Professor
load balancing, handoff/cell-site selection, and scheduling in with the College of System Engineering, Na-
multi-cell packet data systems,” in Proceedings of the 10th tional University of Defense Technology, and
annual international conference on Mobile computing and is also with the College of Intelligence and Computing, Tianjin Uni-
networking, 2004, pp. 302–314. versity. His research interests include distributed systems, software-
[39] O. Babacan, E. L. Ratnam, V. R. Disfani, and J. Kleissl, defined networking, data center networking, wireless and mobile
“Distributed energy storage system scheduling considering tariff systems, and interconnection networks. He is a senior member of
structure, energy arbitrage and solar pv penetration,” Applied the IEEE and a member of the ACM.
Energy, vol. 205, pp. 1384–1393, 2017.
[40] E. L. Ratnam, S. R. Weller, and C. M. Kellett, “An optimization-
based approach to scheduling residential battery storage with
solar pv: Assessing customer benefit,” Renewable Energy, Kui Wu received the BSc and the MSc degrees
vol. 75, pp. 123–134, 2015. in computer science from Wuhan University,
[41] F. Kazhamiaka, P. Jochem, S. Keshav, and C. Rosenberg, “On China, in 1990 and 1993, respectively, and the
the influence of jurisdiction on the profitability of residential PhD degree in computing science from the
photovoltaic-storage systems: A multi-national case study,” University of Alberta, Canada, in 2002. He
Energy Policy, vol. 109, pp. 428–440, 2017. joined the Department of Computer Science,
[42] C. Guan, Y. Wang, X. Lin, S. Nazarian, and M. Pedram, “Rein- University of Victoria, Canada, in 2002, where
forcement learning-based control of residential energy storage he is currently a Full Professor. His research
systems for electric bill minimization,” in 2015 12th Annual interests include smart grid, mobile and wire-
IEEE Consumer Communications and Networking Conference less networks, and network performance evaluation. He is a senior
(CCNC). IEEE, 2015, pp. 637–642. member of the IEEE.
Xun Shao received his Ph.D. in information

Hao Yuan received the B.S. degree in manage-
science from the Graduate School of Informa-
ment science and engineering from National
tion Science and Technology, Osaka Univer-
University of Defense Technology, Changsha,
sity, Japan, in 2013. From 2013 to 2017, he
China, in 2019. He is currently working to-
was a researcher with the National Institute of
wards the M.S. degree in the same department.
Information and Communications Technology
His main research interests include edge com-
(NICT) in Japan. Currently, he is an Assistant
puting and green communication.
Professor at the School of Regional Innovation
and Social Design Engineering, Kitami Insti-
tute of Technology, Japan. His research interests include distributed
systems and networking. He is a member of the IEEE and IEICE.
Guoming Tang is a research fellow at the
Peng Cheng Laboratory, Shenzhen, Guang-
dong, China. He received his Ph.D. degree Keping Yu received the M.E. and Ph.D. de-
in Computer Science from the University of grees from the Graduate School of Global
Victoria, Canada, in 2017, and the Bachelor’s Information and Telecommunication Studies,
and Master’s degrees from the National Uni- Waseda University, Tokyo, Japan, in 2012 and
versity of Defense Technology, China, in 2010 2016, respectively. He was a Research Asso-
and 2012, respectively. He was also a visiting ciate and a Junior Researcher with the Global
research scholar of the University of Waterloo, Information and Telecommunication Institute,
Canada, in 2016. His research mainly focuses on green computing, Waseda University, from 2015 to 2019 and
computing for green and edge computing. 2019 to 2020, respectively, where he is cur-
rently an Assistant Professor. His research interests include smart
grids, information-centric networking, the Internet of Things, artificial
intelligence, blockchain, and information security. He is a Member
of the IEEE.
19
Wei Wei received the M.S. and Ph.D. de-

grees from Xi’an Jiaotong University, Xi’an,
China, in 2005 and 2011, respectively. He
is currently an Associate Professor with the
School of Computer Science and Engineering,
Xi’an University of Technology, Xi’an. He ran
many funded research projects as a Principal
Investigator and Technical Member. He has
published over 100 research articles in interna-
tional conferences and journals. His current research interests include
the area of wireless networks, wireless sensor networks application,
image processing, mobile computing, distributed computing, and
pervasive computing, the Internet of Things, and sensor data clouds.
He is a Senior Member of the China Computer Federation. He is an
Editorial Board Member of the Future Generation Computer System,
the IEEE Access, Ad Hoc & Wireless Sensor Network, Institute of
Electronics, Information and Communication Engineers, and KSII
Transactions on Internet and Information Systems.
A PPENDIX A
R ESULTS FROM O FFICE & C OMPREHENSIVE
A REAS
1250 Power Grid 1250 Power Grid 1250 Power Grid

Output Power (watt)
Output Power (watt)
Output Power (watt)

1000 1000 1000
Wind Turbine Wind Turbine Wind Turbine
750 Solar PV 750 Solar PV 750 Solar PV
500 500 500
250 250 250
0 0 0
0 2 4 6 8 10 12 14 16 18 20 22 24 0 2 4 6 8 10 12 14 16 18 20 22 24 0 2 4 6 8 10 12 14 16 18 20 22 24

Output Power (watt)
Output Power (watt)
Output Power (watt)
1000 1000 1000

500 500 500
250 250 250
0 0 0
0 2 4 6 8 10 12 14 16 18 20 22 24 0 2 4 6 8 10 12 14 16 18 20 22 24 0 2 4 6 8 10 12 14 16 18 20 22 24

Output Power (watt)
Output Power (watt)
Output Power (watt)
1000 1000 1000

500 500 500
250 250 250
0 0 0
0 2 4 6 8 10 12 14 16 18 20 22 24 0 2 4 6 8 10 12 14 16 18 20 22 24 0 2 4 6 8 10 12 14 16 18 20 22 24
Fig. 10. The power supply pattern of a single 5G BS at area of office is supplied by different power supply methods under different weather
conditions in one day period.
20

1250 Battery Storage 1250 Battery Storage 1250 Battery Storage
Output Power (watt)
Output Power (watt)

Output Power (watt)

750 750 750
500 500 500
250 250 250
0 0 0
0 2 4 6 8 10 12 14 16 18 20 22 24 0 2 4 6 8 10 12 14 16 18 20 22 24 0 2 4 6 8 10 12 14 16 18 20 22 24

Output Power (watt)
Output Power (watt)

Output Power (watt)

750 750 750
500 500 500
250 250 250
0 0 0
0 2 4 6 8 10 12 14 16 18 20 22 24 0 2 4 6 8 10 12 14 16 18 20 22 24 0 2 4 6 8 10 12 14 16 18 20 22 24

Output Power (watt)
Output Power (watt)
Output Power (watt)

750 750 750
500 500 500
250 250 250
0 0 0
0 2 4 6 8 10 12 14 16 18 20 22 24 0 2 4 6 8 10 12 14 16 18 20 22 24 0 2 4 6 8 10 12 14 16 18 20 22 24
Fig. 11. The power supply pattern of a single 5G BS at area of comprehensive is supplied by different power supply methods under different
weather conditions in one day period.

BESS Aided Renewable Energy Supply Using Deep Rein

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BESS Aided Renewable Energy Supply Using Deep Rein

Uploaded by

Copyright:

Available Formats

This article has been accepted for publication in a future issue of this journal, but has not been

BESS Aided Renewable Energy Supply using Deep

Power Consumption (watt)

To cope with the intermittent renewable energy

II. BACKGROUND B. Energy Cost of 5G BS

Power supply module of 5G Base Station

90% → 80%). Besides, for an arbitrary time TABLE I

in previous [13], [14], [19]. And we will introduce

𝑎𝑟𝑔𝑚𝑎𝑥" 𝑄(𝑠, 𝑎; 𝜃) 𝑄(𝑠, 𝑎; 𝜃)

Wind Turbine output power (kW)

2) Renewable Energy Generation Data: In TABLE II

Clear Day Partial Cloudy Day Cloudy Day

1500 1500 1500

Output Power (watt)

Output Power (watt)

500 500 500

1500 1500 1500

Output Power (watt)

1000 Wind Turbine 1000 Wind Turbine 1000 Wind Turbine

500 500 500

1500 1500 1500

Output Power (watt)

1000 Wind Turbine 1000 Wind Turbine 1000 Wind Turbine

500 500 500

Total Electricity Bill

Fig. 9. Total electricity bills under different scheduling algorithms.

TABLE IV VII. R ELATED W ORK

Literature Objective Approach Solution

Xun Shao received his Ph.D. in information

Wei Wei received the M.S. and Ph.D. de-

1250 Power Grid 1250 Power Grid 1250 Power Grid

Output Power (watt)

Output Power (watt)

1250 Power Grid 1250 Power Grid 1250 Power Grid

Output Power (watt)

1000 1000 1000

1250 Power Grid 1250 Power Grid 1250 Power Grid

Output Power (watt)

Output Power (watt)

1000 1000 1000

Power Grid Power Grid Power Grid

Output Power (watt)

1000 Wind Turbine 1000 Wind Turbine 1000 Wind Turbine

Power Grid Power Grid Power Grid

Output Power (watt)

1000 Wind Turbine 1000 Wind Turbine 1000 Wind Turbine

Power Grid Power Grid Power Grid

Output Power (watt)

1000 Wind Turbine 1000 Wind Turbine 1000 Wind Turbine

You might also like