Professional Documents
Culture Documents
Paper Earth Learning 2
Paper Earth Learning 2
net/publication/261380458
CITATIONS READS
8 127
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Peng-Yong Kong on 29 October 2014.
Abstract—Recently, the issue of energy efficiency in wireless base station as it accounts for almost 40% of total energy
networks has attracted much research attention due to the consumption, and significant improvement in energy efficiency
growing concern on global warming and operator’s profitability. can be achieved by using advanced power amplifier. In the
We focus on energy efficiency of base stations because they
account for 80% of total energy consumed in a wireless network. second category, the efforts include minimizing the number
In this paper, we intend to reduce energy consumption of a base of base stations deployed by using a higher density of low
station by dynamically activating and deactivating the modular power micro and pico base stations [6]. These works in
resources at the base station depending on the instantaneous both the categories consider the telecommunication network
network traffic. We propose an online reinforcement learning to carry a specific fixed volume of traffic, which is generally
algorithm that will continuously adapt to the changing network
traffic in deciding which action to take to maximize energy saving. a representation of the peak hours scenario. In reality, network
As an online algorithm, the proposed scheme does not require traffic is dynamic as it changes temporally from time to time.
a separate training phase and can be deployed immediately. For instance, network traffic is usually high during office hours
Simulation results have confirmed that the proposed algorithm when many business activities are carried out, and the network
can achieve more than 50% energy saving without compromising traffic drops to a minimal level at night when most people are
network service quality which is measured in terms of user
blocking probability. asleep.
Index Terms—Green wireless networks, energy efficient base It is not energy efficient to maintain base stations to be fully
station, reinforcement learning, online Q-Learning. functional as at the peak hours while the actual network traffic
is at its minimal. In [7], dynamic network traffic characteristics
I. I NTRODUCTION are exploited in reducing energy consumption by shutting
The increasingly popular term “green wireless networks” down some base stations during a low traffic period, or
refers to technological advanced wireless networks with im- by controlling the cell size depending on the traffic load.
proved energy efficiency [1]. We are interested in improving However, these existing works assume each base station is
the energy efficiency to reduce carbon emissions and to lower a single entity which must be controlled as a whole unit.
energy cost of the network operator ([2], [3]). Carbon emission We envisage that the future green base stations will have
is an important environmental issue because increasing release their resources organized as a collection of modular units
of carbon directly into the atmosphere is perceived as the cause each with its own energy consumption profile. These modular
of global warming crisis. In addition to this environmental resource units can be radios, baseband processors, feeders,
aspect, escalating energy cost has etched into the profitability power amplifiers, air-conditioners, etc. This modular model
of wireless network operators. is similar to the system model that has been adopted by [8].
In wireless cellular network, there are various existing With the modular base station, this paper intends to exploit
efforts to improve energy efficiency at the base stations be- the dynamic nature of network traffic in reducing energy
cause they collectively account for about 80% of the total consumption by dynamically activating and deactivating the
energy consumption. These efforts can be broadly classified resource units. We propose a reinforcement learning algorithm
into two categories, namely (a) Improve energy efficiency of for the base station such that it can continuously adapt to
the base station itself, and (b) Reduce the required number the ever-changing network traffic in deciding either to turn
of base stations for each telecommunication network. In the on an additional module, to turn off an already activated
first category, the efforts involve controlling the transmission module or to maintain status quo. Reinforcement learning is a
power more optimally through parameter optimization after machine learning technique that helps an agent to decide which
taking into account coverage and capacity requirements [4], action to take in a given environment so as to maximize some
or re-designing the base stations by using equipments and notions of cumulative reward [9]. In our context, the actions
components that are more energy efficient [5]. According are what to do with the activation of modular resource units,
to [5], power amplifier is the most critical component in a the environment is about the time-varying network traffic, and
the reward is related to the amount of energy saved. To realize making an outgoing call or receiving an incoming call. When
reinforcement learning, we we have designed an online Q- the call ends, the user becomes inactive and is considered
Learning algorithm, which does not require a separate training departed from the system. Let u[n] be the number of active
phase before deployment. The rest of this paper is organized as users in a give time slot n. The value of u[n] depends on
follows. We describe the system model in Section II. In Section u[n − 1], λ[n], which is the number of newly arrived users in
III, we propose the online Q-Learning algorithm. Section IV time slot n and µ[n], which is the number of newly departed
presents and discusses the evaluation results. This paper ends users in time slot n. As such,
with concluding remarks in Section V.
u[n] = u[n − 1] + λ[n] − µ[n]. (2)
II. S YSTEM M ODEL
In our system model, λ[n] and µ[n] are random variables with
We assume a discrete time model for the wireless network time-varying average values. For example, as illustrated in
where the time domain is divided into repetitive time slots of a Fig. 2, the average value for λ[n] is much higher at 2 PM
fixed duration T as illustrated in Fig. 1. In the model, system compared to that at 2 AM. This is a clear representation of
parameters change their values only at the beginning of a time different network traffic conditions at different times of a day.
slot.
25
quality measured in terms of user blocking probability. In
24 general, the blocking probability can be kept low at around
23
0.025. Although this is not as low as 0.01 that we desire for,
the slight increase in blocking probability is probably the price
22
to pay for the very significant energy saving.
21
V. C ONCLUSION
20
We envisage the future green wireless base station to have
19
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
modular resource units that can be separately and dynamically
Discount factor activated or deactivated depending on the network traffic
load. We have proposed an online Q-Learning (reinforcement
Fig. 5. Average energy saved (KJoule) per time slot with different learning
rates α, and discount factors, γ.
learning) algorithm to perform the dynamic resource activation
in face of ever-changing network traffic. Simulation results
confirm that the proposed scheme can achieve significant
Fig. 5 shows the average energy saved per time slot at differ- energy saving as much as 80% without compromising network
ent learning rates and discount factors. The results confirm that service quality since the user blocking probability can be kept
energy saving can reach as high as 80% with proper settings. as low as 0.025.
It is desirable for the Q-Learning to have a high learning rate
and low discount factor, implying learning fast while looking R EFERENCES
not far into the future. [1] Ziaul Hasan, Hamidreza Boostanimehr and Vijay K. Bhargava, “Green
Cellular Networks: A Survey, Some Research Issues and Challenges”,
0.055 IEEE Communications Survey & Tutorials, Fourth Quarter 2011.
[2] “Energy Aware Radio and NeTwork TecHnologies (EARTH)”,
0.05
http://www.ict-earth.eu/.
[3] “Towards Real Energy-efficient Network Design (TREND)”,
Probability of user blocking
0.045
http://www.fp7-trend.eu/.
[4] Holger Claussen, Lester T. W. Ho and Florian Pivit, “Effects of Joint
Macrocell and Residential Picocell Deployment on The Network Energy
0.04
Efficiency”, IEEE PIMRC, September, 2008.
[5] Jyrki T. Louhi, “Energy Efficiency of Modern Cellular Base Stations”,
0.035
International Conference on Telecommunications Energy, pp. 475-476,
October 2007.
0.03 [6] Fred Richter, Albrecht J. Fehske and Gerhard P. Fettweis, “Energy
Learning rate = 0.1
Learning rate = 0.3
Efficiency Aspects of Base Station Deployment Strategies for Cellular
0.025 Learning rate = 0.5 Networks”, IEEE Vehicular Technology Conference, Septermber 2009.
Learning rate = 0.7 [7] Zhisheng Niu, Yiqun Wu, Jie Gong and Zexi Yang, “Cell Zooming
Learning rate = 0.9
0.02 for Cost-Efficient Green Cellular Networks”, IEEE Communications
0 5 10 15 20 25 30
Limit of action space Maganine, Vol. 48, No. 11, pp. 74-79, November 2010.
[8] Salah-Eddine Elayoubi, Louai Saker and Tijani Chahed, “Optimal Con-
Fig. 6. User blocking probability with different learning rates α, and limits trol for Base Station Sleep Mode in Energy Efficient Radio Access
of action space, x. Networks”, IEEE INFOCOM, pp. 106-110, April 2011.
[9] Richard S. Sutton and Andrew G. Barto, “Reinforcement Learning: An
Introduction”, The MIT Press, Cambridge, Massachusetts, United States
of America, 1998.
Fig. 6 and Fig. 7 show that the significant energy saving [10] Christopher J. C. H. Watkins and Peter Dayan, “Technical Notes: Q-
can be achieved without compromising the network service Learning”, Machine Learning, Vol. 8, pp. 279-292, 1992.