Professional Documents
Culture Documents
Low Complexity Online Radio Access
Low Complexity Online Radio Access
Low Complexity Online Radio Access
2, FEBRUARY 2020
Abstract—In an offload-capable Long Term Evolution (LTE)- Wireless Fidelity (WiFi) Heterogeneous Network (HetNet), we consider
the problem of maximization of the total system throughput under voice user blocking probability constraint. The optimal policy is
threshold in nature. However, computation of optimal policy requires the knowledge of the statistics of system dynamics, viz., arrival
processes of voice and data users, which may be difficult to obtain in reality. Motivated by the Post-Decision State (PDS) framework to
learn the optimal policy under unknown statistics of system dynamics, we propose, in this paper, an online Radio Access Technology
(RAT) selection algorithm using Relative Value Iteration Algorithm (RVIA). However, the convergence speed of this algorithm can be
further improved if the underlying threshold structure of the optimal policy can be exploited. To this end, we propose a novel structure-
aware online RAT selection algorithm which reduces the feasible policy space, thereby offering lesser storage and computational
complexity and faster convergence. This algorithm provides a novel framework for designing online learning algorithms for other
problems and hence is of independent interest. We prove that both the algorithms converge to the optimal policy. Simulation results
demonstrate that the proposed algorithms converge faster than a traditional scheme. Also, the proposed schemes perform better than
other benchmark algorithms under realistic network scenarios.
Index Terms—User association, LTE-WiFi offloading, constrained MDP, threshold policy, stochastic approximation
1 INTRODUCTION
1536-1233 ß 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See ht_tp://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: UNIVERSITY PUTRA MALAYSIA. Downloaded on March 10,2020 at 05:36:03 UTC from IEEE Xplore. Restrictions apply.
ROY ET AL.: LOW COMPLEXITY ONLINE RADIO ACCESS TECHNOLOGY SELECTION ALGORITHM IN LTE-WIFI HETNET 377
Authorized licensed use limited to: UNIVERSITY PUTRA MALAYSIA. Downloaded on March 10,2020 at 05:36:03 UTC from IEEE Xplore. Restrictions apply.
378 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 19, NO. 2, FEBRUARY 2020
possessing large memory requirement. Additionally, due to We further exploit the threshold nature of optimal
associated exploration mechanism, their convergence rate is policies [19] and propose a novel structure-aware
slow, especially under large state space. online association algorithm. Theoretical and simula-
tion results indicate that the knowledge of threshold
1.2 Our Contribution property helps in achieving reductions in storage
In this paper, our primary contribution is to propose online and computational complexity as well as in conver-
learning algorithms to maximize the total system through- gence time. We also prove that the proposed scheme
put subject to a constraint on the voice user blocking proba- converges to the true value function.
bility without knowing the statistics of arrival processes of The proposed structure-aware algorithm provides a
voice/data users in an LTE-WiFi HetNet. To address the novel framework that can be applied for designing
issue of slow convergence of existing learning schemes in online learning algorithms for other problems and
the literature, in this paper, we propose a Post-Decision hence is of independent interest.
State (PDS) learning algorithm which speeds up the learn- Performances of the proposed algorithms are com-
ing process by removing the action exploration. This pared with other online algorithms in the literature [1].
approach is based on reformulation of the Relative Value Performances of the proposed algorithms are com-
Iteration Algorithm (RVIA) equation and can be imple- pared with other benchmark RAT selection algo-
mented online in the Stochastic Approximation (SA) frame- rithms under realistic scenarios.
work. Furthermore, the PDS learning algorithm has a lower The rest of the paper is organized as follows. Section 2
space complexity than that of Q-learning [1] because instead describes the system model. In Section 3, the problem for-
of the state-action pair values, we need to store the value
mulation within the framework of constrained MDP is
functions associated with states alone. We also prove the
described. The system model and formulation adopted in our
convergence of the PDS learning RAT selection algorithm to
the optimality. paper is analogous to [1], [14], [19]. The developments
We have shown in [19] that the optimal policy has a described after Section 3 is our point of departure. We intro-
threshold structure, wherein after a certain threshold on the duce the notion of PDS in Section 4. Sections 5 and 6 propose
number of WiFi data users, data users are served using PDS learning algorithm and structure-aware learning
LTE. A similar property exists for the admission of voice algorithm, respectively, for RAT selection in an LTE-WiFi
users [19], where after a certain threshold on the number of HetNet. A comparison of computational and storage com-
LTE data and voice users, voice users are blocked. In this plexities of the proposed and traditional algorithms is pro-
paper, we exploit the threshold properties in [19] and pro- vided in Section 7. Simulation results are presented in
pose a structure-aware learning algorithm which, instead of Section 8, followed by conclusions in Section 9. The proofs are
the entire policy space, searches the optimal policy only available in the supplemental materials file, which is available
from the set of threshold policies. This reduces the conver- in the IEEE Computer Society Digital Library at http://doi.
gence time as well as the computational and storage com- ieeecomputersociety.org/10.1109/TMC.2019.2892983.
plexity in comparison to that of the proposed PDS learning
algorithm. We prove that the threshold vector iterates in the 2 SYSTEM MODEL
proposed structure-aware learning algorithm indeed con-
verge to the globally optimal solution. Note that the analyti- As demonstrated in Fig. 1, we consider a system consisting
cal methodologies presented in this paper to learn the of a WiFi AP inside the coverage area of an LTE BS, both
optimal threshold policy are developed independently and connected to a centralized controller using ideal lossless
can be applied to any learning problem where the optimal links. We assume that voice and data users are present at
policy is threshold in nature. any geographical point in the coverage area of the LTE BS.
Although we make some simplifying assumptions to Data users outside the common coverage area of the LTE BS
facilitate the analysis, performance of the proposed schemes and the WiFi AP always get associated with the LTE BS.
are studied in realistic conditions without the simplifying Therefore no decision is involved in this case. Hence with-
assumptions. Extensive simulations are conducted in ns-3 out loss of generality, we take into account only those data
[29], a discrete event network simulator, to characterize the users which are present in the dual coverage area of the
performance of the proposed algorithms. It is observed LTE BS and the WiFi AP. Data users can be associated with
through simulations that the proposed structure-aware either the LTE BS or the WiFi AP. We assume that in LTE,
RAT selection online learning algorithm outperforms the voice and data users are allocated resources from a common
PDS learning algorithm, providing faster convergence to resource pool. We assume that voice and data user arrivals
optimality. Performance comparison of the proposed algo- are Poisson processes with means v and d , respectively.
rithms is made with Q-learning based RAT selection Service times for voice and data users follow exponential
algorithm [1]. Furthermore, we observe that the proposed distributions with means m1v and m1 , respectively. Assump-
d
algorithms outperform other benchmark algorithms under tions on service times follow justifications in [30]. All the
realistic network scenarios like presence of channel fading, users are assumed to be stationary.
dynamic resource scheduling and user mobility. We use
3GPP recommended parameters for the simulations. Remark 1. For brevity of notation, we have considered a
Our key contributions can be summarized as follows. single LTE BS and a single WiFi AP. However, the system
model can be generalized to multiple LTE BSs and WiFi
Based on the PDS paradigm, we propose an online APs with small modifications. When the coverage areas
algorithm for RAT selection in an LTE-WiFi HetNet. of multiple APs/BSs do not overlap, we need to consider
The convergence proof for the proposed algorithm is the number of users in each AP/BS in the state space. In
provided. case of multiple overlapping APs/BSs, we can construct a
Authorized licensed use limited to: UNIVERSITY PUTRA MALAYSIA. Downloaded on March 10,2020 at 05:36:03 UTC from IEEE Xplore. Restrictions apply.
ROY ET AL.: LOW COMPLEXITY ONLINE RADIO ACCESS TECHNOLOGY SELECTION ALGORITHM IN LTE-WIFI HETNET 379
Authorized licensed use limited to: UNIVERSITY PUTRA MALAYSIA. Downloaded on March 10,2020 at 05:36:03 UTC from IEEE Xplore. Restrictions apply.
380 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 19, NO. 2, FEBRUARY 2020
Authorized licensed use limited to: UNIVERSITY PUTRA MALAYSIA. Downloaded on March 10,2020 at 05:36:03 UTC from IEEE Xplore. Restrictions apply.
ROY ET AL.: LOW COMPLEXITY ONLINE RADIO ACCESS TECHNOLOGY SELECTION ALGORITHM IN LTE-WIFI HETNET 381
ated with the PDS s^ 2 S. Thus, we have However, the scheme (9) is a primal RVIA algorithm which
solves a dynamic programming equation for a fixed value
sÞ ¼ Es0 ½V ðs0 Þ;
V^ð^ of LM b. To obtain optimality in b, b is to be iterated along
the timescale hðnÞ, as described below:
where the expectation Es0 is taken over all the pre-decision
states which are reachable from the post-decision state s^. bnþ1 ¼ L½bn þ hðnÞðBn Bmax Þ; (10)
Let the transition probability from PDS s^ to pre-decision
Authorized licensed use limited to: UNIVERSITY PUTRA MALAYSIA. Downloaded on March 10,2020 at 05:36:03 UTC from IEEE Xplore. Restrictions apply.
382 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 19, NO. 2, FEBRUARY 2020
where the projection operator L projects the value of LM to utilizing the threshold nature of optimal policy, the policy
the interval ½0; L for a large L > 0. Therefore, the primal- space can be reduced significantly. To this end, we propose
dual RVIA can be described as follows. a structure-aware online learning algorithm which searches
If the system is at PDS s^ at the nth iteration, then do the the optimal policy only from the set of threshold policies,
following: providing faster convergence than PDS learning Algorithm.
Note that independent methodologies which are developed
V^nþ1 ð^
sÞ ¼ ð1 gðgð^
s; nÞÞÞV^n ð^ s; nÞÞfmax½rðs0 ; a; bÞ
sÞ þ gðgð^ in this section can be applied to any learning problem hav-
a
ing similar structural properties.
þ V^n ðs^0 Þ V^n ðs^ Þg; (11)
V^nþ1 ðs^00 Þ ¼ V^n ðs^00 Þ 8s^00 6¼ s^;
6.1 Gradient Based Online Algorithm
bnþ1 ¼ L½bn þ hðnÞðBn Bmax Þ: (12) Let the throughput increment in WiFi when the number
of WiFi users increases from k to ðk þ 1Þ be denoted by
The assumptions on gðnÞ and hðnÞ (Equations (7) and (8)) ~W;D ðkÞ. Therefore, R~W;D ðkÞ ¼ ðk þ 1ÞRW;D ðk þ 1Þ kRW;D ðkÞ.
R
ensure that two quantities are updated on two different We assume the following.
timescales. The value of LM is updated on a slower time-
scale than that of the value function. From the slower LM Assumption 1. R~W;D ðkÞ is a non-increasing function of k.
timescale point of view, V^ð^
sÞ appears to be equilibrated in This assumption is in line with the full buffer traffic
accordance with the current LM value, and from the faster model [20].
timescale view, LM appears to be almost constant. This Summary of the structural properties of the optimal pol-
two-timescale scheme induces a “leader-follower” behavior. icy is as follows. Detailed proofs of the structural properties
The slow (fast) timescale iterate does not interfere in the can be found in [19].
convergence of the fast (slow) timescale iterate.
(1) Upto a threshold on the number of WiFi data users
Theorem 1. The schemes (11)-(12) converges to (V^; b ) “almost (say kth ) serve data users in WiFi (A3 ) and then serve
surely”(a.s.). them using LTE (A2 ) until LTE is full. When LTE is
Proof. Proof is provided in Section 10.1. in the supplemen- full, i.e., ði þ jÞ ¼ C, the optimal policy is to serve all
tal material file. u
t data users using WiFi until k ¼ W , where an incom-
ing data user is blocked.
Based on the analysis presented above, the two timescale (2) f8i; jjði þ jÞ < Cg and a voice user arrival, A4 ðA2 Þ is
PDS online learning algorithm is described in Algorithm 1. better than A2 ðA4 Þ if k < kth ðk kth Þ.
As described in the algorithm, value functions associated (3) f8i; jjði þ jÞ < Cg and a voice user arrival, if the
with different states, the LM and the number of iterations optimal action in state ði; j; kÞ is blocking, then the
are initialized at the beginning. Based on a random event optimal action in state ði þ 1; j; kÞ is blocking.
(arrival or departure of voice/data user), the system state is (4) f8i; jjði þ jÞ ¼ Cg and a voice user arrival, if the opti-
initialized. When the current PDS of the system is s^, the sys- mal action in state ði; j; kÞ is blocking, then the opti-
tem chooses an action which maximizes the R.H.S expres- mal action in state ði þ 1; j 1; kÞ is blocking.
sion in Equation (9). Based on the observed reward in the Using the first two properties, we can eliminate a number of
current PDS s^0 , V^ð^
sÞ is updated along with the LM. This suboptimal actions. In the case of data user arrival (event
process is repeated for every decision epoch. E2 ) and departure of voice and data users (events E3 ; E4
and E5 ), a single decision is involved. This may provide
improved convergence because contrary to an online
Algorithm 1. PDS Learning Algorithm algorithm without any knowledge of structural property,
1: Initialize number of iterations k 1, value function vector we no longer need to learn optimal actions in some states.
V^ð^
sÞ 0; 8^s 2 S and the LM b 0. The only event where multiple decisions are involved is the
2: while TRUE do voice user arrival (event E1 ). As stated in Property 3 and 4,
3: Determine the event (arrival/departure) in the current the value of the threshold on i, where the optimal action
decision epoch. changes to blocking, is a function of j and k. Thus, if we
4: Choose action a which maximizes the R.H.S expression in have the knowledge of the values of thresholds, we can
Equation (9). characterize the policy completely. The idea is to optimize
5: Update the value function of PDS s^ using (9). over the threshold vector (say u) using an update rule, so
6: Update the LM according to Equation (10). that the value of the threshold vector u converges to the
7: Update s^ s^0 and k k þ 1. optimal value. Before proceeding further, we determine the
8: end while dimension of u using the analysis presented below.
Using Property 1 and 2, we can identify three regions.
6 STRUCTURE-AWARE ONLINE RAT SELECTION (1) 0 k < kth : Using Property 1, we have j ¼ 0. For
each value of k, we need to know the value of thresh-
ALGORITHM old which belongs to the set f0; 1; ::; Cg.
In this section, we propose a learning algorithm exploiting (2) k ¼ kth : Using Property 1, k ¼ kth ) j 0. Thus, it
the threshold properties of the optimal policy. The PDS boils down to computing a single threshold which
learning algorithm proposed in Section 5 does not take into belongs to the set f0; 1; ::C jg (Property 3), for each
account the threshold nature of the optimal policy and value of j ð0 j < C). Also, we need to compute a
hence optimizes over the entire policy space. However, single threshold for ði þ jÞ ¼ C (Property 4).
Authorized licensed use limited to: UNIVERSITY PUTRA MALAYSIA. Downloaded on March 10,2020 at 05:36:03 UTC from IEEE Xplore. Restrictions apply.
ROY ET AL.: LOW COMPLEXITY ONLINE RADIO ACCESS TECHNOLOGY SELECTION ALGORITHM IN LTE-WIFI HETNET 383
(3) W > k > kth : Using Property 1, k > kth ) Therefore, according to the two-timescale gradient based
ði þ jÞ ¼ C. Thus, using Property 4, we need to learning framework, on the faster timescale, we have
obtain the threshold of blocking for ðW kth 1Þ V nþ1 ðs; uÞ ¼ ð1 gðgðs; nÞÞÞV n ðs; uÞ þ gðgðs; nÞÞ½rðs; a; bÞ
values of k.
þ V n ðs0 ; uÞ V n ðs ; uÞ; (16)
Therefore the dimension of u ¼ ðkth þ C þ W kth Þ =
ðC þ W Þ. V nþ1 ðs ; uÞ ¼ V^n ðs00 ; uÞ; 8s00 6¼ s:
00
Remark 3. When the state space becomes too large, then it For example, if the current state is s ¼ ði; 0; 0Þ and ði < un ð0ÞÞ,
becomes cumbersome to represent a policy since this then state transition is determined by P0 ðs0 jsÞ (accept in LTE
requires tabulating actions corresponding to each state. (A2 )), i.e., s0 ¼ ði þ 1; 0; 0Þ, else, s0 is determined by P1 ðs0 jsÞ
Due to the threshold nature of the optimal policy, the (blocking (A1 )), i.e., s0 ¼ ði; 0; 0Þ. However, value functions
representation using the threshold vector becomes com- corresponding to other states are kept unchanged.
putationally efficient. Instead of storing the optimal Note that, the above scheme works for a fixed value of
action corresponding to each state, we just need to store threshold vector u and LM b. To obtain the optimal value
ðC þ W Þ individual thresholds. of u, u is to be iterated along the slower timescale hðnÞ.
Note that, although individual components of the thresh-
We consider a class of threshold policies which can be
old take discrete values, we interpolate them to the contin-
described in terms of the threshold vector u. The main idea
uous domain to be able to apply the online update rule.
behind the online algorithm is to compute the gradient of the
Since the threshold policy is a step function (governed by
system metric, i.e., the average reward of the system, with
P0 ðs0 jsÞ up to a threshold and P1 ðs0 jsÞ, thereafter) defined
respect to u and improve the policy by updating the value of
at discrete points, Assumption 2 is not satisfied at every
u in the direction of the gradient. Therefore, following [35],
point. Therefore we approximate the threshold policy in
one needs to compute the gradient of the system metric. To
state s by a randomized policy which is a function of u
express the dependence of the parameters associated with the
(fðs; uÞ, say). We define
underlying Markov chain on u explicitly, we need to redefine
the notations. Let the transition probability associated with Pss0 ðuÞ P1 ðs0 jsÞfðs; uÞ þ P0 ðs0 jsÞð1 fðs; uÞÞ;
the Markov chain fXn g as a function of u be given by
ðiuðT Þ0:5Þ
where fðs; uðT ÞÞ ¼ 1þe
e
ðiuðT Þ0:5Þ in state s ¼ ði; j; kÞ, provides
Pss0 ðuÞ ¼ P ðXnþ1 ¼ s0 jXn ¼ s; uÞ:
a convenient approximation to the step function.
Assumption 2. We assume that for every s; s0 2 S, Pss0 ðuÞ is a Remark 4. The rationale behind the choice of this function
bounded, twice differentiable function, and the first and second is the fact that it is continuously differentiable, and the
derivative of Pss0 ðuÞ is bounded. derivative is nonzero everywhere.
Let the average reward of the Markov chain, steady state While designing an online update scheme for u, instead
stationary probability of state s, value function of state s (as of rrðun Þ (See Equation (14)), we can evaluate rPss0 ðuÞ. The
a function of u) be denoted by rðuÞ, pðs; uÞ and V ðs; uÞ, steady-state stationary probabilities inside the summation
respectively. The following proposition provides a closed- inside Equation (13) can be omitted by performing averag-
form expression for the gradient of the average reward of ing over time. We have
the system. A proof for the same can be found in [35].
Although [35] considers a generalized case where the rPss0 ðuÞ ¼ ðP1 ðs0 jsÞ P0 ðs0 jsÞÞrfðs; uÞ: (17)
reward function depends on u, in our case the same proof
holds with the exception that the gradient of the reward In the right hand side of Equation (17), we incorporate a mul-
function is zero. tiplication factor of 12 since multiplication by a constant term
does not alter the online scheme. The physical significance of
Proposition 1. Under assumptions on Pss0 ðuÞ as stated before, this operation is that at any iteration, we have state transi-
we have tions according to P0 ð:j:Þ and P1 ð:j:Þ with equal probabilities.
X X The update of u in the slower timescale hðnÞ is as follows:
rrðuÞ ¼ pðs; uÞ rPss0 ðuÞV ðs0 ; uÞ: (13)
s0 2S
unþ1 ðT Þ ¼ DT ½un ðT Þ þ hðnÞrfðs; un ðT ÞÞð1Þan V n ðs0 ; un Þ;
s2S
(18)
Hence, we can compute the value of rrðuÞ (or rPss0 ðuÞ) unþ1 ðT 0 Þ ¼ un ðT 0 Þ 8T 0 ¼
6 T;
to construct an incremental scheme similar to a stochastic
gradient algorithm for the threshold values, of the form where an is a random variable which takes values 0 and 1
with equal probabilities. When it takes the value 0, then s0 is
unþ1 ¼ un þ hðnÞrrðun Þ; (14) determined by P1 ðs0 jsÞ, otherwise by P0 ðs0 jsÞ. The averaging
property of SA then leads to the effective drift (17). Depend-
where un represents the value of threshold vector in nth iter- ing on the state visited, the T th component of the vector u is
ation on the slower timescale hðnÞ. Given a threshold u, we updated as shown in Equation (18). For example, if the cur-
assume that the state transition in state s ¼ ði; j; kÞ is given rent state is ð1; 0; 0Þ, then un ð0Þ is updated (See Equa-
by P0 ðs0 jsÞ, if i < uðT Þ and P1 ðs0 jsÞ, otherwise, where uðT Þ tion (15)), and other components are kept unchanged. The
denotes the component of u which corresponds to state s. projection operator DT is a function which ensures that the
Specifically iterates remain bounded in the interval ½0; MðT Þ], where
k þ j; ði þ jÞ 6¼ C; C ðT kth Þ; if ðkth þ CÞ T kth ;
T ¼ (15) MðT Þ ¼ (19)
C þ k; ði þ jÞ ¼ C: C; else:
Authorized licensed use limited to: UNIVERSITY PUTRA MALAYSIA. Downloaded on March 10,2020 at 05:36:03 UTC from IEEE Xplore. Restrictions apply.
384 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 19, NO. 2, FEBRUARY 2020
TABLE 3 i.e., jSj values. While updating the PDS value function, PDS
Computational and Storage Complexities of Different Algorithms learning algorithm evaluates jAj functions, resulting in a
per-iteration complexity of jAj.
Algorithm Storage complexity Computational In the case of structure-aware learning algorithm, we no lon-
complexity
ger need to store jSj value functions. Rather, by virtue of the
Q-learning [1] OðjSj jAjÞ OðjAjÞ threshold nature of optimal policy, we consider three cases.
PDS learning OðjSjÞ ¼ OðC 2 W Þ OðjAjÞ
Structure-aware OðC 2 þ CW Þ Oð1Þ (1) 0 k < kth : Since we have j ¼ 0, for each value of k,
learning we need to store ðC þ 1Þ value functions.
(2) k ¼ kth : k ¼ kth ) j 0. Thus, we need to store
ðC þ 1 jÞ value functions, for each value of
Similar to Algorithm 1, to obtain the optimal value of b, b is to j ð0 j C).
be iterated along the same timescale hðnÞ, as specified below: (3) W k > kth : k > kth ) ði þ jÞ ¼ C. Therefore, we
need to store value functions of ðC þ 1Þ states for
bnþ1 ¼ L½bn þ hðnÞðBn Bmax Þ; (20)
each value of k.
Therefore, the total number of value functions which need
Remark 5. The dynamics of LM and threshold vector are to be stored is ðC þ 1Þkth þ ðCþ1ÞðCþ2Þ
2 þ ðC þ 1ÞðW kth Þ,
not dependent on each other directly. However, both b
which is equal to ðCþ1ÞðCþ2Þ þ ðC þ 1ÞW . Note that, this is a
and u iterates depend on value functions in the faster 2
timescale. Therefore u is updated in the same timescale as considerable reduction in storage complexity in comparison
that of b, without requiring a third timescale. to the PDS learning scheme having a storage complexity of
OðC 2 W Þ. For example, when W ¼ C, the storage complexity
Theorem 2. The schemes (16), (18) and (20) converge to opti- reduces from OðC 3 Þ to OðC 2 Þ. Furthermore, feasible actions
mality a.s. corresponding to each state need not be stored separately
Proof. Proof is provided in Section 10.2. in the supplemen- since the threshold vector completely characterizes the pol-
tal material file. u
t icy. The per-iteration computational complexity of this
scheme (see Equation (16)) is Oð1Þ. This scheme also involves
Based on the analysis described above, the structure- updating a single component of the threshold vector (Equa-
aware online learning algorithm is stated in Algorithm 2. As tion (18)) with a computational complexity of Oð1Þ.
described in the algorithm, value functions associated with
different states, the LM, the threshold vector and the number
of iterations are initialized at the beginning. When the current
8 SIMULATION RESULTS
state of the system is s, the system chooses the action which is In this section, proposed PDS learning and structure-aware
given by the current value of the threshold vector. Based on learning algorithms are simulated in ns-3 to characterize
the observed reward, V ðsÞ and u is updated along with the and compare their convergence behaviors. Convergence
LM. This process is repeated for every decision epoch. rates of the proposed algorithms are compared with that of
the Q-learning, as proposed in our earlier work [1]. Simula-
Algorithm 2. Structure-Aware Learning Algorithm tion results establish that the proposed PDS learning algo-
rithm provides improved convergence than Q-learning.
1: Initialize number of iterations k 1, value function Furthermore, it is observed that the knowledge of structural
V ðsÞ 0; 8s 2 S, the LM b 0 and the threshold vector properties indeed reduces the convergence time.
u 0.
2: while TRUE do
8.1 Simulation Model and Evaluation Methodology
3: Choose action a given by the current value of threshold
vector u. The simulation setup comprises a 3GPP LTE BS and an oper-
4: Update the value function of states s using (16). ator-deployed IEEE 802.11g WiFi AP. All users are assumed
5: Update the LM according to Equation (20). to be stationary. Data users are distributed uniformly within
6: Update threshold vector according to Equation (18). 30 m radius of the WiFi AP which is approximately 50 m
7: Update s s0 and k k þ 1. away from the LTE BS. LTE and WiFi network parameters
8: end while used in simulations are chosen based on 3GPP [36], [37]
models and saturation throughput [20] IEEE 802.11g WiFi [6]
model and described in Tables 4 and 5. We consider the gen-
7 COMPARISON OF COMPLEXITIES OF LEARNING eration of CBR uplink traffic for voice and data users in LTE.
ALGORITHMS This is implemented in ns-3 using an application (similar to
In this section, we provide a comparison of storage and ON/OFF application) developed by us.
computational complexities of traditional Q-learning [1], pro- For the update of the PDS value functions, threshold
posed PDS learning and structure-aware learning algorithms. vector and LM, we consider gðnÞ ¼ ðb n cþ2Þ
1
0:6 and hðnÞ ¼ n .
10
1000
We summarize the storage and computational complexities
of these schemes in Table 3. 8.2 Convergence Analysis
Q-learning algorithm [1] stores value functions for every Figs. 3a and 3b illustrate how the Q-learning, PDS learning
state-action pair, i.e., jSj jAj values and updates the value and structure-aware learning algorithms converge with
function of one state-action pair at a time. While updating increasing number of iterations (n). We keep v ¼ d ¼ 1. It
the value function, Q-learning evaluates jAj functions. The is evident that both the proposed algorithms outperform
PDS learning algorithm (see Equation (11)) requires storing Q-learning in terms of convergence speed. Contrary to PDS
jSj PDS value functions and feasible actions in every state, learning, even after a considerable amount of iterations,
Authorized licensed use limited to: UNIVERSITY PUTRA MALAYSIA. Downloaded on March 10,2020 at 05:36:03 UTC from IEEE Xplore. Restrictions apply.
ROY ET AL.: LOW COMPLEXITY ONLINE RADIO ACCESS TECHNOLOGY SELECTION ALGORITHM IN LTE-WIFI HETNET 385
TABLE 4 TABLE 5
LTE Network Model WiFi Network Model
Fig. 3. Plot of total system throughput versus number of iterations ðnÞ for different algorithms.
Authorized licensed use limited to: UNIVERSITY PUTRA MALAYSIA. Downloaded on March 10,2020 at 05:36:03 UTC from IEEE Xplore. Restrictions apply.
386 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 19, NO. 2, FEBRUARY 2020
scenarios. We compare the total system throughput and WiFi, if any, is offloaded to LTE. While offloading, we always
voice user blocking probability performance of the pro- choose the user with the worst channel.
posed algorithms with that of the on-the-spot offloading
[38] and LTE-preferred schemes [19].
Although in the system model (See Section 2) we consider 8.4.1 Voice User Arrival Rate Variation
single resource block allocation to LTE data users, in simula- Fig. 6a depicts the blocking percentage of voice users for
tions we relax this assumption and consider proportional on-the-spot offloading, LTE-preferred and the proposed
fair scheduling for the LTE BS which dynamically assigns algorithms for varying v . Since on-the-spot offloading
resources to the users based on user bandwidth demand. blocks voice user when LTE reaches its capacity, blocking
Users randomly generate individual bandwidth demands. probability of voice users increases with v . Since PDS
However, we assume that the maximum data rate achievable learning and structure-aware learning algorithms learn in
for a single data user is 5 Mbps and the bottleneck is in the which states blocking is to be chosen as the optimal
access network. Furthermore, in the previous sections, there action, voice user blocking probability corresponding to
is no consideration of channel fading effects in LTE and these algorithms converge to the same value. Since the
WiFi. To address that, whenever we choose an action involv- proposed algorithms may block voice users even when
ing offloading of a user from one RAT to another (A4 and
the LTE system does not reach its capacity, the blocking
A5 ), the user with the worst channel is selected for offload-
probability values are marginally higher than that of
ing. For example, whenever A4 is chosen and we offload a
data user from LTE to WiFi, we always choose the data user on-the-spot offloading. Voice users may be blocked to
with the lowest Signal-to-noise Ratio (SNR). Since, in general, save LTE resources for future data user arrivals which
a user with bad channel provides bad throughput to the sys- have a higher throughput contribution to the system.
tem, the user with the worst channel is chosen for offloading. LTE-preferred scheme blocks a voice user when LTE sys-
We consider Extended Pedestrian A model [39] for fading in tem is full and there is no data user in LTE. Therefore,
LTE and Rayleigh fading for WiFi. on-the-spot offloading and LTE-preferred schemes pro-
In on-the-spot offloading [38], data users always choose vide similar blocking probability performance.
WiFi unless WiFi coverage is not present. Therefore, in our Fig. 6b illustrates the total system throughput perfor-
system model, on-the-spot offloading always associates data mance of different algorithms under varying v . With
users with WiFi until capacity is reached in WiFi. Voice users increase in v , the average number of voice users in the sys-
are associated with LTE, and when LTE reaches its capacity, tem increases while the number of WiFi data users remains
voice users are blocked. In LTE-preferred scheme [19], voice the same. Therefore, in the case of on-the-spot offloading,
and data users are associated with LTE until LTE reaches its the total system throughput increases with v . However,
capacity. When LTE reaches its capacity and a voice user since the throughput of voice users is small compared to
arrives, the voice user is blocked if there is no data user in data users, the rate of increment is very small. PDS learning
LTE. Otherwise, one existing data user is offloaded to WiFi if and structure-aware learning algorithms learn the optimal
capacity is available in WiFi. Upon the departure of an exist- policy which does significant load balancing via A4 and A5 .
ing voice or data user from LTE, an existing data user in Also, while offloading, the proposed algorithms take chan-
nel state of users into account. Thus, these algorithms out-
perform on-the-spot offloading in terms of total system
TABLE 6 throughput with performance improvement varying from
Average Per-Iteration Time of Different Algorithms 10.72 percent (for v ¼ 0:13) to 28.72 percent (for v ¼ 0:07).
With increase in v , LTE-preferred scheme starts offloading
Algorithm Average per-iteration time (ms)
data users to WiFi to accommodate incoming voice users.
Q-learning [1] 49.66 Under low WiFi load, total throughput of the system
PDS learning 40.07 increases. As WiFi load increases, the rate of increment
Structure-aware learning 32.119 decreases. However, both the proposed algorithms perform
better than LTE-preferred scheme.
Authorized licensed use limited to: UNIVERSITY PUTRA MALAYSIA. Downloaded on March 10,2020 at 05:36:03 UTC from IEEE Xplore. Restrictions apply.
ROY ET AL.: LOW COMPLEXITY ONLINE RADIO ACCESS TECHNOLOGY SELECTION ALGORITHM IN LTE-WIFI HETNET 387
P
Fig. 5. Plot of total system throughput versus sum of step sizes till nth iteration ( nk¼1 gðkÞ) for different algorithms.
Fig. 6. Plot of voice user blocking fraction and total system throughput for different algorithms. (a) Voice user blocking percentage versus v . (b) Total
system throughput versus v (d ¼ 1=20; mv ¼ 1=60, and md ¼ 1=10). (c) Voice user blocking percentage versus d . (d) Total system throughput
versus d (v ¼ 1=6; mv ¼ 1=60, and md ¼ 1=10).
8.4.2 Data User Arrival Rate Variation when LTE does not have available capacity and there is no
As observed in Fig. 6c, since in on-the-spot offloading, data user in LTE, the blocking probabilities of LTE-preferred
data and voice users are served using WiFi and LTE, respec- scheme and on-the-spot offloading are same.
tively, changes in d do not impact the blocking probability Since on-the-spot offloading associates data user with
of voice users. Performances of both PDS learning and struc- WiFi, with increase in d , the load in WiFi increases. As a
ture-aware learning algorithms are similar to that of on-the- result, as d (See Fig. 6d) increases, the effect of contention
spot offloading. Due to the presence of a constraint on the and channel fading reduces the rate of increment of
voice user blocking probability, most of the voice users are throughput. Both the proposed algorithms perform better
blocked when the LTE system reaches capacity. Therefore, than on-the-spot offloading by virtue of optimal RAT selec-
the proposed algorithms do most of the blocking of voice tion and load balancing actions which reduces the effect of
users in the same decision epochs as that of on-the-spot off- contention in WiFi. Also, while offloading, the proposed
loading. Since LTE-preferred scheme blocks voice users only algorithms take channel state of users into account.
Authorized licensed use limited to: UNIVERSITY PUTRA MALAYSIA. Downloaded on March 10,2020 at 05:36:03 UTC from IEEE Xplore. Restrictions apply.
388 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 19, NO. 2, FEBRUARY 2020
Fig. 7. (a) Total system throughput versus v (d ¼ 1=20; mv ¼ 1=60, and md ¼ 1=10). (b) Total system throughput versus d (v ¼ 1=6; mv ¼ 1=60, and
md ¼ 1=10).
Therefore, the proposed algorithms outperform on-the-spot establish that the proposed schemes outperform on-the-spot
offloading with performance improvement varying from 20 offloading and LTE-preferred schemes under realistic net-
percent (for d ¼ 0:1) to 54.6 percent (for d ¼ 0:5). As d work scenarios.
increases, LTE-preferred scheme starts offloading more data
users to WiFi. Therefore, the system throughput increases. ACKNOWLEDGMENTS
Under high d , the effect of contention is lesser than that of
on-the-spot offloading, resulting in a better performance This work has been funded by the Ministry of Electronics
than on-the-spot offloading. However, the proposed algo- and Information Technology (MeitY), Government of India,
rithms perform better than LTE-preferred scheme. as part of the "5G Research and Building Next Gen Solutions
for Indian Market" project.
8.5 Consideration of User Mobility
In this section, we evaluate how the proposed algorithms REFERENCES
perform in comparison to on-the-spot offloading and LTE-
[1] A. Roy, P. Chaporkar, and A. Karandikar, “An on-line radio
preferred scheme in the face of user mobility. In addition to access technology selection algorithm in an LTE-WiFi network,”
ns-3 simulation settings described in the last section, we in Proc. IEEE Wireless Commun. Netw. Conf., 2017, pp. 1–6.
also consider random waypoint model [40] for mobility of [2] Y. He, M. Chen, B. Ge, and M. Guizani, “On WiFi offloading in hetero-
voice and data users. As evident from Figs. 7a and 7b, geneous networks: Various incentives and trade-off strategies,” IEEE
Commun. Surveys Tuts., vol. 18, no. 4, pp. 2345–2385, Oct.–Dec. 2016.
although total system throughputs provided by different [3] Cisco, “Cisco visual networking index: Global mobile data traffic
algorithms change due to mobility, comparative perfor- forecast update, 2013–2018,” White Paper, 2014.
mance of the proposed algorithms with respect to on-the- [4] V. G. Nguyen, T. X. Do, and Y. Kim, “SDN and virtualization-
spot offloading and LTE-preferred scheme remains the based LTE mobile network architectures: A comprehensive
same. Since mobility does not have any impact on the block- survey,” Wireless Pers. Commun., vol. 86, no. 3, pp. 1401–1438, 2016.
[5] N. M. Akshatha, P. Jha, and A. Karandikar, “A centralized SDN
ing probability of the voice users, the blocking probability architecture for the 5G cellular network,” in Proc. IEEE 5G World
performances of the considered algorithms are exactly the Forum, 2018, pp. 147–152.
same as that described in Figs. 6a and 6c. [6] Wireless LAN Medium Access Control (MAC) and Physical Layer
(PHY) Specifications, IEEE Standard 802.11–2012, Part 11, 2012.
[7] 3GPP TR 37.834 v0.3.0, “Study on WLAN/3GPP Radio Inter-
9 CONCLUSIONS working,” (2013, Jun.). [Online]. Available: http://www.3gpp.
org/DynaReport/37834.htm
In this paper, we have proposed a PDS learning algorithm [8] A. Whittier, P. Kulkarni, F. Cao, and S. Armour, “Mobile data off-
which can be implemented online without the knowledge loading addressing the service quality versus resource utilisation
of statistics of arrival processes. It has been proved that the dilemma,” in Proc. IEEE 27th Annu. Int. Symp. Pers. Indoor Mobile
Radio Commun., 2016, pp. 1–6.
algorithm converges to the optimal policy. Furthermore,
[9] F. Moety, M. Ibrahim, S. Lahoud, and K. Khawam, “Distributed heuris-
another online algorithm, which exploits the threshold tic algorithms for RAT selection in wireless heterogeneous networks,”
structure of optimal policy, has been proposed. The knowl- in Proc. IEEE Wireless Commun. Netw. Conf., 2012, pp. 2220–2224.
edge of threshold structure provides improvements in [10] E. Aryafar, A. Keshavarz-Haddad, M. Wang, and M. Chiang,
computational and storage complexity and convergence “RAT selection games in HetNets,” in Proc. IEEE INFOCOM,
2013, pp. 998–1006.
time. The proposed algorithm provides a novel framework
[11] K. Lee, J. Lee, Y. Yi, I. Rhee, and S. Chong, “Mobile data offload-
that can be applied for designing online learning algorithms ing: How much can WiFi deliver?” IEEE/ACM Trans. Netw.,
for any general problem and is of independent interest. We vol. 21, no. 2, pp. 536–550, Apr. 2013.
have proved that the structure-aware learning algorithm [12] D. Suh, H. Ko, and S. Pack, “Efficiency analysis of WiFi offloading
convergences to globally optimal threshold vector. Simula- techniques,” IEEE Trans. Veh. Technol., vol. 65, no. 5, pp. 3813–
3817, May 2016.
tion results have been presented to exhibit how the PDS
[13] N. Cheng, N. Lu, N. Zhang, X. Zhang, X. S. Shen, and J. W. Mark,
paradigm and the knowledge of structural properties pro- “Opportunistic WiFi offloading in vehicular environment: A
vide improvement in convergence time over traditional game-theory approach,” IEEE Trans. Intell. Transp. Syst., vol. 17,
online association algorithms. Moreover, simulation results no. 7, pp. 1944–1955, Jul. 2016.
Authorized licensed use limited to: UNIVERSITY PUTRA MALAYSIA. Downloaded on March 10,2020 at 05:36:03 UTC from IEEE Xplore. Restrictions apply.
ROY ET AL.: LOW COMPLEXITY ONLINE RADIO ACCESS TECHNOLOGY SELECTION ALGORITHM IN LTE-WIFI HETNET 389
[14] A. Roy and A. Karandikar, “Optimal radio access technology [41] N. Salodkar, A. Bhorkar, A. Karandikar, and V. S. Borkar, “An on-
selection policy for LTE-WiFi network,” in Proc. IEEE Int. Symp. line learning algorithm for energy efficient delay constrained
Model. Optimization Mobile Ad Hoc Wireless Netw., 2015, pp. 291–298. scheduling over a fading channel,” IEEE J. Sel. Areas Commun.,
[15] G. S. Kasbekar, P. Nuggehalli, and J. Kuri, “Online client-AP asso- vol. 26, no. 4, pp. 732–742, May 2008.
ciation in WLANs,” in Proc. IEEE Int. Symp. Model. Optimization [42] V. R. Konda and V. S. Borkar, “Actor-critic-type learning algo-
Mobile Ad Hoc Wireless Netw., 2006, pp. 1–8. rithms for Markov decision processes,” SIAM J. Control Optimiza-
[16] K. Khawam, S. Lahoud, M. Ibrahim, M. Yassin, S. Martin, M. El tion, vol. 38, no. 1, pp. 94–123, 1999.
Helou, and F. Moety, “Radio access technology selection in het- [43] J. Abounadi, D. Bertsekas, and V. S. Borkar, “Learning algorithms
erogeneous networks,” Phys. Commun., vol. 18, pp. 125–139, 2016. for Markov decision processes with average cost,” SIAM J. Control
[17] S. Barmpounakis, A. Kaloxylos, P. Spapis, and N. Alonistioti, Optimization, vol. 40, no. 3, pp. 681–698, 2001.
“Context-aware, user-driven, network-controlled RAT selection [44] V. S. Borkar and S. P. Meyn, “The ODE method for convergence of
for 5G networks,” Comput. Netw., vol. 113, pp. 124–147, 2017. stochastic approximation and reinforcement learning,” SIAM J.
[18] B. H. Jung, N. O. Song, and D. K. Sung, “A network-assisted user- Control Optimization, vol. 38, no. 2, pp. 447–469, 2000.
centric WiFi-offloading model for maximizing per-user through- [45] V. S. Borkar, “An actor-critic algorithm for constrained Markov deci-
put in a heterogeneous network,” IEEE Trans. Veh. Technol., sion processes,” Syst. Control Lett., vol. 54, no. 3, pp. 207–213, 2005.
vol. 63, no. 4, pp. 1940–1945, May 2014.
[19] A. Roy, P. Chaporkar, and A. Karandikar, “Optimal radio access Arghyadip Roy received the BE degree from
technology selection algorithm for LTE-WiFi network,” IEEE Jadavpur University, Kolkata, India, in 2010, and
Trans. Veh. Technol., vol. 67, no. 7, pp. 6446–6460, Jul. 2018. the MTech degree from IIT Kharagpur, India, in
[20] G. Bianchi, “Performance analysis of the IEEE 802.11 distributed 2012. He is currently a research scholar with
coordination function,” IEEE J. Sel. Areas Commun., vol. 18, no. 3, the Department of Electrical Engineering, IIT
pp. 535–547, Mar. 2000. Bombay, India. He previously worked with the
[21] M. El Helou, M. Ibrahim, S. Lahoud, K. Khawam, D. Mezher, and Samsung R&D Institute-Bangalore, India. His
B. Cousin, “A network-assisted approach for RAT selection in het- research interests include resource allocation,
erogeneous cellular networks,” IEEE J. Sel. Areas Commun., vol. 33, optimization, and control of stochastic systems.
no. 6, pp. 1055–1067, Jun. 2015. He is a student member of the IEEE.
[22] E. Khloussy, X. Gelabert, and Y. Jiang, “Investigation on MDP-
based radio access technology selection in heterogeneous wireless Vivek Borkar received the BTech degree in elec-
networks,” Comput. Netw., vol. 91, pp. 57–67, 2015. trical engineering from IIT Bombay, in 1976, the
[23] R. Li, Z. Zhao, J. Zheng, C. Mei, Y. Cai, and H. Zhang, “The learning MS degree in systems and control from Case
and prediction of application-level traffic data in cellular networks,” Western Reserve University, in 1977, and the PhD
IEEE Trans. Wireless Commun., vol. 16, no. 6, pp. 3899–3912, Jun. 2017. degree in electrical engineering and computer sci-
[24] K. Kumar, A. Gupta, R. Shah, A. Karandikar, and P. Chaporkar, “On ence from the University of California at Berkeley,
analyzing Indian cellular traffic characteristics for energy efficient net- in 1980. He has held positions at the TIFR Center
work operation,” in Proc. IEEE 21st Nat. Conf. Commun., 2015, pp. 1–6. for Applicable Mathematics and the Indian Institute
[25] U. Paul, A. P. Subramanian, M. M. Buddhikot, and S. R. Das, of Science in Bengaluru, and Tata Institute of Fun-
“Understanding traffic dynamics in cellular data networks,” in damental Research, Mumbai, before joining IIT
Proc. IEEE INFOCOM, 2011, pp. 882–890. Bombay, Mumbai, as institute chair professor of
[26] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduc- electrical engineering in Aug. 2011. He has held visiting positions with the
tion. Cambridge, MA, USA: MIT Press, 1998. University of Twente, MIT, the University of Maryland at College Park, and
[27] K. Adachi, M. Li, P. H. Tan, Y. Zhou, and S. Sun, “Q-Learning the University of California at Berkeley. He is a fellow of the IEEE, American
based intelligent traffic steering in heterogeneous network,” in Math. Society, TWAS, and the science and engineering academies in
Proc. IEEE Veh. Technol. Conf. Spring, 2016, pp. 1–5. India. His research interests include stochastic optimization and control,
[28] S. Anbalagan, D. Kumar, D. Ghosal, G. Raja, and V. Muthuval- covering theory, algorithms, and applications.
liammai, “SDN-assisted learning approach for data offloading in
5G HetNets,” Mobile Netw. Appl., vol. 22, no. 4, pp. 771–782, 2017. Prasanna Chaporkar received the MS degree
[29] ns-3 simulator. (2018, Dec.). [Online]. Available: http://code. from the Faculty of Engineering, Indian Institute of
nsnam.org/ns-3-dev/ Science, Bangalore, India, in 2000, and the PhD
[30] T. Bonald and J. W. Roberts, “Internet and the Erlang formula,” ACM degree from the University of Pennsylvania,
SIGCOMM Comput. Commun. Rev., vol. 42, no. 1, pp. 23–30, 2012. Philadelphia, Pennsylvania, in 2006. He was a
[31] E. Altman, Constrained Markov Decision Processes. Boca Raton, FL, ERCIM post-doctoral fellow with ENS, Paris,
USA: CRC Press, 1999. France, and NTNU, Trondheim, Norway. Currently,
[32] M. L. Puterman, Markov Decision Processes: Discrete Stochastic he is an associate professor with the Indian Insti-
Dynamic Programming. Hoboken, NJ, USA: Wiley, 2014. tute of Technology, Mumbai. His research interests
[33] F. J. Beutler and K. W. Ross, “Optimal policies for controlled Mar- include resource allocation, stochastic control,
kov chains with a constraint,” J. Math. Anal. Appl., vol. 112, no. 1, queueing theory, and distributed systems and
pp. 236–252, 1985. algorithms. He is a member of the IEEE.
[34] V. S. Borkar, Stochastic Approximation: A Dynamical Systems View-
point. Cambridge, U.K.: Cambridge Univ. Press, 2008.
[35] P. Marbach and J. N. Tsitsiklis, “Simulation-based optimization of Abhay Karandikar is currently director of IIT
Markov reward processes,” IEEE Trans. Autom. Control, vol. 46, Kanpur (on leave from IIT Bombay). He is also a
no. 2, pp. 191–209, Feb. 2001. member (part-time) of the Telecom Regulatory
[36] 3GPP TR 36.814 v9.0.0, “Further advancements for E-UTRA physi- Authority of India (TRAI). In IIT Bombay, he
cal layer aspects,” (2010, Mar.). [Online]. Available: http:// served as institute chair professor with the Elec-
www.3gpp.org/dynareport/36814.htm trical Engineering Department, dean (Faculty
[37] 3GPP TR 36.839 v11.1.0, “Mobility enhancements in heteroge- Affairs) from 2017 to 2018, and head of the Elec-
neous networks,” (2012, Dec.). [Online]. Available: http:// trical Engineering Department from 2012 to 2015.
www.3gpp.org/dynareport/36839.htm He is the founding member of the Telecom
[38] F. Mehmeti and T. Spyropoulos, “Performance analysis of “on- Standards Development Society, India (TSDSI),
the-spot” mobile data offloading,” in Proc. IEEE Global Commun. India’s standards body for telecom. He was the
Conf., 2013, pp. 1577–1583. chairman of TSDSI from 2016 to 2018. His research interests include
[39] 3GPP TS 36.104 V10.2.0, “Base Station (BS) Radio Transmission resource allocation in wireless networks, software defined networking,
and Reception,” (2011, Apr.). [Online]. Available: http:// frugal 5G, and rural broadband. He is a member of the IEEE.
www.3gpp.org/dynareport/36104.htm
[40] D. B. Johnson and D. A. Maltz, “Dynamic source routing in ad hoc " For more information on this or any other computing topic,
wireless networks,” in Mobile Computing. Berlin, Germany: please visit our Digital Library at www.computer.org/csdl.
Springer, 1996, pp. 153–181.
Authorized licensed use limited to: UNIVERSITY PUTRA MALAYSIA. Downloaded on March 10,2020 at 05:36:03 UTC from IEEE Xplore. Restrictions apply.