Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Available online at www.sciencedirect.

com
Available online at www.sciencedirect.com
Available online at www.sciencedirect.com

ScienceDirect
Procedia Computer Science 00 (2019) 000–000
Procedia
Procedia Computer
Computer Science
Science 17500 (2019)
(2020) 000–000
315–324 www.elsevier.com/locate/procedia
www.elsevier.com/locate/procedia

The 15th International Conference on Future Networks and Communications (FNC)


The 15th International Conference on2020,
August 9-12, Future Networks
Leuven, and Communications (FNC)
Belgium
August 9-12, 2020, Leuven, Belgium
Deep
Deep Reinforcement
Reinforcement Learning
Learning Based
Based Resource
Resource Allocation
Allocation For
For
Narrowband Cognitive Radio-IoT Systems
Narrowband Cognitive Radio-IoT Systems
K.F Mutebaa,∗, K Djouania,b , T.O Olwala
a F’SATI
K.F Mutebaa,∗, K Djouania,b , T.O Olwala
Tshwane University of Technology (TUT), Staatsartillerie road, Pretoria 0001 South Africa
a F’SATI Tshwane University of Technology
b LISSI,University of Paris(TUT), Staatsartillerie
Est Creteil road, Pretoria
(UPEC), Creteil, France 0001 South Africa
b LISSI,University of Paris Est Creteil (UPEC), Creteil, France

Abstract
Abstract
Narrowband Internet-of-Things (NB-IoT) is a low-power wide area (LPWA) technology developed by the Third-generation Part-
Narrowband
nership Project Internet-of-Things (NB-IoT)
(3GPP) with objective is a low-power
to enable a wide rangewideofarea
IoT (LPWA) technology
devices, low developed
cost device and lowbypower
the Third-generation
in the 5G era. AsPart-
the
nership Project (3GPP) with objective to enable a wide range of IoT devices, low cost device
number of IoT devices continue to increase, the demand for the spectrum allocation grows proportionately. The and low power in the 5G era.
NB-IoT As the
spectrum
number
allocationof isIoT devices
limited continue
from 180 KHzto increase,
to 200 KHz the and
demandis notfor the spectrum
sufficient allocationthe
to accomodate grows proportionately.
exponential Thesize
surge in the NB-IoT
of thespectrum
NB-IoT
allocation
devices.Thus,is limited from
the need to 180 KHz toallocate
efficiently 200 KHz theand is not sufficient
available spectrum toto accomodate the exponential
the NB-IoT devices. surge in
Furthermore, in the size of the
an attempt NB-IoT
to enhance
devices.Thus, the need to efficiently allocate the available spectrum to the NB-IoT devices. Furthermore, in
the coverage in NB-IoT network, recent relevant studies (3GPP release 13) have introduced the concept of repeated transmission. an attempt to enhance
the coverage
Since repeated in transmissions
NB-IoT network, recent
ensure relevant
coverage studies (3GPP
enhancement but release 13) havewastage,
cause spectrum introduced
thethe concept resource
traditional of repeated transmission.
allocation is not
Since repeated
appropriate for transmissions
NB-IoT network. ensure coveragebyenhancement
Motivated this researchbut gapcause spectrum
we propose wastage, the traditional
a NB-Cognitive Radio-IoTresource
(NB-CR-IoT)allocation is not
technique
appropriate for NB-IoT network. Motivated by this research gap we propose a NB-Cognitive Radio-IoT (NB-CR-IoT)
which integrates Cognitive Radio (CR) techniques into the operation of the conventional NB-IoT. The resulting architecture seeks technique
which integrates
to foster an efficientCognitive Radio (CR)
opportunistic techniques
spectrum access into the operation
in distributed of the conventional
heterogeneous NB-IoT.
networks.We Theformulate
further resulting architecture
the resourceseeks
allo-
to foster
cation an efficient
problem opportunistic
as a deep Q-learningspectrum access
solved by in distributed
reducing the number heterogeneous networks.We further
of repeated transmissions formulate
and allocating the IoT
more resource allo-
devices in
cation problem as a deep Q-learning solved by reducing the number of repeated transmissions and
NB-IoT network. The results in this contribution indicate that DQN outperforms the traditional Q-learning algorithm. allocating more IoT devices in
NB-IoT network. The results in this contribution indicate that DQN outperforms the traditional Q-learning algorithm.
c 2020

© 2020 The
The Authors.
Authors. Published
Published by
by Elsevier
Elsevier B.V.
B.V.
c 2020

This The Authors. Published by Elsevier B.V.
This is an open access article under the CC BY-NC-ND license
is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
(http://creativecommons.org/licenses/by-nc-nd/4.0/)
This is an open access article under
Peer-review under responsibility
responsibility of the
ofthe CC BY-NC-ND
theConference license
ConferenceProgram
Program (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Chairs.
Chair.
Peer-review under responsibility of the Conference Program Chairs.
Keywords: Narrowband-Cognitive IoT, LPWA, spectrum allocation, 3GPP, Q-learning, Deep Q-learning.
Keywords: Narrowband-Cognitive IoT, LPWA, spectrum allocation, 3GPP, Q-learning, Deep Q-learning.

1. Introduction
1. Introduction
The 21st century is witnessing an era where the connectivity of computer and cellular devices has surpassed the
The 21st century
conventional is witnessing
limits towards what isan era where
called the connectivity
today Internet of Thingsof(IoT).
computer and cellular
Therefore, devices
the number has surpassed
of connected the
devices
conventional limits towards what is called today Internet of Things (IoT). Therefore, the number of connected
will explosively increase. It has been predicted that 5.5 million of devices will be connected through the network devices
will explosively
everyday[1]. Thisincrease. It has been
fast expansion predicted
will have thatimpact
a major 5.5 million
on theofmanner
deviceswe
will becommunicate,
live, connected through
work the
and network
interact
everyday[1]. This fast expansion will have a major impact on the manner we live, communicate, work and interact

∗ Corresponding author. Tel.: +27734945736


∗ Corresponding
E-mail address:author. Tel.: +27734945736
franckdestjean@gmail.com
E-mail address: franckdestjean@gmail.com
1877-0509  c 2020 The Authors. Published by Elsevier B.V.
1877-0509
This c 2020

is an open Thearticle
access Authors.
underPublished by Elsevier B.V.
the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
1877-0509
Peer-review ©under
2020
This is an open Thearticle
access Authors.
under
responsibility Published
of by Elsevier
the Conference
the CC BY-NC-ND B.V.
license
Program (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Chairs.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Peer-review under responsibility of the Conference Program Chairs.
Peer-review under responsibility of the Conference Program Chairs.
10.1016/j.procs.2020.07.046
316 K.F. Muteba et al. / Procedia Computer Science 175 (2020) 315–324
2 KF Muteba et al / Procedia Computer Science 00 (2019) 000–000

with things. Based on the above statement, how to achieve massive devices connection over the network becomes an
issue for IoT.
In order to meet the requirement of IoT, the 3GPP introduced Narrowband Internet of Things in release 13 with
objective to create cheaper device compatible with LTE band with long range and long battery lifetime [3]. Many
industries such as Huawei and Ericsson have taken in the standardization of NB-IoT system and considered it to be a
big step toward the evolution of 5G IoT networks [4]. NB-IoT reuses LTE patterns, including channel modulation and
coding schemes, numerologies and higher layer communication protocols. In a downlink transmission, orthogonal
frequency-division multiple access (OFDMA) is used with 180 kHz of bandwidth, while in the uplink transmission,
single-carrier frequency-division multiple access (SC-FDMA) is used. This warrants the fast deployment of NB-IoT
product for existing LTE equipment and software vendors [5].With 180 kHz of minimum spectrum requirement,
narrowband IoT (NB-IoT) can be deployed in three possible operational modes, i.e, (i) as standalone, (ii) in the guard
carriers of existing LTE/UMTS spectrum, (iii) within an existing LTE carrier (inband) by replacing one or more PRBs
[3].
Compared to other Low Power Wide Area Network (LPWAN) technologies such as LoRaWan and Sigfox which
operate in unlicensed band, NB-IoT operates in licensed band meaning it can operates in cellular bands. Hence, NB-
IoT is more reliable and offers better quality of service (QoS) [7]. However, since the concept of repeated transmission
has been introduced as an important solution to improve the coverage of NB-IoT systems, which cause spectrum
wastage and reduces the system throughput. Hence, the interest of bringing new technique for efficient spectrum
allocation.
Some studies [7], [11],[12] used reinforcement learning. Though they addressed the traditional resource allocation
defect, they are not appropriate for high-dimensional state space and variable selection. Therefore, motivated by this
gap, this paper proposes a NB-Cognitive Radio-IoT (NB-CR-IoT) resource allocation solution with the objective of
reusing vacant channels in licensed spectrum and deep learning strategy (DQN) to select high-dimensional state space
variable.
The remainder of this paper is organized as follows: Section 2 describes the proposed system model of the fully
distributed NB-IoT setup with the relevant NB-IoT numerology. Section 3 discusses the proposed model Section 4,
presents the simulations results. Finally, Section 5 concludes the paper.

2. Related work

Regarding optimization in NB-IoT network, in [4] the authors proposed an algorithm with objective to minimise
the number of consumed subframes in a way that each device can transmit data to the base station. The proposed
algorithm effectively reduce the number of consumed sub-frames by 50% comparatively to the baseline. The authors
in [8] proposed a link adaptation algorithm to improve the time and resource allocation for NB-IoT devices. Here, an
algorithm which selects a repetition number and modulation coding scheme was designed for each NB-IoT device
using a inner and outer loop to guarantee transmission reliability and improve the throughput of NB-IoT system.
The proposed method outperforms the straight forward method and the repetition-dominated method by saving
more active time and resource consumption.In [9] the authors proposed an iterative algorithm based on cooperative
approach in uplink transmission, here a time slot data transmission and power allocation scheme for optimizing the
overall channel gain is presented. A review done by [10] considered scheduling techniques for resource allocation
in NB-IoT devices.The authors proposed an algorithm which uses pre-divided resource allocation to accomodate the
user with the objective of maximizing the spectrum utilization. However, all the above solutions have not considered
the time-varied heterogeneous of NB-IoT devices, they considered a steady behaviour of NB-IoT network.
Regarding reinforcement learning (RL) some works have been done, in [7] the authors proposed a reinforcement
learning-based algorithm for dynamic spectrum access with the objective of improving coverage while simultaneously
decreasing the number of repetitions - hence reducing on energy consumption. The proposed algorithm efficiently
selects the best channels with highest probability to be available, with the best coverage distance and lowest number
of repetitions.In[11] the authors used Q-learning algorithm to tackle the congestion problem of Machine Type
Communication (MTC) in LTE, an Access Class Barring (ACB) scheme was proposed to optimize the probability of
Random Access Channel (RACH) procedure. In [12] the authors proposed a mechanism using Q-learning algorithm
to limit the number of resource allocated to each traffic class regarding the QoS class identifier.
K.F. Muteba et al. / Procedia Computer Science 175 (2020) 315–324 317
KF Muteba et al / Procedia Computer Science 00 (2019) 000–000 3

Although the proposed Reinforcement learning methods outperform the traditional resource allocation,they can not
solve multi-group optimization problem in NB-IoT network, because of their incapacity to deal with high-dimensional
state space and variable selection.Hence, there is a need for a novel approach, therefore we propose a NB-CR-IoT
technique using a DQN a Deep Reinforcement Learning DQN to overcome such drawbacks.

Regarding Cognitive radio a few studies have been done considering the shortage of the wireless spectrum. The
authors in [13] emphasized on the basic understanding of IoT which has as objectives to empower general object to
learn, to think and understand the environment by effectively integrating the operational process of human cognition
into the design of IoT and thereby presents the cognitive processing techniques that lie at the art of IoT. In [14] the
authors highlight on cognitive radio (CR) functionalities specially spectrum sensing and discuss how CR can be
useful for the realization of a better IoT paradigm. The authors in [15] give a background on CR and IoT, survey
novel approaches and then discuss research challenges related to the use of CR technology for IoT. In [16] the authors
proposed a solution to mitigate to the need of internet congestion connection by using CR-enabled device to device.
The results obtained indicate that there is a 30% reduction in congestion.

3. Proposed model

In this section, the traditional LTE-A network is considered where NB-IoT devices transmit their data on their
allocated channels. However, due to the increased number of NB-IoT use cases, the spectrum allocated for NB-IoT
systems might not be sufficient to service the large number of devices. Therefore, the system model illustrated in Fig.
1 below is considered for opportunistic NB-IoT operation on other licensed channels.

Fig. 1: A Fully Distributed Learning Architecture of a NB-IoT Resource Allocation Scenario.

It is also assumed that the network topology changes are recorded and updated in a Q-table for comprehensive cogni-
tion [17]. A set of K, 1, 2, · · · , K NB-IoT access points (APs) are uniformly distributed under coverage of a LTE base
station. Under the coverage of each NB-IoT AP, IoT devices are randomly distributed and seeking UL opportunistic
transmission over an LTE licensed channel. Due to the opportunistic nature of the system illustrated in Fig. 1, it is
assumed that the NB-IoT devices possess cognitive capabilities which result in NB-CR-IoT system.

3.1. Primary Network Assumptions

It is assumed that the primary network has a set M ∈ {1, 2, · · · , M} APs, each one having N ∈ {1, 2, · · · , N}
channels. Under the coverage of each AP, there is a set K ∈ 1, 2, · · · , K of NB-CR-IoT devices uniformly distributed
according to a Poisson process seeking uplink opportunistic transmission. Due to the opportunistic nature of the sys-
tem, a primary channel can be sensed to be idle, which is represented by H0 or found to be busy, which is represented
318 K.F. Muteba et al. / Procedia Computer Science 175 (2020) 315–324
4 KF Muteba et al / Procedia Computer Science 00 (2019) 000–000

by H1 . Therefore, the probabilities of finding a channel either in its idle state or busy state is denoted as discussed in
[18]:

(λ p − µ p ) µp
P(H0 ) = , and P(H1 ) = , (1)
λp λp

respectively. Here, λ p is the PU arrival process which is modeled using Poisson distribution and µ p represents the
channel holding time, which denotes the expected time that the PU is present in its channel and is modeled as an
exponential distribution.

3.2. Secondary Network Assumptions

The secondary network is assumed to use the preemptive resume identical (PRI-M/M/n) queueing model, with
exponential arrival process and exponential service process and n channels discovered through spectrum sensing, with
a first-come-first-served (FCFS) service policy. The NB-CR-IoT sense the wireless spectrum to ascertain their status
and upon discovering an idle channel, their arrival rate is according to a Poisson process with parameter λ s , with
a service rate µ s . In order to overcome the contention for the same channel among the multiple NB-IoT devices, a
distributed channel selection scheme is adopted [19], which uses the same seed to generate a pseudo-random channel
selecting sequence.

3.3. Average System Throughput

Average system throughput is the amount of data transmitted in a given time period, i.e., during a period T . So, if
Ctot denotes the total number of channels discovered to be idle within T , then the number of channels successfully
utilized by the NB-CR-IoT devices is denoted as C s , which represents the function of their satisfaction degree in terms
of the overall spectral efficiency measure. Therefore, when the licensed system is not active in its channel, ignoring
overheads, the system capacity can be defined as follows:

ηk = C s log2 (1 + γk ), (2)

where γk is the signal-to-interference-plus-noise ratio (SINR) of the transmitted signal measured at the NB-CR-IoT
device, defined as follows:

Pk |hk,k |2
γk = 2  , (3)
P0 |h0,k | + j∈K\{k} P j |h j,k |2 + σ2k

Interference from other APs

where Pk is the transmission power of the kth AP, P j denotes the transmission power of the jth AP, excluding the kth ,
h j,k is the corresponding channel gain, and σ2k denotes the variance of the additive white Gaussian noise.

3.4. Proposed DQN Solution

In this subsection, the resource allocation problem is solved using a DQN because it is an efficient method to analyse
the problem and produce optimal results. Thus, in order to understand how a DQN operates, one needs to be familiar
with RL and MDPs already proposed in [12]. For the sake of brevity, the DQN procedure was developed from the
basis of Q-learning which is a classic and commonly used RL algorithm. At this point we can describe the DQN
K.F. Muteba et al. / Procedia Computer Science 175 (2020) 315–324 319
KF Muteba et al / Procedia Computer Science 00 (2019) 000–000 5

procedure for the resource allocation. Thus, a DQN is one of the most recent and most popular algorithms in RL,
developed through deep reinforcement learning (DRL). The basic operation of a DQN is illustrated in Fig. 2 below.

Fig. 2: Flowchart Depicting the Operating Principle of an DQN in Resource Allocation.

With reference from Fig. 2 above, the Q-function update guidelines using the time difference technique are as follows:

 
Qt+1 (st , a ) ← Qt (st , at ) + α Q̂(st , at ) − Qt (st , at ) , (4)

where

Q̂(st , at ) = rt + γ max Q(s , a ). (5)


 a

However, since this approach involves a large number of learning steps to reach optimality, it becomes impractical in
situations with large state-action spaces. this approach is impractical in that it involves a very large number of learning
steps. In this paper, recent advancements in DRL a DQN with experience replay which learns faster even in large
state-action spaces are employed. Instead of using a Q-table to update the Q-values, a deep neural network (DNN)
is used as an efficient nonlinear approximator of the Q action-value function. Thus, in order to improve the learning
performance, at each time-step, the learning agent uses the experience tuple ei (t) = (si (t), ai (t), ri (t + 1), si (t + 1)) from
the environment which has been stored in the replay buffer D [22].
Moreover, in DQN each agent utilizes two separate neural networks (NNs) as Q-network approximators: one as an
action-value function approximator Q(s, a; θi ) and the other as a target action-value function approximator Q̂(s, a; θi− ),
where θi and θi− denote the parameters (i.e., weights) for each NN. Thus, at each learning step, the parameters θi
of each agent’s action-value function are updated through mini-batch of random samples of entries sampled from
the replay memory buffer D. The parameters θi− of the action-value function are updated through a gradient descent
(SGD) backward propagation algorithm using the error function. The DQN algorithm is an algorithm that combines
Q-learning with NNs, which uses a deep neural network (DNN) as the Q-value network and the mean-square error
(MSE) as the loss function in Q as follows:

 2 
L(θ) = E Q̂(s, a; θ) − Q(s, a; θ) , (6)

where θ is the target network parameter,

Q̂(s, a; θ) = rt + γ max Q(s , a ; θ). (7)


 a
320 K.F. Muteba et al. / Procedia Computer Science 175 (2020) 315–324
6 KF Muteba et al / Procedia Computer Science 00 (2019) 000–000

This means that the loss function in (6) can be determined based on the second term of (4). To solve (4) and obtain
the optimal Q value, one needs to find its gradient with respect to the network parameter θ by training the NN using
a SGD approach detailed in [23]. The loss function is used to update the network parameters using a SGD algorithm
through back-propagation which is derived according to chain rule as follows:

  ∂Q(s, a; θ) 
∂L(θ)
= E Q̂(s, a; θ− ) − Q(s, a; θ) . (8)
∂θ ∂θ

The action selection procedure used in this paper is a Softmax action selection strategy defined in [22], otherwise an
-greedy approach, which means that an action is randomly chosen from the action set A with probability , otherwise
an action with the highest action-value is chosen. The experience replay component of the DQN utilizes stochastic
prioritization, which generates the probability of choosing a certain action for replay and is used to avoid a greedy
prioritization of services. The network weights are also updated through a backward propagation as follows:

 
θ ← θt + α Q̂(s, a; θ) − Q(st , at ; θ) θt Q(st , at ; θ), (9)

where Q̂ denotes the action values estimated by the second network, which is updated less fre-
quently for stability purposes. The procedure for energy minimization is outlined in Algorithm 1 below.
Algorithm 1: Deep Q-learning Algorithm for Resource Allocation in NB-CIoT Systems
Input: K, αt , γt , T , S , , θ
Output: π, r(s, a; θ)
01:01: Initialize Q(s, a) = 0, ∀s ∈ S and a ∈ A,
γt = 0.98,  = 0.1
02: For each NB-CIoT device do
03: Randomly generate an initial state s1 = st
04: Initialize replay memory buffer D
05: For each time-slot t do
06: Monitor the state-space S
 
07: Select action at =  arg maxa Q(st , a|θ)
08: Independently execute action and receive
reward rk and observe the next state s and
with probability  reject signal
09: If reject is true then
10: Rerun DQN to get new action a
11: If at  a then
12: Replace at with a
13: End If
14: End If
15: Set s = (st , at , st+1 ) and store transition
(st , at , rt , s ) in D
16: Sample mini-batch of transitions from D
17: Update network using (6) and network
weights using θ ← L(θ) using (9)
18: End For
19: Update target action-value function Q̂ ← Q
20: End For
K.F. Muteba et al. / Procedia Computer Science 175 (2020) 315–324 321
KF Muteba et al / Procedia Computer Science 00 (2019) 000–000 7

4. Simulation Results

In this section, we present the simulation parameters, the performance evaluation results and discussion of the pro-
posed algorithms. The MAT LABT M simulation tool is used to evaluate the effectiveness of the proposed algorithms.
It is assumed that the channel gains remain constant over the duration of each time-slot. Fig. 3 illustrates the perfor-
mance evaluation of belief that based on the spectrum sensing results the vacant channel will still be vacant in the next
time-slot. It is assumed that NB-CR-IoT devices use enhanced blindly combined energy detection [24] to sense the
licensed spectrum and detect vacant channels. The probability that the primary channel is idle at the beginning of the
next time-slot is given by Bayes’ rule [25].
The belief is plotted against increasing values of the probability of detection where at the minimum value of the
probability of detection, the belief that the next time-slot will see the channel vacant is the same as the current belief.
However, as the value of the detection probability increases, the belief that the channel will still be free in the next
time-slot also increases. For a low value of belief, the increase is quadratic but as the current belief is increased the
belief of a vacant channel assumes an almost linear behavior. The reason for this behavior can be argued from the
perspective of balancing new information against new information, which is an application of partially observable
MDPs (POMDPs). POMDPs are a combination of an MDP and hidden Markov models to model system dynamics
that connect unobservant system states to observations. In POMDPs, the agent can perform actions that affect the
system and cause the system state to change, with the goal of maximizing a reward that depends on the sequence of
system state and the agent’s actions. The agent uses these observations to form a belief (i.e., a belief state) of the state
in which the system currently is. This belief state is thus expressed as a probability distribution over the states and the
solution of the POMDP is a policy prescribing which action is optimal for each belief state.

Fig. 3: Illustrates the belief that the primary channel will still be vacant in the next time-slot vs probability of detection.

The NB-CR-IoT satisfaction degree is evaluated as a function of their increasing number in the network as illustrated
in Fig. 4 below.
322
8 K.F. Muteba
KF Muteba et al. / Procedia
et al / Procedia ComputerComputer
Science Science 175
00 (2019) (2020) 315–324
000–000

Fig. 4: Illustration of NB-CR-IoT Satisfaction Degree.

The NB-CR-IoT satisfaction degree illustrated in Fig. 4 above has been evaluated on a scale of 1 to 5, where 1
represents poor satisfaction while 5 is the maximum satisfaction. The exploration rate has been set to 0.1 for both
compared algorithms and the proposed DQN is seen to outperform the traditional Q-learning algorithm with a margin
at the maximum number of NB-CR-IoT devices. The satisfaction is used in this chapter as a quantitative measure of
how the NB-CR-IoT devices perceive their network experience. According to the ITU-T recommendation P.800.1, for
both compared algorithms, the quality perception is “good” until the number of NB-CR-IoT devices reach 8, where it
becomes “fair” until the maximum number of devices in the system.
The evaluation of the achievable bit rates for both algorithms also plotted as a function of an increasing number of
NB-CR-IoT devices is performed in Fig. 5 below.

Fig. 5: Illustration of NB-CR-IoT Achievable Bit Rates.

The achievable bit rates for NB-CR-IoT devices is illustrated in Fig. 5 above for both algorithms compared with
each other. The achievable bit rates using the proposed DQN technique is better by 7.41% , owing to the nature
of DQNs which is an obvious improvement over the Q-learning technique in terms of state space exploration. The
summary of the basic simulation parameters for traditional signal processing optimization is tabulated in Table 1.
K.F. Muteba et al. / Procedia Computer Science 175 (2020) 315–324 323
KF Muteba et al / Procedia Computer Science 00 (2019) 000–000 9

Simulation parameter Value


Carrier frequency, fc 2.1GHz
System Bandwidth, ξ 180kHz
Primary network radius 1000m
Packet arrival process Poisson
Number of NB-IoT devices, K 50
Number of licensed channels 100
Number of primary systems 10
Number of channel states 3
Primary system SINR threshold, γ0(pu) 10dB
Noise spectral density, σ -174dBm
Symbol rate f s = 1/T s 500e3 symbols/sec
Minimum packet arrival rate, λt 2 packets/time-slot

Table 1: Simulation Parameters.

The summary of the hyperparameters used in the proposed Q-network is tabulated in Table 2.

Hyperparameter Value
Hidden layer neurons 256
Hidden layer activation function Logistic sigmoid
Output later activation function Gibbs Softmax function
Exploration rate,  0.1
Minibatch size 32
Replay memory size 10000
Target Q-network update frequency 1000
Finite iteration horizon, T 2000
Learning rate, α 0.5
Discount factor, γ 0.98

Table 2: Simulation Hyperparameters.

5. Conclusion

In this paper, a NB-CR-IoT system is proposed where NB-IoT systems were enabled to reuse the licensed spectrum
band in an opportunistic manner using cognitive radio techniques. Thus, the first part of this paper was to integrate
CR techniques into the operation of NB-IoT for opportunistic spectrum access in distributed heterogeneous networks.
Since in opportunistic spectrum access the safety of primary systems need to be managed through spectrum sensing,
the length of the vacant channel is reduced, thus efficiency measures were used to derive the effective NB-CR-IoT
capacity. The second part was to infuse some intelligence into the designed NB-CR-IoT system through the use
of AI strategies by formulating the resource allocation problem using MDPs which was solved using conventional
Q-learning. The results of this contribution indicate that NB-CR-IoT devices cognitively select available channels
by using the interaction between Q-learning and the MDP.Since the conventional Q-learning technique suffers from
over-estimation, a DQN strategy is proposed to overcome such drawback. DQNs operated using transfer learning
algorithm whereby two networks are updated from time to time to minimise the over-estimation of conventional
Q-learning technique. The results indicate that DQN outperforms the traditional Q-learning algorithm.
324 K.F. Muteba et al. / Procedia Computer Science 175 (2020) 315–324
10 KF Muteba et al / Procedia Computer Science 00 (2019) 000–000

6. Acknowledgment

This work is supported in part by the National Research Foundation of South Africa (Grant Number: 90604).
Opinions, findings and conclusions or recommendations expressed in any publication generated by the NRF supported
research are those of the author(s) alone, and the NRF accepts no liability whatsoever in this regard.

References

[1] R. Konduru, and M. R. Bharamagoudra, “Challenges and Solutions of Interoperability on IoT: How Far Have We Come in Resolving the IoT
Interoperability Issues,” International Conference On Smart Technologies For Smart Nation (SmartTechCon), Bangalore, pp. 572 - 576, 2017.
[2] K.F Muteba, K. Djouani, and T.O Olwal, “A comparative Survey Study on LPWA IoT Technologies: Design, considerations, challenges and
solutions,”Procedia Computer Science vol.155, pp. 636 - 641, 2019.
[3] 3GPP TR 45.820 v13.1.0. “Cellular System Support for Ultra-Low Complexity and Low Throughput Internet of Things,” Available: [Online],
November 2015.
[4] Y-J. Yu and J-K. Wang “Uplink Resource Allocation for Narrowband Internet of Things (NB-IoT) Cellular Networks,” APSIPA Annual Summit
and Conference, 2018.
[5] A. Hoglund et al., “Overview of 3GPP Release 14 Enhanced NB-IoT,” IEEE Network, vol. 31, no. 6, pp. 16 - 22, 2017.
[6] K.F Muteba, K.Djouani, and T.O Olwal, “Challenges and Solutions of Spectrum Allocation in NB-IoT Technology,”SATNAC,Ballito,
KwaZulu-Natal, South Africa. Sept, 2019.
[7] M. Chafii, F. Bader, and J. Palicot, “Enhancing Coverage in Narrowband-IoT Using Machine Learning,” IEEE Wireless Communications and
Networking Conference (WCNC), Barcelona, pp. 1 - 6, 2018.
[8] C. Yu, L. Yu, Y. Wu, Y. He, and Q. Lu, “Uplink Scheduling and Link Adaptation for Narrowband Internet of Things Systems,” IEEE Access,
vol. 5, pp. 1724 - 1734, 2017
[9] H. Malik, H. Pervaiz, M. Mahtab Alam, Y. Le Moullec, A. Kuusik and M. Ali Imran, “Radio Resource Management Scheme in NB-IoT
Systems,” IEEE Access, vol. 6, pp. 15051 - 15064, 2018.
[10] R. Boisguene, S. Tseng, C. Huang, and P. Lin, “A survey on NB-IoT Downlink Scheduling: Issues and Potential Solutions,” 13th International
Wireless Communications and Mobile Computing Conference (IWCMC), pp. 547 - 555, Valencia, 2017.
[11] J. Moon and Y. Lim, “A reinforcement learning approach to access management in wireless cellular networks,” Wireless Communications and
Mobile Computing ,2017.
[12] E. C. Santos, “A Simple Reinforcement Learning Mechanism for Resource Allocation in LTE-A Networks with Markov Decision Process and
Q-Learning,” arXiv:1709.09312v1 [cs.AI], 27 September 2017.
[13] Q. Wu, G. Ding,Y. Xu,S. Feng,Z. Du,J. Wang and K.Long,“Cognitive Internet of Things: A New Paradigm Beyond Connection,”Sensors,vol.
1, no. 2, pp. 129-143, April 2014.
[14] A. Khan, M. Rehmani, and A. Rachedi, “When Cognitive Radio meets the Internet of Things?,”IWCMC, pp. 469-474, September 2016.
[15] P. Rawat, K. Deep singh, and JM. Bonnin, “ Cognitive Radio for M2M and Internet of Things,”Comput. commun, 2016.
[16] M. Nitti, M. Murroni,M Fadda and L. Atzori, “Exploiting Social Internet of Things Features in Cognitive Radio,” IEEE Access, vol 4 pp.
9204-9212, 2016.
[17] S. Goudarzi,M. N. Kama, M. H. Anisi, S. A. Soleymani, F. Doctor, “Self-Organizing Traffic Flow Prediction with an Optimized Deep Belief
Network for Internet of Vehicles,” Sensors, vol. 18, pp. 3459, 2018.
[18] A. Shakeel, R. Hussain, A. Iqbal, I. L. Khan, Q. U. Hasan and S. A. Malik, “Spectrum Handoff based on Imperfect Channel State Prediction
Probabilities with Collision Reduction in Cognitive Radio Ad Hoc Networks,” Sensors, vol. 19, no. 4741, pp. 1 - 24, 31 October 2019.
[19] P. Thakur, A. Kumar, S. Pandit, G. Singh, and S. N. Satashia, “Performance Analysis of Cognitive Radio Networks Using Channel Prediction
Probabilities and Improved Frame Structure,” Digital Communications and Networks, vol. 4, Issue 4, pp. 287 - 295, November 2018.
[20] P. Yang, L. Li, J. Yin, H. Zhang, W. Liang, W. Chen, and Z. Han, “Dynamic Spectrum Access in Cognitive Radio Networks Using Deep
Reinforcement Learning and Evolutionary Game,” International Conference on Communications(ICC), Beijing, China, 16 - 18 August 2018.
[21] A. Azzouna, A. Guezmil, A. Sakly, and A. Mtibaa, “Resource Allocation for Multi-user Cognitive Radio Systems using Multi-agent Q-
Learning,” Procedia Computer Science, vol. 10, pp. 46 - 53, 2012.
[22] M. C. Hlophe and B. T. Maharaj, “QoE-Driven Resource Allocation for SUs with Heterogeneous Traffic using Deep Reinforcement Learning,”
in Proceddings of the 2nd IEEE Wireless Africa Conference, Pretoria, South Africa, 18 - 20 August 2019.
[23] M. C. Hlophe and B. T. Maharaj, “Spectrum Occupancy Reconstruction in Distributed Cognitive Radio Networks Using Deep Learning”, IEEE
Access, vol. 7, no. 2, pp. 14294 - 14307, January 2019.
[24] M. C. Hlophe, B. T. Maharaj, and S. Hamouda, “Distributed Spectrum Sensing in Cognitive Radio Systems”, in Proceedings of the IEEE
AFRICON, Capetown, South Africa, 18 - 20 September 2017.
[25] V. Q. Do and I. Koo, “Learning Frameworks for Cooperative Spectrum Sensing and Energy-Efficient Data Protection in Cognitive Radio
Networks,” Applied Sciences, vol. 8, no. 722, pp. 1 - 24, 04 May 2018.

You might also like