Professional Documents
Culture Documents
Hao2018 PDF
Hao2018 PDF
DOI: 10.1007/s41650-018-0012-7
c Posts & Telecom Press and Springer Singapore 2018 Research paper
Abstract—The mobile Ad Hoc network (MANET) is out a fixed communication network infrastructure[1-5] . Owing
a self-organizing and self-configuring wireless network, to its flexible and dynamic nature, MANET has been widely
consisting of a set of mobile nodes. The design of efficient used in areas such as military communications, disaster area
routing protocols for MANET has always been an active communications, and emergency rescues. MANET has also
area of research. In existing routing algorithms, however, been used to ensure vehicle communication by constructing
the current work does not scale well enough to ensure vehicular Ad Hoc network (VANET). To enhance communi-
route stability when the mobility and distribution of nodes cation quality, MANET needs efficient routing protocols. In
vary with time. In addition, each node in MANET has this network, each node has stochastic mobility and distribu-
only limited initial energy, so energy conservation and tion, causing the network topology to vary with time. It is
balance must be taken into account. An efficient routing therefore difficult to ensure the stability of selected routes.
algorithm should not only be stable but also energy saving Considering the limited initial energy of each node, it is im-
and balanced, within the dynamic network environment. portant to reduce energy consumption and also to balance
To address the above problems, we propose a stable and residual energy. To date, many MANET routing algorithms
energy-efficient routing algorithm, based on learning have been proposed to enhance communication performance.
automata (LA) theory for MANET. First, we construct
a new node stability measurement model and define an
A. Motivation
effective energy ratio function. On that basis, we give
the node a weighted value, which is used as the iteration Although a large amount of meaningful work has been
parameter for LA. Next, we construct an LA theory-based conducted on the design of MANET routing, none of this
feedback mechanism for the MANET environment to work considers the impact of node distribution, which varies
optimize the selection of available routes and to prove the over time and affects route stability. Work that focuses on
convergence of our algorithm. The experiments show that enhancing the stability of routing topology and using mo-
our proposed LA-based routing algorithm for MANET bility prediction models (e.g., Refs. [6-15]) may therefore
achieved the best performance in route survival time, not fully consider mobility control. In addition, most exist-
energy consumption, energy balance, and acceptable ing MANET routing algorithms use traditional heuristic al-
performance in end-to-end delay and packet delivery gorithms as their optimization methods (e.g., Refs. [16-22] );
ratio. these may lack expansibility and offer minimal hand-tuning,
Keywords—MANET routing, stability measurement resulting in relatively high computation costs in a dynamic en-
model, effective energy ratio function, learning automata vironment. From the engineering application point of view,
theory, feedback mechanism, optimization MANET has been widely used in many infrastructure and
emergency communication applications. The new generation
version of MANET can use non-orthogonal multiple access
I. I NTRODUCTION
technique, which greatly enhances its performance. The ex-
isting problems and practical value discussed above motivated
T he mobile Ad Hoc network (MANET) is composed of a
set of self-wireless mobile nodes, communicating with- us to design a stable and energy-efficient routing algorithm for
MANET.
Manuscript received Aug. 09, 2017; accepted Jan. 16, 2018.
S. Hao, H. Y. Zhang, M. K. Song. School of Computer, Wuhan University,
Wuhan 430079, China. B. Contributions
This work is supported by the National Natural Science Foundation of
China (No. 61772386), Guangdong provincial science and technology project The major contributions of this paper can be summarized
(No. 2015B010131007). as follows.
2 Journal of Communications and Information Networks
1) We construct a new stability measurement model to mea- not ensure normalization; it also incurs relatively high com-
sure the relay node stability in a route and define an effective putation costs (owing to the prediction adjacency matrix). In
energy ratio function. Ref. [12], T. Manimegalai et al. proposed an animal com-
2) We introduce the learning automata (LA) theory to opti- munication strategy (ACS), based on a routing algorithm. By
mize the process of route selection. using ACS, the authors constructed an animal behavior mech-
3) We construct a MANET environment feedback mecha- anism to improve path construction and retention, enhancing
nism using LA theory. With the help of this MANET environ- route stability. This method is similar to the ant colony al-
ment feedback mechanism, we can choose the optimal path gorithm, which uses history information to adjust the relay
from the source node to the destination node. This will be nodes. In this method, the authors declared that the density of
more stable, ensuring energy conservation and balance. the node is very important for enhancing route topology and
4) We provide a rigorous and skillful mathematical proof to communication stability. This rule is modeled on the gregari-
authenticate the algorithm’s convergence, which has not been ous habits of animals. Therefore, a node that has a relatively
done successfully in previous work. high node density is chosen as the relay node. The adjust-
ment mechanism for the relay node also follows this rule. It
C. Paper Outline should be stressed that, in this method, the source node needs
to periodically send a request packet to the destination node.
The rest of our paper is organized as follows. The related In addition, many routing algorithms focus on improving
work of the MANET routing algorithm is reviewed in sec- other qualities (e.g., Refs. [16-22]), including energy con-
tion II. A brief overview of the routing protocol and LA is sumption, end-to-end delay, and the packet delivery ratio. In
presented in section III. The system model is presented in sec- a typical example[17] , S. Chettibi et al. proposed an adap-
tion IV. The proof of convergence is provided in section V. tive maximum-lifetime routing policy based on reinforcement
The simulation results and experiment analysis are presented learning strategy. With the help of the heuristic principle (Q-
in section VI. Finally, a conclusion summarizes these findings learning method), it optimizes the route lifetime and mini-
in section VII. mizes overhead control. In this method, the author constructs
an intelligence battery model, which can help the node adjust
II. R ELATED W ORK AND O UR S OLUTION its transmission power, based on the dynamic environment. In
Ref. [18], a network routing protocol is proposed, based on
To the best of our knowledge, most current routing algo- the Russian Doll method. In this method, the routing protocol
rithms proposed for MANET focus mainly on enhancing rout- can choose the best multi-criteria solution to reduce energy
ing stability (e.g., Refs. [6-13]). Typically, in Ref. [7], B. consumption and enhance the throughput ratio. In Ref. [19],
An et al. proposed a mobility-based hybrid routing protocol. S. Chettibi et al. proposed a routing algorithm based on the
By dividing the network into several dynamic and adaptive ant colony optimization technique. By using the ant colony
teams, the mobility behavior of each team can be predicted by algorithm, the authors constructed an estimate model to rep-
constructing a team mobility model. A hybrid routing pro- resent route preference. In this method, the battery equipped
tocol is then proposed for team communication. However, with a node is sufficiently intelligent to adjust its battery loss
in a practical network environment, each node has indepen- rate. The heuristic strategy proposed in this paper relies on
dent stochastic mobility; a team mobility model cannot reflect this precondition, which may not fit all MANETs. Ref. [20]
the changes in a team. If the relative mobility of nodes in a proposed to select a stable route by using a fuzzy logic system.
team is high enough, the accuracy of this model must be re- In Ref. [21], P. Srivastava et al. achieved a QoS-aware routing
duced. In Ref. [8], A. Bentaleb et al. proposed a new mobility by using an artificial neural network (ANN). This approach
model, based on the Doppler shift, which can estimate rela- used a convolution calculation to achieve good performance in
tive speed. Using this mobility model enables the design of routing QoS (in this algorithm, the optimization targets are the
a mobility-based clustering routing protocol. Although this packet delivery ratio and end-to-end delay; for this reason, the
method ensures that the selected route has relatively high sta- parameters needed in cable news network (CNN) are a packet
bility, the nodes must periodically exchange HELLO packets delivery ratio and end-to-end delay. It is worth noting that
with neighbor nodes, which, of course, increases the energy the convolution calculation model constructed in this paper
consumption of the nodes. In Ref. [11], R. Suraj et al. used can raise computation costs (in this paper, as the authors have
movement history and genetic policy to construct a direction used a two-layer CNN, the measurement parameters have had
prediction adjacency matrix. This technique proposes a new to be computed multiple times). In addition, as a general rule,
approach to mobility prediction, which does not depend on the computation accuracy and time of CNN always depend on
probabilistic methods and is completely based on a genetic al- the number of layers.
gorithm (GA). However, it must be noted that this method can- We point out the advantages and disadvantages of each
A Stable and Energy-Efficient Routing Algorithm Based on Learning Automata Theory for MANET 3
routing algorithm[6-22] . It should be noted that none of the we found all available paths from the source node to the des-
works mentioned considers the impact of node distribution, tination node explained in part A of section IV). Because of
which varies with time and also affects route topology stabil- the convergence of LA, we can finally choose the optimal path
ity (the route life cycle). In fact, the survival time of a route with the highest path value from all of the available paths. The
is influenced by not only the relative mobility between nodes chosen path will be stable enough to ensure overall energy
but also the distribution of nodes. Overall energy conserva- saving and balance.
tion and balance should also be taken into consideration. In
addition to the above two points, it must be stressed that the III. P RELIMINARIES AND BACKGROUND
traditional heuristic techniques and machine learning meth-
ods used to design MANET routing protocols generally lack In this section, we present a brief overview of routing pro-
expansibility. They have minimal hand-tuning, and incur rela- tocols and some preliminary information on LA theory.
tively high computation costs. To resolve these existing prob-
lems, this paper proposes a stable and energy-efficient routing A. Overview of Routing Protocols
algorithm for MANET using LA theory. Compared with tra-
Based on the relation with information[20,27] , routing pro-
ditional machine learning methods and heuristic algorithms,
tocols can be divided into several categories. In general, we
LA theory has the following advantages: (1) LA theory is sup-
classify the protocols into three kinds: proactive, reactive and
ported by a completed mathematics proof[23-26] ; (2) LA theory
hybrid protocols. Proactive routing protocols periodically up-
is capable of global optimization and results in relatively low
date the message so that it can ensure the data packets trans-
costs, in a dynamic environment; (3) LA theory has good ex-
mission. Reactive protocols initiate route discovery on de-
pansibility, which is needed to optimize large-scale MANET
mand. That means when the source node has data packets to
routing performance; (4) LA theory can map the computa-
be sent to a given destination node, it initiates route discov-
tion space to a probability space, ensuring normalization. The
ery by broadcasting the route request packet. While receiving
optimization efficiency of traditional heuristic algorithms and
the request packet, the relay nodes will rebroadcast it again.
machine learning methods (e.g., ACO, PSO, GA) rely on the
This process continues until the request packet arrives at the
construction of a heuristic function. Considering the dynamic
destination node. Similar to the handshake mechanism, the
environment, it is difficult to construct a good enough heuris-
destination node generates a route reply packet and sends it
tic function. In addition, these methods cannot always ensure
to the source node. In other words, the reply packet tracks
normalization. As a general rule, the LA theory is used in
the reverse route already taken by the corresponding request
bio-computation and stochastic system control, which can be
packet. As a compromised scheme, hybrid routing protocols
regarded as a dynamic environment. Owing to the dynamic
combine these two routing protocols, which can be used in hi-
features of MANET, we are able to use LA theory to find the
erarchical structure networks. Generally, proactive protocols
optimal route from available paths.
cause more energy consumption, which, of course, degrades
In our solution, we begin by constructing a new node sta-
the network life cycle. Hybrid routing protocols use more con-
bility measurement model and defining the effective energy
trol information than reactive protocols needing a hierarchical
ratio function. On this basis, we introduce a node-weighted
structure network.
value function, which is used as the iteration parameter. We
then use LA theory to construct a MANET environment feed- B. Learning Automata
back model. In this feedback model, each node is equipped
with learning automaton, enabling it to take action by sensing LA theory is a self-learning mechanism based on the theory
the surrounding environment. Based on LA theory, this pro- of stochastic process[23,26] . As an adaptive decision-making
cess can be represented through a rigorous linear probability system, LA can enhance the performance by using previous
iteration; in other words, the relay node in the available paths knowledge to choose the best action from a limited set of
updates its weighted value according to the feedback signal, actions through repeated interactions with a random environ-
which represents the result after sensing the environment (we ment. Basic LA contains three key factors: a random environ-
have used a judging function to distinguish the type of feed- ment, an automaton and a feedback system. The automaton
back signal explained in part A of section IV). When the feed- chooses actions based on the random environment and the en-
back signal is a reward signal, this node will add its weighted vironment responds to these actions by producing a feedback
value. Conversely, it will reduce its weighted value. Thus, signal. Based on the effect on the automaton, the feedback
the current node can decide which node should be chosen as signal can be divided into ‘positive signal’ (reward signal) or
the next hop node from a group of available hop nodes. Ac- ‘negative’ signal (penalty signal). Over a period of time, the
cordingly, the path value defined in this feedback mechanism automaton can learn from the feedback signal to find an opti-
will be added or reduced (before executing this mechanism, mal action (Fig. 1 shows the operating principle of LA).
4 Journal of Communications and Information Networks
random environment
IV. S YSTEM M ODEL
(MANET) β In this section, we construct a new stability measurement
model and define an effective energy ratio function. On this
basis, we define the node-weighted value function, which can
be used to construct a MANET environment feedback mech-
α anism (a routing learning process).
learning automata
A. Network Model
Figure 1 Learning automata
In general, a MANET can be described by a undirected
Definition 1 (environment) The random environment is graph G = (V, E), where V represents the set of nodes and
an object interacting with the automaton. Usually, we set E represents the set of edges[28,29] . Therefore, a path in the
E = {A, B,C} to describe the random environment. Where network can be regarded as a set of nodes, which connect to
A = {α1 , α2 , · · · , αt } represents the limited sets of inputs per- each other from the source node to the destination node. In our
formed by the automaton, αt represents the action in time paper, all nodes exist in a 2D rectangular scenario and com-
slot t. Similarly, B = {β1 , β2 , · · · , βt } represents a limited municate through a common broadcast channel using omni-
set of responses from the random environment, βt repre- directional antennas. They have the same transmission range
sents the response from the environment in time slot t. C = of r. Assuming that the distance between node i and j can be
{c1 , c2 , · · · , ct } is the set of penalty probabilities, which is as- represented as Dist(i, j) and j is i’s neighbor node, Dist(i, j)
sociated with the given action α in time slot t. should be no more than r. Additionally, we do not have to
Using the definition of ci , the average penalty M(t) can be consider the impact of an interference range or avoid collision
defined by the following expression: by using a shared wireless channel. It should be explained
that these preconditions have been widely adopted in previous
M(t) = E[β (t)|P(t)]
work.
= E[β (t)|p1 (t), p2 (t), · · · , pn (t)]
= Pr[β [t = 0|P(t)] B. Node Stability Measurement Model
n
= ∑ Pr[β (t) = 0|α(t) = αi ] · Pr[α(t) = αi ] The chosen relay node in an available path from the source
i=1 node to the destination node should be stable enough to ensure
n path stability. This distinguishes our work from previous stud-
= ∑ ci · pi (t),
i=1
ies, in which we used only node velocity and did not consider
the impact of the distribution of neighbor nodes on stability.
where P(t) represents the action probability at instant t. Thus,
We measure the node stability by estimating the average time
the average penalty for the pure chance automaton is ex-
a node is connected to its neighbor nodes. Without any loss
pressed as
of generality, we assume that vi represents the mobility speed
1 n
M0 = ∑ ci . of node i in a small time slot, and that the velocity of the node
n i=1
remains unchanged during this short period of time. Hence,
Definition 2 (automaton) In a general sense, an automa- we can estimate the survival time of the link between node i
ton is a work system which does not need guidance from and any of its neighbor nodes, using the following formula:
the outside. From the mathematical point of view, the au-
tomaton can be defined as {A, B, Q, T, G}((A = α1 , α2 , · · · , αt , r2 = [(vix − v jx ) · ti, j + Dist(i, j)x ]2
B = β1 , β2 , · · · , βt have been defined in Def. 1). Q = q1 (t), + [(viy − v jy ) · ti, j + Dist(i, j)y ]2 , (1)
q2 (t), · · · , qn (t) represents the state in time slot t. T : Q × B ×
A −→ Q denotes the state transition function of the automa- where v()x , v()y represent the horizontal component and ver-
ton, which determines the way in which automaton transfers tical component of speed, respectively; Dist()x , Dist()y the
to the next state from the current state and input. We usually horizontal component and vertical component of the current
use the following formula to represent the state transition from distance between node i and node j; node j is a neighbor node
instant t to t + 1. of node i. By solving the above formula, we can get the ex-
q(t + 1) = T (q(t), α(t), β (t)). pression of ti, j
q
G determines the output based on the state at instant t.
Hb2 − Ha · Hc − Hb2
α(t) = G(q(t)). ti, j = , (2)
Ha
A Stable and Energy-Efficient Routing Algorithm Based on Learning Automata Theory for MANET 5
where Ha , Hb , Hc can be represented as Simplifying the above integral (7) using the element-
integral method
2 2
Ha = (vix − v jx ) + (viy − v jy ) ,
4D
Hb = Dist(i, j)x · (vix − v jx ) + Dist(i, j)y · (vi,y − v jy ), (3) Prob{Dist(i, j) = D 6 r} = funit (D), (8)
L2W 2
Hc = [Dist(i, j)x ]2 + [Dist(i, j)y ]2 .
where funit (D) can be written as
Thus, we can use a weighted average function to measure
π r2
for 0 < r 6 W : funit (D) = LW − Lr −W r + ,
node i’s stability(Nsi )
2 2
for W < r < L : funit (D)
(
Nsi = ∑mj=1 π{current distance = Dist(i, j)} · ti, j ,
√ W2
(4)
W
∑mj=1 πi j = 1, = LW arcsin + L r2 −W 2 − − Lr,
p r 2
2 2
for L 6 r 6 L +W : funit (D)
where π{current distance = Dist(i, j)} represents the
W W2
p
weighted value of ti j , which means the limiting probability
= LW arcsin + L r2 −W 2 −
that the current distance between i and j is Dist(i, j). To give
r 2
L2 r 2
p
the expression of π{current distance = Dist(i, j)}, we should
+W r2 − L2 − − .
2 2
first deduce the expression of Prob{Dist(i, j) < r}(r is the (9)
transmission range defined in part A of section III). Generally, the transmission range r is always less than
Without loss of generality, the node mobility in this the width W . With the help of the spline interpo-
MANET is independent and stochastic. Thus, we can deduce lation method[30,31] , we can deduce the expression of
the spatial PDF (probability distribution function) of any node π{current distance = Dist(i, j)}
in the MANET using the following uniform distribution:
π{current distance = Dist(i, j) = D}
1
f (x, y) = x ∈ (0, L), y ∈ (0,W ), (5) = Prob{Dist(i, j) = D 6 r < W }−1 ·
L ·W
6D2
2π 8(L +W )D
where L, W respectively represent the length and width of + − . (10)
LW L2W 2 L 2W 2
MANET. Therefore, the joint PDF between node i and j can
be represented as follows, noting that the joint PDF relies on Remark 1 D is a simplified representation of Dist() in
the setting of network scenarios previous formulas. The reason we get the expression (10)
through the spline interpolation method is that it meets Lip-
4 schitz Condition[30,32] .
fi j (Dx, Dy) = L2W 2 · (L − Dx)(W − Dy),
(6) By substituting formula (10) into formula (4), we can ob-
Dx2 + Dy2 = Dist(i, j) = D,
tain Ns.
Dx = Dist(i, j)x ; Dy = Dist(i, j)y .
C. Effective Energy Ratio Function
Thus, the expression of Prob{Dist(i, j) 6 r} can be ex-
pressed as In an available path, the chosen relay node should have
enough power to ensure the transmission of packets. We use
for 0 < r 6 W : Prob{Dist(i, j) = D 6 r} the effective energy ratio function (Er) to represent the energy
√
Z r Z r2 −Dx2
level of the relay node
= fi j (Dx, Dy)dDxdDy,
0 0 Rei
Eri = , (11)
for W < r < L : Prob{Dist(i, j) = D 6 r}
√
Z W Z r2 −W 2
Iei
where Rei and Iei represent the residual energy and initial en-
= fi j (Dx, Dy)dDxdDy
0 √ 0
ergy of node i, respectively. Thus, we can define the node
Z r2 −Dx2
Z r
+ √ fi j (Dx, Dy)dDxdDy, weighted value (Nw) as
0 r2 −W 2
√
Nw = ω1 · Ns0i + ω2 · Eri ,
for L 6 r 6 L2 +W 2 : Prob{Dist(i, j) = D < r} i
√
Nsi − Nsmin
Ns0i =
Z W Z r2 −W 2
, (12)
= fi j (Dx, Dy)dDxdDy Ns max − Nsmin
0 √ 0
ω1 + ω2 = 1,
Z r2 −Dx2 Z L
+ √ fi j (Dx, Dy)dDxdDy. where ω() represents the weighting factor, Nb0i represents the
0 r2 −W 2
(7) normalized value of Nbi . To find the optimal route among all
6 Journal of Communications and Information Networks
continuous update process. For this reason, we can present the in definition 1) takes on values in a compact metric space. The
genericity update process of node-weighted value in this way: outputs of the LA derive from a finite set, and the reinforce-
ment signals take values from the closed interval [0, 1]. It is
Nw0 (t + 1) = E[Nw(t + 1)|Nw(t)] − Nw(t). (19) important to note that, in our proposed algorithm, α(t) rep-
resents the result of the sensing environment, which has two
Based on the Kolmogorov criterion[23] , formula (19) can be
types of information, good and bad. Hence, α(t) is a finite set.
represented as
Similarly, β (t) represents the feedback mechanism for the re-
Nw0i j (t) = Λ · Nwi j (t)[1 − Nwi j (t)E[βi j (t)] lay nodes, which also can be represented as two probability
iteration formulas. For this reason, β (t) in our paper is also a
− Λ · ∑ Nwiq (t)Nwi j (t)E[βiq (t)]]
q6= j
finite set. In addition, we can determine that the computation
space has been defined in [0, 1], which, of course, meets the
= Λ · Nwi j (t) ∑ Nwiq (t)[E[βi j (t)] − E[βiq (t)]]. (20)
q6= j
condition.
In our routing algorithm, we have improved the update fac-
Based on the definition in LA theory, Λ represents the tor using formula (17). It should be stressed here that a(t),
genericity maximum update rate proposed in our improved b(t) are represented as two exponential functions. We can
LA. Generally, the update rate floats around 0.10 (in our pa- easily find that
per, we set δ = 0.10 and ε = 0.10). Thus, we can reach the
(
max{a(t)} = δ ,
following conclusion by differentiating both sides of the for- (25)
max{b(t)} = ε.
mula (20)
Therefore, we can confirm that the update rate is in [0, 1],
ϑ ∑q6= j Nwiq E[βiq (t)
= E[βi j (t)]. (21) which also meets the condition. We find that the formula
Nwi j
(24) (an ordinary differential equation) has a unique solution,
Let (ϑ ∑q6= j Nwiq E[βiq (t))/Nwi j = Q(Nw), Nw0i j (t) can be which is based on the initial x(0).
represented as In this proof, j represents the chosen next hop node of node
i; q represents the ID of the hop node not chosen by node i.
Nw0i j (t) = Λ Nwi j (t) · ∑ Nwiq Q(Nw) − Q(Nw) ,
Hence Nwi j represents the edge between nodes i and j, which
q6= j Nwi j Nwiq (22) is chosen by selecting the next hop node j from current node
Nw0 (t) = function(Nw(t)).
i. The value of Nwi j is equal to the current node-weighted
ij
value Nw j . The genericity maximum update rate Λ can have
Obviously, we conclude that Nw0i j is not a direct function two parameters: a reward coefficient and a penalty coefficient.
of the time slot t. This conclusion allows us to use the spline Thus, we are able to provide a convergence proof using LA
interpolation method[31] to rewrite the node-weighted value theory.
Nw(t)
Nw(t) = PΛ (τ), (23)
VI. S IMULATION R ESULTS A BOUT THE
where τ ∈ [nΛ , (n + 1)Λ ], PΛ (τ) is a piecewise constant in- P ROPOSED A LGORITHM
terpolation function (In LA theory, P(t) represents the action
probability at instant t). Now we only care about the conver- This section evaluates the performance of the proposed al-
gence of PΛ (). gorithm. By using NS-3[34,35] , we compare our proposed al-
Generally, the genericity maximum update rate Λ is close gorithm (LASEERA) with that of ACSRA (Ref. [12]), AN-
to 0[23] . Hence, based on approximation theory[33] , we can NQARA (Ref. [21]), and classical AODV.
make an assertion that pΛ (τ) can weakly converge to the so-
lution set of an ordinary differential equation[30] , which can be A. Simulation Parameters
represented as
Tab. 1 shows the network parameters.
dxi j = xi j (t) ∑ xiq Q(x) − Q(x) ,
Our experiment uses the Random Waypoint model to de-
dτ q6= j xi j xiq (24) scribe the node motion and standard energy module parame-
x(0) = pΛ (0), ters of NS-3 (Power End, Power Start, Current A) in order to
describe the energy transmission consumption. It is important
where X(.) denotes Nw(.) in our study. It must be noted that to stress that the parameters δ and ε must not be too large or
formula (24) is a particular case in the weak convergence the- too small. Like the memory factor of Markov Chain, these two
orem, as it relies on the fact that P(t), β (t), α(t) constitute a parameters determine the maximum update rate of the LA. If
Markov process. In addition, the value of β (t), α(t) (defined these parameters are too large, the optimization process rate
A Stable and Energy-Efficient Routing Algorithm Based on Learning Automata Theory for MANET 9
1
average residual energy/J
LASEERA
1 ANNQARA
ACSRA
AODV
LASEERA
0
ANNQARA 10.0 12.5 15.0 17.5 20.0
ACSRA maximum velocity/m.s−1
AODV
energy variance
has lower energy variance than ACSRA, ANNQARA, or 0.6
AODV. In fact, our proposed routing algorithm’s energy vari-
ance is an average of 25.5%, 24.9%, and 34.9% lower than 0.4
ACSRA, ANNQARA and AODV, respectively. This means
that our proposed routing algorithm performs better in balanc- LASEERA
0.2 ANNQARA
ing energy. In addition, the energy variance decreases with the ACSRA
AODV
number of nodes; the relationship is not strictly linear.
0.0
10.0 12.5 15.0 17.5 20.0
1.0
LASEERA
maximum velocity/m.s−1
ANNQARA
ACSRA Figure 9 Energy variance vs. velocity
0.8 AODV
energy variance
AODV
is the reason why our algorithm has the smallest energy vari- 30
the route contains more relay nodes when the number of nodes relation to the number of nodes, when maximum velocity
increases. From the simulation results, it is easy to follow this is 10 m/s. We can see that our proposed routing algorithm
truth. We also find that the AODV has a relative high delay (LASEERA) has a relatively low packet delivery ratio, in
performance, which means the original strategy cannot repre- comparison to ACSRA, ANNQARA, and AODV. In fact, our
sent the shortest delay performance. It is interesting to discuss proposed routing algorithm’s packet delivery ratio is an aver-
the delay performance of our algorithm and ANNQARA. As age of 2.0% and 1.1% lower than those of ACSRA and AN-
we have mentioned in related work, ANNQARA use the pa- NQARA, respectively. Compared to AODV, our routing algo-
rameter of end-to-end delay to construct the convolution cal- rithm’s packet delivery ratio is 5.5% higher.
culation model (2 layers CNN). Hence, the result that the de-
lay performance of ANNQARA is better than other routing al- 1.0
gorithms is normal. Owing to the stability model constructed
1.0 1.0
LASEERA
ANNQARA
ACSRA
AODV
0.8 0.5
LASEERA
ANNQARA
ACSRA
AODV
0.6 0.0
10.0 12.5 15.0 17.5 20.0 40 60 80 100
maximum velocity/m.s−1 number of nodes
Figure 13 Packets delivery ratio vs. velocity Figure 14 Control overhead vs. nodes
need n − 2 nodes to construct a route between the source node rithm. The above results confirm that the MANET routing al-
and the destination node. These relay nodes are within the gorithm, which uses optimization methods, increases control
same transmission range. Based on the optimization strategy overheads, boosting computation costs.
in our algorithm, we need to ascertain the current state of the Fig. 15 shows the impact of velocity on control overheads.
neighbor nodes and then judge the type of feedback. In this It is clear that AODV has the smallest amount of control over-
way, a node needs to communicate with n − 3 neighbor nodes. heads, while ANNQARA and ACSRA have higher control
Therefore, a one-time feedback process needs (n − 2)(n − 3) overheads than our routing algorithm. This means that, as
communication times. As we have mentioned above, the opti- a learning-based routing protocol, our routing algorithm has
mization process can be finished in a finite time period (a finite an acceptable amount of control overheads: 26.6% lower than
number of iteration times). Assuming that the finite number ANNQARA and 27.5% lower than ACSRA. By contrast, our
of iteration times is M, the total communication time will be proposed algorithm’s control overheads are 30.2% higher than
M(n−2)(n−3), or O(n2 ). We note that, in practical terms, the those of AODV.
computational complexity would normally be less than that
shown in this example. To measure the computational com- 1.0
LASEERA
plexity of our algorithm, we use control overhead to evaluate ANNQARA
normalized control overhead
ACSRA
this metric. AODV
Fig. 14 shows the impact of the number of nodes on the
amount of control overheads. It is clear that AODV has the
smallest amount of control overheads, while ANNQARA and 0.5
ACSRA have higher control overheads than our routing al-
gorithm. That means that, as a learning-based routing proto-
col, our routing algorithm has an acceptable amount of con-
trol overheads, which are 22.7% lower than ANNQARA and
25.2% lower than ACSRA. By contrast, they are 38.3% higher
than AODV. 0.0
10.0 12.5 15.0 17.5 20.0
Analyzing the simulation results shows that that AODV maximum velocity/m.s−1
has the smallest amount of control overheads. The reason
Figure 15 Control overhead vs. velocity
for this is that AODV does not use any additional optimiza-
tion methods to optimize the chosen route, which, of course,
reduces the control overheads. As mentioned in the previ- Analyzing the simulation results shows that, as velocity in-
ous section, ANNQARA uses a 2-layer CNN, which can en- creases, so do control overheads. The reason for this is easy
hance computation costs. For this reason, ANNQARA’s con- to understand: route stability is reduced by increasing velocity
trol overheads are larger than those of our routing algorithm. (frequent route reconstruction requires more control packets).
ACSARA needs periodical control packets to adjust its relay Hence, control overheads increase.
nodes, which can also increase control overheads. Its control The simulation results above offer the following general-
overheads are therefore larger than those of our routing algo- ized findings:
14 Journal of Communications and Information Networks
(1) Our proposed routing algorithm (LASEERA) has the [3] Z. Yang, Y. Liu. Understanding node localizability of wireless Ad Hoc
best performance when it comes to route survival time, energy and sensor networks [J]. IEEE Transactions on Mobile Computing,
2012, 11(8): 1249-1260.
consumption, and energy balance.
[4] M. A. Rahman, M. S. Hossain. A location-based mobile crowdsensing
(2) Intelligence algorithms (LASEERA, ANNQARA, and framework supporting a massive Ad Hoc social network environment
ACSRA) need more costs to control their optimization strat- [J]. IEEE Communications Magazine, 2017, 55(3): 76-85.
egy, which, of course, increases their control overheads. Com- [5] N. T. Dinh, Y. Kim. Information-centric dissemination protocol for
pared to the other intelligence algorithms (ANNQARA and safety information in vehicular ad-hoc networks [J]. Wireless Net-
works, 2017, 23(5): 1359-1371.
ACSRA), our algorithm has an acceptable performance in re-
[6] S. J. Lee, W. Su, M. Gerla. Wireless Ad Hoc multicast routing with
lation to control overheads. mobility prediction [J]. Mobile Networks and Applications, 2001, 6(4):
(3) No algorithm can optimize all of the metrics without 351-360.
paying any additional cost. The optimization strategy deter- [7] B. An, S. Papavassiliou. MHMR: mobility-based hybrid multicast
mines which metrics it can optimize. An efficient optimiza- routing protocol in mobile Ad Hoc wireless networks [J]. Wireless
Communications and Mobile Computing, 2003, 3(2): 255-270.
tion strategy relies on the precondition that the additional costs
[8] A. Bentaleb, S. Harous, A. Boubetra. A weight based clustering
of the strategy are acceptable. scheme for mobile Ad Hoc networks [C]//The 11th International Con-
ference on Advances in Mobile Computing and Multimedia, Vienna,
2013: 161-166.
VII. C ONCLUSION AND F UTURE W ORK [9] S. Guo, O. Yang. Maximizing multicast communication lifetime in
wireless mobile Ad Hoc networks [J]. IEEE Transactions on Vehicular
In this paper, we proposed an energy-efficient, stable rout- Technology, 2008, 57(4): 2414-2425.
ing algorithm based on LA theory for MANET, and provided [10] H. B. Thriveni, G. M. Kumar, R. Sharma. Performance evaluation of
clear research steps. We first constructed a new node stabil- routing protocols in mobile ad-hoc networks with varying node density
and node mobility [C]//International Conference on Communication
ity measurement model and defined an effective energy ra-
Systems and Network Technologies, Gwalior, 2013: 252-256.
tio function; these were used to define the node-weighted [11] R. Suraj, S. Tapaswi, S. Yousef, et al. Mobility prediction in mobile
value. Second, we constructed a MANET environment feed- Ad Hoc networks using a lightweight genetic algorithm [J]. Wireless
back mechanism, in which each node is equipped with a learn- Networks, 2016, 22(6): 1797-1806.
ing automaton to execute an optimization process and update [12] T. Manimegalai, C. Jayakumar, G. Gunasekaran. Using animal com-
munication strategy (ACS) for MANET routing [J]. Journal of the Na-
its own weighted value, based on different feedback signals,
tional Science Foundation of Sri Lanka, 2015, 43(3): 199-208.
which are generated by sensing the node’s ambient network [13] G. Singal, V. Laxmi, M. S. Gaur, et al. Moralism: mobility prediction
environment. In this process, we also improved the basic with link stability based multicast routing protocol in MANETs [J].
LA, so that each node can sense variations in feedback signal Wireless Networks, 2017, 23(3): 663-679.
strength over time. In addition, we provided a rigorous mathe- [14] Selvi, F. A. Pitchaimuthu. Ant based multipath backbone routing for
load balancing in MANET [J]. IET Communications, 2017, 11(1):
matical proof to authenticate the convergence of our proposed
136-141.
routing algorithm, which has not been done well in earlier [15] A. Kout, S. Labed, S. Chikhi, et al. AODVCS, a new bio-inspired
studies. Through a simulation experiment, we found that our routing protocol based on cuckoo search algorithm for mobile Ad Hoc
proposed routing algorithm has the best performance in route networks [J]. Wireless Networks, 2017(9): 1-11.
survival time, energy consumption, and energy balance and [16] J. Liu, Y. Xu, X. Jiang. End-to-end delay in two hop relay MANETs
with limited buffer [C]//Second International Symposium on Comput-
achieves an acceptable performance in end-to-end delay and
ing and Networking, Shizuoka, 2015: 151-156.
packet delivery ratio. [17] S. Chettibi, S. Chikhi. Adaptive maximum-lifetime routing in mobile
In the future study, the following research directions are ad-hoc networks using temporal difference reinforcement learning [J].
meaningful. Evolving Systems, 2014, 5(2): 89-108.
(1) Extending this algorithm to a layered network structure. [18] A. Petrowski, F. Aissanou, I. Benyahia, et al. Multicriteria rein-
forcement learning based on a Russian doll method for network rout-
(2) Considering how to improve this algorithm within an
ing [C]//IEEE International Conference Intelligent Systems, London,
energy harvesting MANET scene. 2010: 321-326.
(3) Designing a QoS cross layer routing algorithm to extend [19] P. Vijayalakshmi, S. A. J. Francis, J. A. Dinakaran. A robust energy
our current work. efficient ant colony optimization routing algorithm for multi-hop Ad
Hoc networks in MANETs [J]. Wireless Networks, 2016, 22(6): 2081-
2100.
R EFERENCES [20] S. Chettibi, S. Chikhi. Dynamic fuzzy logic and reinforcement learning
for adaptive energy efficient routing in mobile ad-hoc networks [J].
[1] L. Blazevic, L. Buttyan, S. Capkun, et al. self-organization in mobile Applied Soft Computing, 2016, 38: 321-328.
ad-hoc networks: the approach of terminodes [J]. IEEE Communica- [21] P. Srivastava, R. Kumar. A new QoS-aware routing protocol for
tions Magazine, 2001, 39(6): 166-174. MANET using artificial neural network [J]. Journal of Computing and
[2] R. Bruno, M. Conti, E. Gregori. Mesh networks: commodity multihop Information Technology, 2016, 24(3): 221-235.
Ad Hoc networks [J]. IEEE Press, 2005, 43(3): 123-131. [22] S. K. Das, S. Tripathi. Intelligent energy-aware efficient routing for
A Stable and Energy-Efficient Routing Algorithm Based on Learning Automata Theory for MANET 15
MANET [J]. Wireless Networks, 2016(7): 1-21. [34] G. F. Riley, T. R. Henderson. The ns-3 Network Simulator [J]. Model-
[23] K. S. Narendra, M. A. L. Thathachar. Learning automata: An intro- ing and Tools for Network Simulation, 2010: 15-34.
duction [M]. USA: DBLP, 2012. [35] M. S. Khan, Q. K. Jadoon, M. I. Khan. Mobile and wireless technology
[24] M. A. L. Thathachar, P. S. Sastry. A hierarchical system of learning 2015: A comparative performance analysis of MANET routing proto-
automata that can learn the globally optimal path [J]. Information Sci- cols under security attacks [M]. Germany: Springer Berlin Heidelberg,
ences, 1987, 42(2): 143-166. 2015, 310: 137-145.
[25] H. Beigy, M. R. Meybodi. Utilizing distributed learning automata to
solve stochastic shortest path problems [J]. International Journal of
Uncertainty, Fuzziness and Knowledge-Based Systems, 2006, 14(05): A BOUT THE AUTHORS
591-615.
[26] M. L. Thathachar, P. S. Sastry. Varieties of learning automata: an Sheng Hao was born in Lanzhou. He received his B.E.
overview [J]. IEEE Transactions on Systems Man & Cybernetics Part and M.S. degrees in computer science and technology
B Cybernetics A Publication of the IEEE Systems Man & Cybernetics from Wuhan University. He is now a Ph.D. candidate
Society, 2002, 32(6): 711-722. of architecture. His research interests include wireless
network, communication theory and complex network
[27] A. A. Anasane, R. A. Satao. A survey on various multipath routing
theory. (Email: 2008301500139@whu.edu.cn)
protocols in wireless sensor networks [J]. Procedia Computer Science,
2016, 79: 610-615.
[28] D. B. West. Introduction to graph theory [M]. 2nd ed. USA: McGraw-
Hill Higher Education, 2005: 260.
Huyin Zhang [corresponding author] was born in
[29] I. Das, D. K. Lobiyal, C. P. Katti. Multipath routing in mobile Ad Hoc
Shanghai. He received his Ph.D. degree in computer
network with probabilistic splitting of traffic [J]. Wireless Networks,
science and technology from Wuhan university. He is
2016, 22(7): 2287-2298.
a professor of Wuhan university. His research interests
[30] H. J. Kushner. Approximation and weak convergence methods for include high performance computing, network qual-
random processes, with applications to stochastic systems theory [M]. ity of service, new generation network architecture.
USA: MIT Press, 1984. (Email: zhy2536@whu.edu.cn)
[31] G. Wahba. Erratum: spline interpolation and smoothing on the sphere
[J]. Siam Journal on Scientific & Statistical Computing, 2012, 2(2):
5-16. Mengkai Song was born in Wuhan. He received his
[32] J. Chen, W. Li. Convergence behaviour of inexact Newton methods un- undergraduate degree in computer science and tech-
der weak Lipschitz condition [J]. Journal of Computational & Applied nology from Wuhan university. He is now a gradu-
Mathematics, 2006, 191(1): 143-164. ate student of computer network. His research inter-
[33] G. A. Anastassiou, S. G. Gal. Approximation theory: Moduli of conti- ests include wireless network and differential privacy.
nuity and global smoothness preservation [M]. USA: DBLP, 2000. (Email: mksong@whu.edu.cn)