Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Journal of Communications and Information Networks, 2018

DOI: 10.1007/s41650-018-0012-7

c Posts & Telecom Press and Springer Singapore 2018 Research paper

A Stable and Energy-Efficient Routing Algorithm


Based on Learning Automata Theory for MANET
Sheng Hao, Huyin Zhang* , Mengkai Song
* Corresponding author, Email: zhy2536@whu.edu.cn

Abstract—The mobile Ad Hoc network (MANET) is out a fixed communication network infrastructure[1-5] . Owing
a self-organizing and self-configuring wireless network, to its flexible and dynamic nature, MANET has been widely
consisting of a set of mobile nodes. The design of efficient used in areas such as military communications, disaster area
routing protocols for MANET has always been an active communications, and emergency rescues. MANET has also
area of research. In existing routing algorithms, however, been used to ensure vehicle communication by constructing
the current work does not scale well enough to ensure vehicular Ad Hoc network (VANET). To enhance communi-
route stability when the mobility and distribution of nodes cation quality, MANET needs efficient routing protocols. In
vary with time. In addition, each node in MANET has this network, each node has stochastic mobility and distribu-
only limited initial energy, so energy conservation and tion, causing the network topology to vary with time. It is
balance must be taken into account. An efficient routing therefore difficult to ensure the stability of selected routes.
algorithm should not only be stable but also energy saving Considering the limited initial energy of each node, it is im-
and balanced, within the dynamic network environment. portant to reduce energy consumption and also to balance
To address the above problems, we propose a stable and residual energy. To date, many MANET routing algorithms
energy-efficient routing algorithm, based on learning have been proposed to enhance communication performance.
automata (LA) theory for MANET. First, we construct
a new node stability measurement model and define an
A. Motivation
effective energy ratio function. On that basis, we give
the node a weighted value, which is used as the iteration Although a large amount of meaningful work has been
parameter for LA. Next, we construct an LA theory-based conducted on the design of MANET routing, none of this
feedback mechanism for the MANET environment to work considers the impact of node distribution, which varies
optimize the selection of available routes and to prove the over time and affects route stability. Work that focuses on
convergence of our algorithm. The experiments show that enhancing the stability of routing topology and using mo-
our proposed LA-based routing algorithm for MANET bility prediction models (e.g., Refs. [6-15]) may therefore
achieved the best performance in route survival time, not fully consider mobility control. In addition, most exist-
energy consumption, energy balance, and acceptable ing MANET routing algorithms use traditional heuristic al-
performance in end-to-end delay and packet delivery gorithms as their optimization methods (e.g., Refs. [16-22] );
ratio. these may lack expansibility and offer minimal hand-tuning,
Keywords—MANET routing, stability measurement resulting in relatively high computation costs in a dynamic en-
model, effective energy ratio function, learning automata vironment. From the engineering application point of view,
theory, feedback mechanism, optimization MANET has been widely used in many infrastructure and
emergency communication applications. The new generation
version of MANET can use non-orthogonal multiple access
I. I NTRODUCTION
technique, which greatly enhances its performance. The ex-
isting problems and practical value discussed above motivated
T he mobile Ad Hoc network (MANET) is composed of a
set of self-wireless mobile nodes, communicating with- us to design a stable and energy-efficient routing algorithm for
MANET.
Manuscript received Aug. 09, 2017; accepted Jan. 16, 2018.
S. Hao, H. Y. Zhang, M. K. Song. School of Computer, Wuhan University,
Wuhan 430079, China. B. Contributions
This work is supported by the National Natural Science Foundation of
China (No. 61772386), Guangdong provincial science and technology project The major contributions of this paper can be summarized
(No. 2015B010131007). as follows.
2 Journal of Communications and Information Networks

1) We construct a new stability measurement model to mea- not ensure normalization; it also incurs relatively high com-
sure the relay node stability in a route and define an effective putation costs (owing to the prediction adjacency matrix). In
energy ratio function. Ref. [12], T. Manimegalai et al. proposed an animal com-
2) We introduce the learning automata (LA) theory to opti- munication strategy (ACS), based on a routing algorithm. By
mize the process of route selection. using ACS, the authors constructed an animal behavior mech-
3) We construct a MANET environment feedback mecha- anism to improve path construction and retention, enhancing
nism using LA theory. With the help of this MANET environ- route stability. This method is similar to the ant colony al-
ment feedback mechanism, we can choose the optimal path gorithm, which uses history information to adjust the relay
from the source node to the destination node. This will be nodes. In this method, the authors declared that the density of
more stable, ensuring energy conservation and balance. the node is very important for enhancing route topology and
4) We provide a rigorous and skillful mathematical proof to communication stability. This rule is modeled on the gregari-
authenticate the algorithm’s convergence, which has not been ous habits of animals. Therefore, a node that has a relatively
done successfully in previous work. high node density is chosen as the relay node. The adjust-
ment mechanism for the relay node also follows this rule. It
C. Paper Outline should be stressed that, in this method, the source node needs
to periodically send a request packet to the destination node.
The rest of our paper is organized as follows. The related In addition, many routing algorithms focus on improving
work of the MANET routing algorithm is reviewed in sec- other qualities (e.g., Refs. [16-22]), including energy con-
tion II. A brief overview of the routing protocol and LA is sumption, end-to-end delay, and the packet delivery ratio. In
presented in section III. The system model is presented in sec- a typical example[17] , S. Chettibi et al. proposed an adap-
tion IV. The proof of convergence is provided in section V. tive maximum-lifetime routing policy based on reinforcement
The simulation results and experiment analysis are presented learning strategy. With the help of the heuristic principle (Q-
in section VI. Finally, a conclusion summarizes these findings learning method), it optimizes the route lifetime and mini-
in section VII. mizes overhead control. In this method, the author constructs
an intelligence battery model, which can help the node adjust
II. R ELATED W ORK AND O UR S OLUTION its transmission power, based on the dynamic environment. In
Ref. [18], a network routing protocol is proposed, based on
To the best of our knowledge, most current routing algo- the Russian Doll method. In this method, the routing protocol
rithms proposed for MANET focus mainly on enhancing rout- can choose the best multi-criteria solution to reduce energy
ing stability (e.g., Refs. [6-13]). Typically, in Ref. [7], B. consumption and enhance the throughput ratio. In Ref. [19],
An et al. proposed a mobility-based hybrid routing protocol. S. Chettibi et al. proposed a routing algorithm based on the
By dividing the network into several dynamic and adaptive ant colony optimization technique. By using the ant colony
teams, the mobility behavior of each team can be predicted by algorithm, the authors constructed an estimate model to rep-
constructing a team mobility model. A hybrid routing pro- resent route preference. In this method, the battery equipped
tocol is then proposed for team communication. However, with a node is sufficiently intelligent to adjust its battery loss
in a practical network environment, each node has indepen- rate. The heuristic strategy proposed in this paper relies on
dent stochastic mobility; a team mobility model cannot reflect this precondition, which may not fit all MANETs. Ref. [20]
the changes in a team. If the relative mobility of nodes in a proposed to select a stable route by using a fuzzy logic system.
team is high enough, the accuracy of this model must be re- In Ref. [21], P. Srivastava et al. achieved a QoS-aware routing
duced. In Ref. [8], A. Bentaleb et al. proposed a new mobility by using an artificial neural network (ANN). This approach
model, based on the Doppler shift, which can estimate rela- used a convolution calculation to achieve good performance in
tive speed. Using this mobility model enables the design of routing QoS (in this algorithm, the optimization targets are the
a mobility-based clustering routing protocol. Although this packet delivery ratio and end-to-end delay; for this reason, the
method ensures that the selected route has relatively high sta- parameters needed in cable news network (CNN) are a packet
bility, the nodes must periodically exchange HELLO packets delivery ratio and end-to-end delay. It is worth noting that
with neighbor nodes, which, of course, increases the energy the convolution calculation model constructed in this paper
consumption of the nodes. In Ref. [11], R. Suraj et al. used can raise computation costs (in this paper, as the authors have
movement history and genetic policy to construct a direction used a two-layer CNN, the measurement parameters have had
prediction adjacency matrix. This technique proposes a new to be computed multiple times). In addition, as a general rule,
approach to mobility prediction, which does not depend on the computation accuracy and time of CNN always depend on
probabilistic methods and is completely based on a genetic al- the number of layers.
gorithm (GA). However, it must be noted that this method can- We point out the advantages and disadvantages of each
A Stable and Energy-Efficient Routing Algorithm Based on Learning Automata Theory for MANET 3

routing algorithm[6-22] . It should be noted that none of the we found all available paths from the source node to the des-
works mentioned considers the impact of node distribution, tination node explained in part A of section IV). Because of
which varies with time and also affects route topology stabil- the convergence of LA, we can finally choose the optimal path
ity (the route life cycle). In fact, the survival time of a route with the highest path value from all of the available paths. The
is influenced by not only the relative mobility between nodes chosen path will be stable enough to ensure overall energy
but also the distribution of nodes. Overall energy conserva- saving and balance.
tion and balance should also be taken into consideration. In
addition to the above two points, it must be stressed that the III. P RELIMINARIES AND BACKGROUND
traditional heuristic techniques and machine learning meth-
ods used to design MANET routing protocols generally lack In this section, we present a brief overview of routing pro-
expansibility. They have minimal hand-tuning, and incur rela- tocols and some preliminary information on LA theory.
tively high computation costs. To resolve these existing prob-
lems, this paper proposes a stable and energy-efficient routing A. Overview of Routing Protocols
algorithm for MANET using LA theory. Compared with tra-
Based on the relation with information[20,27] , routing pro-
ditional machine learning methods and heuristic algorithms,
tocols can be divided into several categories. In general, we
LA theory has the following advantages: (1) LA theory is sup-
classify the protocols into three kinds: proactive, reactive and
ported by a completed mathematics proof[23-26] ; (2) LA theory
hybrid protocols. Proactive routing protocols periodically up-
is capable of global optimization and results in relatively low
date the message so that it can ensure the data packets trans-
costs, in a dynamic environment; (3) LA theory has good ex-
mission. Reactive protocols initiate route discovery on de-
pansibility, which is needed to optimize large-scale MANET
mand. That means when the source node has data packets to
routing performance; (4) LA theory can map the computa-
be sent to a given destination node, it initiates route discov-
tion space to a probability space, ensuring normalization. The
ery by broadcasting the route request packet. While receiving
optimization efficiency of traditional heuristic algorithms and
the request packet, the relay nodes will rebroadcast it again.
machine learning methods (e.g., ACO, PSO, GA) rely on the
This process continues until the request packet arrives at the
construction of a heuristic function. Considering the dynamic
destination node. Similar to the handshake mechanism, the
environment, it is difficult to construct a good enough heuris-
destination node generates a route reply packet and sends it
tic function. In addition, these methods cannot always ensure
to the source node. In other words, the reply packet tracks
normalization. As a general rule, the LA theory is used in
the reverse route already taken by the corresponding request
bio-computation and stochastic system control, which can be
packet. As a compromised scheme, hybrid routing protocols
regarded as a dynamic environment. Owing to the dynamic
combine these two routing protocols, which can be used in hi-
features of MANET, we are able to use LA theory to find the
erarchical structure networks. Generally, proactive protocols
optimal route from available paths.
cause more energy consumption, which, of course, degrades
In our solution, we begin by constructing a new node sta-
the network life cycle. Hybrid routing protocols use more con-
bility measurement model and defining the effective energy
trol information than reactive protocols needing a hierarchical
ratio function. On this basis, we introduce a node-weighted
structure network.
value function, which is used as the iteration parameter. We
then use LA theory to construct a MANET environment feed- B. Learning Automata
back model. In this feedback model, each node is equipped
with learning automaton, enabling it to take action by sensing LA theory is a self-learning mechanism based on the theory
the surrounding environment. Based on LA theory, this pro- of stochastic process[23,26] . As an adaptive decision-making
cess can be represented through a rigorous linear probability system, LA can enhance the performance by using previous
iteration; in other words, the relay node in the available paths knowledge to choose the best action from a limited set of
updates its weighted value according to the feedback signal, actions through repeated interactions with a random environ-
which represents the result after sensing the environment (we ment. Basic LA contains three key factors: a random environ-
have used a judging function to distinguish the type of feed- ment, an automaton and a feedback system. The automaton
back signal explained in part A of section IV). When the feed- chooses actions based on the random environment and the en-
back signal is a reward signal, this node will add its weighted vironment responds to these actions by producing a feedback
value. Conversely, it will reduce its weighted value. Thus, signal. Based on the effect on the automaton, the feedback
the current node can decide which node should be chosen as signal can be divided into ‘positive signal’ (reward signal) or
the next hop node from a group of available hop nodes. Ac- ‘negative’ signal (penalty signal). Over a period of time, the
cordingly, the path value defined in this feedback mechanism automaton can learn from the feedback signal to find an opti-
will be added or reduced (before executing this mechanism, mal action (Fig. 1 shows the operating principle of LA).
4 Journal of Communications and Information Networks

random environment
IV. S YSTEM M ODEL
(MANET) β In this section, we construct a new stability measurement
model and define an effective energy ratio function. On this
basis, we define the node-weighted value function, which can
be used to construct a MANET environment feedback mech-
α anism (a routing learning process).
learning automata
A. Network Model
Figure 1 Learning automata
In general, a MANET can be described by a undirected
Definition 1 (environment) The random environment is graph G = (V, E), where V represents the set of nodes and
an object interacting with the automaton. Usually, we set E represents the set of edges[28,29] . Therefore, a path in the
E = {A, B,C} to describe the random environment. Where network can be regarded as a set of nodes, which connect to
A = {α1 , α2 , · · · , αt } represents the limited sets of inputs per- each other from the source node to the destination node. In our
formed by the automaton, αt represents the action in time paper, all nodes exist in a 2D rectangular scenario and com-
slot t. Similarly, B = {β1 , β2 , · · · , βt } represents a limited municate through a common broadcast channel using omni-
set of responses from the random environment, βt repre- directional antennas. They have the same transmission range
sents the response from the environment in time slot t. C = of r. Assuming that the distance between node i and j can be
{c1 , c2 , · · · , ct } is the set of penalty probabilities, which is as- represented as Dist(i, j) and j is i’s neighbor node, Dist(i, j)
sociated with the given action α in time slot t. should be no more than r. Additionally, we do not have to
Using the definition of ci , the average penalty M(t) can be consider the impact of an interference range or avoid collision
defined by the following expression: by using a shared wireless channel. It should be explained
that these preconditions have been widely adopted in previous
M(t) = E[β (t)|P(t)]
work.
= E[β (t)|p1 (t), p2 (t), · · · , pn (t)]
= Pr[β [t = 0|P(t)] B. Node Stability Measurement Model
n
= ∑ Pr[β (t) = 0|α(t) = αi ] · Pr[α(t) = αi ] The chosen relay node in an available path from the source
i=1 node to the destination node should be stable enough to ensure
n path stability. This distinguishes our work from previous stud-
= ∑ ci · pi (t),
i=1
ies, in which we used only node velocity and did not consider
the impact of the distribution of neighbor nodes on stability.
where P(t) represents the action probability at instant t. Thus,
We measure the node stability by estimating the average time
the average penalty for the pure chance automaton is ex-
a node is connected to its neighbor nodes. Without any loss
pressed as
of generality, we assume that vi represents the mobility speed
1 n
M0 = ∑ ci . of node i in a small time slot, and that the velocity of the node
n i=1
remains unchanged during this short period of time. Hence,
Definition 2 (automaton) In a general sense, an automa- we can estimate the survival time of the link between node i
ton is a work system which does not need guidance from and any of its neighbor nodes, using the following formula:
the outside. From the mathematical point of view, the au-
tomaton can be defined as {A, B, Q, T, G}((A = α1 , α2 , · · · , αt , r2 = [(vix − v jx ) · ti, j + Dist(i, j)x ]2
B = β1 , β2 , · · · , βt have been defined in Def. 1). Q = q1 (t), + [(viy − v jy ) · ti, j + Dist(i, j)y ]2 , (1)
q2 (t), · · · , qn (t) represents the state in time slot t. T : Q × B ×
A −→ Q denotes the state transition function of the automa- where v()x , v()y represent the horizontal component and ver-
ton, which determines the way in which automaton transfers tical component of speed, respectively; Dist()x , Dist()y the
to the next state from the current state and input. We usually horizontal component and vertical component of the current
use the following formula to represent the state transition from distance between node i and node j; node j is a neighbor node
instant t to t + 1. of node i. By solving the above formula, we can get the ex-
q(t + 1) = T (q(t), α(t), β (t)). pression of ti, j
q
G determines the output based on the state at instant t.
Hb2 − Ha · Hc − Hb2
α(t) = G(q(t)). ti, j = , (2)
Ha
A Stable and Energy-Efficient Routing Algorithm Based on Learning Automata Theory for MANET 5

where Ha , Hb , Hc can be represented as Simplifying the above integral (7) using the element-
 integral method
2 2
Ha = (vix − v jx ) + (viy − v jy ) ,

4D
Hb = Dist(i, j)x · (vix − v jx ) + Dist(i, j)y · (vi,y − v jy ), (3) Prob{Dist(i, j) = D 6 r} = funit (D), (8)
 L2W 2
Hc = [Dist(i, j)x ]2 + [Dist(i, j)y ]2 .

where funit (D) can be written as
Thus, we can use a weighted average function to measure 
π r2
for 0 < r 6 W : funit (D) = LW − Lr −W r + ,

node i’s stability(Nsi ) 



 2 2
for W < r < L : funit (D)
( 

Nsi = ∑mj=1 π{current distance = Dist(i, j)} · ti, j , 
√ W2

(4) 
 W
∑mj=1 πi j = 1, = LW arcsin + L r2 −W 2 − − Lr,


p r 2
 2 2
for L 6 r 6 L +W : funit (D)
where π{current distance = Dist(i, j)} represents the 

W W2

 p
weighted value of ti j , which means the limiting probability 

 = LW arcsin + L r2 −W 2 −
that the current distance between i and j is Dist(i, j). To give

 r 2
L2 r 2

 p

the expression of π{current distance = Dist(i, j)}, we should

 +W r2 − L2 − − .
2 2
first deduce the expression of Prob{Dist(i, j) < r}(r is the (9)
transmission range defined in part A of section III). Generally, the transmission range r is always less than
Without loss of generality, the node mobility in this the width W . With the help of the spline interpo-
MANET is independent and stochastic. Thus, we can deduce lation method[30,31] , we can deduce the expression of
the spatial PDF (probability distribution function) of any node π{current distance = Dist(i, j)}
in the MANET using the following uniform distribution:
π{current distance = Dist(i, j) = D}
1
f (x, y) = x ∈ (0, L), y ∈ (0,W ), (5) = Prob{Dist(i, j) = D 6 r < W }−1 ·
L ·W
6D2
 
2π 8(L +W )D
where L, W respectively represent the length and width of + − . (10)
LW L2W 2 L 2W 2
MANET. Therefore, the joint PDF between node i and j can
be represented as follows, noting that the joint PDF relies on Remark 1 D is a simplified representation of Dist() in
the setting of network scenarios previous formulas. The reason we get the expression (10)
through the spline interpolation method is that it meets Lip-
4 schitz Condition[30,32] .

 fi j (Dx, Dy) = L2W 2 · (L − Dx)(W − Dy),


(6) By substituting formula (10) into formula (4), we can ob-
Dx2 + Dy2 = Dist(i, j) = D,

 tain Ns.
Dx = Dist(i, j)x ; Dy = Dist(i, j)y .

C. Effective Energy Ratio Function
Thus, the expression of Prob{Dist(i, j) 6 r} can be ex-
pressed as In an available path, the chosen relay node should have
 enough power to ensure the transmission of packets. We use
for 0 < r 6 W : Prob{Dist(i, j) = D 6 r} the effective energy ratio function (Er) to represent the energy


 √
Z r Z r2 −Dx2
level of the relay node





 = fi j (Dx, Dy)dDxdDy,

 0 0 Rei
Eri = , (11)

for W < r < L : Prob{Dist(i, j) = D 6 r}




 √
Z W Z r2 −W 2
Iei

where Rei and Iei represent the residual energy and initial en-



 = fi j (Dx, Dy)dDxdDy
 0 √ 0
 ergy of node i, respectively. Thus, we can define the node
Z r2 −Dx2
Z r

 + √ fi j (Dx, Dy)dDxdDy, weighted value (Nw) as
0 r2 −W 2

 √
Nw = ω1 · Ns0i + ω2 · Eri ,

 
for L 6 r 6 L2 +W 2 : Prob{Dist(i, j) = D < r}  i



 
 Nsi − Nsmin
Ns0i =
 Z W Z r2 −W 2
, (12)


= fi j (Dx, Dy)dDxdDy Ns max − Nsmin



 

0 √ 0 
ω1 + ω2 = 1,



 Z r2 −Dx2 Z L



 + √ fi j (Dx, Dy)dDxdDy. where ω() represents the weighting factor, Nb0i represents the
0 r2 −W 2
(7) normalized value of Nbi . To find the optimal route among all
6 Journal of Communications and Information Networks

of the available routes, ensuring that it is not only stable but


also guarantees energy conservation and balance, it is neces- node
sary to optimize the selection of paths. We note that tradi-
tional heuristic routing algorithms generally lack extendibil- generate sensing the
feedback signal surrounding
ity, mathematical rigor, and adaptability for a dynamic en- (reward or MANET
vironment. Therefore, unlike previous work, we have cho- penalty?) environment

sen LA theory to complete the optimization process. In the


next section, we discuss in more detail our LA theory-based, learning automata
energy-efficient, stable routing algorithm.

Figure 2 Feedback mechanism of learning automata


V. S TABLE AND E NERGY E FFICIENT ROUTING
A LGORITHM BASED ON LA T HEORY criteria, where j and k represent the next available hop nodes
of node i):
In this section, we first use LA theory to construct a (1) If a relay node receives a reply packet from the next
MANET environment feedback mechanism. In other words, hop node, and the information contained in this reply packet
by judging the content of each reply message, we reward or is “good information” (a reward signal), we will execute the
punish the relay node through a rigorous iteration expression. reward scheme and the weighted value (Nw) of the next hop
We then provide a detailed algorithm implementation scheme. node will be increased accordingly.
Finally, we prove the convergence of our algorithm. (2) If a relay node receives the reply packet from its next
hop node, and the information contained in this reply packet is
A. MANET Environment Feedback Mechanism “bad information” (a penalty signal), we will execute penalty
scheme and the weighted value (Nw) of the next hop node will
When the source node plans to send packets to the destina- be reduced accordingly.
tion node, it needs to find the available paths from the source
node to destination node. To find all of the available paths to me
rd sche j
the destination node, the source node broadcasts build-route rewa
ion
messages to its neighbor nodes (flooding requests). Once the info rmat
good
build-route message is received by the destination node, it will i
bad i
nform
reply the build-route message, so that all available paths from ation
pena
the source node to the destination node can be identified. Ob- lty sc
hem k
e
viously, not all of these paths will be good enough to transmit
data packets. Therefore, it is necessary to find the optimal Figure 3 Two feedback criterions
path, which is sufficiently stable and ensures overall energy
saving and balance. Using the standard linear iteration equation of LA
Using LA theory, we construct a MANET environment theory[23] , the node’s reward feedback scheme for receiving
feedback model, which is an optimization mechanism for “good information” can be represented as
route selection. In this model, each node is equipped with
a learning automaton to execute the feedback mechanism. Nwi (t + 1) = Nwi (t) + a[1 − Nwi (t)]. (13)
Fig. 2 shows the feedback mechanism of the LA. To find the
optimal path from the available paths, the source node sends The penalty feedback scheme for altering a node’s
request packets to the destination node through the available weighted value on receipt of “bad information” can be rep-
paths. When it receives the request packets, the destination resented as
node responds by sending reply packets to the source node Nwi (t + 1) = (1 − b)Nwi (t), (14)
along the available paths. In this reply process, each relay
node on an available path receives a reply packet from the next where a and b represent the linear update rate of the weighted
available hop node. Drawing on LA theory, the next available value; t represents the instant (in a discrete condition, it de-
hop node senses the surrounding MANET environment and notes iteration times).
adds environment-feedback information to the reply message. It is now essential to define a feedback judgement func-
As mentioned in section II, there are two types of envi- tion, which can decide whether information is good or bad.
ronment feedback. Based on the information content of the Based on the definition of feedback judgement function in
replies, we set two feedback criteria (Fig. 3 shows these two LA theory[23] , the feedback judgment function(ϕ) can be ex-
A Stable and Energy-Efficient Routing Algorithm Based on Learning Automata Theory for MANET 7

pressed as Our proposed routing algorithm


Ni
Nw j (t) Initialization:
ϕi (t) = Nwi (t) − ∑ , (15)
j=1 Ni adjustable parameters of LA(δ , ε)
Input:
where Ni represents the number of node i’s neighbor nodes.
the available paths between the source node and the destination node (α. ),
When Nwi (t) > 0 (better than the average level), the MANET
relay nodes which belong to the available paths
environment has generated a reward feedback signal and the
Output:
node has received good information; conversely, Nwi (t) < 0
the optimal path
(worse than the average level) means that the MANET envi-
Execution steps:
ronment has generated a penalty feedback signal and the node
1) sending a packet through an available path;
has received bad information.
2) while receiving the packet, the destination node will send a reply
We note that only using formulas (13) and (14) cannot re- message to the source node;
flect feedback strength, as they change with iteration times. 3) for i = 1, 2, · · · , m − 1 (i represents the relay node ID);
Therefore, we continue to optimize these two formulas, using 4) if the relay node i receives a reply message from the next available
the following expression to represent the improved feedback Nj
hop node j and Nwi (t) − ∑k=1 (Nwk (t)/N j) > 0;
scheme: 5) using the reward feedback scheme to update the weighted value of
 node j;


 If node i is rewarded and get good information 6) Nw j (t + 1) = Nw j (t) + a j (t)[1 − Nw j (t)];

 Nwi (t + 1) = Nwi (t) + ai (t)[1 − Nwi (t)], 7) updating the path value which contains this relay node j accord-
(16) ingly;

 If node i is punished and get bad information
8) Pm (t + 1) = Pm (t) + (a j (t)[1 − Nw j (t)])/M;


Nwi (t + 1) = [1 − bi (t)]Nwi (t),

9) else (node i receives the reply message from the available next hop
Nj
node j and Nw j (t) − ∑k=1 (Nwk (t)/N j) < 0);
where
10) using the penalty feedback scheme and updating the weighted
 value of the node j;
a(t) = δ · µ(t),
 11) Pj (t + 1) = [1 − b j (t)]p j (t);
b(t) = ε · [1 − µ(t)], (17) 12) updating the path value which contains this relay node j accord-
ingly;

µ(t) = exp[−|ϕi (t)|].

13) Pm (t + 1) = Pm (t) + [b j (t)]/M;
As the node-weighted value updates, based on our proposed End
MANET feedback mechanism, the path-weighted value will
be updated accordingly. In this feedback mechanism, the path C. Convergence Proof of Our Proposed Routing Algo-
value (P) of an available path from the source node to the des- rithm
tination node can be represented as
As mentioned in a previous paper, our proposed algorithm
M
Nwi (t) uses LA theory to optimize the selected paths. As it is impor-
Pm (t) = ∑ , (18)
i=1 M tant to ensure the convergence of this process, we have pro-
vided the following rigorous proof.
where M represents the number of relay nodes on an avail-
able path from the source node to the destination node; m Theorem 1 The node-weighted value Nw(t) can converge if
represents the ID of the available path. With the help of this and only if the drift of the node-weighted value Nw0 (t) is not
feedback mechanism, the node learns from the information by a direct function of the instant t.
sensing the MANET environment (judging the type of feed- Proof
back signal) and updates its weighted value. With the help Lemma 1 the drift of node weighted value Nw0 (t) is not
of this feedback mechanism, we can find the optimal path, directly a function of instant t.
with the highest path value P. Finally, the data packets will Lemma proof As the process of this algorithm can be re-
be transmitted through this optimal path that is stable enough garded as an optimization problem, we can use approximation
and ensures overall energy conservation and balance. theory[32] to prove the algorithm. In our paper, each node has
a learning automaton; for this reason, we can use the theory
B. Algorithm Implementation Scheme of stochastic process to represent the genericity update pro-
cess of weighted value (in the dynamics control method, the
We give the detailed algorithm implementation scheme as one step probability update is called the drift function of prob-
follows: ability because the long update process can be regarded as a
8 Journal of Communications and Information Networks

continuous update process. For this reason, we can present the in definition 1) takes on values in a compact metric space. The
genericity update process of node-weighted value in this way: outputs of the LA derive from a finite set, and the reinforce-
ment signals take values from the closed interval [0, 1]. It is
Nw0 (t + 1) = E[Nw(t + 1)|Nw(t)] − Nw(t). (19) important to note that, in our proposed algorithm, α(t) rep-
resents the result of the sensing environment, which has two
Based on the Kolmogorov criterion[23] , formula (19) can be
types of information, good and bad. Hence, α(t) is a finite set.
represented as
Similarly, β (t) represents the feedback mechanism for the re-
Nw0i j (t) = Λ · Nwi j (t)[1 − Nwi j (t)E[βi j (t)] lay nodes, which also can be represented as two probability
iteration formulas. For this reason, β (t) in our paper is also a
− Λ · ∑ Nwiq (t)Nwi j (t)E[βiq (t)]]
q6= j
finite set. In addition, we can determine that the computation
space has been defined in [0, 1], which, of course, meets the
= Λ · Nwi j (t) ∑ Nwiq (t)[E[βi j (t)] − E[βiq (t)]]. (20)
q6= j
condition.
In our routing algorithm, we have improved the update fac-
Based on the definition in LA theory, Λ represents the tor using formula (17). It should be stressed here that a(t),
genericity maximum update rate proposed in our improved b(t) are represented as two exponential functions. We can
LA. Generally, the update rate floats around 0.10 (in our pa- easily find that
per, we set δ = 0.10 and ε = 0.10). Thus, we can reach the
(
max{a(t)} = δ ,
following conclusion by differentiating both sides of the for- (25)
max{b(t)} = ε.
mula (20)
Therefore, we can confirm that the update rate is in [0, 1],
ϑ ∑q6= j Nwiq E[βiq (t)
= E[βi j (t)]. (21) which also meets the condition. We find that the formula
Nwi j
(24) (an ordinary differential equation) has a unique solution,
Let (ϑ ∑q6= j Nwiq E[βiq (t))/Nwi j = Q(Nw), Nw0i j (t) can be which is based on the initial x(0).
represented as In this proof, j represents the chosen next hop node of node
   i; q represents the ID of the hop node not chosen by node i.
 Nw0i j (t) = Λ Nwi j (t) · ∑ Nwiq Q(Nw) − Q(Nw) ,
 Hence Nwi j represents the edge between nodes i and j, which
q6= j Nwi j Nwiq (22) is chosen by selecting the next hop node j from current node
 Nw0 (t) = function(Nw(t)).

i. The value of Nwi j is equal to the current node-weighted
ij
value Nw j . The genericity maximum update rate Λ can have
Obviously, we conclude that Nw0i j is not a direct function two parameters: a reward coefficient and a penalty coefficient.
of the time slot t. This conclusion allows us to use the spline Thus, we are able to provide a convergence proof using LA
interpolation method[31] to rewrite the node-weighted value theory.
Nw(t)
Nw(t) = PΛ (τ), (23)
VI. S IMULATION R ESULTS A BOUT THE
where τ ∈ [nΛ , (n + 1)Λ ], PΛ (τ) is a piecewise constant in- P ROPOSED A LGORITHM
terpolation function (In LA theory, P(t) represents the action
probability at instant t). Now we only care about the conver- This section evaluates the performance of the proposed al-
gence of PΛ (). gorithm. By using NS-3[34,35] , we compare our proposed al-
Generally, the genericity maximum update rate Λ is close gorithm (LASEERA) with that of ACSRA (Ref. [12]), AN-
to 0[23] . Hence, based on approximation theory[33] , we can NQARA (Ref. [21]), and classical AODV.
make an assertion that pΛ (τ) can weakly converge to the so-
lution set of an ordinary differential equation[30] , which can be A. Simulation Parameters
represented as
   Tab. 1 shows the network parameters.
 dxi j = xi j (t) ∑ xiq Q(x) − Q(x) ,
 Our experiment uses the Random Waypoint model to de-
dτ q6= j xi j xiq (24) scribe the node motion and standard energy module parame-

x(0) = pΛ (0), ters of NS-3 (Power End, Power Start, Current A) in order to
describe the energy transmission consumption. It is important
where X(.) denotes Nw(.) in our study. It must be noted that to stress that the parameters δ and ε must not be too large or
formula (24) is a particular case in the weak convergence the- too small. Like the memory factor of Markov Chain, these two
orem, as it relies on the fact that P(t), β (t), α(t) constitute a parameters determine the maximum update rate of the LA. If
Markov process. In addition, the value of β (t), α(t) (defined these parameters are too large, the optimization process rate
A Stable and Energy-Efficient Routing Algorithm Based on Learning Automata Theory for MANET 9

Table 1 Parameters as possible.


network scene scale 500 m × 600 m • Normalized control overhead This metric reflects the
nodes number {40, 60, 80, 100} extra overhead that results from using non-data-transmission
transmission range 50 m packets, which are needed to construct the optimization strat-
simulation time 600 s
egy of our algorithm.
Fig. 4 shows the variation in average route survival time in
mobility model random waypoint
relation to the number of nodes, when the maximum veloc-
speed range 0-20 m/s
ity is 10 m/s. It is clear that our proposed routing algorithm
pause time 0s
(LASEERA) has a higher route survival time than ACSRA,
initial distribution completely uniform ANNQARA, or AODV. When compared with ACSRA, AN-
MAC protocol 802.11 b NQARA and AODV, our proposed routing algorithm’s route
initial energy 2 Joule survival time increases by an average of 22.5%, 24.9%, and
power end 16.0206 dBm 80.2%, respectively.
power start 16.0206 dBm
6
current A 0.0174 Ampere LASEERA
bandwidth 250 kbit/s ANNQARA

average route survival time/s


ACSRA
packet size 256 bytes 5 AODV

weighting coefficient ω1 = ω2 = 0.5


reward coefficient δ (0.10)
4
penalty coefficient ε (0.10)

will be added, but this process cannot ensure the robustness of 3


the algorithm. If these parameters are too small, the optimiza-
tion process rate will be reduced and learning efficiency will
not be good enough. In general, the value of the update rate 2
40 60 80 100
floats around 0.10, which is an empirical value. In addition, number of nodes
we must fairly consider route stability and energy efficiency.
Figure 4 Survival time vs. nodes
We therefore set ω1 = ω2 = 0.5 (obviously, the sum of the
weighting coefficients must be 1).
The relationship between route survival time and the num-
B. Experiment Metrics ber of nodes is not a monotonic relationship. Overall, the route
survival time will slightly but not strictly increase with the
To measure the performance of our proposed routing algo- number of nodes.
rithm, we use the following metrics. By analyzing the simulation results, we can confirm that
• Route survival time The time period when the route is our node stability measurement model is useful for enhancing
connected. This metric is significant as a measure of the ra- route stability. In the ACSRA routing algorithm, the authors
tionality and effectiveness of the node-stability measurement focus on efficiently adjusting the relay node, which uses only
model used in our proposed routing algorithm. node density as its optimization parameters; this cannot pro-
• Residual energy The energy remaining in the node. vide a route survival time as long as that offered by our routing
This metric uses our routing algorithm to reflect the level of algorithm. In ANNQARA, the authors used a two-layer CNN
energy consumption. to optimize the throughput ratio and end-to-end delay. This
• Energy variance The difference in residual energy be- is why it cannot ensure as long a route survival time as our
tween nodes. This metric reflects the energy balance level. routing algorithm. Classical AODV uses the shortest path as
An efficient routing protocol can ensure that this metric is as its routing policy, which, of course, cannot ensure route sta-
small as possible. bility (the shortest path cannot ensure the relative stability of
• Packet delivery ratio The ratio of packets successfully the link).
delivered to the destination node. An efficient routing protocol Fig. 5 shows the variation in average route survival times,
can maintain this metric at a relatively high level. in relation to maximum velocity, when the number of nodes
• End-to-end delay The time it takes for data packets is 40. It is clear that our proposed routing algorithm
sent from the source node to reach the destination node. An (LASEERA) has a higher route survival time than ACSRA,
efficient routing protocol can ensure that this metric is as small ANNQARA, or AODV. In fact, our proposed routing algo-
10 Journal of Communications and Information Networks

6 of nodes, but the relationship is not strictly linear.


LASEERA
ANNQARA Analyzing the simulation results shows that our strategy is
ACSRA
average route survival time/s

AODV useful in reducing energy consumption. It is worth noting


4
that, although we use only an effective energy ratio function
in the optimization strategy (in common with the other three
algorithms cited, we do not have an intelligence transmission
power adjustment strategy), energy consumption is reduced.
2 The first reason for this is that we are able to maintain the route
life cycle as long as possible, as our stability control model de-
creases route adjustment frequency, thus saving energy. The
second reason is that the effective energy ratio function can
0
10.0 12.5 15.0 17.5 20.0 ensure the node residual energy level. In other words, the
maximum velocity/m.s−1 probability that a node will not have enough energy to trans-
mit a packet in the packet transmission process is lower than
Figure 5 Survival time vs. velocity
that in the other three algorithms. If the relay node does not
have enough energy to transmit a packet, we must abandon
rithm’s route survival time is higher than those of ACSRA,
this routing and adjust the relay node accordingly, which nat-
ANNQARA, and AODV by an average of 37.4%, 41.5%, and
urally increases energy consumption. Because of its frequent
104.3% respectively.
route reconstruction, AODV has the highest energy consump-
By analyzing the simulation results, we can see that, as the
tion.
velocity increases, the network topology stability decreases.
Fig. 7 shows the variations in average residual energy in re-
This means that the link cannot be maintained long enough.
lation to maximum velocity, when the number of nodes is 40.
This is why we see so many decreasing route survival times.
It is clear that our proposed routing algorithm (LASEERA)
Owing to the stability model constructed in our paper, our pro-
has higher residual energy than ACSRA, ANNQARA, or
posed algorithm has the best route survival time.
AODV. In fact, our proposed routing algorithm’s residual
Fig. 6 shows the variations in average residual energy in
energy is an average of 16.6%, 19.5%, and 40.5% higher
relation to the number of nodes, when the maximum veloc-
than ACSRA, ANNQARA and AODV, respectively. Overall,
ity is 10 m/s. It is clear that our proposed routing algorithm
residual energy decreases with the maximum velocity value.
(LASEERA) produces higher residual energy than ACSRA,
ANNQARA or AODV. In fact, our proposed routing algo- 2
rithm’s residual energy is an average of 21.9%, 18.3%, 55.6%
higher than ACSRA, ANNQARA and AODV, respectively.
average residual energy/J

1
average residual energy/J

LASEERA
1 ANNQARA
ACSRA
AODV

LASEERA
0
ANNQARA 10.0 12.5 15.0 17.5 20.0
ACSRA maximum velocity/m.s−1
AODV

0 Figure 7 Residual energy vs. velocity


40 60 80 100
number of nodes
By analyzing the simulation results, we can see that, as
Figure 6 Residual energy vs. nodes velocity increases, residual energy decreases due to the in-
crease of network reconstruction frequency, inflating energy
The relationship between our routing algorithm’s residual consumption. As detailed above, our optimization strategy
energy and the number of nodes is not a monotonic rela- can ensure the residual energy level, reducing the probability
tionship. The residual energy of the two other routing algo- of relay node adjustment during the packet transmission pro-
rithms (ACSRA and ANNQARA) decreases with the number cess. Hence, as velocity increases, our algorithm still has the
A Stable and Energy-Efficient Routing Algorithm Based on Learning Automata Theory for MANET 11

best performance. 1.0

Fig. 8 shows the variations in energy variance in relation to


the number of nodes, when the maximum velocity is 10 m/s. 0.8
On the whole, our proposed routing algorithm (LASEERA)

energy variance
has lower energy variance than ACSRA, ANNQARA, or 0.6
AODV. In fact, our proposed routing algorithm’s energy vari-
ance is an average of 25.5%, 24.9%, and 34.9% lower than 0.4
ACSRA, ANNQARA and AODV, respectively. This means
that our proposed routing algorithm performs better in balanc- LASEERA
0.2 ANNQARA
ing energy. In addition, the energy variance decreases with the ACSRA
AODV
number of nodes; the relationship is not strictly linear.
0.0
10.0 12.5 15.0 17.5 20.0
1.0
LASEERA
maximum velocity/m.s−1
ANNQARA
ACSRA Figure 9 Energy variance vs. velocity
0.8 AODV
energy variance

ity increases, so does route reconstruction frequency. Given


0.6
that there is a fixed number of nodes in the network, the prob-
ability of node reuse also increases, enhancing the value of
energy variance and decreasing energy balance.
0.4 Fig. 10 shows the variations in end-to-end delay in rela-
tion to number of nodes, when maximum velocity is 10 m/s.
We can see that our proposed routing algorithm (LASEERA)
0.2 performs not as good as ANNQARA but better than ACSRA
40 60 80 100
number of nodes
in end-to-end delay. Compared to ACSRA, our algorithm
(LASEERA)’s end-to-end delay reduces 42.2% in average.
Figure 8 Energy variance vs. nodes Compared to ANNQARA, our algorithm (LASERA)’s end-
to-end delay increases 6.7% in average. Compared to AODV,
Analyzing the simulation results shows that our strategy our algorithm (LASERA)’s end-to-end delay reduces 19.6%
is useful for balancing overall residual energy. It should be in average.
noted that, in our method, the node chosen to be a relay node
should have higher residual energy than neighboring nodes. 40
LASEERA
In this way, we can ensure that the same nodes will not al- ANNQARA
ways be reused during the route reconstruction process. This ACSRA
average end-to-end delay/ms

AODV
is the reason why our algorithm has the smallest energy vari- 30

ance. In the other three routing algorithms, the authors have


not considered this problem; as a result, their energy variance
20
performance is not as good as ours. As the number of nodes
increases, the reuse frequency of nodes decreases during the
route reconstruction process. This demonstrates that, as the 10
number of nodes increases, the energy variance decreases.
Fig. 9 shows the variations in energy variance in relation to
maximum velocity, when the number of nodes is 40. Clearly, 0
40 60 80 100
our proposed routing algorithm (LASEERA) has less energy
number of nodes
variance than ACSRA, ANNQARA or AODV. In fact, our
proposed routing algorithm’s energy variance is an average Figure 10 The end-to-end delay vs. nodes
of 29.8%, 26.1%, and 38.0% lower than that of ACSRA, AN-
NQARA and AODV, respectively. This means that our pro- Analyzing the simulation results shows that that ACSRA
posed routing algorithm performs better in balancing energy. achieves the highest end-to-end delay performance. The rea-
Analyzing the simulation results shows that, as velocity in- son is that in ACSRA, the node density is an important factor
creases, so does energy variance. The relationship between to choose or adjust the relay node, which ensures the chosen
them is not strictly linear. The reason for this is that, as veloc- node has relatively high node density. This method may cause
12 Journal of Communications and Information Networks

the route contains more relay nodes when the number of nodes relation to the number of nodes, when maximum velocity
increases. From the simulation results, it is easy to follow this is 10 m/s. We can see that our proposed routing algorithm
truth. We also find that the AODV has a relative high delay (LASEERA) has a relatively low packet delivery ratio, in
performance, which means the original strategy cannot repre- comparison to ACSRA, ANNQARA, and AODV. In fact, our
sent the shortest delay performance. It is interesting to discuss proposed routing algorithm’s packet delivery ratio is an aver-
the delay performance of our algorithm and ANNQARA. As age of 2.0% and 1.1% lower than those of ACSRA and AN-
we have mentioned in related work, ANNQARA use the pa- NQARA, respectively. Compared to AODV, our routing algo-
rameter of end-to-end delay to construct the convolution cal- rithm’s packet delivery ratio is 5.5% higher.
culation model (2 layers CNN). Hence, the result that the de-
lay performance of ANNQARA is better than other routing al- 1.0
gorithms is normal. Owing to the stability model constructed

average packet delivery ratio


in our paper, the definition of node weighted value has con-
tained the consideration to distance factor (node distribution).
This is the reason why our proposed routing algorithm can
achieve good delay performance. We note that the delay per- 0.9
formance of our algorithm and ANNQARA are not influenced
by the number of nodes (as a whole).
Fig. 11 shows the variations in end-to-end delay in relation LASEERA
ANNQARA
to maximum velocity, when the number of nodes is 40. In ACSRA
AODV
this case, our proposed routing algorithm (LASEERA) per-
0.8
forms less well than ANNQARA but better than ACSRA in 40 60 80 100
end-to-end delay. Compared to ACSRA, our algorithm’s end- number of nodes
to-end delay is an average of 11.7% lower. Compared to AN-
Figure 12 The packet delivery ratio vs. nodes
NQARA, its end-to-end delay is an average of 6.7% higher.
Compared to AODV, its end-to-end delay is 11.3% lower, on
average. Analyzing the simulation results shows that ANNQARA
has the best packet delivery ratio because it uses a convolu-
30 tion calculation model. ACSRA also has a better packet deliv-
ery ratio than our proposed algorithm. This reflects its unique
strategy. As discussed above, a node that has relatively high
average end-to-end delay/ms

density is a suitable relay node. This approach ensures the


relay node always has auxiliary nodes, guaranteeing transmis-
sion, when the next hop node is malfunctioning. Hence, AC-
20
SRA’s results are easy to understand. Of all of the systems,
AODV has the worst packet delivery ratio.
LASEERA Fig. 13 shows the variations in packet delivery ratios, in re-
ANNQARA
ACSRA lation to maximum velocity, when the number of nodes is 40.
AODV
We can see that our proposed routing algorithm (LASEERA)’s
10 packet delivery ratio is entirely lower than that of ACSRA
10.0 12.5 15.0 17.5 20.0
and partly lower than that of ANNQARA. Compared to AC-
maximum velocity/m.s−1
SRA, our proposed routing algorithm’s packet delivery ra-
Figure 11 End-to-end delay vs. velocity tio is an average of 2.3% lower. Compared to ANNQARA,
our algorithm (LASEERA)’s packet delivery ratio is an av-
Analyzing the simulation results, we find that as velocity erage of 0.8% lower. Compared to AODV, our algorithm
increases, delay increases (not absolutely). The reason for (LASEERA)’s packet delivery ratio is an average of 5.3%
this is that, as velocity increases, so does route reconstruction higher.
frequency, leading to the frequent adjustment of relay nodes. Analyzing the simulation results shows that, with increased
In this situation, the route may need auxiliary nodes to en- velocity, the packet delivery ratio decreases. The reason for
sure transmission from the current node to its next hop node, this is easy to follow: route stability is reduced by increasing
which, of course, adds to the number of relay nodes. The end- velocity. Hence, the packet delivery ratio decreases.
to-end period of delay increases accordingly. Now, we must analyze the computational complexity of
Fig. 12 shows the variations in packet delivery ratios in our algorithm. In the worst possible conditions, we would
A Stable and Energy-Efficient Routing Algorithm Based on Learning Automata Theory for MANET 13

1.0 1.0
LASEERA
ANNQARA

normalized control overhead


average packet delivery ratio

ACSRA
AODV

0.8 0.5

LASEERA
ANNQARA
ACSRA
AODV

0.6 0.0
10.0 12.5 15.0 17.5 20.0 40 60 80 100
maximum velocity/m.s−1 number of nodes

Figure 13 Packets delivery ratio vs. velocity Figure 14 Control overhead vs. nodes

need n − 2 nodes to construct a route between the source node rithm. The above results confirm that the MANET routing al-
and the destination node. These relay nodes are within the gorithm, which uses optimization methods, increases control
same transmission range. Based on the optimization strategy overheads, boosting computation costs.
in our algorithm, we need to ascertain the current state of the Fig. 15 shows the impact of velocity on control overheads.
neighbor nodes and then judge the type of feedback. In this It is clear that AODV has the smallest amount of control over-
way, a node needs to communicate with n − 3 neighbor nodes. heads, while ANNQARA and ACSRA have higher control
Therefore, a one-time feedback process needs (n − 2)(n − 3) overheads than our routing algorithm. This means that, as
communication times. As we have mentioned above, the opti- a learning-based routing protocol, our routing algorithm has
mization process can be finished in a finite time period (a finite an acceptable amount of control overheads: 26.6% lower than
number of iteration times). Assuming that the finite number ANNQARA and 27.5% lower than ACSRA. By contrast, our
of iteration times is M, the total communication time will be proposed algorithm’s control overheads are 30.2% higher than
M(n−2)(n−3), or O(n2 ). We note that, in practical terms, the those of AODV.
computational complexity would normally be less than that
shown in this example. To measure the computational com- 1.0
LASEERA
plexity of our algorithm, we use control overhead to evaluate ANNQARA
normalized control overhead

ACSRA
this metric. AODV
Fig. 14 shows the impact of the number of nodes on the
amount of control overheads. It is clear that AODV has the
smallest amount of control overheads, while ANNQARA and 0.5
ACSRA have higher control overheads than our routing al-
gorithm. That means that, as a learning-based routing proto-
col, our routing algorithm has an acceptable amount of con-
trol overheads, which are 22.7% lower than ANNQARA and
25.2% lower than ACSRA. By contrast, they are 38.3% higher
than AODV. 0.0
10.0 12.5 15.0 17.5 20.0
Analyzing the simulation results shows that that AODV maximum velocity/m.s−1
has the smallest amount of control overheads. The reason
Figure 15 Control overhead vs. velocity
for this is that AODV does not use any additional optimiza-
tion methods to optimize the chosen route, which, of course,
reduces the control overheads. As mentioned in the previ- Analyzing the simulation results shows that, as velocity in-
ous section, ANNQARA uses a 2-layer CNN, which can en- creases, so do control overheads. The reason for this is easy
hance computation costs. For this reason, ANNQARA’s con- to understand: route stability is reduced by increasing velocity
trol overheads are larger than those of our routing algorithm. (frequent route reconstruction requires more control packets).
ACSARA needs periodical control packets to adjust its relay Hence, control overheads increase.
nodes, which can also increase control overheads. Its control The simulation results above offer the following general-
overheads are therefore larger than those of our routing algo- ized findings:
14 Journal of Communications and Information Networks

(1) Our proposed routing algorithm (LASEERA) has the [3] Z. Yang, Y. Liu. Understanding node localizability of wireless Ad Hoc
best performance when it comes to route survival time, energy and sensor networks [J]. IEEE Transactions on Mobile Computing,
2012, 11(8): 1249-1260.
consumption, and energy balance.
[4] M. A. Rahman, M. S. Hossain. A location-based mobile crowdsensing
(2) Intelligence algorithms (LASEERA, ANNQARA, and framework supporting a massive Ad Hoc social network environment
ACSRA) need more costs to control their optimization strat- [J]. IEEE Communications Magazine, 2017, 55(3): 76-85.
egy, which, of course, increases their control overheads. Com- [5] N. T. Dinh, Y. Kim. Information-centric dissemination protocol for
pared to the other intelligence algorithms (ANNQARA and safety information in vehicular ad-hoc networks [J]. Wireless Net-
works, 2017, 23(5): 1359-1371.
ACSRA), our algorithm has an acceptable performance in re-
[6] S. J. Lee, W. Su, M. Gerla. Wireless Ad Hoc multicast routing with
lation to control overheads. mobility prediction [J]. Mobile Networks and Applications, 2001, 6(4):
(3) No algorithm can optimize all of the metrics without 351-360.
paying any additional cost. The optimization strategy deter- [7] B. An, S. Papavassiliou. MHMR: mobility-based hybrid multicast
mines which metrics it can optimize. An efficient optimiza- routing protocol in mobile Ad Hoc wireless networks [J]. Wireless
Communications and Mobile Computing, 2003, 3(2): 255-270.
tion strategy relies on the precondition that the additional costs
[8] A. Bentaleb, S. Harous, A. Boubetra. A weight based clustering
of the strategy are acceptable. scheme for mobile Ad Hoc networks [C]//The 11th International Con-
ference on Advances in Mobile Computing and Multimedia, Vienna,
2013: 161-166.
VII. C ONCLUSION AND F UTURE W ORK [9] S. Guo, O. Yang. Maximizing multicast communication lifetime in
wireless mobile Ad Hoc networks [J]. IEEE Transactions on Vehicular
In this paper, we proposed an energy-efficient, stable rout- Technology, 2008, 57(4): 2414-2425.
ing algorithm based on LA theory for MANET, and provided [10] H. B. Thriveni, G. M. Kumar, R. Sharma. Performance evaluation of
clear research steps. We first constructed a new node stabil- routing protocols in mobile ad-hoc networks with varying node density
and node mobility [C]//International Conference on Communication
ity measurement model and defined an effective energy ra-
Systems and Network Technologies, Gwalior, 2013: 252-256.
tio function; these were used to define the node-weighted [11] R. Suraj, S. Tapaswi, S. Yousef, et al. Mobility prediction in mobile
value. Second, we constructed a MANET environment feed- Ad Hoc networks using a lightweight genetic algorithm [J]. Wireless
back mechanism, in which each node is equipped with a learn- Networks, 2016, 22(6): 1797-1806.
ing automaton to execute an optimization process and update [12] T. Manimegalai, C. Jayakumar, G. Gunasekaran. Using animal com-
munication strategy (ACS) for MANET routing [J]. Journal of the Na-
its own weighted value, based on different feedback signals,
tional Science Foundation of Sri Lanka, 2015, 43(3): 199-208.
which are generated by sensing the node’s ambient network [13] G. Singal, V. Laxmi, M. S. Gaur, et al. Moralism: mobility prediction
environment. In this process, we also improved the basic with link stability based multicast routing protocol in MANETs [J].
LA, so that each node can sense variations in feedback signal Wireless Networks, 2017, 23(3): 663-679.
strength over time. In addition, we provided a rigorous mathe- [14] Selvi, F. A. Pitchaimuthu. Ant based multipath backbone routing for
load balancing in MANET [J]. IET Communications, 2017, 11(1):
matical proof to authenticate the convergence of our proposed
136-141.
routing algorithm, which has not been done well in earlier [15] A. Kout, S. Labed, S. Chikhi, et al. AODVCS, a new bio-inspired
studies. Through a simulation experiment, we found that our routing protocol based on cuckoo search algorithm for mobile Ad Hoc
proposed routing algorithm has the best performance in route networks [J]. Wireless Networks, 2017(9): 1-11.
survival time, energy consumption, and energy balance and [16] J. Liu, Y. Xu, X. Jiang. End-to-end delay in two hop relay MANETs
with limited buffer [C]//Second International Symposium on Comput-
achieves an acceptable performance in end-to-end delay and
ing and Networking, Shizuoka, 2015: 151-156.
packet delivery ratio. [17] S. Chettibi, S. Chikhi. Adaptive maximum-lifetime routing in mobile
In the future study, the following research directions are ad-hoc networks using temporal difference reinforcement learning [J].
meaningful. Evolving Systems, 2014, 5(2): 89-108.
(1) Extending this algorithm to a layered network structure. [18] A. Petrowski, F. Aissanou, I. Benyahia, et al. Multicriteria rein-
forcement learning based on a Russian doll method for network rout-
(2) Considering how to improve this algorithm within an
ing [C]//IEEE International Conference Intelligent Systems, London,
energy harvesting MANET scene. 2010: 321-326.
(3) Designing a QoS cross layer routing algorithm to extend [19] P. Vijayalakshmi, S. A. J. Francis, J. A. Dinakaran. A robust energy
our current work. efficient ant colony optimization routing algorithm for multi-hop Ad
Hoc networks in MANETs [J]. Wireless Networks, 2016, 22(6): 2081-
2100.
R EFERENCES [20] S. Chettibi, S. Chikhi. Dynamic fuzzy logic and reinforcement learning
for adaptive energy efficient routing in mobile ad-hoc networks [J].
[1] L. Blazevic, L. Buttyan, S. Capkun, et al. self-organization in mobile Applied Soft Computing, 2016, 38: 321-328.
ad-hoc networks: the approach of terminodes [J]. IEEE Communica- [21] P. Srivastava, R. Kumar. A new QoS-aware routing protocol for
tions Magazine, 2001, 39(6): 166-174. MANET using artificial neural network [J]. Journal of Computing and
[2] R. Bruno, M. Conti, E. Gregori. Mesh networks: commodity multihop Information Technology, 2016, 24(3): 221-235.
Ad Hoc networks [J]. IEEE Press, 2005, 43(3): 123-131. [22] S. K. Das, S. Tripathi. Intelligent energy-aware efficient routing for
A Stable and Energy-Efficient Routing Algorithm Based on Learning Automata Theory for MANET 15

MANET [J]. Wireless Networks, 2016(7): 1-21. [34] G. F. Riley, T. R. Henderson. The ns-3 Network Simulator [J]. Model-
[23] K. S. Narendra, M. A. L. Thathachar. Learning automata: An intro- ing and Tools for Network Simulation, 2010: 15-34.
duction [M]. USA: DBLP, 2012. [35] M. S. Khan, Q. K. Jadoon, M. I. Khan. Mobile and wireless technology
[24] M. A. L. Thathachar, P. S. Sastry. A hierarchical system of learning 2015: A comparative performance analysis of MANET routing proto-
automata that can learn the globally optimal path [J]. Information Sci- cols under security attacks [M]. Germany: Springer Berlin Heidelberg,
ences, 1987, 42(2): 143-166. 2015, 310: 137-145.
[25] H. Beigy, M. R. Meybodi. Utilizing distributed learning automata to
solve stochastic shortest path problems [J]. International Journal of
Uncertainty, Fuzziness and Knowledge-Based Systems, 2006, 14(05): A BOUT THE AUTHORS
591-615.
[26] M. L. Thathachar, P. S. Sastry. Varieties of learning automata: an Sheng Hao was born in Lanzhou. He received his B.E.
overview [J]. IEEE Transactions on Systems Man & Cybernetics Part and M.S. degrees in computer science and technology
B Cybernetics A Publication of the IEEE Systems Man & Cybernetics from Wuhan University. He is now a Ph.D. candidate
Society, 2002, 32(6): 711-722. of architecture. His research interests include wireless
network, communication theory and complex network
[27] A. A. Anasane, R. A. Satao. A survey on various multipath routing
theory. (Email: 2008301500139@whu.edu.cn)
protocols in wireless sensor networks [J]. Procedia Computer Science,
2016, 79: 610-615.
[28] D. B. West. Introduction to graph theory [M]. 2nd ed. USA: McGraw-
Hill Higher Education, 2005: 260.
Huyin Zhang [corresponding author] was born in
[29] I. Das, D. K. Lobiyal, C. P. Katti. Multipath routing in mobile Ad Hoc
Shanghai. He received his Ph.D. degree in computer
network with probabilistic splitting of traffic [J]. Wireless Networks,
science and technology from Wuhan university. He is
2016, 22(7): 2287-2298.
a professor of Wuhan university. His research interests
[30] H. J. Kushner. Approximation and weak convergence methods for include high performance computing, network qual-
random processes, with applications to stochastic systems theory [M]. ity of service, new generation network architecture.
USA: MIT Press, 1984. (Email: zhy2536@whu.edu.cn)
[31] G. Wahba. Erratum: spline interpolation and smoothing on the sphere
[J]. Siam Journal on Scientific & Statistical Computing, 2012, 2(2):
5-16. Mengkai Song was born in Wuhan. He received his
[32] J. Chen, W. Li. Convergence behaviour of inexact Newton methods un- undergraduate degree in computer science and tech-
der weak Lipschitz condition [J]. Journal of Computational & Applied nology from Wuhan university. He is now a gradu-
Mathematics, 2006, 191(1): 143-164. ate student of computer network. His research inter-
[33] G. A. Anastassiou, S. G. Gal. Approximation theory: Moduli of conti- ests include wireless network and differential privacy.
nuity and global smoothness preservation [M]. USA: DBLP, 2000. (Email: mksong@whu.edu.cn)

You might also like