Manuscript

Energy Efficient Deep Reinforcement Learning Approach to Control the Traffic Flow
in IoT Networks for Smart City
1*
Dr. Mrinai M. Dhanvijay, 2Dr Shailaja C. Patil
1*
Assistant Professor, Department of Electronics and Telecommunication, Modern
Education Society's College of Engineering, Pune, India.
2
Professor & Dean (R&D), Department of Electronics and Telecommunication, Rajarshi Shahu College of
Engineering, Pune, India.
*Corresponding author’s mail id: mrinaimdhanvijay@gmail.com
Abstract
Traffic flow control plays a crucial role in the development of smart cities, as it directly impacts various key
components of urban infrastructure. With the rapid developments in artificial intelligence and large amount of
data, there has been growing interest in leveraging deep reinforcement learning (RL) techniques for traffic
control. These methods have shown promise in improving intersection efficiency and reducing travel time.
However, existing approaches often overlook the spatial and temporal correlations between intersections,
leading to suboptimal performance and increased waiting times. A decentralized traffic management system that
leverages a Graph-Structured Correlation time spatial attention (GSCTSA) network and Asynchronous
Advantage Actor-Critic (GSCTSA -A3C)is proposed. The system utilizes traffic controllers installed at each
traffic light to collect and process real-time traffic data from nearby sensors and cameras. By processing the data
at the edge layer, closer to the intersections, instead of relying on cloud-based solutions, real-time decision-
making becomes feasible. The proposed GSCTSA -A3C model combines LSTM, correlational attention, and
Graph Attention Networks with the Asynchronous Advantage Actor-Critic (A3C) RL algorithm. It captures
spatio-temporal correlations and optimizes traffic light timings to improve traffic flow and reduce congestion.
The GSCTSA module extracts essential spatio-temporal correlations, while the A3C algorithm learns optimal
traffic control policies. Experimental outcomes demonstrate the effectiveness of the GSCTSA -A3C approach in
achieving efficient traffic management in IoT networks for smart cities.
Keywords: Traffic light control, Spatial-temporal correlation, A3C, IoT, and Deep RL
1. Introduction:
The use of Internet of Things (IoT) technologies in smart cities has led to the deployment of numerous sensors
to collect data in real-time from various sources [1]. One of the critical applications of IoT in smart cities is
traffic control. The increasing number of vehicles on the road has led to traffic congestion, which negatively
impacts the environment, public health, and productivity [2].Traffic congestion is a major problem in urban
areas and has significant economic, social, and environmental impacts. IoT networks have the potential to
address this problem by enabling the collection and analysis of traffic data in real-time [3]. In this project, we
1
aim to develop a system that optimizes traffic flow in IoT networks for smart cities by considering factors such
as volume, speed, occupancy.
The existing traffic control systems have limitations in terms of energy consumption and traffic flow
optimization [4]. The traditional traffic control systems are unable to handle large volumes of traffic, leading to
traffic congestion and delay [5]. Additionally, a lot of energy is used by these mechanisms, leading to increased
operational costs and environmental impact [6, 7]. Traffic control mechanisms utilize a lot of energy due to the
operation of traffic signals and other equipment. These systems are often inefficient, leading to wasteful energy
consumption [8]. Traffic flow optimization is essential in reducing energy consumption and improving traffic
flow [9]. The optimization can be achieved through the use of advanced algorithms that consider real-time
traffic data and adjust traffic signals accordingly [10, 11]. Therefore, this research attempts to introduce an
energy-efficient technique to control traffic flow in IoT networks for smart city applications using artificial
intelligence (AI) that optimizes traffic flow while minimizing energy consumption. The objective of this project
is to develop a system that optimizes traffic flow in IoT networks for smart cities while minimizing energy
consumption. The system should take into account various input factors, including traffic volume, speed,
occupancy, weather, events, emergencies, public transportation, and road infrastructure, and should be able to
adapt to changing traffic conditions in real-time.
Existing approaches in traffic 1control often lack the correlational attention mechanism, which hinders their
ability to effectively capture and utilize spatial dependencies and relationships among various traffic elements,
including intersections, road segments, and vehicles. As a consequence, the control decisions made may not
fully consider the spatial context, leading to suboptimal performance. Moreover, the absence of spatial
correlation analysis can result in imprecise predictions and less efficient traffic light control strategies. Efficient
resource allocation, a crucial aspect of traffic light control, is compromised when spatial correlations are not
taken into account. This can lead to inefficiencies, congestion, and increased travel times for vehicles. To
overcome these limitations, GSCTSA-A3C approach is introduced.
The following are the main contributions of GSCTSA-A3C approach:
 By incorporating temporal and spatial information of the road network, we introduce the GSCTSA-
A3C approach for decentralized traffic management, which demonstrates significant performance
improvements in traffic signal control.
 Further, an incorporated CAM-based LSTM within the graph attention mechanism is utilized to
enhance the expressive power of the model. This incorporation allows for transforming the
characteristics of node into high-level representations, effectively capturing the spatial correlations in
the traffic control context.
 The CAM dynamically chooses the necessary external intersection features that influence the
behaviour of the traffic flow (such as traffic volume, weather conditions, road conditions, and time of
day) to capture the spatial correlations in the traffic data. Furthermore, when determining the
correlational attention weight, priority is given to the neighbouring intersection features that have a
significant impact on the target features, thereby enabling effective correlation estimation.
2
 The evaluation outcomes demonstrate that the GSCTSA-A3C technique outperforms existing
techniques regarding of waiting time, queue length, throughput, and average speed of vehicle.
The remaining sections of the article are organized as follows: Section 2 presents the related work, followed by
the introduction of the decentralized GSCTSA-A3C for traffic light control in Section 3. Section 4 provides the
results, simulation setup, and discussion. Finally, Section 5 concludes the article.
2. Related Work
This section presents a review of the literature on traffic flow prediction, focusing on three broad categories of
techniques: meta heuristic algorithms, RL, and deep learning.
2.1 Meta heuristic algorithm in traffic flow prediction
Meta heuristic algorithms are used in traffic flow prediction because they can optimize complex and non-linear
objective functions and search for optimal or near-optimal solutions in large solution spaces. The performance
of traditional statistical and machine learning methods may deteriorate when facing these challenges, as their
capacity to capture the complex interactions among the variables is constrained. Meta heuristic algorithms can
overcome these limitations by searching for optimal or near-optimal solutions in large solution spaces. They can
also handle multiple objectives simultaneously and provide a set of solutions that trade-off between conflicting
objectives. This makes them suitable for traffic flow prediction tasks, where multiple factors need to be
considered and optimized simultaneously.
Angayarkanni et al. [12] proposed a hybrid Grey Wolf Optimization (GWO) and Bald Eagle Search (BES)
optimization to optimize the support vector regression parameters for traffic flow prediction. However, this
method may not be suitable when dealing with large amounts of traffic flow data. In such situations, a better
option is to use SVR with a hybrid optimization algorithm. Additionally, to address the issue of wavelet neural
networks being prone to getting stuck in local optima, Du et al. [13] employed an improved whale optimization
to optimize the parameters of the network, leading to improved accuracy and efficiency in traffic flow
prediction. However, meta heuristic algorithms typically require a large number of iterations to find the optimal
solution, making them computationally expensive, especially for large-scale traffic flow prediction problems.
Also, it is susceptible to how their parameter settings are configured. Choosing the optimal parameter settings
can be time-consuming.
2.2 Deep learning technique in traffic flow prediction
Deep learning techniques are used in traffic flow prediction because they have shown promising results in
capturing the complex and non-linear relationships between traffic flow and other variables such as weather
conditions, time of day, and road network structure. Unlike traditional machine learning models that rely on
hand-engineered features, deep learning models can learn relevant features directly from the data, leading to
better performance and generalization. Recurrent neural networks (RNNs) are particularly well-suited for
modeling temporal sequences such as traffic flow, as they can capture the time-dependencies and context of the
data.
3
Several models based on the long short-term memory (LSTM) network and gated recurrent unit (GRU)
architectures are applied for flow prediction. While traditional memory networks, both short- and long-term,
perform well on time series data with medium- to short-term length, they are not always reliable and can be
unstable when processing extended time series information. To address this, Ma et al. [14] introduced a
technique based on an improved long-short-term memory (LSTM) and traffic flow time series analysis. This
model integrates an improved LSTM model based on both LSTM and BILSTM networks, allowing it to use
future information and process sequences of traffic flow data on a large scale more efficiently. However, these
models are limited to analyzing only the temporal features of the data and cannot fully capture the spatial
features of traffic flow.
Additionally, some approaches use cloud architectures for traffic flow prediction, but this may cause delays
since data centres and IoT devices are so far apart. To overcome these challenges and enhance the accuracy, Yu
et al. [15] introduced a cloud-edge IoT framework and a short-term traffic flow prediction model according to
spatial-temporal correlation (TFPM-STC). It employs principal component analysis to analyze the intersection
correlation and uses bi-directional GRU and Convolution-GRU models to extract the periodic and spatial-
temporal features of traffic flow.
Previous deep learning models used for traffic prediction have primarily focused on either spatial or temporal
features, and most of them have been single models. Although some hybrid models have been developed, they
have only integrated a limited number of factors using fusion analysis. However, the LSTM network has the
potential for multivariate analysis of precipitation, weather, and traffic data. To enhance the accuracy of flow
prediction, Narmadha and Vijayakumar et al. [16] proposed a hybrid LSTM and CNN model that can capture
spatial and temporal features and perform multivariate analysis. However, deep learning techniques have some
challenges, such as high computational requirements, overfitting, and difficulties in interpretability.
2.3 Reinforcement learning technique in traffic flow prediction
Deep RL is utilized in prediction since it can handle the complexity and uncertainty of traffic patterns and
provide accurate and robust predictions. DRL algorithms can learn to extract relevant features from high-
dimensional input data, such as traffic volumes, speeds, and densities, and make decisions based on them to
optimize traffic flow. Furthermore, DRL can be used to optimize the performance of autonomous vehicles in
complex traffic scenarios, such as merging into a highway or navigating through intersections. By using DRL,
these vehicles can learn to make optimal decisions in real-time and adapt to changing traffic conditions. Overall,
DRL is a promising technique for traffic flow prediction and control, as it can handle the complexity and
uncertainty of traffic patterns and provide accurate and robust predictions.
Most of the existing traffic light control architecture is inefficient and cannot effectively accommodate
autonomous vehicle systems, leading to issues such as delays and waste of energy. To address these problems,
Mushtaq et al. [17] introduced a model that includes an adaptive traffic signal control system that uses DRL to
enhance flow of traffic at intersections during congested times. Additionally, a smart re-routing approach is
introduced for approaching traffic to redistribute the traffic load to alternative routes, thereby avoiding
congested intersections. RL is crucial for overcoming data sparsity issues and improving the accuracy and
4
stability of long-term prediction. So, Peng et al. [18] combine deep learning techniques, such as graph
convolution and LSTM, with RL to improve prediction in the case of incomplete and sparse data. However,
using DRL in traffic flow control leads to complexity and overfitting.
2.4 Reinforcement learning technique in traffic flow control
The hierarchical multi-agent system (HMAS) is proposed by Abdoos and Bazzan et al. [19] for optimizing
traffic signals in an area. A local controller agent controls each junction locally and uses RL to choose the
optimum traffic signal management strategy based on the local data. A higher-level agent, known as a region
controller agent, receives real-time local traffic information and utilizes it along with all the data from local
controller agents to train LSTM as a prediction component. The area controller agent then makes an effort to
determine the appropriate joint policy based on the facts about the anticipated traffic. By combining RL with
prediction modules, this strategy enhances the performance of the suggested technique. In order to learn, the
agents at various tiers of the hierarchy employ different types of data, efficiently controlling the traffic lights in
congested areas. The lack of consideration for the intersection's spatiotemporal connection in this technique,
however, results in ineffective traffic flow management.
For efficient traffic control at intersections, Kumar et al. [20] proposed the Deep RL-based Traffic Light Control
System (DITLCS). This system utilizes cameras, RFID sensors, and other devices to collect vehicle traffic data
through a vehicular network. A fuzzy determines the most suitable mode of operation according to the provided
traffic statistics. The deep RL component of DITLCS dynamically adjusts the phase in response to the volume
of traffic, allowing for increased vehicle throughput and shorter queue lengths at the intersection.
To address the challenges of scalability and partial observability in distributed multi-agent traffic signal
management, Wang et al. [21] proposed RACS (Reinforced Attention-based Coordination System). RACS,
based on Graph Attention Networks (GAT) with Advantage Actor-Critic (A2C), effectively controls traffic. It
dynamically learns the weighted impact of neighbouring junctions to determine the optimal traffic light action at
a local intersection, incorporating observations and policies from those intersections. The primary advantage of
RACS is its ability to gather geographical data from nearby junctions and incorporate their dynamic conditions
and reward information, leading to improved stability and performance.
Hu et al. [22] proposed the Mean Field Double Q-learning with Dynamic Timing Control (MFDQL-DTC)
algorithm for effective traffic control. This algorithm utilizes mean field approximation to model the behavior of
agents in nearby intersections and incorporates a Dynamic Timing Control (DTC) module to reduce unnecessary
waiting time and the number of stops, especially in unsaturated traffic flow scenarios. By minimizing pauses
and waiting times in unsaturated traffic flow, the MFDQL-DTC algorithm provides efficient traffic control.
Additionally, it demonstrates scalability for expansive city-size road networks.
Existing approaches for traffic light control often neglect the consideration of spatial and temporal correlations.
The absence of a CAM in traffic light control leads to limited spatial understanding, reduced prediction
accuracy, inefficient resource allocation, and suboptimal traffic coordination. To address these shortcomings and
enhance the performance and efficiency of the traffic light control system, it is crucial to incorporate the
5
correlational attention mechanism, which allows for a better understanding of spatial relationships and more
effective decision-making.
3. System design
The environment and the edge layer serve as the central components of the proposed traffic management
system, as illustrated in Fig. 1. The environment layer consists of traffic lights, automobiles, and other elements.
Each traffic signal is accompanied by a separate traffic control unit. In our proposed decentralized traffic control
architecture, separate traffic controllers are installed at each intersection. These controllers gather traffic
information from nearby sensors or cameras situated at the roadside. They process this input data to determine
the current traffic state, queue length, waiting time, weather conditions, and other relevant parameters.
Moreover, each traffic controller at an intersection extracts necessary features from neighbouring traffic signal
controllers. This decentralized approach allows for the processing and analysis of traffic data to occur at the
edge layer, closer to the intersections, instead of relying on cloud-based solutions. By avoiding the delays
associated with transmitting data to the cloud for processing, real-time decision-making becomes feasible. One
significant advantage of employing our proposed traffic control architecture is the ability to extract spatio-
temporal correlations from the features obtained at each intersection and neighbouring nodes. These correlations
provide insights into the interactions and dependencies among different intersections, enabling coordinated
traffic light control. By leveraging these correlations, the A3C algorithm can effectively manage and optimize
traffic light timings in a decentralized manner.
6
Figure 1: Illustration of the proposed traffic flow control architecture
Fig. 2 illustrates the framework of the proposed GSCTSA -A3C. Graph-Structured Correlation time spatial
attention (GSCTSA) network with Asynchronous Advantage Actor-Critic (GSCTSA -A3C) for an energy-
efficient approach to control the traffic flow in IoT networks for smart cities. GSCTSA -A3C model is a type of
deep RL model. It combines LSTM, correlational attention, and graph attention networks, with the RL algorithm
A3C. This approach allows the model to learn optimal traffic control policies by taking into account both
temporal and spatial relationships in the data, as well as the interactions between various road network
segments.
7
Figure 2: Framework of the proposed GSCTSA -A3C
The GSCTSA module comprises two main components: the CAM with LSTM and the graph attention. The
graph attention calculates weight coefficients among nodes (agents) and their nearby agents to extract time-
based correlations. Additionally, the CAM with LSTM is responsible for capturing complex spatial correlations
between the target (i.e., the intersection for which the traffic light control decisions are being made) and external
series (i.e., neighbouring intersections of the target intersection). It adjusts attention weights and modifies input
features into high-level representations using LSTM.
3.1 Proposed GSCTSA -A3C approach for traffic light control
The details of GSCTSA-A3C’S implementation are described in this section. We specifically define the
structure of the traffic signal control network, the state, the action, and the reward.
State: The state is a numerical depiction of the environment within the road network that each agent (traffic
signal controller) has been able to view. The length of the line ( ll ), the flow of traffic (TF ), the weather (We
), the presence of pedestrians ( PP ), and the time of day ( TD ) are all taken into account in a road
environment. As a result, Eq. (1) defines the observation (O ) of agenta .
8
Δ
Oa = {ll1 , . .. .. ll N , TF1 ,. .. . .TF N , We 1 ,. . .. .We N , PP1 , . .. .. .. PP N , TD 1 ,. .. . .TD N }
(1)
where N denotes the count of entrance paths present at the intersection. The state observation of each agent,
along with the state observation of the neighbouring agent, is provided as input to the GSCTSA module. The
GSCTSA module is responsible for extracting essential spatio-temporal correlations, as explained in detail in
section 3.2.
Action: The intersection agent is required to select an action (timing schedule) in response to the environmental
conditions based on its current state observation. The time interval is utilized as the frequency for updating. The
action chosen by agent a is represented by Eq. (2).
Δ
action a = {du , sc 1 , .. . .. sc np }
(2)
where du indicates the indicates the time scaling parameter that determines the duration of the following cycle.
We restrict the cycle duration within the range of [ du min R , dumax R ] in order to prevent the cycle from being
overly lengthy or too brief, where R represents the duration of the reference cycle. The scale factors for the
length of every phase in the following cycle are

sc 1 ,.....sc np , and np represents the total count of phases.
Here, R = 100,
du min = 0.6, and du max = 1. This form of action space design can improve route planning for
automobiles by improving the predictability of traffic flow. It may also deploy the algorithm directly on the
majority of signal controllers, avoiding the agent’s frequent decision-making.
Reward: The objective is to decrease the average vehicle delay while lowering energy use and enhancing
junction traffic flow. The reward increases as the waiting period decreases. In the meantime, fairness is also
taken into account to prevent a small number of automobiles from experiencing excessive waiting times. The
a th agent’s reward function is described as follows:
[ ( )]
n α
1
Δ
wv
wt n
reward = ∑
n wt n=1
λ 1−
T wt
(3)
where
n wt denotes vehicle waiting at intersection,
wt n denotes T
waiting period of vehicle n , wt denotes
acceptable waiting period, λ and α are constant. T wt is set to 20, λ is set to 0.15, and α is set to 2.
3.1.1 Graph-Structured Correlation time spatial attention
In the proposed traffic light management system, a GSCTSA mechanism is utilized to extract spatial-temporal
correlations across multiple intersections from a set of time-dependent observation data. The neighbouring node
9
y= { y 1 , y 2 , y 3 ........ y U }
properties of the graph structure, represented as , reflect various external factors and
serve as input to the system. Here, U denotes y ∈ ℜm denotes the

the total neighbouring intersections. u
features of a specific node andm denotes the number of features in each node. By applying the GSCTSA
method, a new set of node features (i.e., required spatio-temporal correlated feature), denoted as
i' ={ i'1 , i'2 , ....., i'U } i ' ∈ ℜn denotes the updated features of a node, where
, is generated. u n denotes the
features in each updated node. This process enables the capture of the essential spatio-temporal correlations
required for effective traffic light control using the A3C framework. The illustration of GSCTSA module is
provided in Fig. 3.
10
Figure 3: Illustration of GSCTSA module
Capturing spatial correlations is necessary in traffic light control because the performance of a traffic light
system depends not only on the current state of traffic at a particular intersection (target) but also on the
surrounding conditions (external node). By analyzing the spatial correlations, we can gain insights into how
external factors influence the target variable and make more informed decisions about traffic light timing and
control strategies.
In GAT, learnable linear mappings are utilized to transform input characteristics into higher-level features,
enabling adequate representational capacity. Each node is subject to the weight matrix
ωy u , ω∈ ℜn×m .
However, linear mappings alone are insufficient since there are spatial correlations between the numerous
nearby intersections and the target. Therefore, we propose the use of an LSTM with correlational attention as a
mapping mechanism. The correlational attention mechanism (CAM) captures spatial correlations, while the
LSTM further enhances the input features, resulting in higher-level features.
The relevance or significance of individual features of a node (such as traffic volume, weather conditions, road
conditions, etc.) in relation to the target series (specific node) can vary over time. In traffic light control,
different factors may have varying levels of influence on the target node's behavior or state at different time
steps. For example, during rush hour, traffic volume might be a crucial factor in determining the optimal timing
of traffic lights. However, during late-night hours when traffic is minimal, other features like weather conditions
or pedestrian activity may become more significant. The significance of each feature can change due to various
factors such as time of day, traffic patterns, and environmental conditions. By recognizing and adapting to these
dynamic relationships, a traffic light control system can adjust its decision-making process and prioritize the
features that have the most impact on the target series at any given time step. This allows for more effective and
context-aware traffic management.
U
l U∈ ℜ
x= {x 1 , x 2 , x 3 ........ xU } , x u ∈ℜ l th y =(
l
y l1 , l l
)
y 2 , y 3 ........ y U
The target , and the feature of the nodes,
, respectively. The CAM analyzes the relationships between the features of the neighbouring nodes and the
target node and assigns weights or importance values to these features. The traffic light control system
effectively take into account the effect of adjacent nodes on the target node's state at different time steps by
altering the relevance of the neighbouring intersection features according to the correlational attention method.
This allows the system to capture spatial correlations and make informed decisions about traffic light timings or
control strategies that take into account the influence of nearby factors on the target node. The correlational
attention weight is determined using Eq. (4 and 5).
l U
f u =Z f tanh ¿ ¿ (4)
11
exp ( f lu )
β lu =soft max ( f )=
l
u m
∑ exp ( f uj )
j=1 (5)
where
g'u−1 ∈ ℜn represent the feature ofu−1 ,
t u−1 represents the preceding hidden unit, and
[ g'u−1 ; t u−1 ] ∈ ℜ2 n is a concatenation operation. The factors

zf , c
f ∈ ℜ U ×2 n , and V f ∈ ℜU ×U are
learnable parameters. The attention weight

β lu represents the relevance of the l th input feature for node u .
Correlational attention approach is utilized to capture spatial correlations based on the attention weight.
H u =( β1u y11 , β 2u y 2u ,. .. . .. , β mu y mu ) U
(6)
The correlation sequence feature

Hu takes the place of the initial input sequence
y u . Time series data varies
dynamically over time; hence, it is possible to learn a mapping from

H u to gu by using a nonlinear function.
gu =L( g'u−1 ), H u (7)
The Graph Attention Network (GAT) is used to give distinct neighbouring nodes varied weights. This weighting
process is done to capture and extract temporal correlations, which refer to the relationships or patterns that exist
over time between these neighbouring nodes. By assigning different weights, the system can emphasize or
prioritize certain historical information from relevant time steps. This historical information is then used in the
prediction process to estimate future traffic light states or behaviors. The updated node feature
g'u is calculated
using the weight coefficient associated with node u. The LeakyReLU nonlinearity function is then used to
construct the attention coefficient using a weight vector b .
f
( [
ui= leaky Re LU bU gu ; g'i ]) (8)
where [ g u ; g 'i ] represents the concatenation process, gu

represents the feature of node u after undergoing
Gu ={ g '1 , g '2 , .. . .. .. , g'u−1 }

LSTM transformation, represents the collection of neighbourhood features for node
'
u , and gi ∈ Gu . The softmax function provided in Eq. (9) is used to normalize attention weights across all
options of node t in order to make them easily comparable over various nodes.
12
b ui=
(
exp leaky Re LU ( b U [ gu ; g 'i ] ) )
∑g ∈ G
'
l u
(
exp leaky Re LU ( bU [ gu ; g'l ]) ) (9)
b
where ui denotes the weight of the time-based correlation at time i and u , as well as the relevance of node i to
'
node u . The features of neighbouring nodes are weighted together to generate a new feature, denoted as gu ,
using the attention mechanism. This new feature is computed for node u, and it captures the relevant
information from the neighbouring nodes, allowing for better understanding and consideration of the
surrounding environment in the traffic light control process.
1
g'u = ∑ b ∗g'
u−1 g' ∈ G ji i
i u (10)
where
g'u represents the neighbourhood node characteristics weighted sum. The spatial and temporal
correlations between different time sequences are included in this weighted sum. The time steps that are useful
for regulating traffic signals are selected by the attention weight.
Here the spatio-temporal correlations and the generation of

g'u take into account the attributes and interactions
of neighbouring intersections over a range of time intervals. Instead of considering only the current state or a
single snapshot of neighbouring intersections, the system analyzes the historical information and patterns
observed over multiple time steps. This allows for a more comprehensive understanding of the traffic dynamics
and helps in making accurate predictions and decisions for traffic light control. By considering multiple time
steps, the system can capture temporal dependencies, trends, and fluctuations in traffic patterns. It enables the
identification of recurring patterns, peak traffic hours, and other time-based variations that may impact
congestion and traffic flow.
Dynamic traffic light control aims to optimize traffic flow in real-time by considering the current conditions and
the interactions between adjacent intersections. To achieve this, the system calculates a new feature for each
intersection, which represents the expected sequence feature at a specific time. The key idea is to capture the
spatio-temporal correlations between neighbouring intersections by assigning weights to their attributes. These
weights determine the importance of each neighbouring intersection's attributes in contributing to the new
feature of the target intersection. By considering the weighted attributes of neighbouring intersections, the
system can better understand the traffic patterns and dependencies between different intersections. This
information is crucial for making informed decisions and dynamically modifying the traffic light durations to
optimize traffic flow and reduce congestion.
4. Result and discussion
13
This section presents the simulation setup, the experimental results, and a comparison between the proposed and
existing traffic flow control methods
4.1 Simulation setup
The GSCTSA-A3Capproach is examined using Simulation of Urban Mobility (SUMO). The simulation is done
using a4×4 road network configuration, in which there are sixteen junctions altogether throughout the whole
road network and each intersection has been configured to have 4 lanes with opposing traffic directions. Only
right-turn traffic is permitted in the rightmost lane, only through traffic is permitted in the centre lane, and only
left-turn traffic is permitted in the left inner lane. Figure 4 and 5 illustrate the simulation environment and the
phases, respectively. As shown in Figure 6, each junction utilizes 4-phase signal control. Each phase is followed
by a 3-second yellow light to clear the crossing. When there is no conflicting traffic, vehicles are always
allowed to turn right. The distance between the two junctions is 500 meters, and the permitted speed is 40
kilometres per hour. Simulation setup is provided in the Table 1.
Table 1: Simulation setup
Parameter Values
Iteration 500
Batch size 32
Actor network’s learning rate 1×10−4
Critic network’s learning rate 5×10−5
Replay buffer 10,000
14
Figure 4: Simulation environment
Figure 5: Illustration depicting the four phases at an individual intersection
4.2 Experimental results
This section discusses the experimental findings of the GSCTSA-A3Ctraffic flow management with other
HMAS [19], DITLCS [20], RACS-A2C [21], and MFDQL-DTC [22] algorithms on average waiting time,
throughput, average queue length, and speed.
15
i) Average waiting time
ii) Throughput
16
iii) Average queue length
iv) Average speed
Figure 6: Comparison of proposed traffic flow control network with existing approaches
The comparison outcomes of the proposed strategy for different traffic simulation settings are shown in Fig. 6.
The number of iteration utilized throughout the run is illustrated on the X-axis. The relevant performance
statistic is displayed on the Y-axis. According to Fig. 6(i), the suggested method drastically lowers the average
waiting time compared to other existing techniques. The results demonstrate that the suggested technique
efficiently learns the environment and adapts the traffic signal phases to the volume and variety of the traffic. As
a result, there is less waiting. The comparison findings of queue length and throughput are shown in Figures
6(ii) and 6(iii), respectively.
17
Analysis of the results shows that this approach significantly improves performance by reducing the queue
length and raising throughput. The performance of the GSCTSA-A3Cis significantly enhanced for throughput
and average queue length by the proposed decentralised architecture with A3C. It dynamically modified the
signal phase in accordance with the volume of vehicular traffic at each junction; as a result, more vehicles pass
through the intersection, resulting in a shorter wait and higher throughput. Additionally, the average vehicle
speed rises as a result of this. As a result, Fig. 6(iv) shows the average speed of the vehicles for the purpose of
analysing the smoothness of the traffic. According to the findings (Fig. 6(iv)), the suggested solution is clearly
the fastest among all existing techniques compared. Thus, when compared to the current HMAS [19], DITLCS
[20], RACS-A2C [21], and MFDQL-DTC [22], the suggested technique yields the best results. By
decentralizing the control and leveraging spatio-temporal correlations in the proposed approach achieves several
benefits. Processing traffic data at the edge layer minimizes communication delays and enables faster decision-
making, leading to improved responsiveness in traffic signal control. Also, decentralized architectures can scale
more efficiently as each intersection operates independently, allowing for the control system to handle a larger
number of intersections without a centralized bottleneck. This approach enhances the system's robustness as the
failure of one traffic controller does not affect the entire system. Intersections can continue to operate
autonomously even if communication or control is disrupted in certain areas. Each intersection adapt to local
traffic conditions and make independent decisions based on its own data and neighbouring information. This
flexibility enables the traffic control system to respond effectively to varying traffic patterns and optimize traffic
flow in real-time.
5. Conclusion
In this study, we proposed a decentralized traffic management system based on the GSCTSA-A3C model,
which integrates correlational attention, Graph Attention Networks, and the A3C RL algorithm. The system
addresses the challenges of traffic control in smart cities by extracting spatio-temporal correlations from traffic
data and optimizing traffic light timings at the edge layer. By processing and analyzing data closer to the
intersections, real-time decision-making is enabled, avoiding the delays associated with cloud-based solutions.
The experimental results demonstrate the effectiveness of the proposed approach in achieving efficient traffic
flow and reducing congestion. The GSCTSA-A3C model effectively captures spatial and temporal relationships
in the data, allowing for informed traffic light control decisions. The decentralized architecture enables
coordinated traffic light control by extracting correlations between different intersections. By leveraging edge
computing, real-time decision-making, and optimized traffic light timings, it minimized energy consumption
and improve traffic management efficiency. The incorporation of correlation-based attention further enhances
the model's energy-efficient capabilities. Future work will focus on further enhancing the scalability and
performance of the system and exploring additional optimization techniques for traffic management in smart
cities.
Compliance with Ethical Standards:
Funding: There is no funding for this study.

Conflict of Interest: Authors declares that they have no conflict of interest.
18
Ethical approval: This article does not contain any studies with human participants and/or animals performed
by any of the authors.
Informed consent: There is no informed consent for this study.
Authors' contributions
All the authors have participated in writing the manuscript and have revised the final version. All authors read
and approved the final manuscript.
Data Availability Statement:
All authors contributed to the study conception and design. Material preparation, data collection and analysis
were performed by Mrinai Dhanvijay M and Shailaja Patil C. The first draft of the manuscript was written by
Mrinai Dhanvijay and all authors commented on previous versions of the manuscript. All authors read and
approved the final manuscript.
Conceptualization: Mrinai Dhanvijay M; Methodology: Mrinai Dhanvijay M; Formal analysis and investigation:
Mrinai Dhanvijay M, Shailaja Patil C; Writing - original draft preparation: Mrinai Dhanvijay M, Shailaja Patil C
; Writing- review and editing: Mrinai Dhanvijay M, Shailaja Patil C; Supervision: Shailaja Patil C .
References
1. Jiang D (2020) The construction of smart city information system based on the Internet of Things and
cloud computing. Computer Communications 150: 158-66.
2. Nguyen TH, Jung JJ (2021) Swarm intelligence-based green optimization framework for sustainable
transportation. Sustainable Cities and Society 71: 102947.
3. Agarwal P, Matta P, Sharma S (2021) Analysis based traffic flow control decision using IoT sensors.
Materials Today: Proceedings 46: b10707-11.
4. Liu Y, James JQ, Kang J, Niyato D, Zhang S (2020) Privacy-preserving traffic flow prediction: A
federated learning approach. IEEE Internet of Things Journal 7(8): 7751-63.
5. Guidoni DL, Maia G, Souza FS, Villas LA, Loureiro AA (2020) Vehicular traffic management based
on traffic engineering for vehicular ad hoc networks. IEEE Access 8: 45167-83.
6. Zheng H, Lin F, Feng X, Chen Y (2020) A hybrid deep learning model with attention-based conv-
LSTM networks for short-term traffic flow prediction. IEEE Transactions on Intelligent Transportation
Systems 22(11): 6910-20.
7. Karimzadeh M, Aebi R, de Souza AM, Zhao Z, Braun T, Sargento S, Villas L (2021) Reinforcement
learning-designed LSTM for trajectory and traffic flow prediction. In 2021 IEEE wireless
communications and networking conference (WCNC) 1-6. IEEE.
8. Chen S, Wang H, Meng Q (2021) An optimal dynamic lane reversal and traffic control strategy for
autonomous vehicles. IEEE Transactions on Intelligent Transportation Systems 23(4): 3804-15.
9. Celtek SA, Durdu A, Alı ME (2020) Real-time traffic signal control with swarm optimization methods.
Measurement 166: 108206.
19
10. Wang F, Zhu M, Wang M, Khosravi MR, Ni Q, Yu S, Qi L (2020) 6G-enabled short-term forecasting
for large-scale traffic flow in massive IoT based on time-aware locality-sensitive hashing. IEEE
Internet of Things Journal 8(7): 5321-31.
11. Majumdar S, Subhani MM, Roullier B, Anjum A, Zhu R (2021) Congestion prediction for smart
sustainable cities using IoT and machine learning approaches. Sustainable Cities and Society 64:
102500.
12. Yu X, Sun L, Yan Y, Liu G (2021) A short-term traffic flow prediction method based on spatial–
temporal correlation using edge computing. Computers & Electrical Engineering 93: 107219.
13. Ma C, Dai G, Zhou J (2021) Short-term traffic flow prediction for urban road sections based on time
series analysis and LSTM_BILSTM method. IEEE Transactions on Intelligent Transportation Systems
23(6): 5615-24.
14. Narmadha S, Vijayakumar V (2021) Spatio-Temporal vehicle traffic flow prediction using multivariate
CNN and LSTM model. Materials Today: Proceedings.
15. Du W, Zhang Q, Chen Y, Ye Z (2021) An urban short-term traffic flow prediction model based on
wavelet neural network with improved whale optimization algorithm. Sustainable Cities and Society
69: 102858.
16. Angayarkanni SA, Sivakumar R, Ramana Rao YV (2021) Hybrid Grey Wolf: Bald Eagle search
optimized support vector regression for traffic flow forecasting. Journal of Ambient Intelligence and
Humanized Computing 12: 1293-304.
17. Mushtaq A, Haq IU, Imtiaz MU, Khan A, Shafiq O (2021) Traffic flow management of autonomous
vehicles using deep reinforcement learning and smart rerouting. IEEE Access 9: 51005-19.
18. Peng H, Du B, Liu M, Liu M, Ji S, Wang S, Zhang X, He L (2021) Dynamic graph convolutional
network for long-term traffic flow prediction with reinforcement learning. Information Sciences 578:
401-16.
19. Abdoos M, Bazzan AL (2021) Hierarchical traffic signal optimization using reinforcement learning and
traffic prediction with long-short term memory. Expert systems with applications 171: 114580.
20. Kumar N, Rahman SS, Dhakad N (2020) Fuzzy inference enabled deep reinforcement learning-based
traffic light control for intelligent transportation system. IEEE Transactions on Intelligent
Transportation Systems 22(8): 4919-28.
21. Wang M, Wu L, Li J, He L (2021) Traffic signal control with reinforcement learning based on region-
aware cooperative strategy. IEEE Transactions on Intelligent Transportation Systems 23(7): 6774-85.
22. Hu T, Hu Z, Lu Z, Wen X (2023) Dynamic traffic signal control using mean field multi‐agent
reinforcement learning in large scale road‐networks. IET Intelligent Transport Systems.
20

Manuscript

Uploaded by

Copyright:

Available Formats

You might also like

Manuscript

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Manuscript

Uploaded by

Copyright:

Available Formats

Energy Efficient Deep Reinforcement Learning Approach to Control the Traffic Flow

in IoT Networks for Smart City

*Corresponding author’s mail id: mrinaimdhanvijay@gmail.com

The following are the main contributions of GSCTSA-A3C approach:

2.1 Meta heuristic algorithm in traffic flow prediction

2.2 Deep learning technique in traffic flow prediction

2.3 Reinforcement learning technique in traffic flow prediction

2.4 Reinforcement learning technique in traffic flow control

3.1 Proposed GSCTSA -A3C approach for traffic light control

environment. As a result, Eq. (1) defines the observation (O ) of agenta .

length of every phase in the following cycle are

a th agent’s reward function is described as follows:

3.1.1 Graph-Structured Correlation time spatial attention

serve as input to the system. Here, U denotes y ∈ ℜm denotes the

[ g'u−1 ; t u−1 ] ∈ ℜ2 n is a concatenation operation. The factors

learnable parameters. The attention weight

The correlation sequence feature

dynamically over time; hence, it is possible to learn a mapping from

gu =L( g'u−1 ), H u (7)

construct the attention coefficient using a weight vector b .

where [ g u ; g 'i ] represents the concatenation process, gu

Gu ={ g '1 , g '2 , .. . .. .. , g'u−1 }

Here the spatio-temporal correlations and the generation of

4. Result and discussion

4.1 Simulation setup

Table 1: Simulation setup

Actor network’s learning rate 1×10−4

Critic network’s learning rate 5×10−5

Replay buffer 10,000

Figure 5: Illustration depicting the four phases at an individual intersection

4.2 Experimental results

iv) Average speed

Compliance with Ethical Standards:

Funding: There is no funding for this study.

You might also like