Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Optimising Public Bus Transit Networks Using Deep Reinforcement

Learning
Ahmed Darwish1 , Momen Khalil1 , and Karim Badawi2

Abstract— Public Transportation Buses are an integral part development, bus scheduling and driver scheduling. The
of our cities, which relies heavily on optimal planning of routes. stages are highly dependable on each other, and the output
The quality of the routes directly influences the quality of of the last stage can be the starting point of another iteration
service provided to passengers, in terms of coverage, directness,
and in-vehicle travel time. In addition, it affects the profitability of the planning process. Each stage can be thought of as
of the transportation system, since the network structure a combinatorial problem, with a computational complexity
directly influences the operational costs. We propose a system lying in the NP-Hard domain[2]. This makes it almost
which automates the planning of bus networks based on given impossible to optimise the whole pipeline with our current
demand. The system implements a paradigm, Deep Reinforce- computational resources using one single algorithm.
ment Learning, which has not been used in past literature
before for solving the well-documented multi-objective Transit Throughout the years, different algorithms have been
Network Design and Frequency Setting Problem (TNDFSP). proposed to optimise one or more stages separately from
The problem involves finding a set of routes in an urban area, others, using a whole spectrum of different approaches.
each with its own bus frequency. It is considered an NP-Hard However, they relied heavily on hand-crafted heuristics.
combinatorial problem with a massive search space. Compared Naturally, these heuristics would come from the intuition
to state-of-the-art paradigms, our system produced very com-
petitive results, outperforming state-of-the-art solutions. and the experience of an expert. This can be a limitation,
Index Terms—Deep Reinforcement Learning, Attention Mod- since one’s perception of reality may not always be the most
els, Transit Network Design, Frequency Setting accurate. One common sub-problem is concerned with the
optimisation the first two stages together, what is known
I. INTRODUCTION in literature as the Transit Network Design and Frequency
ublic transportation is an important component of any Setting Problem (TNDFSP). This problem aims to come up
P urban city, and its quality naturally positively affects
the well-being of the city’s residents. However, enhancing
with the most efficient network topology, along with the
respective frequencies of each line, which is based on the
the quality of a public transportation system in terms of customers’ travel demands.
coverage and frequency comes at a cost to the operator, who In this paper, we propose a novel approach to solve the
needs to maintain more vehicles with more employees to TNDFSP using Deep Reinforcement Learning (DRL), by
operate them, which maps to extra Capital and Operating exploiting the inherent combinatorial nature of the prob-
Expenditure (CAPEX and OPEX). The presence of conflict- lem. Our approach utilises a modified Multi-Head Attention
ing interests makes the process of planning a public transit (MHA) network to output a transit network, which is post-
network challenging. processed and input to the reward function that outputs the
Planning a bus-based transportation network is even less scalar values needed for back-propagation. Our model does
straightforward, since the infrastructure is less rigid than, not require any heuristics to be included in the algorithm,
for example, a rail-based mode of transportation. And it and it also allows us to prioritise the objectives to optimise
can even be more complicated if the infrastructure is still (either customer satisfaction or operator cost minimisation),
undetermined, i.e., the locations of the bus stops are still by fine-tuning a few custom hyperparameters before training.
unknown. This, however, is outside the scope of this piece of Our main contribution in this paper is the use of DRL
literature, and we are mainly concerned with the optimisation for solving TNDFSP to output a complete, optimised net-
of networks with determined infrastructure. According to [1], work with frequencies, with no need whatsoever for pre-
the design phase of such networks can be divided into five determined heuristics. We trained and evaluated our model
main stages: network design, frequencies setting, timetable on Mandl’s benchmark network[4].
1 Ahmed Darwish (https://orcid.org/0000-0001-5793-4128) and Momen II. LITERATURE REVIEW
Khalil (https://orcid.org/0000-0001-7837-1870) are members of the R&D A. Different Approaches for solving TNDFSP
department of Idea in Motion Inc. They are also studying at the Computer
Science and Engineering Faculty at the German University in Cairo (GUC), Purely mathematical models have been designed and built
Egypt. ahmed.ahmeddarwish@student.guc.edu.eg to design the network routes. However, they keep being too
momen.khalil@student.guc.edu.eg
2 Dr. Karim Badawi is currently with Trapeze Group, Switzerland. strict for the dynamic always-changing nature of the real
He is also teaching and heading a research group at the department world, and they generally only work most efficiently for the
of Information Technology and Electrical Engineering at the targeted variant of the problem[3].
Swiss Federal Institute of Technology Zurich (ETH Zürich)
in Switzerland. karim.badawi@trapezegroup.com Another group of solutions utilised hand-crafted heuristics
badawi@iis.ee.ethz.ch to optimise the network. For example, such heuristics would

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on September 21,2023 at 03:37:01 UTC from IEEE Xplore. Restrictions apply.
be used in determining the topology of the initial population network from Bello’s work, since they claim there is no
for a Genetic Algorithm[6]. temporal value in the order the nodes are input to the
One of the earliest significant efforts was carried out by network. In addition, they also propose inputting a ”dynamic
Mandl[4], who applied his heuristics to a real network in element” to the attention mechanism, such that the network
Switzerland. This network is considered a benchmark for is more robust against dynamic properties of the Vehicle
evaluating algorithms up until now. Baaj and Mahmassani[5] Routing Problem (VRP), and they utilise the A3C training
proposed more complex heuristics than those of Mandl’s, method. Kool et al.[16] used the Multi-Head Attention[17]
but they still remain too sophisticated and require a good model instead of the typical single-head attention model
understanding of the transportation system being optimised, and a different non-connectionist baseline approximator, to
and are not generally integrable with a lot of real-world reduce the variance between the loss gradients. They report
networks. state-of-the-art results when compared to other RL-based
Evolutionary Algorithms have also shown significant ef- Combinatorial Optimisation frameworks.
fectiveness in solving this problem, abstracting away much Kool et al. also mentioned the possibility of using their
of the needed knowledge, and having more applicability neural network architecture with similar problems given
to real-life. Chakroborty and Diwedi[6] proposed a Genetic a custom masking and reward function. Consequently, we
Algorithm (GA) with a multi-objective fitness function, fo- decided to employ their architecture for the TNDFSP, due to
cusing on user costs and coverage, which outperformed its its leading performance, and the feasibility of adapting our
predecessors, but only solved the routing problem. problem to the model architecture.
Many of the algorithms which followed started focus-
ing more on the TNDFSP. Zhao and Zeng[8] combined III. METHODOLOGY
Simulated Annealing, a probabilistic technique, with a GA
A. Problem Statement
to acquire improved results. Kidwai et al.[7] proposed an
iterative multi-stage algorithm which minimises head-ways As mentioned before, we have two conflicting objectives to
and the routes. Arbex and da Cunha also proposed a similar optimise. The first one of them, Customer (or user) Satisfac-
algorithm, with the operator cost as a part of the constraints, tion, is concerned with a short trip time, a minimum number
in order to minimise the fleet size[9]. Jha et al.[10] applied of transfers, and a short waiting time at stations. The second
multi-objective particle swarm optimisation with multiple one, Operator costs, is concerned with minimising CAPEX
search strategies algorithm[11] in designing Transit Net- and OPEX. For simplicity, we are going to assume that the
works, with an additional objective of minimising the CO2 number of vehicles needed to serve a specific network of
emissions of the buses, and produced state-of-the-art results stations is sufficient to represent such costs sufficiently, as
for relatively larger number of lines when compared to the shown in literature. We can formulate the problem in the
results in the paper by Arbex and de Cunha, but still remains following manner:-
inferior when the number of routes is 4. Given the tuple I = (G, D, T) consisting of a directed
graph, G = (S, R), where S denotes the set of bus stops, and
B. Solving combinatorial problems using DRL R denotes the set of roads connecting them, an OD-matrix
All the papers we surveyed involved pre-determined D ∈ R|S| , and a travel time matrix1 T ∈ R|S| , produce a set
heuristics which are used to traverse the search tree of of routes, L, where each route is a path in graph G, which
solutions until a good enough solution was found. Based acts as a one of the Pareto optimal solutions minimising the
on some insights from the survey by Bengio et al.[12], we values U and O, which denote the customer cost, and the
decided to experiment with the possibility of learning those operator cost, respectively.
heuristics using connectionist approaches. The survey did not The way the values U and O are computed will be
discourage the use of this approach for highly-dimensional explained briefly in the Reward Function section.
problems like the TNDFSP. The sub-problems of our target problem can not be solved
Several papers have been found trying to solve different simultaneously. Instead, they must be solved sequentially,
graph combinatorial problems, such as the Travelling Sales- since Frequency Settings requires the assignment of the
man Problem (TSP), the Vehicle Routing Problem (VRP), demand of the transportation network to the different routes.
and the Orienteering Problem (OP), all of which are, more This can not occur if no transit network exists. Therefore,
or less, very specific instances of our problem. a network has to be designed first, before calculating the
Bello et al.[13] have utilised a Pointer Network with headway.
an attention layer as an agent, deciding which node to Assumptions and Hyperparameters:
visit in the TSP problem. The Actor-Critic approach was To help simplify the search space, we impose additional
used to train the agent. Dai et al.[14] implemented a self- non-functional constraints (some in the form of hyperparam-
feeding network which takes as an input the embedding of eters) and assumptions about the nature of our inputs and
a whole graph, while masking out the nodes the agent is outputs. The constraints are similar to those found in Arbex
not allowed to pick, and keeps inputting the output back and da Cunha’s paper[9]:-
into the network until the solution is complete. Nazari et
al.[15] proposed removing the LSTM encoder of pointer 1 This matrix contains the time needed to go from a station to another.

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on September 21,2023 at 03:37:01 UTC from IEEE Xplore. Restrictions apply.
1) The output network is a connected graph. This means L(θ), where θ represents the neural network’s weights. The
that all stops are reachable from any other stop. REINFORCE approach is used to approximate the gradient,
2) Each and every stop in the graph must be present in with a baseline, b(s). Moreover, Kool et al. proposed a rollout
at least one of the routes. baseline with periodically changing baseline policy. Thus, for
3) Routes must be ultimately different from each other. a probability distribution, pθ (π|s), using policy π to solve an
Identical routes are considered to be a single route with instance s, we have:-
their frequencies combined.
4) No route can have a cycle.
5) The number of output routes, |L|, is predefined by the ∇L (θ|s) = Epθ (π|s) [(L (π) − b(s)) ∇ log pθ (π|s)] (3)
operator.
6) All buses in the operator’s fleet have the same capacity. C. TNDFSP-specific Model Configurations
7) The number of stops in each route has to be within
In this section, we will explain how an instance of our
a predefined range. In addition, the estimated travel
problem is transformed to a combinatorial optimisation prob-
time to traverse a single route has to be bounded by a
lem that can be solved by the model.
predefined value. All this is to account for feasibility of
bus schedules and to avoid overtiring the bus drivers. 1) Encoder Input: To provide a better context to the
decoder, the input graph is augmented with the demand
B. Model Architecture Overview and travel time information, which are better represented as
The neural network architecture we are using in this paper, edges, since the nodes have no useful intrinsic properties.
which is proposed originally by Kool el al., is based on In such case, the demand and the travel matrices can be
the Encoder-Decoder Architecture, in addition to Attention thought of as adjacency matrices representing edges of the
Mechanisms (AMs). road network. The nodes of the graph are, consequently,
Initially, an encoder similar to the one in the Transformer redundant, and will not be input to the model, but will
architecture proposed by Vaswani et al.[17] is used to pro- be used for the mask function, and calculating the reward.
duce the nodes and the graph embeddings, but without the The model, as a result of this change, will be outputting a
use of the positional encoding, since it was unnecessary to sequence of consecutive edges from the graph.
our purpose. The initial embeddings are then put through N Two more inputs are the OD-matrix, δ, and a modified
attention layers, each consisting of two sub-layers: Multi- travel time matrix, d. The original travel time matrix corre-
Head Attention and a fully connected Feed-Forward layer to sponds to the travel time values between the nodes in the
perform edge-wise projections. Following that, the encoder graph that already have a link between them. However, in
calculates a graph embedding as the mean of the final node order for the model to capture all the information about the
embeddings. Then, all of the embeddings are used as input travel times in the graph, a new travel time matrix needs
to the decoder. to be created to include all the travel times between all
The decoder depends on the embeddings of the encoder, the nodes even if no direct link exist between them. This
the context vector, as well as the previously output values for task is achieved by applying Floyd-Warshall’s shortest path
deciding on the current decoding step. Decoding happens algorithm[18][19] to the graph. The original matrix would
sequentially, where the context node embedding is passed later be used in the masking function.
from the encoder at each time step, along with the graph To be able to differentiate between the different routes,
embeddings. and have the model learn how to optimise separate routes.
In each decoding step, the attention values are then com- A special “delimiter” edge is used and appended to the
puted and used to calculate the log-probabilities:- beginning of the list of edges, which the model is forced to

−∞ output before starting a new route. This edge’s embeddings
 T  for disallowed input are calculated separately from the rest of the edges, using
u(c)j = q(c) kj (1)
C. tanh √ d
otherwise separate parameters, due to its special purpose.
k
The delimiter edge is input before the other edges, giving
C is a constant used for clipping the values before masking. it an index of zero in the array of edges, x. Thus, the initial
dk is the query/key dimensionality, which is equal to the embeddings of our input is calculated through the learned
dimensions of the embedding layer output, divided by the linear projection of parameters through parameters W x and
number of attention heads. bx in the first embedding layer as follows:-
Finally, the normalised probability vector, p, is computed (
using softmax, after which the output is determined using (0) W0x xi + bx0 if i = 0
either sampling or greedy approaches:- hi = (4)
W x [xi , δi , di ] + bx if i = 1, ..., n(n−1)
2

pi = sof tmax(u(c)i ) (2)


[.] represents horizontal concatenation, and indexing the
After the calculation of the reward of the output for some matrices using i is equivalent to accessing the value of the
instance s, gradient descent is employed to minimise the loss, cell corresponding to edge i. n is the number of graph nodes.

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on September 21,2023 at 03:37:01 UTC from IEEE Xplore. Restrictions apply.
2) Decoder Context: For our problem, additional values amount of demand passing through each edge in the
are added to the context vector, most of which were men- graph is calculated, as to be able to calculate the peak
tioned in the Assumptions and Hyperparameters section. load of each of the lines, to be able to find their
The last decoded edge is added to the context vector. For respective headways.
the first decoding step, that would be the delimiter edge, For an accurate computation of requirements 1 and 2, we
since we assume that a route would be starting there. This expand the graph, making copies of any node which lies
piece of information and the other values are concatenated on different lines, and adding extra transfer edges between
to the embeddings in the following fashion:- them. The travel time value of those transfer edges would be
h i the pre-defined transfer penalty of the network. The edges’
 h̄(N ) , h(N ) , r, nmin , nmax , len, U if t = 1 weights are also modified, having an extra “Number of
(N ) 0
h(c) = h i transfers” attribute, which is 1 only for the transfer edges.
 h̄(N ) , h(N )
πt−1 , r, nmin , nmax , len, U if t > 1 An example of this operation, which we denote as Graph
(5) Extender, is shown in Fig. 1.
where: Once the graph is modified, we run a modified version
• nmin and nmax represent the remaining number of edges of Dijkstra’s Algorithm[20], which does not only output
left to reach the predefined minimum and maximum the shortest paths and their lengths, but also the number
number of edges per route, respectively. of needed transfers and the lines IDs corresponding to this
• len is the travel time needed to traverse the route being shortest path. The output IDs of the used lines is required
decoded up till the last decoding step. later in the reward calculation.
• r represents the number of routes remaining needed to The algorithm is run for each node in the modified graph,
reach the pre-defined value. and the output of each run is accumulated in two (n × n)
• U is the number of unique nodes in all decoded routes
square matrices where n is the original number of stops, one
so far. This is needed to be able to satisfy the first for the travel time values and the other for the number of
assumption. transfers. In cases where the algorithm is run on one of the
The values nmin , nmax , and len, are reset every time the copied nodes, we always take the best performing copy, in
delimiter edge is decoded, as to start a new route. terms of travel time.
3) The Masking Function: The main purpose of the To fulfil the third aforementioned requirement, an exact
masking function is to make sure the constraints mentioned copy of the modified graph is created, where all weights
in Assumptions and Hyperparameters are met to make of the edges are initialised with zero. Afterwards, for each
sure that the decoded output is valid. In addition, we need edge in the output paths, the demand value of the given
to ensure that the chosen edges are consecutive, and actually pair, is added to the weights. Sometimes, it would be equally
exist in the original road network. This is done by keeping efficient to go from point A to point B using the same set of
track of all visited nodes and edges, as well as a count of edges, but through different lines, in this case, we split the
finished routes, and number of nodes accumulated so far in demand among the parallel lines, assuming that they would
the current route. be uniformly used by the passengers.
4) Output Pre-processing and Reward Calculation: After 5) Reward Calculation: The following equation is used
the decoding process is finished and the solutions are formed, to calculate the reward value:
the costs of the decoded solutions, which denote the scalar
R(π) = W1 · AT T (π) + W2 · P SD(π)+
values U and O in the problem formulation, must be (6)
calculated. W3 · P U D(π) + W4 · N F S(π)
However, these calculations are not straightforward, and All Wi values represent the priorities of each of the metrics
require a lot of additional information which are inferred in the reward function. A higher weight increases the priority
from solutions. Information about travel times, or rather of this specific metric. If all weights are equal, all metrics
distances, between the different pairs of stops in the solution are given the same priority. Every addend is normalised
network is required. In addition, knowledge of the exact paths
taken to achieve a trip between any two pair needs to be
gathered. This knowledge constitutes the number of transfers
needed in a trip and the lines used. To be able to calculate the
reward value of the decoded solution, we therefore require
each of the following:-
1) A travel-time matrix for the decoded network, N T T ,
which have the time needed to transfer from one line
to another included in the values.
2) A matrix representing the needed number of transfers
Fig. 1. Modification of an example graph. Each colour represents a line,
for each OD pair, T P P . while blue represents the transfer edges. As explained, the weight tuple
3) A version of the output graph where the demand is changes from (travel time) to (travel time, number of transfers). The transfer
assigned to the edges, DAG. In other words, the penalty for this example is 5.

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on September 21,2023 at 03:37:01 UTC from IEEE Xplore. Restrictions apply.
before being multiplied by the coefficient, in order to reduce IV. EXPERIMENTS AND RESULTS
variance during training. A. Data and Hardware Description
The first addend represents the Average Travel Time for
Our system was implemented using PyTorch2 . We ran our
all satisfied passengers, incorporated with the pre-defined
code on Google Colaboratory3 , which provides users with
transfer penalty, P .
an Intel Xeon processor, a Tesla T4 GPU with a memory of
T IV T (π) + P ×
P
T P P (π) 16GB, a RAM of ∼12GB.
AT T (π) = P (7) Mandl’s Network[4], along with its demand matrix, was
OD
used to train our model. As mentioned in the introduction
where: section, this network is used as a benchmark for papers
X proposing transit network planning algorithms. The network
T IV T (π) = OD ◦ N T T constitutes 15 different Swiss cities, connected by a road
network. The network consists of 21 edges, each having the
The second addend, Percentage of Satisfied Demand, is travel time as a weight. The network is shown in figure 2.
a measure of the percentage of the satisfied trips. It is The passenger demand is represented as a matrix for the
calculated by calculating of trips completed with 0, 1, and 2 peak hour, summing up to a total of 15,570 passengers. For
transfers. To always maximise direct trips, indirect trips are simplicity, the OD matrix is symmetrical and the travel time
added to the measure as a penalty. This function contains for each direction of an edge is equal.
weighting coefficients, ωi , similar to the ones in equation 6.
B. Model Training
P D(π) = ω1 × [−P (T P P0 (π))] + Regarding the model, we utilise the same hyperparameters
(8)
ω2 × P (T P P1 (π)) + ω3 × P (T P P2 (π)) as those used by Kool et al.[16] for learning the Travelling
Salesman Problem:-
where:
• Weights initialised uniformly between
P h √ √ i
OD ∗ T P Px −1/ d, 1/ d , where d is the input dimension.
P (T P Px ) = P
OD • 3 encoder layers, with input and hidden sizes of 128
• 8 attention heads
The function, T P Px , converts a matrix into a Boolean −4
• Learning rate of η = 10
mask where any element equivalent to any of the elements
• The optimiser used is Adam[21] along with Batch
in the list, x, is set to one, otherwise the element is set to 0.
Normalisation
The third addend is simply the Percentage of Unsatisfied
Demand, P (T P Px>2 ). As to TNDFSP-specific hyperparameters, we set the fol-
The fourth and the last addend is the Needed Fleet Size lowing values, which are similar to those found in literature,
for our network, which is dependent on, and can be inferred to make comparisons more valid:-
from, the chosen waiting times per line. The waiting time 2 https://pytorch.org/
is assumed to be half the headway value, which is the 3 https://colab.research.google.com/

reciprocal of the frequency of buses in a given time period.


This frequency can be calculated for a given line using this
equation proposed by Ceder[1]:-

LOADlmax
fl = (9)
LF . CAP
LOADlmax is the maximum load of a given line, i.e., it is
the highest amount of passengers a line has to satisfy in a
given period of time. LF is the pre-defined load factor of
the vehicles’ seats. CAP represents pre-defined the capacity
of the operator’s vehicles.
The maximum load can be easily determined by examining
the graph, DAG, where the edge with the highest demand in
each line is identified. Once we have each line’s frequency,
we can sum all the values to determine the value N F S and
injecting it into the equation.
For trips which involve one or two transfers, we assume
that transfers divides the trips into separate trips, each with
its own waiting time, according to the IDs of used lines
computed by Dijkstra’s algorithm. We then sum up the values
at each segment. Fig. 2. Mandl’s Benchmark Network. Edge weights represent travel times.

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on September 21,2023 at 03:37:01 UTC from IEEE Xplore. Restrictions apply.
• Number of routes in the network = {4, 5, 6} and Cunha. When outputting 4 routes, the model was able
• Minimum number of nodes per route = 3 to save up operator costs by 14.3%, at an increase of only
• Maximum number of nodes per route = 11 11.6% to the average headway, and 0.87% in travel time. In
• Maximum route length in terms of travel time units = a practical sense, the increase of the average headway is only
35 25.8 seconds, which can seem trivial when compared to the
• Transfer Penalty = 25 12 buses the operator did not have to use to meet almost the
• Maximum load factor of vehicles = 1.25 same directness and travel time. The 6-route model produced
• Vehicle Capacity = 40 almost the same numbers.
• Coefficients of the reward function (it should be noted As to networks of 5 routes, the 6-route model was able
that these weights would prioritise directness and oper- to produce a network of superior directness, and requires 5
ator costs):- less buses. Other metrics were insignificantly longer, with
the average travel time being 9.6 seconds longer, and the
W1 = 2 W2 = 1.5 W3 = 5 W4 = 2
average headway being more than a minute longer. Again,
ω1 = 2 ω2 = 1.5 ω3 = 1 practically, this is negligible in contrast to the reduction in
For the first experiment, the model was trained on Mandl’s operator expenditure and the directness.
Network for 100 epochs, each consisting of 30 batches, Finally, our model generated 6-route networks of better
containing 256 input instances each. The number of routes directness with the 6-route model. It also reduced average
was 4. The batch size could not be made bigger due to travel time by about 35 seconds, and the average headway
memory constraints, but increasing the size would have by about 20 seconds. On the other hand, the model cost the
positively contributed to the efficiency of the model training. operator 4 more buses than the state-of-the-art network.
On average, a batch took 25 seconds for 4-route training, and
90 seconds for 6-route training. The progress of training the V. CHALLENGES
4-route model is shown in figure 3. As for evaluation, we
decoded 100 instances using a sampling decoding strategy, One of the most important factors that would make our
and picked the best network. deep learning approach more sustainable and quicker to train
After training, we experimented using the same model to on different architectures and reward functions is the concur-
output a network of 5 and 6 routes, to get a feel of how well rency of the reward function. However, due to the difficulty
generalised our model is. However, the model kept outputting making specific parts of our reward function parallel, like
copies of the original four routes. Dijkstra’s function and the Graph Modifier, training was
Following the experiments of the 4-route-model, we de- slowed down by those sequential bottlenecks. This prevented
cided to continue its training, setting the number of routes us from trying out enough variations of the weights, or trying
to 6. After the training, we tried outputting networks with a to have the model train on outputting a diverse number of
fewer number of routes. The results were more promising, routes. What also necessitates the parallelism of our reward
and will be be discussed in the following subsection. function is the need for optimising bigger networks, that have
a number of stations in the order of the hundreds.
C. Results and Comparison to State-of-the-Art Networks
As shown in table I in the next page, the 4-route model VI. CONCLUSION
was able to produce very close results to those of Arbex
In this paper, we proposed using Deep Reinforcement
Learning to solve one of the oldest well-established problems
in public transportation, the TNDFSP. This approach was
never used before in literature, and it obviates the need for
heuristics of any kind for reaching a good solution. The
proposed system showed great potential of finding optimised
solutions that satisfy both the passengers and operators
objectives. The networks produced by our model were very
competitive in comparison with state-of-the-art algorithms,
when applied to Mandl’s benchmark network.
In the future, we can improve the process of training the
model by finding parallel alternatives to Dijkstra’s and the
Graph Extender, to alleviate the sequential bottleneck. This
will deem the training of the model to generate larger net-
works more feasible. In addition, several Origin-Destination
matrices can be fed to the model during training to allow it
to generalise better to network demand patterns over longer
periods. Consequently, the model will be able to produce
Fig. 3. Change of the loss value during training the model for 4 routes high-quality networks given unseen demand matrices.

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on September 21,2023 at 03:37:01 UTC from IEEE Xplore. Restrictions apply.
TABLE I
C OMPARISON BETWEEN OUR PROPOSED ARCHITECTURE (DRL) AND THE STATE - OF - THE - ART

Average
Maximum Average
# of # of Direct 1-transfers 2-transfers In-vehicle
Source Headway Headway
Lines Buses/hour Trips % % % Travel
(min) (min)
Time (min)
Chakroborty &
105 89.98 10.02 0.00 4.06 4.06 13.10
Wivedi[6]
Nikolić &
94 91.91 8.09 0.00 4.32 3.14 11.32
Teodorović[22]
Arbex &
84 98.27 1.73 0.00 7.50 3.70 11.45
4 Cunha[9]
Jha &
- 97.17 2.83 0.00 - - 13.41
et al.[10]
DRL
72 98.01 1.99 0.00 8.06 4.13 11.55
(Trained for 4 routes)
DRL
72 98.01 1.99 0.00 8.06 4.13 11.50
(Trained for 6 routes)
Arbex &
81 98.20 1.80 0.00 7.60 4.25 11.08
Cunha[9]
5
DRL
76 99.04 0.96 0.00 11.3 5.58 11.24
(Trained for 6 routes)
Arbex &
79 98.20 1.80 0.00 8.22 5.94 11.55
Cunha[9]
Jha &
6 - 98.46 1.54 0.00 - - 12.61
et al.[10]
DRL
83 99.04 0.96 0.00 11.32 5.61 10.98
(Trained for 6 routes)

R EFERENCES [14] Dai, H., Khalil, E. B., Zhang, Y., Dilkina, B., & Song, L. (2017).
Learning combinatorial optimization algorithms over graphs. In
[1] Ceder, A., & Wilson, N. H. M. (1986). BUS NETWORK DESIGN. NIPS’17 Proceedings of the 31st International Conference on Neural
Transportation Research Part B-Methodological, 20(4), 331–344. Information Processing Systems (pp. 6351–6361).
[15] Nazari, M., Oroojlooy, A., Snyder, L., & Takac, M. (2018). Rein-
[2] Magnanti, T. L., & Wong, R. T. (1984). Network Design and Trans-
forcement Learning for Solving the Vehicle Routing Problem. In NIPS
portation Planning: Models and Algorithms. Transportation Science,
2018: The 32nd Annual Conference on Neural Information Processing
18(1), 1–55.
Systems (pp. 9861–9871).
[3] Valérie Guihaire and Jin-Kao Hao. “Transit network design and
[16] Kool, W., Hoof, H. van, & Welling, M. (2019). Attention, Learn to
scheduling: A global review”. In: Transportation Research Part A-
Solve Routing Problems! In ICLR 2019: 7th International Conference
policy and Practice. 42.10 (2008), pp. 1251–1273.
on Learning Representations.
[4] Mandl, C. E. (1980). Evaluation and optimization of urban public [17] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez,
transportation networks. European Journal of Operational Research, A. N., . . . Polosukhin, I. (2017). Attention Is All You Need. ArXiv
5(6), 396–404. Preprint ArXiv:1706.03762.
[5] Baaj, M. H., & Mahmassani, H. S. (1991). An AI-based approach [18] Robert W. Floyd. “Algorithm 97: Shortest Path”. In: Commun. ACM
for transit route system planning and design. Journal of Advanced 5.6 (June 1962), pp. 345–. ISSN: 0001-0782.
Transportation, 25(2), 187–209. [19] Stephen Warshall. “A Theorem on Boolean Matrices”. In:J. ACM 9.1
[6] Chakroborty, P., & Wivedi, T. (2002). OPTIMAL ROUTE NETWORK (Jan. 1962),pp. 11–12. ISSN: 0004-5411.
DESIGN FOR TRANSIT SYSTEMS USING GENETIC ALGO- [20] Edsger W. Dijkstra. “A note on two problems in connexion with
RITHMS. Engineering Optimization, 34(1), 83–100. graphs”. In:Numerische Mathematik 1.1 (1959), pp. 269–271.
[7] Kidwai, F. A., Deb, K., Marwah, B. R., & Karim, M. R. (2005). [21] Kingma, D. P., & Ba, J. L. (2015). Adam: A Method for Stochastic
A GENETIC ALGORITHM BASED BUS SCHEDULING MODEL Optimization. In ICLR 2015: International Conference on Learning
FOR TRANSIT NETWORK. Representations 2015.
[8] Zhao, F., & Zeng, X. (2006). Optimization of transit network lay- [22] Nikolić, M., & Teodorović, D. (2013). Transit network design by
out and headway with a combined genetic algorithm and simulated Bee Colony Optimization. Expert Systems With Applications, 40(15),
annealing method. Engineering Optimization, 38(6), 701–722. 5945–5955.
[9] Arbex, R. O., & Cunha, C. B. da. (2015). Efficient transit network
design and frequencies setting multi-objective optimization by alter-
nating objective genetic algorithm. Transportation Research Part B-
Methodological, 81, 355–376.
[10] Jha, S. B., Jha, J. K., & Tiwari, M. K. (2019). A multi-objective meta-
heuristic approach for transit network design and frequency setting
problem in a bus transit system. Computers & Industrial Engineering,
130, 166–186.
[11] Lin, Q., Li, J., Du, Z., Chen, J., & Ming, Z. (2015). A novel multi-
objective particle swarm optimization with multiple search strategies.
European Journal of Operational Research, 247(3), 732–744.
[12] Bengio, Y., Lodi, A., & Prouvost, A. (2018). Machine Learning for
Combinatorial Optimization: a Methodological Tour d’Horizon. ArXiv
Preprint ArXiv:1811.06128.
[13] Bello, I., Pham, H., Le, Q. V., Norouzi, M., & Bengio, S. (2017).
Neural Combinatorial Optimization with Reinforcement Learning. In
ICLR (Workshop).

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on September 21,2023 at 03:37:01 UTC from IEEE Xplore. Restrictions apply.

You might also like