Sustainability 15 05618

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

sustainability

Article
GPS Data Analytics for the Assessment of Public City Bus
Transportation Service Quality in Bangkok
Rathachai Chawuthai 1 , Agachai Sumalee 2 and Thanunchai Threepak 1, *

1 School of Engineering, King Mongkut’s Institute of Technology Ladkrabang, Bangkok 10520, Thailand;
rathachai.ch@kmitl.ac.th
2 School of Integrated Innovation, Chulalongkorn University, Bangkok 10330, Thailand; agachai.s@chula.ac.th
* Correspondence: thanunchai.th@kmitl.ac.th; Tel.: +66-2329-8341 (ext. 114)

Abstract: Evaluation of the quality of service (QoS) of public city buses is generally performed using
surveys that assess attributes such as accessibility, availability, comfort, convenience, reliabilities,
safety, security, etc. Each survey attribute is assessed from the subjective viewpoint of the service
users. This is reliable and straightforward because the consumer is the one who accesses the bus
service. However, in addition to summarizing personal feedback from humans, using data analytics
has become another useful method for assessing the QoS of bus transportation. This work aims to use
global positioning system (GPS) data to measure the reliability, accessibility, and availability of bus
transportation services. There are three QoS scoring functions for tracking complete trips, on-path
driving, and on-schedule operation. In the analytical process, GPS coordinates rounding is adopted
and applied for detecting trips on each route path. After assessing the three QoS scores, it has been
found that most bus routes have good operations with high scores, while some bus routes show room
for improvement. Future work could use our data to create recommendations for policy makers in
terms of how to improve a city’s smart mobility.

Keywords: bus transportation; GPS data analytics; quality of service; smart city; smart mobility;
urban informatics

1. Introduction
Citation: Chawuthai, R.; Sumalee, A.;
Threepak, T. GPS Data Analytics for City bus transportation is a public transportation option that is commonly used in
the Assessment of Public City Bus many countries as it supports the growing transportation demand and takes into account
Transportation Service Quality in affordability for passengers [1]. Thus, having qualified bus services becomes a key factor
Bangkok. Sustainability 2023, 15, 5618. for smart life in a city. In this case, before enhancing the service quality, we need to
https://doi.org/10.3390/su15075618 understand the current quality of service (QoS) of bus transportation, then improve it
Academic Editor: Juneyoung Park
point by point. The QoS of city bus transportation is generally measured by user surveys:
e.g., Wethyavivorn and Sukwattanakorn [2], Ueasangkomsate [3], Chan et al. [4], Page and
Received: 30 January 2023 Yue [5], and Goyal et al. [6]. These studies found that the common issues are accessibility,
Revised: 18 March 2023 availability, reliability, security, and comfortability. As to research from Thailand, the
Accepted: 20 March 2023 authors of [2,3] stated that passengers in particular areas of Bangkok had serious concerns
Published: 23 March 2023
about the physical facilities and service reliability. The results of [3] were reported to
the government to help it plan policies for enhancing the efficiency of public buses. The
relevant works are reviewed in Section 2 and summarized in Table 1.
Copyright: © 2023 by the authors.
As can be seen, survey results help a city to explore issues from the viewpoints of
Licensee MDPI, Basel, Switzerland. users in order to improve bus services. It is well known that survey results depend on the
This article is an open access article individual. This means that obtaining feedback from a large number of people can reflect
distributed under the terms and most of the problems and needs of citizens. However, in the age of data technology, using
conditions of the Creative Commons data to measure the quality of service of city bus transportation has become another way to
Attribution (CC BY) license (https:// understand the issues. Thus, this work aims to contribute data for measuring the QoS of
creativecommons.org/licenses/by/ bus transportation by focusing on the aspects of accessibility, availability, and reliability,
4.0/). which can benefit directly from data analytics.

Sustainability 2023, 15, 5618. https://doi.org/10.3390/su15075618 https://www.mdpi.com/journal/sustainability


Sustainability 2023, 15, 5618 2 of 23

To conduct data analytics in the transportation domain, global positioning system


(GPS) data are the key to exploring useful results, e.g., travel time variability analytics [7],
GPS data processing methods [8], the transfer time and waiting time of bus passengers [9],
rural–urban migrant analytics during the COVID-19 pandemic [10], traffic monitoring [11],
predictive transportation [12,13], road defect monitoring [14], etc. Thanks to a joint project
between King Mongkut’s Institute of Technology Ladkrabang and the Department of Land
Transport of Thailand, global positioning system (GPS) data from buses and city data
from Bangkok have been beneficial to our research study. This work aims to use GPS
data analytics and spatial data analytics to improve the quality of bus transportation in
Bangkok, Thailand. As there are several previous works using buses’ GPS data to examine
aspects such as travel time, transfer time, waiting time, and number of transfers [7,9], this
work focuses on other issues under the criteria of reliability, accessibility, and availability.
First, in terms of reliability, we assess whether every bus route provides the number of
completed trips promised. Second, to gauge accessibility, we assess whether every driving
route covers the whole route path. Last, to demonstrate the availability, we assess the
frequency of every bus sticking to its timetable. In this way, we aim to measure the quality
of services (QoS) of bus transportation by the following objectives:
1. To track complete bus trips based on commitments.
2. To track on-path driving following operated routes.
3. To track on-schedule operation according to schedule conditions.

Table 1. Summary of literature studies on the uses of GPS technology for transportation and the
quality of service of bus transportation.

Reference Year Summary Lesson Learned


Reviewed the uses of the GPS technology of Survey purposes, samples, and techniques
Shen and Stopher [8] 2014 mobile phones and applications for travel of each work; details of five steps for GPS
survey from many works. data processing.
Used GPS data to analyze travel time Some significant factors, such as segment
Mazloumi
2010 variability against other length, off-peak time, etc., contributing to
et al. [7]
traffic-related factors. the travel time variability.
The results of travel time, transfer time,
Gschwendar Used data from smart cards and GPS for
2016 number of transfers, and waiting time are
et al. [9] analyzing the uses of buses.
beneficial for policy makers.
The criteria of accessibility, reliability,
Conducted user surveys before and after
comfort, safety, customer satisfaction, and
Chan et al. [4] 2020 installing the GPS system for
customer loyalty were higher after having
monitoring buses.
the GPS system.
The public transportation matrix, including
Studied the public transportation quality availability, accessibility, information, time,
Page and Yue [5] 2009
matrix for tourism. customer care, comfort, security, and
environment, was reviewed.
Proposed multicriteria decision making for It results in some important criteria, such as
Goyal et al. [6] 2022 finding significant criteria for evaluating the total number of vehicles, scheduled vehicles,
quality of a bus depot. operated vehicles, off-road vehicles, etc.
Wethyavivorn and Conducted user surveys about travel There needs to be an improvement in bus
2019
Sukwattanakorn [2] patterns, modes, and ratings. frequency, and precise schedules.
Five dimensions, tangibility, reliability,
responsiveness, assurance, and access, are
Conducted user survey about the QoS of
Ueasangkomsate [3] 2019 analyzed; the improvement of some
public transportations.
attributes under these dimensions
is required.

Our approach defines three scoring levels, QoS-1, QoS-2, and QoS-3, to describe all
objectives. Taking a closer look at the situation of the management of public city bus
transportation in Bangkok, there are four challenges that our work faces. First, there is
no wireless sensor detecting a bus at a bus stop; as some works have mentioned [15,16],
analyzed; the improvement of some attrib-
utes under these dimensions is required.

Our approach defines three scoring levels, QoS-1, QoS-2, and QoS-3, to describe all
Sustainability 2023, 15, 5618 objectives. Taking a closer look at the situation of the management of public city bus trans- 3 of 23
portation in Bangkok, there are four challenges that our work faces. First, there is no wire-
less sensor detecting a bus at a bus stop; as some works have mentioned [15,16], the ana-
lytics
the of GPS transactions
analytics with route
of GPS transactions withpolylines are adopted
route polylines to detectto
are adopted trips of buses.
detect Second,
trips of buses.
a bus route in Bangkok could take several different courses
Second, a bus route in Bangkok could take several different courses depending on thedepending on the demand
from passengers
demand and the and
from passengers strategies of the bus
the strategies operators.
of the There There
bus operators. must bemust main routes
be main on a
routes
bus route, but it is also possible to have subpaths, which are shorter
on a bus route, but it is also possible to have subpaths, which are shorter versions of the versions of the main
path, and
main path,split
andpaths, whichwhich
split paths, diverge from the
diverge main
from thepath
maintopath
go totoother
go todestinations. Third,
other destinations.
a bus can
Third, choose
a bus can anychoose paths
anyinpaths
a day in following the schedule
a day following conditionsconditions
the schedule from a busfrom route a
provider, so we need to use data analytics to detect the path that
bus route provider, so we need to use data analytics to detect the path that a bus drove a bus drove through.
Last, there
through. is no
Last, executable
there timetabletimetable
is no executable to showto the departure
show time. Intime.
the departure fact, Inschedule condi-
fact, schedule
tions only provide the number of trips in any time period, while
conditions only provide the number of trips in any time period, while bus providers bus providers manage
the departure
manage time by themselves.
the departure time by themselves.
Due to tothese
theseissues,
issues,data
data analytics
analytics on on
GPSGPS datadata
and and
otherother datasets
datasets is mainly
is mainly employed em-
ployed
to to determine
determine the QoSthe QoS In
scores. scores. In this
this case, ourcase,
methodour method
provides provides four phases,
four phases, input,
input, prepro-
preprocessing,
cessing, scoring, scoring, and output,
and output, as depicted
as depicted in Figurein Figure
1. Input1. Input
data aredatatheareGPSthetransaction
GPS trans-
action
of of the
buses, buses, the polyline
polyline of every ofbus
every bus and
route, route,
theand the schedule
schedule conditionsconditions
of all busof all bus
routes.
To workTowith
routes. work GPSwith data,
GPSthe techniques
data, of GPS
the techniques of coordinates
GPS coordinatesrounding
roundingis adopted at the
is adopted at
preprocessing
the preprocessing phase. Then,
phase. bus trips
Then, and metadata
bus trips and metadata are calculated in order
are calculated to measure
in order three
to measure
QoS
threescoring functions.
QoS scoring Our work
functions. resulted
Our work in thein
resulted QoS
thescore of each
QoS score ofbus
eachroute for thefor
bus route three
the
months of the of
three months lastthequarter of 2021,
last quarter of and
2021,found that there
and found that was
thereroom
was for
roomimprovement
for improvement in the
sustainability of bus of
in the sustainability transportation services.
bus transportation services.

input pre-processing scoring output

Trajectory- Complete
Bus GPS Bus-Trip Bus Trips QoS-1 Score
Route
Transactions Detecting Tracking
(3.2.1)
Matching
(3.4.1) (3.5)
(3.3.2)

Path Bus Driving


Bus Route On-Path QoS-2 Score
Bounding Box Route
polylines Detecting
Calculating Tracking
(3.2.2) (3.4.2)
(3.3.1) (3.6)

Bus Bus
Schedule QoS-3 Score
Schedule
Conditions Tracking
(3.2.3) (3.7)

Figure 1. Our overall approach. The details of


of each
each module
module are
are described
described by
by the
the number
number of
of subsec-
subsec-
tions in parentheses.
tions in parentheses.

This manuscript contains five sections. The first provides an overall introduction to
our work. Second, we review the uses of GPS in transportation, the quality of service of
bus transportation, and the technical methods of GPS data processing. The third section
explains about
aboutthe
thedata
dataand
and proposed
proposed methods
methods for for calculating
calculating the three
the three QoS scoring
QoS scoring func-
functions. The fourth
tions. The fourth section
section demonstrates
demonstrates the results
the results of our analytical
of our analytical methodsmethods in the
in the form of
form
tablesofand
tables andtogether
charts, charts, together with a discussion.
with a discussion. In the
In the last last section,
section, a summary
a summary and
and recom-
recommended
mended futurefuture work based
work based on ouron our approach
approach are provided.
are provided.

2. Literature Review
This section studies the uses of GPS technology for transportation and the QoS of
bus transportation in several works, which are summarized in Table 1. In addition, the
technique of GPS coordinates, which is used to analyze spatial data, is reviewed.
Sustainability 2023, 15, 5618 4 of 23

2.1. The Uses of GPS Technology for Transportation


GPS technology has been used in the transportation domain for decades [8]. Shen
and Stopher [8] found that there were many attempts to use GPS technology in addition to
traditional survey methods, for example, to monitor travel behavior changes, route choice,
residential selection, etc. Based on the coordinates data gathered from smartphones and
GPS devices, they analyzed spatial data to assess trips, travel time, activities, etc. This
work also summarized the processing steps of GPS data: preprocessing, trip identification,
mode detection, purpose imputation, and analytical results. GPS data analytics can give
insight into public transportation, as studied by Mazloumi et al. [7]. This work used GPS
transactions from buses in Melbourne, Australia to determine the travel time variability.
The standard deviation of travel time was explored with a period of four hours per day.
Since a high value leads to poor performance in transportation, they found that the factors
of section length (km), number of signalized intersections per km, and number of stops
per km contributed to the increase in this value; while off-peak time and industrial area
provided a lower value. This result can assist bus operators with planning their bus
schedules so that the arrival time corresponds to the actual situation. In addition, working
with other data helps to gather more useful results—for example, Gschwendar et al. using
smart card and GPS data [9]. The analytics of using smart cards as payment for bus services
resulted in data on travel time, transfer time, number of transfers, and waiting time as well
as the passenger demands. Based on the analytical results of these indicators under the
dimensions of time and space, the public transport authority and bus operators could work
together to improve policies and transportation plans to truly meet the needs of users.

2.2. Quality of Service of Bus Transportation


As urban bus services are readily available as an affordable, accessible, and sustain-
able mode of transportation, they are crucial to the movement of people inside cities [1].
However, the QoS of urban bus systems is often inadequate, which can negatively impact
ridership and lead to a decline in the overall performance of the system. There has been
research devoted to the QoS of public transportation, especially bus transportation.
Chan et al. [4] used real-time GPS tracking to improve the quality of bus services. Their
work implemented an application for collecting passengers’ feedback via surveys before
and after installing a real-time GPS tracking system. There were six criteria for assessing
the quality of service: accessibility, reliability, comfort, safety, customer satisfaction, and
customer loyalty. The results showed that all scores after GPS tracking were significantly
higher than before having it. This work also noted that when passengers knew the bus
schedule and actual situation, they were willing to preplan their trip, and were pleased
with the safe and comfortable transit. Thus, this work demonstrated the feasibility of using
GPS tracking for enhancing the quality of service, although it did not use GPS data analytics
to measure the QoS.
To measure the public transportation quality, a tourism matrix was studied in [5]. There
were eight factors considered: availability, accessibility, information, time, customer care,
comfort, security, and environment. The travel modes, such as coach and bus transportation,
cycling, rail travel, cruising, ferries, air transportation, etc. were studied in order to
highlight points of policy and planning issues. All of these aspects can be evaluated using
user surveys; however, to be data-driven as part of smart mobility, some of them such as
availability, accessibility, and time can take advantage of GPS data.
Goyal et al. [6] provided summary statistics of bus quality in Rajasthan State during
2018 and 2019. The major categories are operational service, passenger service, cost effects,
and quality. This work introduced multicriteria decision making for assisting decision
makers with selecting significant criteria for assessing the performance of a bus depot. The
criteria of the operational service are feasible to evaluate by GPS data. These are the total
number of vehicles, number of scheduled vehicles, number of operating vehicles, number
of off-road vehicles, number of scheduled trips, number of operating trips, number of
Sustainability 2023, 15, 5618 5 of 23

extra trips, number of curtailed trips, total number of employees, number of routes, and
route distance.
Other works from Thailand [2,3] surveyed the QoS of public transportation based on
five dimensions: tangibility, reliability, responsiveness, assurance, and access. The authors
analyzed the results and concluded that the perceived quality of service in the Bangkok
metropolitan area and the East region was similarly poor and improvement is required
on some attributes, such as the number of buses, availability, precise bus schedules, buses’
current locations, safety, driver ability, interconnection of the transport system, etc.

2.3. GPS Coordinates Rounding


GPS coordinates are used to precisely identify the location of a point on the Earth’s
surface. However, in some cases, it may be necessary to round the coordinates of a GPS
location to the nearest whole number, in order to obscure the exact location or protect the
privacy of individuals. This process is known as GPS coordinates rounding [17–20]. One
approach to GPS coordinates rounding is to use a “rounding box.” A rounding box is a
geographic area within which the GPS coordinates of a location will be rounded to the
same whole number [17]. For example, a rounding box of size 2 would round the GPS
coordinates of all locations within the box to be the same location, digits of the coordinate
(13.34213, 100.42345) being (13.34, 100.42). Several works have employed the technique of
GPS coordinates rounding. Huang et al. [17] used rounding boxes of a route to find the
intersecting parts of two routes. Elevelt et al. [18] used locations from surveys to summary
citizens’ activities by areas in the Netherlands, and also applied three-digit rounding boxes
that bound spatial precision areas to about 100 m. Ciociola et al. [19] employed rounding
boxes at three decimals of GPS location for analyzing trips made by electronic scooters in
the USA. Payyanadan et al. [20] introduced a method to measure the risks of routes for older
drivers. This research used different rounding decimals, four-digit rounded latitudes and
three-digit rounded longitudes, due to the curvature degree of the earth at the focus area.

3. Materials and Methods


As seen from the review in Section 2 and the summary in Table 1, there is a high
possibility of using GPS data to measure the QoS of city bus transportation. Some aspects,
such as travel time, transfer time, number of transfers, waiting time, road conditions, and
time periods, were analyzed by GPS technology [7–9]. In addition, many criteria, such
as accessibility, availability, reliability, comfort, safety, customer satisfaction, customer
loyalty, bus frequency, precise schedules, responsiveness, assurance, etc., were evaluated
by the survey method [2–6]. Based on previous studies, our work aims to further support
the concept of using GPS data for measuring the QoS of city bus transportation. In our
work, due to the datasets available and some issues raised in [2,3], the criteria of reliability,
accessibility, and availability are underlined in terms of complete trips (QoS-1), on-path
driving (QoS-2), and on-schedule operation (QoS-3).
To achieve our objectives, QoS-1, 2, and 3 were evaluated by step-by-step processing
of the input data; our overall work is displayed in Figure 1. There are four main steps:
input, preprocessing, scoring, and output.
First, the input datasets are (1) bus GPS transactions containing bus identifiers, route
numbers, coordinates, speeds, and timestamps; (2) bus route polylines, which are sequence
sets of coordinates of fixed route paths; and (3) bus schedule containing conditions of each
bus route path. Details are given in Section 3.2.
Second, preprocessing is to process input data in order to prepare clean data for the
scoring phase. This involves bounding box calculation and trajectory route matching. The
path bounding box calculation creates a polyline of any bus route path into a set of rounding
boxes in order to calculate the route matching in the next step. Moreover, trajectory route
matching verifies that the location of a bus is along its route path. Further explanation is
given in Section 3.3.
Sustainability 2023, 15, 5618 6 of 23

Third, bus trips are analyzed in order to input data for calculating the three QoS scores.
The scores are for complete trip tracking, bus-driving route tracking, and bus schedule
tracking. This is discussed in Sections 3.4–3.7.
QoS-1, QoS-2, and QoS-3 scores are the output of the three steps.

3.1. Definitions
Our method introduces various terms, defined as follows:
- p (e.g., p1): An original coordinate point that is a relation comprising of latitude
and longitude.
- p with a dot (e.g., p1.1): An inner point between original coordinate points.
- p* (e.g., p*1, p*1.1): A rounding box of a coordinate point p.
- p*(x,y) (e.g., p*1(+1,+2) ): A neighbor of a p*. For example, if the 2-decimal rounding
box p*1 is (13.00, 100.00), the neighbor p*1(+1,+2) is (13.00 + 1 × 0.01, 100.00 + 2 × 0.01)
being (13.01, 100.02).
- P (e.g., P1): A path that is a sequence set of p.
- P* (e.g., P1*): A path P whose points are rounded.
- P** (e.g., P1**): A path that contains all neighbors of all coordinate points from P*.
- POR (b*, P**): A function to detect a point of bus (b*) on a path (P**).

3.2. Data Preparation


There are three main input datasets: (1) GPS transaction data, (2) bus route polylines,
and (3) bus schedule conditions. It is noted that some sensitive data such as bus identifiers
and route numbers are transformed into alternative labels in order to preserve the privacy
of data.

3.2.1. GPS Transaction Data


A GPS transaction dataset stores GPS data from all buses every minute. There is a GPS
box in every bus, and it sends current data to a server. Each entry includes the bid (bus
identifier), route (route number), ts (timestamp), lat (latitude), lon (longitude), and speed
(speed in km/h). Example data are presented in Table 2. These are GPS transaction entries
of a bus with the route number R7234. As we mentioned, the route number is an alias and
does not exist in Thailand.

Table 2. Example GPS data of a bus on route R7234. In this table, the bid is a bus identifier, route is a
route number, ts is a timestamp, lat is a latitude, lon is a longitude, and speed is a speed in kilometers
per an hour.

Bid Route Ts Lat Lon Speed


8ead83c5 R7234 2021-10-20 09:39:21 13.729222 100.641610 33
8ead83c5 R7234 2021-10-20 09:40:36 13.721500 100.642138 31
8ead83c5 R7234 2021-10-20 09:41:36 13.713388 100.643667 63
8ead83c5 R7234 2021-10-20 09:42:21 13.709083 100.644722 57
8ead83c5 R7234 2021-10-20 09:42:36 13.706860 100.645250 51
... ... ... ... ...

3.2.2. Bus Routes Polylines


This dataset contains information on the path polylines of each bus route. In Thailand,
one route number might have more than one path. These are analyzed into four cases,
as depicted in Figure 2. First, as in Figure 2(1), there is one main path with only the go
direction. This case is generally a loop transit. Second, as in Figure 2(2), there is a beginning
point and an end point having a main path with go and back directions. Third, as in
Figure 2(3), there is a subpath from the main path. This is if a bus provider considers
shortening a path due to the demand of passengers during rush hour. The end point of this
case is still in the main path. Any subpaths must be reported to the government authority.
land, one route number might have more than one path. These are analyzed into four
cases, as depicted in Figure 2. First, as in Figure 2(1), there is one main path with only the
go direction. This case is generally a loop transit. Second, as in Figure 2(2), there is a be-
ginning point and an end point having a main path with go and back directions. Third, as
Sustainability 2023, 15, 5618 in Figure 2(3), there is a subpath from the main path. This is if a bus provider considers7 of 23
shortening a path due to the demand of passengers during rush hour. The end point of
this case is still in the main path. Any subpaths must be reported to the government au-
thority. Last, as in Figure 2(4), some bus providers have a split path to another end point.
Last, as in Figure 2(4), some bus providers have a split path to another end point. For
For example, when there is a new point of interest such as a new department store, a bus
example, when there is a new point of interest such as a new department store, a bus
provider considers having a split path to that new place.
provider considers having a split path to that new place.

main path / go
(1) begin
point

main path / go
begin end
(2)
point point
main path / back

main path / go
begin end
(3) main path / back
point sub path / go point
sub end
point
sub path / back
main path / go
begin end
(4)
point split path / go main path / back point

split path / back

split end
point

Figure
Figure2.2.Behaviors
Behaviorsofofbus
busroutes
routesand
andpaths
pathsinin
Thailand.
Thailand.(1)(1)
AA loop path.
loop (2)(2)
path. A two-direction path.
A two-direction path.
(3) A main path and subpath. (4) A main path and split
(3) A main path and subpath. (4) A main path and split path.path.

Due
Duetotothe
thedetails
detailsofofroutes
routes and
and paths described in
paths described in the
the previous
previousparagraph,
paragraph,ananexample
exam-
ple
of of a bus
a bus route
route polylinesdataset
polylines datasetisispresented
presentedin in Table 3, withwith route,
route,path_id,
path_id,path_type,
path_type,
direction,
direction,and
andpolyline.
polyline.EachEachentry
entryininthis
thistable
tableisisaasingle
singlepath,
path,where
whereone
oneroute
routecan
canhave
have
many
many paths due to the type and direction of the path. In addition, one route must haveaa
paths due to the type and direction of the path. In addition, one route must have
main
mainpath
pathwith
withonly
onlydirection,
direction,gogoororback,
back,but butmay
mayhavehavemanymanysplit
splitpaths
pathsand
andsubpaths.
subpaths.
- - route: a route number.
route: a route number.
- - path_id:
path_id:a aunique
uniqueidentifier
identifierofofa apath.
path.
- - path_type:
path_type:thethetype
typeofofpath,
path,that
thatcancanbe bemain,
main,split,
split,and
andsub.
sub.
- - direction:
direction:the
thebusbusdirection
directionofofa apath,
path,that
thatcan
canbebegogoandandback.
back.
- - begin_point:
begin_point:the thebegin
beginpoint
pointofofthethepolyline.
polyline.
- - end_point:
end_point:the theending
endingpoint
pointofofthe
thepolyline.
polyline.
- - polyline: the sequence set (array) of coordinates.
polyline: the sequence set (array) of coordinates.

Table 3. Example of bus route polyline data.

route path_id path_type direction begin_point end_point polyline


[(13.81196, 100.54976),
R7234 R7234.00 Main go (13.81196, 100.54976) (13.59013, 100.59738)
(13.81106, 100.54943),...
[(13.76977, 100.64184),
R7234 R7234.01 Split go (13.76977, 100.64184) (13.60081, 100.74983)
(13.76865, 100.64196),...
[(13.60081, 100.74983),
R7234 R7234.02 Split back (13.60081, 100.74983) (13.76977, 100.64184)
(13.60068, 100.74984),...
route path_id path_type direction begin_point end_point polyline
[(13.81196, 100.54976),
R7234 R7234.00 Main go (13.81196, 100.54976) (13.59013, 100.59738)
(13.81106, 100.54943),...
Sustainability 2023, 15, 5618 [(13.76977, 100.64184),8 of 23
R7234 R7234.01 Split go (13.76977, 100.64184) (13.60081, 100.74983)
(13.76865, 100.64196),...
[(13.60081, 100.74983),
R7234 R7234.02 Split back (13.60081, 100.74983) (13.76977, 100.64184)
Table 3. Cont. (13.60068, 100.74984),...
[(13.76977, 100.64184),
R7234
route
R7234.03
path_id
Subpath_typego (13.76977, 100.64184)
direction begin_point
(13.59004, end_point
100.59742)
(13.76946, polyline
100.64187),...
[(13.76977,
[(13.59004, 100.64184),
100.59742),
R7234R7234.04
R7234 R7234.03 Sub Sub back go
(13.59004, (13.76977, 100.64184)
100.59742) (13.59004,
(13.76977, 100.59742)
100.64184) (13.76946, 100.64187),...
(13.59013, 100.59738),...
[(13.59004, 100.59742),
R7234 R7234.04 Sub back (13.59004, 100.59742) (13.76977, 100.64184)
[(13.74004, 100.49846),
(13.59013, 100.59738),...
R8190 R8190.00 Main go (13.74004, 100.49846) (13.82723, 100.73943) [(13.74004, 100.49846),
R8190 R8190.00 Main go (13.74004, 100.49846) (13.82723, 100.73943) (13.74012, 100.49822),...
(13.74012, 100.49822),...
[(13.82723, 100.73943),
[(13.82723, 100.73943),
R8190
R8190R8190.01
R8190.01Main Main back (13.82723,
back 100.73943) (13.74004,
(13.82723, 100.73943) 100.49846)
(13.74004, 100.49846)
(13.82581, 100.74775),...
(13.82581, 100.74775),...
... ... ... ... ... ... ...
… … … … … … …

The updateddataset
The updated datasetofof bus
bus route
route polyline
polyline datadata
fromfrom
2021 2021 for Bangkok
for Bangkok and itsand its
metro-
metropolitan area has 1085 entries, including 454 routes, as shown in Figure 3; each
politan area has 1085 entries, including 454 routes, as shown in Figure 3; each route has route
has 2.4 paths,
2.4 paths, 0.7 split
0.7 split paths,
paths, andand 0.2 subpaths
0.2 subpaths on average.
on average.

0 10 km

Figure 3. City bus route network in Bangkok and metropolitan area.


Figure 3. City bus route network in Bangkok and metropolitan area.

3.2.3. Bus Schedule Conditions


The bus schedule conditions dataset is a proposal timetable of each bus route. Every
bus provider has to inform the Department of Land Transport about conditions. Since the
original documents are paper-based, our work has collected them into a relational database
as presented in Table 4. Each entry is the condition of a path, and one path can have many
conditions. The fields of this table are in the following list.
- con_id: a condition identifier.
- route: a route number.
Sustainability 2023, 15, 5618 9 of 23

- path_id: a path id.


- begin_time: the beginning time of that condition.
- end_time: the ending time of that condition.
- con_type: a condition type that can be all trips, count, and headway.
- param: a parameter of that condition.

Table 4. Example bus schedule conditions.

con_id route path_id begin_time end_time con_type param


C0001 R7234 R7234.00 05:00 21:00 all-trips 50
C0002 R7234 R7234.00 05:00 21:00 count 50
C0003 R7234 R7234.00 06:00 09:00 headway 18
C0004 R7234 R7234.00 15:00 18:00 headway 10
C0005 R7234 R7234.01 05:00 21:00 all-trips 15
C0006 R7234 R7234.01 05:00 21:00 count 15
C0007 R7234 R7234.02 05:00 21:00 all-trips 15
C0008 R7234 R7234.02 05:00 21:00 count 15
C0009 R7234 R7234.03 06:00 10:00 all-trips 60
C0010 R7234 R7234.03 06:00 10:00 headway 10
C0011 R7234 R7234.04 06:00 10:00 all-trips 60
C0012 R7234 R7234.04 06:00 10:00 headway 10
C0013 R8190 R8190.00 11:00 18:00 all-trips 10
C0014 R8190 R8190.00 11:00 12:00 count 5
C0015 R8190 R8190.00 16:00 18:00 headway 30
... ... ... ... ... ... ...

The value of the field param is dependent on the con_type. First, each path must have
one condition, with con_type being “all trips” in order to check the minimum number of
trips. As in the first entry (con_id = 1), the path_id R7234.00 must have 50 trips. Second, if
the con_type is “count,” the parameter (param) is the number of buses. If the con_type is
“headway,” the parameter is the bus-headway minutes. In this case, the second condition
(con_id = C0002) interprets that the number of bus trips on the path “R7234.00” of the
route “R7234” between 05:00 and 21:00 must be at least 50. Last, the third condition
(con_id = C0003) shows that, between 06:00 and 09:00, the start time of each trip must be
no more than 10 min. Conditions C0013, C0014, and C0015 are set to be example cases in
the next section.

3.3. Path Rounding Boxes Calculating


To create a map match between GPS data and a path, in general, vector techniques
such as the distance from the point to the perpendicular point of the curved surface, and
path similarity, provide high performance and high complexity. Several studies, such
as [17–20] recommended the rasterization of the vector for working with a large amount
of data. Thus, we applied the concepts of rounding boxes from [17] in order to detect bus
trips. In this section, GPS coordinates, path rounding boxes, and trajectory route matching
are described.

3.3.1. GPS Coordinates and Path Rounding Boxes


Since GPS coordinates are a floating point number, it consumes processing time to
find a nearby location. According to [17], a rounding box of a coordinate can be used
as the reference of the same location. For example, the three-digit rounding boxes of
(13.65495, 100.22424) and (13.65477, 100.22410) are (13.655, 100.224) and (13.655, 100.224),
which are considered as approximately the same location. Thus, a path, which is polylines,
can be structured by rounding boxes using the following four steps, together with the
demonstration in Figure 4.
Since GPS coordinates are a floating point number, it consumes processing time to
find a nearby location. According to [17], a rounding box of a coordinate can be used as
the reference of the same location. For example, the three-digit rounding boxes of
(13.65495, 100.22424) and (13.65477, 100.22410) are (13.655, 100.224) and (13.655, 100.224),
which are considered as approximately the same location. Thus, a path, which is polylines,
Sustainability 2023, 15, 5618 10 of 23
can be structured by rounding boxes using the following four steps, together with the
demonstration in Figure 4.

p2 p2
p1.2
p2.1
p1.1
p1 p3 p1 p3

(1) (2)

p2 p2
p1.2 p1.2
p2.1 p2.1
p1.1 p1.1
p1 p3 p1 p3

(3) (4)

p*2 p*(-1,-1) p*(0,-1) p*(1,-1)


p*1.2
p*1.1 p*(-1,0) p*(0,0) p*(1,0)
p*1 p*2.1 p*3
p*(-1,1) p*(0,1) p*(1,1)

(5) (6)

p*2 p*2
p*1.2 p*1.2
p*1.1 p*1.1
p*1 p*2.1 p*3 p*1 p*2.1 p*3

(7) (8)
Figure 4. Steps
Figure 4. Steps toto construct
construct GPS
GPS rounding
rounding boxes.
boxes. (1)
(1) An
An original
original polyline. (2) Inner
polyline. (2) Inner points
points between
between
corner
cornerpoints.
points.(3)
(3)The
Theconstruction
constructionofofaarounding
roundingboxboxgrid.
grid.(4)
(4)Mapping
Mappinga apoint
point into itsits
into rounding
rounding box.
(5) The
box. (5)representation of rounding
The representation box ofbox
of rounding each
ofpoint with awith
each point star asymbol. (6) A guideline
star symbol. for creating
(6) A guideline for
the first-layer
creating neighborsneighbors
the first-layer of a givenofrounding box. (7) The
a given rounding neighbors
box. of the firstofrounding
(7) The neighbors box. (8) All
the first rounding
box. (8) Allofneighbors
neighbors of all
all rounding rounding boxes.
boxes.

Step
Step 1,
1, Figure
Figure 4(1):
4(1): PP represents
represents aa bus
bus path
path that
that is
is aa set
set of
of sequence
sequence points
points pp from
from the
the
begin
begin point
point to
to the
the ending
ending point.
point. For
For example,
example,
P {p1,
P= = {p1, p2,
p2, p3}.
p3}. (1)
(1)
Step2,2,Figure
Step Figure4(2):
4(2):Since
Since most
most points
points on on polylines
polylines are corner
are corner points,
points, a distance
a distance be-
between
tween adjacent points might be far in case of a long straight line. Thus, we need
adjacent points might be far in case of a long straight line. Thus, we need to find inner pointsto find
inner points
between cornerbetween
points.corner points. of
The distance The distance
nearby innerofpoints
nearbycan
inner points can
be adjusted be adjusted
depending on
developers, such as 10 m. For example, as with path P in step (1), the inner points the
depending on developers, such as 10 m. For example, as with path P in step (1), inner
between
points
p1 and between
p2 mightp1 beand
p1.1p2andmight
p1.2.be p1.1 Pand
Thus, canp1.2. Thus, Pascan
be written be written as follows:
follows:

P = {p1, p1.1, p1.2, p2, p2.1, p3}. (2)

Step 3, Figure 4(3–5): All points of P are rounded into rounding boxes. The rounding
digit is customizable by developers. In an area close to the equator such as Thailand, the
size of 0, 1, 3, 4, and 5 -digit rounding boxes is approximately 100 km, 10 km, 100 m, 10 m,
and 1 m, respectively. For example, if the coordinates of pi are p = (13.13243, 100.47386), the
P = {p1, p1.1, p1.2, p2, p2.1, p3}. (2)

Sustainability 2023, 15, 5618 Step 3, Figure 4(3–5): All points of P are rounded into rounding boxes. The rounding11 of 23
digit is customizable by developers. In an area close to the equator such as Thailand, the
size of 0, 1, 3, 4, and 5 -digit rounding boxes is approximately 100 km, 10 km, 100 m, 10 m,
and 1 m, respectively. For example, if the coordinates of pi are p = (13.13243, 100.47386),
3-digit rounding box of p will be p* = (13.132, 100.474). According to step (2), the rounding
the 3-digit
boxes of the rounding
path P is P*box
in of
thepfollowing
will be p* = (13.132, 100.474). According to step (2), the
line:
rounding boxes of the path P is P* in the following line:
P*P*= {p*1, p*1.1,
= {p*1, p*1.2,
p*1.1, p*2,
p*1.2, p*2.1,
p*2, p*3}.
p*2.1, p*3}. (3)
(3)
Step
Step4,4,Figure
Figure4(6–8):
4(6–8): The
Therounding
roundingboxesboxesof ofP*
P*ininthe
theprevious
previoussteps stepscannot
cannotcreate
createaa
continuous route path. In our work, we have to create neighbors
continuous route path. In our work, we have to create neighbors of a rounding box of a rounding box in orderin
to connect all rounding boxes and expand the area of a path.
order to connect all rounding boxes and expand the area of a path. The neighbors areThe neighbors are created
around
createdaaround
box in aallboxdirections. A neighbor
in all directions. is defined
A neighbor by p*(x,y)by
is defined , where
p*(x,y), subscripts x and yx
where subscripts
are the shifting direction of the current p*. For example, if the three-digit
and y are the shifting direction of the current p*. For example, if the three-digit rounding rounding box of p
isbox
p* of
= (13.132, 100.474), the p*
p is p* = (13.132, 100.474), (–1,–1) is (13.132–0.001, 100.474–0.001), which becomes
the p*(–1,–1) is (13.132–0.001, 100.474–0.001), which becomes (13.131,
100.473). In this case,
(13.131, 100.473). In thisthecase,
original p* is represented
the original by p*(0,0)
p* is represented by. p*It (0,0)
means that one-layer
. It means that one-
neighbors are nine boxes, including the original one. If a developer
layer neighbors are nine boxes, including the original one. If a developer chooses two- chooses two-layer
neighbors, there will be 25 boxes. Thus, the number of neighbors including the original
layer neighbors, there will be 25 boxes. Thus, the number of neighbors including the orig-
one is (2n + 1)2 , where n is the number of layers surrounded.
inal one is (2n + 1)2, where n is the number of layers surrounded.
As demonstrated in Figure 4(6,7), the neighbors of the point p*1, including itself, can
As demonstrated in Figure 4(6,7), the neighbors of the point p*1, including itself, can
be p*1(–1,–1) , p*1(0,–1) , p*1(1,–1) , p*1(–1,0) , p*1(0,0) , p*1(1,0) , p*1(–1,1) , p*1(0,1) , and p*1(1,1) . Thus,
be p*1(–1,–1), p*1(0,–1), p*1(1,–1), p*1(–1,0), p*1(0,0), p*1(1,0), p*1(–1,1), p*1(0,1), and p*1(1,1). Thus, P**,
P**, which is a set of neighbors of elements of P*, as shown in Figure 4(8), can be as follows:
which is a set of neighbors of elements of P*, as shown in Figure 4(8), can be as follows:
P** = { p*1(-1,-1) , p*1 = {, p*1
P**(0,-1) p*1(–1,–1)
(1,-1),, p*1
p*1(0,–1), ,p*1
(-1,0) p*1(1,–1) , p*1
, p*1
(0,0) , p*1
, p*1
(1,0) , . (1,0)
, p*1
(-1,1) . . ,, p*3
p*1(0,1) , p*3(1,1)(0,1)
(–1,1), …, p*3
}. , p*3(1,1) }. (4)
(–1,0) (0,0) (4)
An
An example
example of
of the
the P**
P** of
of aa route
route isisdemonstrated
demonstrated inin Figure
Figure 5(1),
5(1), where
where Figure
Figure 5(2)
5(2)
shows
showsrounding
roundingpoints
pointsininaazoom-in
zoom-inof ofthetheselected
selectedrectangle
rectanglearea
areaininFigure
Figure5(1).
5(1).

(1) (2)
Figure 5. Example rounding boxes of a bus route path: (1) a route path with a selected area; (2)
Figure 5. Example rounding boxes of a bus route path: (1) a route path with a selected area; (2)
rounding
roundingboxes
boxesof
ofthe
theselected
selectedarea
areainin(1).
(1).

Thus, the begin point, end point, and polyline of each path in Table 3 are calculated
Thus, the begin point, end point, and polyline of each path in Table 3 are calculated
via the rounding boxes and presented in Table 5. In this table, rounding boxes’ data
via the rounding boxes and presented in Table 5. In this table, rounding boxes’ data are
are presented by variables. For clarity, the begin point and the end point refer to the
presented by variables. For clarity, the begin point and the end point refer to the path_id
path_id with subfix “.B” and “.E.” For example, in the first entry, R7234.00.B**, R7234.00.E**,
with subfix “.B” and “.E.” For example, in the first entry, R7234.00.B**, R7234.00.E**, and
and R7234.00** are the sets of rounding boxes of the begin point, the end point, and the
R7234.00** are the sets of rounding boxes of the begin point, the end point, and the pol-
polyline, respectively.
yline, respectively.
Sustainability 2023, 15, x FOR PEER REVIEW 12 of 24
Sustainability 2023, 15, 5618 12 of 23

Table 5. Example of bus route polyline data with rounding boxes (a point name ending with two-
Table 5. Example of bus route polyline data with rounding boxes (a point name ending with
star symbols.)
two-star symbols.)
begin_point end_point polyline
route path_id path_type direction begin_point end_point polyline
route path_id path_type (Rounding
direction Boxes) (Rounding Boxes) (Rounding Boxes)
(Rounding Boxes) (Rounding Boxes) (Rounding Boxes)
R7234 R7234.00 main go R7234.00.B** R7234.00.E** R7234.00**
R7234
R7234 R7234.00
R7234.01 main
split go go R7234.00.B**
R7234.01.B** R7234.00.E**
R7234.01.E** R7234.00**
R7234.01**
R7234 R7234.01 split go R7234.01.B** R7234.01.E** R7234.01**
R7234 R7234.02 split back R7234.02.B** R7234.02.E** R7234.02**
R7234 R7234.02 split back R7234.02.B** R7234.02.E** R7234.02**
R7234
R7234 R7234.03
R7234.03 subsub go go R7234.03.B**
R7234.03.B** R7234.03.E**
R7234.03.E** R7234.03**
R7234.03**
R7234
R7234 R7234.04
R7234.04 subsub back back R7234.04.B**
R7234.04.B** R7234.04.E**
R7234.04.E** R7234.04**
R7234.04**
R8190
R8190 R8190.00
R8190.00 main
main go go R8190.00.B**
R8190.00.B** R8190.00.E**
R8190.00.E** R8190.00**
R8190.00**
R8190 R8190.01 main back R8190.01.B** R8190.01.E** R8190.01**
R8190 R8190.01 main back R8190.01.B** R8190.01.E** R8190.01**
... ... ... ... ... ... ...
… … … … … … …

3.3.2.
3.3.2.Trajectory
TrajectoryRoute
RouteMatching
Matching
The
The trajectory routematching
trajectory route matchingisisaamethod
methodto tocheck
checkwhether
whetheraaGPSGPSpoint
pointisison
on aa path.
path.
Since
Since it is unlikely that a coordinate point will be exactly on a path, the distancefrom
it is unlikely that a coordinate point will be exactly on a path, the distance fromthe
the
point
pointtotothetheperpendicular
perpendicularline lineononthe
thepath
pathsurface
surfaceisisgenerally
generallyconsidered,
considered,as
asshown
shownin in
Figure
Figure6(1,2).
6(1,2). For
For this
this vector
vector technique,
technique, aamaximum
maximum distance
distance should
should be
bedefined,
defined,andanditit
consumes
consumescalculation
calculationtime
timethat
thatisisnot
notappropriate
appropriatewith withaalarge
largeamount
amountofofdata.
data.Thus,
Thus,we we
decided to use the rounding boxes of a path for the trajectory route matching.
decided to use the rounding boxes of a path for the trajectory route matching. In this fig-In this figure,
b1 is a coordinate of a bus, where a path is a bus route path. Figure 6(3) shows that b1 is
ure, b1 is a coordinate of a bus, where a path is a bus route path. Figure 6(3) shows that
rounded into b*1. This location is on a path P if b*1 is an element of P**. The function to
b1 is rounded into b*1. This location is on a path P if b*1 is an element of P**. The function
detect a point on a route path (POR) is defined in the following equation, where b* is any
to detect a point on a route path (POR) is defined in the following equation, where b* is
point and P** is a set of rounding boxes in any path.
any point and P** is a set of rounding boxes in any path.
 ∗ ∗ ∗∗ ∗∗
𝑏 ∈b 𝑃∈ P
1, 1,
POR (b∗∗,,P𝑃∗∗
𝑃𝑂𝑅(𝑏 ∗∗ ):
) :=
= (5)
(5)
0, 0,
𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
otherwise

b1

(1)

d
b1

(2)

b*1

(3)
Figure 6. Steps of bus-route matching using GPS rounding boxes. (1) A location of a bus b1 closing to
Figure 6. Steps of bus-route matching using GPS rounding boxes. (1) A location of a bus b1 closing
atopolyline of aof
a polyline bus route.
a bus (2) The
route. distance
(2) The between
distance the bus
between theb1
busand
b1 the
andpolyline. (3) The
the polyline. (3)representation
The represen-
oftation
the rounding
of the rounding box of b1, which is b*1, on the neighbors of the rounding boxespolyline.
box of b1, which is b*1, on the neighbors of the rounding boxes of the of the pol-
yline.
Sustainability 2023, 15, x FOR PEER REVIEW 13 of 24

Sustainability 2023, 15, 5618 13 of 23

In addition, to detect a bus driving on a bus route path, we need to verify that most
of the
InGPS coordinates
addition, of aa bus
to detect bus belong
drivingtoonthe route
a bus path.
route Thewe
path, concept
need of
to trajectory
verify thatroute
most
matching is a key player for finding QoS scores in the next sections.
of the GPS coordinates of a bus belong to the route path. The concept of trajectory route
matching is a key player for finding QoS scores in the next sections.
3.4. Bus Trip Calculating
3.4. Bus Tripthe
When Calculating
rounding boxes of all paths constructed, in the next step, it is to detect bus
tripsWhen
and on-path driving.boxes
the rounding Theseofconcepts
all pathsare described in the following
constructed, next step, subsections.
it is to detect bus
trips and on-path driving. These concepts are described in the following subsections.
3.4.1. Bus Trip Detection
3.4.1. The
Bus concept
Trip Detection
is to detect when an individual bus transits from the begin point to the
end The concept
point. is to
The size of detect when an
the rounding individual
boxes area of bus transits
a point from100
is about the× begin
100 m,point to the
as shown
end point.7(1).
in Figure The The
size begin
of the point
rounding
and boxes areaare
end point of adefined
point isasabout 100 × 100 m, as shown in
follows:
Figure
- 7(1).
The Thepoint
begin beginispoint andwhen
detected end point
a busare defined
starts as follows:
moving out of the rounding boxes area
- The
of thebegin
beginpoint
point,isasdetected
shown inwhenFigure a 7(2).
bus At starts movingt1,out
timestamp of the
a bus rounding
is inside boxes
the round-
area
ing boxes area, while it moves out of the area at the timestamp t2. In this case, t1 the
of the begin point, as shown in Figure 7(2). At timestamp t1, a bus is inside is
rounding
stamped as boxes area,ofwhile
the time a busitatmoves out point
the begin of theR8190.00.B.
area at the timestamp t2. In this case,
- t1
Theis stamped
end pointasisthe time of
detected a busa at
when busthe begin
starts point R8190.00.B.
moving into the rounding boxes area of
- The endpoint,
the end point as
is shown
detected in when
Figurea7(3).
busAt starts movingt9,into
timestamp theisrounding
a bus boxes
entering the area of
rounding
the
boxesendarea,
point,
andas itshown
starts in Figure
inside the7(3).
areaAtat timestamp
timestampt9, a bus
t10. is entering
In this case, t10the rounding
is stamped
boxes
as the area, and
time of it starts
a bus at theinside the area
end point at timestamp t10. In this case, t10 is stamped
R8190.00.E.
as the time of a bus at the end point R8190.00.E.

begin
point
end
point
rounding
boxes
rounding
(1) boxes

t1
t2 t10

begin point t9
end point
R8190.00.B R8190.00.E

rounding boxes area (R8190.00.B**) rounding boxes area (R8190.00.B**)


100 × 100 m 100 × 100 m

(2) (3)

Figure7.
Figure 7. A
A method
method to
to detect
detect aa bus
busat
ataabegin
beginpoint
pointand
andananend
endpoint.
point.(1)(1)
The rounding
The roundingboxes of aof a
boxes
beginning point and an end point of a bus route path. (2) A timestamp t1 when a bus starts
beginning point and an end point of a bus route path. (2) A timestamp t1 when a bus starts moving mov-
ing out of a beginning rounding boxes area, which is represented by two-star symbols (3) A
out of a beginning rounding boxes area, which is represented by two-star symbols (3) A timestamp
timestamp t10 when a bus enters an end rounding boxes area.
t10 when a bus enters an end rounding boxes area.

If the sequence of the begin and end points of a bus, as shown in Figure 8(1), is
[R8190.00.B, R8190.00.E, R8190.00.B, R8190.00.B, R8190.00.E], the trips become [(R8190.00.B,
Sustainability 2023, 15, x FOR PEER REVIEW 14 of 24

Sustainability 2023, 15, 5618 14 of 23

If the sequence of the begin and end points of a bus, as shown in Figure 8(1), is
[R8190.00.B, R8190.00.E, R8190.00.B, R8190.00.B, R8190.00.E], the trips become
[(R8190.00.B,
R8190.00.E),R8190.00.E),
(R8190.00.B,(R8190.00.B, ?), (R8190.00.B,
?), (R8190.00.B, R8190.00.E)].
R8190.00.E)]. The first
The first pair and pair
the and
last the
pair
last pair contain
contain the beginthepoint
begin point
and the and
end the end
point ofpoint of path R8190.00,
path R8190.00, so they
so they are are considered
considered full trips.
However,
full the case the
trips. However, (R8190.00.B, ?), which?),does
case (R8190.00.B, notdoes
which havenot
anhave
end point,
an endispoint,
not considered
is not con-a
full trip.
sidered a full trip.

(1) R8190.00.B R8190.00.E R8190.00.B R8190.00.B R8190.00.E

full trip failed trip full trip

main sub sub main sub sub split split


(2) P.0.B P.2.B P.2.E P.0.E P.2.B P.2.E P.1.B P.1.E

trip1 (sub-path trip)

trip2 trip3 trip4


(main-path trip) (sub-path trip) (split-path trip)
Exampletrip
Figure8.8.Example
Figure tripdetection
detection from
from the
the sequence
sequence of of begin
begin points
points and
and end
end points.
points. (1) (1) A chain
A chain of of
tripsofofan
trips anindividual
individualbuses
busesincluding
includingfull
fulltrips
tripsand
andaafailed
failedtrip.
trip. (2)
(2)AAchain
chainof
oftrips
trips of
of an
an individ-
individual
ual
busbus having
having subsub
triptrip
in aintrip.
a trip.

InInaacase
casewhere
whereaaroute
routehashasmain
mainpaths,
paths,split
splitpaths,
paths,andandsubpaths,
subpaths,the themain
mainpathpathisis
consideredthe
considered thehighest
highestpriority,
priority,while
whilethe
thesplit
splitpath
pathandandthethesubpath
subpathare areinindescending
descending
orderof
order ofimportance.
importance. As As shown
shown in in Figure
Figure8(2);
8(2);P.0,
P.0,P.1, and
P.1, andP.2P.2
areare
a main path,
a main a split
path, path,
a split
and a subpath; and the sequence of a bus is [P.0.B, P.2.B, P.2.E, P.0.E, P.2.B,
path, and a subpath; and the sequence of a bus is [P.0.B, P.2.B, P.2.E, P.0.E, P.2.B, P.2.E, P.2.E, P.1.B, P.1.E].
The trip
P.1.B, is considered
P.1.E]. [(P.0.B, (P.2.B,
The trip is considered P.2.E),
[(P.0.B, P.0.E),
(P.2.B, (P.2.B,
P.2.E), P.2.E),
P.0.E), (P.1.B,
(P.2.B, P.1.E)(P.1.B,
P.2.E), ], where the
P.1.E)
first subpath trip (P.2.B, P.2.E) is inside the main path trip, so it is ignored due
], where the first subpath trip (P.2.B, P.2.E) is inside the main path trip, so it is ignored due to the main
topath
the having
main pathhigher priority
having higherthan the subpath.
priority than the Insubpath.
this case,Inthere
this are three
case, trips,
there are (P.0.B, P.0.E),
three trips,
(P.2.B, P.2.E), and (P.1.B, P.1.E).
(P.0.B, P.0.E), (P.2.B, P.2.E), and (P.1.B, P.1.E).
Thetrip
The tripcalculation
calculation results
results are
are given
givenininTable
Table6.6.InInthe
thetable,
table,the columns
the columns areare
as follows:
as fol-
-
lows: index: an index, which is a running number, of each entry.
- - index:
bid: aan
bus identifier.
index, which is a running number, of each entry.
- - path_id: a path identifier.
bid: a bus identifier.
- begin_ts: a begin timestamp when a bus starts moving out from a begin point’s
- path_id: a path identifier.
rounding boxes area.
- begin_ts: a begin timestamp when a bus starts moving out from a begin point’s
- end_ts: an end timestamp when a bus starts moving into an end point’s rounding
rounding boxes area.
boxes area.
- end_ts: an end timestamp when a bus starts moving into an end point’s rounding
- is_full_trip: to check if a trip is a full trip, where 1 is a full trip, otherwise 0.
boxes area.
- on_path: a measurement of a bus driving on a route path. It uses a Jaccard index,
- is_full_trip: to check if a trip is a full trip, where 1 is a full trip, otherwise 0.
which will be described in the next subsection.
- on_path: a measurement of a bus driving on a route path. It uses a Jaccard index,
which will be described in the next subsection.

Table 6. Example trips from the method trip detection.

Index bid path_id begin_ts end_ts is_full_trip on_path


1 4d43e028 R8190.00 2021-10-01 10:10:00 2021-10-01 12:12:00 1 0.85
Sustainability 2023, 15, 5618 15 of 23
Sustainability 2023, 15, x FOR PEER REVIEW 15 of 24

Table 6. Example trips from the method trip detection.


2 f03235d3 R8190.00 2021-10-01 10:40:00 2021-10-01 12:41:00 1 0.85
Index bid path_id begin_ts end_ts is_full_trip on_path
3 12ec22a7 R8190.00 2021-10-01 11:05:00 2021-10-01 13:07:00 0 0.50
1 4d43e028 R8190.00 2021-10-01 10:10:00 2021-10-01 12:12:00 1 0.85
4 23731bd3 R8190.00 2021-10-01 11:20:00 2021-10-01 13:22:00 1 0.95
2 f03235d3 R8190.00 2021-10-01 10:40:00 2021-10-01 12:41:00 1 0.85
5 3 512e06ff 12ec22a7 R8190.00
R8190.00 2021-10-01 11:25:00
2021-10-01 11:05:00 2021-10-01 13:28:00
2021-10-01 13:07:00 0 1 0.90
0.50
6 4 0a4fd2f5 23731bd3 R8190.00
R8190.00 2021-10-01 11:50:00
2021-10-01 11:20:00 2021-10-0113:22:00
2021-10-01 13:52:00 1 0 0.60
0.95
7 5 1b43575e 512e06ff R8190.00
R8190.00 2021-10-01
2021-10-01 11:25:00
12:15:00 2021-10-01 13:28:00
2021-10-01 14:14:00 1 1 0.90
0.85
6 0a4fd2f5 R8190.00 2021-10-01 11:50:00 2021-10-01 13:52:00 0 0.60
8 512e06ff R8190.00 2021-10-01 13:50:00 2021-10-01 15:56:00 1 0.70
7 1b43575e R8190.00 2021-10-01 12:15:00 2021-10-01 14:14:00 1 0.85
9 8 076fde6b 512e06ff R8190.00
R8190.00 2021-10-01 15:40:00
2021-10-01 13:50:00 2021-10-0115:56:00
2021-10-01 17:49:00 1 1 0.90
0.70
10 9 12ec22a7 076fde6b R8190.00
R8190.00 2021-10-01 16:05:00
2021-10-01 15:40:00 2021-10-0117:49:00
2021-10-01 18:06:00 1 1 0.95
0.90
11 10 23731bd3 12ec22a7 R8190.00
R8190.00 2021-10-01
2021-10-01 16:05:00
16:35:00 2021-10-01
2021-10-0118:06:00
18:33:00 1 0 0.95
0.70
12 11 23731bd3
4d43e028 R8190.00
R8190.00 2021-10-01
2021-10-01 16:35:00
17:20:00 2021-10-01 18:33:00
2021-10-01 19:21:00 0 1 0.70
0.95
12 4d43e028 R8190.00 2021-10-01 17:20:00 2021-10-01 19:21:00 1 0.95
13 13 23731bd3 23731bd3 R8190.00
R8190.00 2021-10-01 17:50:00
2021-10-01 17:50:00 2021-10-0119:52:00
2021-10-01 19:52:00 1 1 0.90
0.90
14 14 f03235d3 f03235d3 R8190.00
R8190.00 2021-10-01 18:20:00
2021-10-01 18:20:00 2021-10-01 20:27:00
2021-10-01 20:27:00 1 1 0.85
0.85

The first row in the table indicates that the trip was made by bus “4d43e028” on path
The first row in the table indicates that the trip was made by bus “4d43e028” on path
R8190.00, which is the main path of route R8190, between 10:10 and 12:12 on 1 October
R8190.00, which is the main path of route R8190, between 10:10 and 12:12 on 1 October
2022, and was a full trip. In addition, some trips, such as 3, 6, and 11, were considered
2022, and was a full trip. In addition, some trips, such as 3, 6, and 11, were considered
failed trips, because they did not pass through the end points of their paths.
failed trips, because they did not pass through the end points of their paths.
3.4.2.
3.4.2.On-Path
On-PathDriving
DrivingDetection
Detection
When
When a trip is detected,ananon-path
a trip is detected, on-path driving detection
driving is is
detection also calculated.
also TheThe
calculated. calcula-
calcu-
tion needs to follow the GPS data of each trip point by point to check the distance
lation needs to follow the GPS data of each trip point by point to check the distance on on
a
route path and the distance outside of the route path. To do this, a true-positive,
a route path and the distance outside of the route path. To do this, a true-positive, false- false-
positive,
positive,and
andfalse-negative
false-negative are
are verified, as demonstrated
verified, as demonstratedininFigure
Figure9,9,andandthe
theJaccard
Jaccardindex
in-
dex is determined.
is determined.
-- True-positive
True-positive(TP):
(TP):the
thedistance
distanceofofaabus
busdriving
drivingon
onaaroute
routepath.
path.
-- False-positive
False-positive(FP):
(FP):the
thedistance
distanceofofaabus
busdriving
drivingoutside
outsideofofaaroute
routepath.
path.
-- False-negative
False-negative(FN):
(FN):the
thedistance
distanceofofaaroute
routepath
pathwithout
withoutaabusbusdriving
drivingon
onit.it.

8 km
(A) F P= (C)
begin
point (B) (D)
end
TP = 5 km TP = 5
km FN =5 km point

actual route path driving path

ExampleGPS
Figure9.9.Example
Figure GPStracks
tracksofofaabus
buson
onaabus
busroute
routepath
pathwhere
whereA–D
A–Dare
arepoints
pointsof
ofits
itspolyline.
polyline.

After that, the Jaccard index is calculated as in the following equation. As shown in
After that, the Jaccard index is calculated as in the following equation. As shown in
Figure 9, TP is 10 (from 5 + 5), FP is 8, and FN is 5, so the Jaccard calculated by 10/(10 + 8 + 5)
Figure 9, TP is 10 (from 5 + 5), FP is 8, and FN is 5, so the Jaccard calculated by 10/(10 + 8
is 0.43 or 43%. The maximum is 1 and the minimum is 0. An example result of Jaccard
+ 5) is 0.43 or 43%. The maximum is 1 and the minimum is 0. An example result of Jaccard
calculation is shown in the column on_path of Table 5.
calculation is shown in the column on_path of Table 5.

Jaccard 𝑇𝑃TP
𝐽𝑎𝑐𝑐𝑎𝑟𝑑 = = TP + FP + FN (6)
(6)
𝑇𝑃 + 𝐹𝑃 + 𝐹𝑁
Thisstep
This stepisisalso
alsoused
usedtotosupport
supportthe
the data
data validation.
validation.Attributes
Attributeson_path
on_pathandandtravel
travel
time, which
time, which is the difference
difference between end_ts and begin_ts, calculated from Table 6 used
between end_ts and begin_ts, calculated from Table 6 are are
to define
used outliner
to define data.data.
outliner A small valuevalue
A small of theofon_path, such such
the on_path, as a number lowerlower
as a number than 0.3,
thanis
assumed
0.3, that athat
is assumed busatrip
buswas
trip not
wasperforming its normal
not performing duties,
its normal so that
duties, trip trip
so that is eliminated
is elimi-
nated from the evaluation of QoS. In addition, the outliners of the travel time are detected
Sustainability 2023, 15, 5618 16 of 23

from the evaluation of QoS. In addition, the outliners of the travel time are detected using
the interquartile range (IQR) method [21,22]. Thus, any trip having different travel time
than the normal travel time of a given route path is also considered to exclude from the
assessment of QoS.

3.5. QoS-1 Score: Tracking Complete Trips


QoS-1 is the score that evaluates the complete trip; in this case, any conditions in
Table 4 are applied to the trip data in Table 6. Table 6 includes trips of the path R8190.00, so
the condition type “all_trip” of this path, C0013, is applied. This means that the number of
trips of path R8190.00 should be 12. QoS-1 is calculated via Equation (7). As the full trips of
the path R8190.00 on 1 October 2021 are counted as 11, the QoS-1 score of the path R8190.00
is max(11,12)/12, which is 0.92.

max(num_ f ull_trips, all_trips)


QoS − 1 Score = (7)
all_trips

After all paths are calculated, the QoS-1 scores of each route are the weighted average
of all paths of that route. For example, the QoS-1 of the route R8190 on 1 October 2021 is
shown in Table 7.

Table 7. Example of three QoS scores of the route R8190 on 1 October 2021.

Route Date QoS_1 QoS_2 QoS_3


R8190 2021-10-01 0.92 0.83 0.70

3.6. QoS-2 Score: Bus On-Path Driving Tracking


Next, the QoS-2 score is calculated by finding the ratio between the number of on-path
trips and all trips. The on-path trip is a trip that has the on_path value greater than a
specific criterion. Our work chooses 0.85 as a criterion, so, there are 10 on-path trips from
Table 6. As well as the on-path trip, all trips are the condition type “all_trip” of a path, as
discussed in the QoS-1 score, so all trips of the path R8190.00 is 12. The equation to calculate
the QoS-2 score is as follows, where the num_on_path_trips is the number of on-path trips:

max(num_on_path_trips, all_trips)
QoS − 2 Score = . (8)
all_trips

In this case, the QoS-2 score of R8190.00 from the example data in Tables 4 and 5 is
max(10, 12)/12, or 0.83. This score of a given day is recorded in Table 7.

3.7. QoS-3 Score: Bus On-Schedule Operation Tracking


Lastly, the QoS-3 score is evaluated using condition data in Table 4 and trip data in
Table 6. The first step is to select trips from a path and begin time that satisfy the given
conditions. Next, the conditions “count” and “headway” are used, and for each condition
the steps in the flowchart in Figure 10 are performed.
In case of a condition type being “count,” the a ratio between max(n, N) and N is
calculated, where n is the number of full trips, and N is the number of possible trips
satisfying the condition. According to condition C0014 in Table 4, five trips are needed
between 11:00 and 12:00, so N is 5. To apply this condition, indices 3–6 of Table 6 are
selected, and the number of trips is 4, so n is 4. Thus, the score of the condition C0014 is
4/5, or 0.8.
Sustainability 2023,15,
Sustainability2023, 15,5618
x FOR PEER REVIEW 17 of
17 of 23
24

begin

select trips of the given route


if trip’s begin_ts between condition’s begin_time and end _time

check
con_type
con_type = “count” con_type = “headway”

N = the number of trips from the condition N = the number of possible trips accroding the
headway from the condtion

n = the nunber selected trips


trip = first trip

n = 1 if the difference between trip’s begin_ts and


condition’s begin_time is acceptable

ex_begin_time = trip’s begin_ts

trip = next trip

n = n + 1 if the difference between trip’s begin_ts


and ex_begin_time is acceptable

is the last no
trip?
yes
score = max(n, N) / N

store the score value

end

Figure10.
Figure 10. Flowchart
Flowchart for
for calculating
calculating the
theQoS-3
QoS-3score.
score.

In
Inaddition,
case of awhen the condition
condition type beingtype“count,”
is “headway,” a ratio
the a ratio score is max(n,
between calculated
N) the
andsame
N is
as for the previous
calculated, where ncondition.
is the number of full n
However, is the
trips, andnumber
N is the ofnumber
trips satisfying thetrips
of possible headway
satis-
condition. AccordingAccording
fying the condition. to condition C0015 in Table
to condition C00144,inthe headway
Table between
4, five trips 16:00 and
are needed 18:00
between
is 30 min, so the first trip must be at 16:00 and the next trips take 30 min
11:00 and 12:00, so N is 5. To apply this condition, indices 3–6 of Table 6 are selected, and each, until 18:00.
This means that
the number this condition
of trips is 4, so n isrequires
4. Thus, five scoresoofNthe
thetrips, is 5.condition
In this case, a developer
C0014 is 4/5, or can
0.8. add
someIn error such as ± 5 min. Based on the time of this condition,
addition, when the condition type is “headway,” a ratio score is calculated indices 10–13 of Tablethe
6
are selected.
same as for the previous condition. However, n is the number of trips satisfying the head-
-way Atcondition.
index 10, According to condition
the begin_time C0015
is 16:05, whichin Table 4, the
satisfies theheadway
condition between 16:00error
including and
18:00times. Thus,son the
is 30 min, is 1,first
andtrip
ex_begin_time
must be at 16:00is 16:05.
and the next trips take 30 min each, until
-18:00.
AtThis
index 11, the
means thatbegin_time is 16:35.
this condition It differs
requires from so
five trips, theNex_begin_time about
is 5. In this case, 30 min,
a developer
so n some
can add becomeserror2, such
and ex_begin_time
as ±5 min. Based becomes
on the16:35.
time of this condition, indices 10–13 of
-TableAt 6 index 12, the begin_time is 17:20. It differs from the ex_begin_time about 45 min,
are selected.
- so this trip
At index 10, is failed. In this case,
the begin_time n is still
is 16:05, 2, andsatisfies
which ex_begin_time changes
the condition into 17:20.
including error
- At index 13, the begin_time is 17:50, and
times. Thus, n is 1, and ex_begin_time is 16:05. it differs from the previous one about 30 min.
- Thus,
At indexn becomes 3.
11, the begin_time is 16:35. It differs from the ex_begin_time about 30 min,
Since n is 3 and2,Nand
so n becomes is 5,ex_begin_time
the score of this condition
becomes is 3/5 or 0.6. At the end, the average
16:35.
score
- of all conditions, C0014 and C0015, is 0.7. Thus,
At index 12, the begin_time is 17:20. It differs from the the
QoS-3 score of 0.7 is
ex_begin_time as recorded
about 45 min,
in Table 7. trip is failed. In this case, n is still 2, and ex_begin_time changes into 17:20.
so this
- At index 13, the begin_time is 17:50, and it differs from the previous one about 30
4. Results
min. Thus, n becomes 3.
4.1. Result of Bus QoS scores
Since n is 3 and N is 5, the score of this condition is 3/5 or 0.6. At the end, the average
The GPS transaction dataset of buses between 1 October 2021 and 31 December 2021
score of all conditions, C0014 and C0015, is 0.7. Thus, the QoS-3 score of 0.7 is as recorded
was analyzed. There were 709,182,747 transactions in total, including 454 bus routes and
in Table 7.
4418 buses. The route numbers were masked due to privacy constraints—for example,
Sustainability 2023, 15, 5618 18 of 23

R7234, R7731, R8196, R8630, etc. After calculating with our approach from the previous
section, the daily results of QoS-1, QoS-2, and QoS-3 were as given in Table 8. The table
demonstrates examples of 12 entries from the actual 92 entries of route R7234. After that,
the QoS scores of each route were grouped by month and reported in Table 9. In addition,
the report from Table 9 can be visualized into charts as in Figure 11. There are three charts
reporting QoS-1, 2, and 3, and each is grouped by a bus route, where every group displays
a QoS score ordered by month.

Table 8. Daily QoS scores of the route R8155 in the 4th quarter of 2021.

Route Date QoS_1 QoS_2 QoS_3


R7234 2021-10-01 0.83 0.85 0.72
R7234 2021-10-02 0.78 0.73 0.76
R7234 2021-10-03 0.77 0.80 0.83
R7234 2021-10-04 0.83 0.86 0.68
R7234 2021-10-05 0.81 0.74 0.73
R7234 2021-10-06 0.92 0.75 0.82
R7234 2021-10-07 0.84 0.87 0.68
R7234 2021-10-08 0.83 0.76 0.80
... ... ... ...
R7234 2021-12-28 0.77 0.91 0.81
R7234 2021-12-29 0.75 0.83 0.67
R7234 2021-12-30 0.76 0.74 0.82
R7234 2021-12-31 0.64 0.82 0.83

Table 9. Monthly QoS scores of various routes for the 4th quarter of 2021.

Route Month QoS_1 QoS_2 QoS_3


R7234 2021-10 0.86 0.83 0.73
2021-11 0.75 0.94 0.81
2021-12 0.69 0.87 0.75
R7731 2021-10 0.71 0.89 0.72
2021-11 0.65 0.93 0.85
2021-12 0.73 0.91 0.74
R8196 2021-10 0.69 0.88 0.76
2021-11 0.87 0.90 0.91
2021-12 0.68 0.84 0.67
R8630 2021-10 0.73 0.87 0.75
2021-11 0.75 0.83 0.89
2021-12 0.71 0.83 0.72

In addition, histograms have been generated to summary QoS scores in detail, as


depicted in Figure 12. The x axis is QoS scores from 0 to 100, and the y axis is the number
of city bus routes having a particular score. As in the figure, most bus routes have scores
close to 100, while a small number of routes have lower scores. In order to make the data
more understandable, we graded each route by level: high, medium, low, and lower, as
reported in Table 10. The table contains the rating labels, rating range, and number of city
bus routes with three QoS scores for each rate.
0.4
0.2
0

2021-10

2021-11

2021-12

2021-10

2021-11

2021-12

2021-10

2021-11

2021-12

2021-10

2021-11

2021-12
R7234 R7731 R8196 R8630
R7234 R7731 R8196 R8630
Sustainability 2023,15,
Sustainability2023, 15,5618
x FOR PEER REVIEW 19 of 24 19 of 23
1
0.8
0.6
QoS-2 Score 1 0.4
0.8 0.2
0.6
QoS-1 Score 0

2021-10

2021-12 2021-11

2021-10 2021-12

2021-11 2021-10

2021-11

2021-12

2021-10

2021-11

2021-12

2021-10

2021-11

2021-12
0.4
0.2
0
R7234 R7731 R8196 R8630

2021-10

2021-11

2021-12

2021-10

2021-11

2021-12

2021-10

2021-11

2021-12
R7234 R7731 R8196 R8630

1 R7234 R7731 R8196 R8630


R7234 R7731 R8196 R8630
0.8
1 0.6
QoS-3 Score 0.8 0.4
0.6 0.2
QoS-2 Score 0.4 0

2021-10

2021-11

2021-12

2021-10

2021-11

2021-12

2021-10

2021-11

2021-12

2021-10

2021-11

2021-12
0.2
0

2021-10

2021-11

2021-12

2021-10

2021-11

2021-12

2021-10

2021-11

2021-12

2021-10

2021-11

2021-12
R7234 R7731 R8196 R8630
R7234 R7731 R8196 R8630
R7234 R7731 R8196 R8630
R7234 R7731 R8196 R8630
Figure 11. Charts of monthly QoS scores.
1
0.8

QoS-3InScore
addition,
0.6 histograms have been generated to summary QoS scores in detail, as de-
0.4
picted in Figure 0.2
12. The x axis is QoS scores from 0 to 100, and the y axis is the number of
city bus routes 0having a particular score. As in the figure, most bus routes have scores
2021-10

2021-11

2021-12

2021-10

2021-11

2021-12

2021-10

2021-11

2021-12

2021-10

2021-11

2021-12
close to 100, while a small number of routes have lower scores. In order to make the data
more understandable, we graded
R7234
R7234
each route
R7731
R7731
by level: high,
R8196
R8196
R8630 medium, low, and lower, as
R8630
reported in Table 10. The table contains the rating labels, rating range, and number of city
Figure
Figure
bus 11.11.
Charts
routes of monthly
Charts
with of monthly
three QoS scores
QoS scores.
QoS scores.
for each rate.
In addition, histograms have been generated to summary QoS scores in detail, as de-
picted QoS-1 QoS-2
in Figure 12. The x axis is QoS scores from 0 to 100, and the y axisQoS-3
is the number of
city bus routes having a particular score. As in the figure, most bus routes have scores
close to 100, while a small number of routes have lower scores. In order to make the data
more understandable, we graded each route by level: high, medium, low, and lower, as
reported in Table 10. The table contains the rating labels, rating range, and number of city
all scores bus routes with three QoS scores for each rate.

QoS-1 QoS-2 QoS-3

all scores

scores < 80

Figure
Figure 12.
12. Histograms
Histograms of
of QoS
QoS scores.
scores. Each
Each column
column is
is the
the QoS
QoS score;
score; the
the first
first row
row shows
shows histograms
histograms
scores < 80
of all scores, and the second row displays histograms of scores below 80.
of all scores, and the second row displays histograms of scores below 80.

Table 10. Number of city bus routes having each rating level of QoS scores.

Figure 12. Histograms of QoS scores.


Rating Number
Score Each column is the QoS score; of City
the first Bus Routes
row shows histograms
of all scores,
Labeland the second Range
row displays histograms of
QoS_1scores below 80. QoS_2 QoS_3
High 90–100 313 315 417
Medium 80–90 10 37 14
Low 60–80 40 66 9
Lower 0–60 91 36 14
Sustainability 2023, 15, 5618 20 of 23

4.2. Discussion
The measurement of QoS of public city bus transportation is an early step in the
improvement of smart mobility since it helps one to understand the current situation. There
are many factors involved in the assessment, such as accessibility, availability, comfort,
customer satisfaction, reliability, safety, security, etc. [2–5]. These metrics are generally
evaluated by the user survey method [2–4], because users are the direct service consumers
and this method can reflect user expectations in a straightforward way. As we are in the
era of data utilization, data analytics supports the analysis of certain factors, in addition
to the survey method [6,8]. Some studies have attempted to use GPS data analytics for
transportation, e.g., for assessing the travel time, travel time variability, waiting time, or
transfer time of buses [7,9]. This is advantageous evidence of the use of data for determining
the QoS of transportation, especially bus services. Since several studies have addressed the
transportation-related issues mentioned above, this study is an extension of the analysis
of GPS data to measure the efficiency of bus services in terms of accessibility, availability,
and reliability. Thus, we aimed to measure the QoS of public city bus transportation in
Bangkok by analyzing the GPS data of buses, route data, and schedule conditions. We used
three QoS scoring functions to determine complete trips, on-path driving, and on-schedule
operations, tracking the conditions of each bus route. The results are reported in Section 4.1;
we found that most of the bus routes received high scores. In this discussion, we organize
our contribution into two parts: our approach, and smart city management.
First, the contribution of the proposed approach is to derive the quality of service
of bus transportation by data analytics. As mentioned in the introduction, it would be
convenient if there were data from wireless sensors at each bus stop to detect the bus arrival
time [15,16]. However, without wireless sensor data, it was necessary to use GPS and spatial
data. For the datasets that we have, we found four challenging issues: that were no arrival
data at any bus stops, one bus route had many paths, a bus could choose any path under
the same route, and there was no exact departure time in timetables. Therefore, the GPS
coordinates rounding box was adopted for path matching [17–20]. It rasterizes a vector of a
polyline into a set of grids, which are indices of a path. Although this technique requires
some memory, it involves little computational processing, and is capable of working with a
large amount of data, such as voluminous GPS transaction coordinates. To match a path, it
finds a trip of a bus with a path type and a direction, so we could detect incomplete trips,
as demonstrated in Figures 7 and 8. Another advantage of using rounding boxes is that it
is simple to detect a bus driving along a route, as shown in Figure 9. Moreover, working
with a condition table and the algorithm in Figure 10, we could correct the frequency
and headway of each bus route path. For all of these steps, the rounding box technique
is a key player that preprocesses the raw data into bus trips and serves all QoS scoring
functions. The results of our work demonstrate the use of data analytics to monitor QoS,
in addition to surveys, as other works have demonstrated. There are more criteria that
data analytics can support, such as driving safety, travel time, bus stop proximity, other
mode connections, etc.; however, this requires much more data, such as bus stop locations
and the coordinates of other modes, which are useful for future research. In addition, the
survey method from [2,3,5,6] is still needed because some qualitative results, such as user
satisfaction, on-board safety, appropriate fare, driver’s ability, and ticket availability are
difficult to measure by data analytics.
Second, our contribution to smart city management was to use data to improve the
QoS. Our work focused on public city bus transportation because buses are commonly
used in any city, such as Bangkok, Thailand. Our data analytics contributes to the research
on transport quality in terms of reliability, accessibility, and availability.
Reliability. The reliability is one aspect contributing to user satisfaction [23]. This
factor can refer to an ability to carriage passengers from a starting point to an end point [24].
The reliability assessed in this work is the ability of buses to perform their intended trip
from an origin to a destination along a route path under specified conditions for a given
period without failure. This factor is measured by QoS-1, which is for compete trip tracking.
Sustainability 2023, 15, 5618 21 of 23

This metric will ensure that bus providers provide enough buses to offer the number of
complete trips that they have committed to. A low score means that the bus operator
cannot provide enough buses to complete the agreed number of trips, so the operator must
prepare more vehicles; otherwise, it may negatively affect the use of this bus route in the
future. The results in Table 10 show that more than 300 bus routes achieved a high rating,
while about 130 needed significant improvement.
Accessibility. The term “accessibility” generally refers to the ability to transfer people
from an origin to a destination [25]. This measurement approach is primarily from the
perspective of user demand and can be viewed as the coverage of transportation system
against the needs of people and user satisfaction [26]. The evaluation in a user-centric mode
is possible by the user survey method [2–4], and by data analytics from individual trip data
such as inferring the mobility of people from their bus smart card payment transactions to
evaluate the supply of public bus transport. In our work, there are data from the supply
side only. The information contains the routes that operators take as concessions from
the government authority and conditions for running buses on each route path that the
operators have committed to. In this work, we excluded how the route meets the user
demand; nevertheless, we were able to evaluate how buses drive along the promised route
paths. Since QoS-1 measures complete trips, a bus may go off route to achieve the fastest
trip between a begin point and an end point in order to increase the QoS-1 score. This
results in a bus not stopping at every location on the route, and is considered a violation of
the regulations of the city bus transportation. Thus, QoS-2, for bus on-path driving tracking,
was introduced to confirm that a bus driver follows the whole route path. A high score
means that a trip had less off-route time and covered the whole path. As per our analysis,
there were about 300 bus routes rating highly, whereas for about 100 the operator must
enforce stricter guidelines with the drivers in order to increase the QoS-2.
Availability. The availability of for public transportation refers to the ability to provide
services covering the demands of travels from passengers. It can be viewed that having
a bus service in accordance with the schedule is a part of the term availability [27–29]. In
this case, work interprets the availability in terms of the regularity of bus operation by
QoS-3, which is for bus on-schedule operation tracking. Even if a bus line has completed
the number of trips specified and did not go off route, it cannot be guaranteed that all
buses will operate regularly. According to the frequency and headway of the bus operation
agreed upon by the operator, each bus line must operate as promised. A failed condition
leads to a lower QoS-3 score. A high score allows users the confidence to use the bus
according to their demands. The results in Table 10 indicate that most bus routes were
reliable in terms of on-schedule operation. Compared to the previous QoS scores, not many
bus routes needed improvement in QoS-3. If we take a closer look at the analytical results,
we see that many bus routes operated more trips than promised. This situation is beneficial
for users, and causes a higher QoS-3 score as a by-product. However, this metric can be
enhanced to evaluate the waiting time at each bus stop. In this case, an individual timetable
is required for every bus stop.
Our proposed method for scoring the QoS of bus transportation is evidence in support
of having policies to enhance smart mobility. Policy makers need to consider the data
carefully, because policies that benefit some service consumers may adversely affect other
groups of people [10]. We have primarily presented the analysis of GPS data from the
supply side, without taking demand-side data into consideration. In the future, when there
are data on people’s need for trips in Bangkok, not just acquired through the survey method,
such as transactions from all-in-one smart cards for public transportation [9], location data
from smartphones [25], etc., we may be able to glean more insights from both the demand
side and the supply side to optimize bus route networks [30] and schedules [31]. In this
event, policies about smart card and privacy data must be put into place.
To this end, our work demonstrates the power of having quality GPS data and spatial
data that enable policy makers to bring about positive changes in a city. We can say that our
Sustainability 2023, 15, 5618 22 of 23

contribution encourages the sustainability of public city bus transportation and, as such,
can be a part of better living in the future.

5. Conclusions
This work introduces an approach to the measurement of the quality of service (QoS) of
public city bus transportation in Bangkok in terms of reliability, accessibility, and availability,
using global positioning system (GPS) data analytics. There were three QoS scoring
functions: QoS-1 for complete trip tracking, QoS-2 for bus on-path driving tracking, and
QoS-3 for bus on-schedule operation tracking. The analytical process had four phases:
input, preprocessing, scoring, and output. Input data were GPS transactions of buses from
the last quarter of 2021; route data containing polylines of all route paths of city buses
in Bangkok and its metropolitan area; and schedule conditions of each route path. The
challenges involved in this study were no bus arrival timestamp at each bus stop, one
route having many paths, no fixed path of buses on the same route, and no departure time
being given in the schedule. Thus, we had to detect the trips on each route by analyzing
GPS trajectory data and path polylines. In this case, GPS coordinates rounding became
an important technique of the preprocessing phase. In the next phase, scoring, when trips
and their metadata were detected, the three QoS scoring functions were executed and gave
results as scores in the output phase. The analytical results of all routes showed that most
bus routes have high scores; however, some bus routes need to be improved due to low
scores. Thus, the contribution of our work was to demonstrate the feasibility of using data
analytics to measure the QoS of bus transportation, in addition to using a survey method.
This is one of the tasks that can contribute to the sustainability of smart cities.
Due to this work focusing on the analytics of bus tracking data from the supply side, in
the future, there needs to be more data, such as individual payment transactions for public
transportation and individual journey data from smartphones, to improve QoS methods
against the demand side.

Author Contributions: Conceptualization, R.C., A.S. and T.T.; Methodology, R.C. and T.T.; Formal
analysis, R.C.; Resources, A.S.; Data curation, T.T.; Writing—original draft, R.C.; Writing—review &
editing, T.T.; Visualization, R.C.; Supervision, A.S.; Project administration, A.S. All authors have read
and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Hansson, J.; Pettersson, F.; Svensson, H.; Wretstrand, A. Preferences in regional public transport: A literature review. Eur. Transp.
Res. Rev. 2019, 11, 1–16.
2. Wethyavivorn, P.; Sukwattanakorn, N. Problems and barriers affecting sustainable commuting: Case study of people’s daily
commute to Kasetsart University, Bangkok, Thailand. IOP Conf. Ser. Earth Environ. Sci. 2019, 329, 012011.
3. Ueasangkomsate, P. Service quality of public road passenger transport in Thailand. Kasetsart J. Soc. Sci. 2019, 40, 74–81.
4. Chan, W.; Ibrahim, W.W.; Lo, M.; Suaidi, M.; Ha, S. Sustainability of public transportation: An examination of user behavior to
real-time GPS tracking application. Sustainability 2020, 12, 9541.
5. Page, S.; Yue, G.G. Transportation and tourism: A symbiotic relationship? In The SAGE Handbook of Tourism Studies; Sage
Publications: Thousand Oaks, CA, USA, 2009; pp. 371–395.
6. Goyal, S.; Agarwal, S.; Singh, N.S.S.; Mathur, T.; Mathur, N. Analysis of Hybrid MCDM Methods for the Performance Assessment
and Ranking Public Transport Sector: A Case Study. Sustainability 2022, 14, 15110.
7. Mazloumi, E.; Currie, G.; Rose, G. Using GPS data to gain insight into public transport travel time variability. J. Transp. Eng. 2010,
136, 623–631. [CrossRef]
8. Shen, L.; Stopher, P.R. Review of GPS travel survey and GPS data-processing methods. Transp. Rev. 2014, 34, 316–334.
Sustainability 2023, 15, 5618 23 of 23

9. Gschwender, A.; Munizaga, M.; Simonetti, C. Using smart card and GPS data for policy and planning: The case of Transantiago.
Res. Transp. Econ. 2016, 59, 242–249. [CrossRef]
10. Liu, Q.; Liu, Z.; Kang, T.; Zhu, L.; Zhao, P. Transport inequities through the lens of environmental racism: Rural-urban migrants
under Covid-19. Transp. Policy 2022, 122, 26–38.
11. Chawuthai, R.; Pruekwangkhao, K.; Threepak, T. Spatial-Temporal Traffic Speed Prediction on Thailand Roads. In Proceedings of
the 7th International Conference on Engineering, Applied Sciences and Technology, Pattaya, Thailand, 1–3 April 2021; pp. 58–62.
12. Chawuthai, R.; Chankaew, N.; Threepak, T. A Hybrid Method for Predicting a Potential Next Rest Stop of Commercial Vehicles.
Transp. Res. Procedia 2018, 34, 36–43. [CrossRef]
13. Chawuthai, R.; Ainthong, N.; Intarawart, S.; Boonyanaet, N.; Sumalee, A. Travel Time Prediction on Long-Distance Road
Segments in Thailand. Appl. Sci. 2022, 12, 5681.
14. Chawuthai, R. Monitoring roadway lights and pavement defects for nighttime street safety assessment by sensor data analysis
and visualization. Sens. Mater. 2018, 30, 2267–2279. [CrossRef]
15. SL, A.H.; Samsudeen, S.N. Real time bus tracking and scheduling system using wireless sensor and mobile technology. J. Inf. Syst.
Inf. Technol. 2016, 1, 18–23.
16. Kamble, P.A.; Vatti, R.A. Bus tracking and monitoring using RFID. In Proceedings of the 2017 Fourth International Conference on
Image Information Processing, Shimla, India, 21–23 December 2017; pp. 1–6.
17. Huang, S.-H.; Lin, C.-S. Rapid Route Comparison Based on GPS Coordinates and Bounding Boxes. J. Traffic Logist. Eng. 2019, 7,
5–9. [CrossRef]
18. Elevelt, A.; Bernasco, W.; Lugtig, P.; Ruiter, S.; Toepoel, V.; Ruiter, B.M.S. Where you at? Using GPS locations in an electronic time
use diary study to derive functional locations. Soc. Sci. Comput. Rev. 2021, 39, 509–526. [CrossRef]
19. Ciociola, A.; Cocca, M.; Giordano, D.; Vassio, L.; Mellia, M. E-scooter sharing: Leveraging open data for system design. In
Proceedings of the 2020 IEEE/ACM 24th International Symposium on Distributed Simulation and Real Time Applications
(DS-RT), Prague, Czech Republic, 14–16 September 2020; pp. 1–8.
20. Payyanadan, R.P.; Sanchez, F.A.L.; Lee, J.D. Assessing route choice to mitigate older driver risk. IEEE Trans. Intell. Transp. Syst.
2016, 18, 527–536.
21. Yang, J.; Rahardja, S.; Fränti, P. Outlier detection: How to threshold outlier scores? In Proceedings of the International Conference
on Artificial Intelligence, Information Processing and Cloud Computing, Sanya, China, 19–21 December 2019; pp. 1–6.
22. Rilett, L.R.; Tufuor, E.; Murphy, S. Arterial roadway travel time reliability and the COVID-19 pandemic. J. Transp. Eng. Part A Syst.
2021, 147, 04021034.
23. Soza-Parra, J.; Raveau, S.; Muñoz, J.C.; Cats, O. The underlying effect of public transport reliability on users’ satisfaction. Transp.
Res. Part A Policy Pract. 2019, 126, 83–93.
24. Xiaoliang, Z.; Limin, J. Analysis of Bus Line Operation Reliability Based on Copula Function. Sustainability 2021, 13, 8419.
25. Liu, Q.; An, Z.; Liu, Y.; Ying, W.; Zhao, P. Smartphone-based services, perceived accessibility, and transport inequity during the
COVID-19 pandemic: A cross-lagged panel study. Transp. Res. Part D Transp. Environ. 2021, 97, 102941.
26. Curl, A.; Nelson, J.D.; Anable, J. Does accessibility planning address what matters? A review of current practice and practitioner
perspectives. Res. Transp. Bus. Manag. 2011, 2, 3–11. [CrossRef]
27. Leng, N.; Corman, F. The role of information availability to passengers in public transport disruptions: An agent-based simulation
approach. Transp. Res. Part A Policy Pract. 2020, 133, 214–236.
28. Vdovychenko, V.; Ivanov, I.; Pidlubnyi, S. Assessment of the impact of traffic conditions on the availability of transport services
of the city bus route. Technol. Audit. Prod. Reserves 2022, 3, 45–50.
29. L’upták, V.; Droździel, P.; Stopka, O.; Stopková, M.; Rybicka, I. Approach methodology for comprehensive assessing the public
passenger transport timetable performances at a regional scale. Sustainability 2019, 11, 3532.
30. Zhang, H.; Cui, H.; Shi, B. A data-driven analysis for operational vehicle performance of public transport network. IEEE Access
2019, 7, 96404–96413. [CrossRef]
31. Zhu, H.; Wu, Y.; Wang, Y. Algorithm for Headway of Fixed Route Buses in Bus Stations Based on Bus Big Data. In Proceedings of
the 6th International Conference on Transportation Information and Safety, Wuhan, China, 22–24 October 2021; pp. 28–33.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like