IET Intelligent Trans Sys - 2021 - Wang - A Two Stage Method For Bus Passenger Load Prediction Using Automatic Passenger

Received: 9 September 2020 Revised: 13 November 2020 Accepted: 8 December 2020 IET Intelligent Transport Systems
DOI: 10.1049/itr2.12018
ORIGINAL RESEARCH PAPER
A two-stage method for bus passenger load prediction using

automatic passenger counting data
Pengfei Wang1 Xuewu Chen1 Jingxu Chen1 Mingzhuang Hua1 Ziyuan Pu2
1
Jiangsu Key Laboratory of Urban ITS, Jiangsu Abstract
Province Collaborative Innovation Center of
In high-frequency transit, providing real-time crowding information (RTCI) is a potential
Modern Urban Traffic Technologies, School of
Transportation, Southeast University, Nanjing, way to promote passenger satisfaction and reduce negative crowding externalities, by assist-
China ing passengers in choosing less crowded vehicles. To make RTCI convincing and reliable,
2
School of Engineering, Monash University, Jalan it is necessary to provide predictive RTCI, in which bus passenger load (BPL) prediction is
Lagoon Selatan, Bandar Sunway 47500, Malaysia the primary problem. This paper proposes a novel two-stage BPL prediction method using
automatic passenger counting (APC) data. The first stage is to predict short-term passen-
Correspondence
ger flows at stops based on an adaptive Kalman filter approach. Using the outputs from the
Xuewu Chen, Jiangsu Key Laboratory of Urban ITS,
Jiangsu Province Collaborative Innovation Center first stage as well as other variables directly from APC data, the second stage is to predict
of Modern Urban Traffic Technologies, School BPL based on a support vector regression algorithm. Several methods from the existing
of Transportation, Southeast University, Nanjing,
literature are used as benchmarks to test the relative performance of the proposed method.
211189, China.
Email: chenxuewu@seu.edu.cn An empirical study on bus line 1 in Suzhou, China shows that the proposed method outper-
forms all the benchmarks, and shows significant superiority over other methods for stops
Funding information with sharp increases in BPL and for multi-step ahead prediction. This study contributes to
National Key Research and Development Program the limited literature on BPL prediction and lays the foundation for providing accurate and
of China, Grant/Award Number: 2018YFB1601300;
Joint Funds of the National Natural Science Founda- reliable predictive RTCI in the future.
tion of China, Grant/Award Number: U20A20330;
National Natural Science Foundation of China,
Grant/Award Number: 71901059; Natural Sci-
ence Foundation of Jiangsu Province in China,
Grant/Award Number: BK20180402; Postgraduate
Research and Practice Innovation Program of Jiangsu
Province, Grant/Award Number: KYCX18_0143
1 INTRODUCTION a feasible and effective way to promote passenger travel

experience in high-frequency transit. With the help of RTCI,
1.1 Motivation passengers can make better-informed decisions about whether
to board a bus or not based on their acceptances for crowding,
Comfort is an important aspect of bus riding experience. With waiting time etc. Our previous study [4] indicates that if RTCI
the aggravating trend of the aging population in many coun- is provided for crowded bus routes, it would not only equalize
tries around the world, the proportion of elderly people among crowding among bus trips but also help to prevent bus bunch-
transit users is increasing [1]. Elderly passengers are more con- ing as the passengers transferring from crowded vehicles to less
cerned about comfort but less concerned about trip time com- crowded vehicles act as a kind of holding tools.
pared to young people. In this sense, travel comfort on buses is From the above point of view, RTCI is considered to be
increasingly important in current days. In-vehicle crowding has welcomed and helpful for both passengers and transit agen-
negative effects on traveller satisfaction and wellbeing that may cies. However, RTCI that only presents current bus crowding
inhibit the transition from private to public transport [2, 3]. conditions sometimes is not enough to support passengers to
With the development of intelligent transportation system, make correct decisions. For instance, a bus is currently having
providing real-time crowding information (RTCI) becomes available seat but may become full or crowded when it arrives
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is
properly cited.
© 2021 The Authors. IET Intelligent Transport Systems published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology
248 wileyonlinelibrary.com/iet-its IET Intell. Transp. Syst. 2021;15:248–260.

17519578, 2021, 2, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/itr2.12018 by Cochrane Japan, Wiley Online Library on [02/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
WANG ET AL. 249
FIGURE 1 An illustration of the bus passenger load prediction problem in high-frequency transit
at the target stop after servicing several stops; the passengers data. Pasini, Khouadjia [19] and Hu, Chiu [20] both applied
at the target stop may spend more waiting time for this bus a long short-term memory (LSTM) neural network to predict
but still cannot have a seat. In this case, the passengers would train loads based on temporal features (e.g., day-of-week,
feel RTCI is not trustworthy and the effect of RTCI would time-of-day) and load measurements from the previous train in
be greatly reduced. Obviously, predictive RTCI that forecasts the current day. Jenelius [21] proposed a prediction framework
the bus crowding condition at the target stop is more reliable consisting of historical and real-time APC data to predict
for passengers. Providing predictive RTCI generally requires individual train car loads and crowding levels based on several
that passenger loads be predicted several stops ahead from the regression models (stepwise linear regression, lasso regression
current bus locations. This paper focuses on the bus passenger and boosted regression tree ensembles). An empirical study on
load (BPL) prediction problem (illustrated in Figure 1). a metro line in Stockholm, Sweden shows that accurate RTCI
can be provided long before the trains arrive, and the three
regression models perform similarly.
1.2 Literature review Passenger load prediction in bus systems is considered more
difficult than in rail transit due to the following two reasons.
Short-term passenger flow prediction is a classical and popu- First, the ridership flows in bus systems, including boarding,
lar research topic over the years. Generally, the approaches for alighting, and onboard passenger numbers, are generally much
it can be classified into parametric and non-parametric meth- smaller than that in rail transit, and thus with more volatility that
ods [5]. In parametric methods, regression analysis [6], autore- makes it difficult to capture the flow patterns [11]. Second, in
gressive integrated moving average (ARIMA) model [7], gener- contrast to the regular headways in rail transit, the running con-
alized autoregressive conditional heteroscedasticity (GARCH) ditions for buses are unstable and easily affected by the external
model [8] have been applied successfully. Non-parametric meth- environment [22].
ods include artificial neural network (ANN) [9], support vec- A few studies have focused on the BPL prediction problem
tor machine (SVM) [10], Kalman filter [11], and random forest in recent years. Initially, Zhang, Shen [23] built a framework
[12]. The fundamental of these models is to construct a non- to predict passenger load using AFC data, where the first step
linear relationship between input and output variables without involves identifying the historical day most similar to the current
a priori knowledge. In recent years, multi-pattern models com- day and predicting downstream loads based on observed loads
bining with time series or neural networks have attracted the from the historical day. In the second step, the passenger loads
attention of many scholars [13–15]. Gong, Fei [16] proposed a are updated by combining real-time and historical data in an
framework to predict the waiting passenger number at bus stop, extended Kalman filter. Since the AFC data only record passen-
which has similarities with the BPL prediction. In the frame- gers’ boarding stops, the prediction was based on the estimated
work, arrival passenger number and departure passenger num- passenger loads and no ground truth data validated the predic-
ber are predicted separately. The waiting passenger number is tion performance. He, Yan [24] conducted BPL prediction for
then calculated, and a Kalman filter approach is developed to realizing predictive air-conditioner control on electric buses.
minimize the estimation inaccuracies. Using manually collected load data, they employed a relatively
There are much fewer studies regarding passenger load simple prediction scheme that assumes a relationship between
prediction in public transit, and most of them focus on rail loads at target stops with loads at previous stops. Three meth-
transit. For instance, Noursalehi [17] proposed a framework ods were adopted for this scheme, including Monte Carlo, radial
for urban rail crowding prediction and information provision basis function neural network (RBF-NN), and Markov-chain.
based on automatic fare collection (AFC) data. The framework Simulation results show that all three methods integrated
conducted data-driven prediction of origination-destination within a predictive air-conditioner control framework are able
(O-D) passenger flows using random forests and boosted to achieve more stable temperature performance and lower
regression trees, with on-line simulation of supply and demand energy consumptions. Particularly, Jenelius [25] applied a similar
interactions. Khomchuk, Tuladhar [18] proposed a Bayesian framework with Jenelius [21] on BPL prediction using lasso
approach to predicting the passenger loads in individual train regression model. Although the framework in Jenelius [25] was
cars downstream of their current locations based on APC data, extended to incorporate real-time automatic vehicle location
in which passenger O-D patterns are assumed to follow a Pois- (AVL) data, the results show that the prediction accuracy
son prior distribution with parameters estimated from historical for bus system is far behind that for trains, indicating that
250 WANG ET AL.
FIGURE 2 The framework of the proposed two-stage BPL prediction method
predicting passenger loads in bus systems is a more challenging bus line. The results show that the proposed two-stage
task. method outperforms all the benchmarks, particularly for
multi-step ahead prediction.
3. The performance of the adaptive Kalman filter approach on
1.3 Summary passenger flow prediction at bus stops is presented, and the
effectiveness of the SVR algorithm in the proposed method
Predicting passenger loads in high-frequency bus systems is a is analysed.
challenging task due to the significant variability of passenger
flows at stops and bus running conditions. Considering the pas-
The remainder of this paper is organized as follows. Sec-
senger load prediction problems for both rail transit and bus
tion 2 presents the two-stage BPL prediction method in detail.
transit, previous studies provide valuable knowledge about the
Section 3 introduces the case study, including the information
ability to predict passenger loads based on different prediction
regarding the studied bus line and APC data, and the bench-
techniques. However, several research gaps can be identified.
marks for comparison. Section 4 presents the performance of
First, most existing studies turn passenger load prediction into a
different methods in both single-step and multi-step ahead pre-
time series problem and only use historical or current load data
dictions. Section 5 concludes the work and elaborates on future
as predictors [19, 20, 23, 24]. More automatic collected data in
directions.
bus systems, such as AVL data and APC data, and deeper min-
ing methods based on these datasets are considered useful for
improving the BPL prediction performance. Besides, few stud-
ies have tackled the BPL prediction problem in an empirical 2 METHODOLOGY
context; thus there is no way to test the relative performances
of existing methods convincingly. The framework of the proposed two-stage BPL prediction
In this paper, we propose a two-stage BPL prediction method method is shown in Figure 2. The method is based on real-time
using APC data (including bus arrival time information as well). APC data. In the preliminary stage, the APC data are turned into
The first stage is to predict short-term passenger flows at stops boarding, alighting, and section passenger flows within fixed
based on an adaptive Kalman filter approach. Using the outputs time intervals at each stop through a homogenization step. In
from the first stage as well as other variables, the second stage stage 1, we forecast boarding, alighting and section passenger
is to predict BPL based on an SVR algorithm. The main contri- flows at subsequent time steps by an adaptive Kalman filter
butions of this paper include: approach. In stage 2, for every single trip, the predictive pas-
senger flow predictors and real-time APC predictors are com-
1. A novel two-stage BPL prediction method using APC data is bined to predict the passenger loads at the subsequent stops
proposed. By dividing the prediction task into two stages, the using support vector regression. In summary, stage 1 is passen-
proposed method can better extract passenger flow patterns ger flow prediction at stop level and stage 2 is BPL prediction at
from APC data and thus build a more effective model to vehicle level.
predict BPL. In the following text, we use the term boarding/alighting
2. A performance comparison between the proposed method number/flow to represent boarding/alighting passenger num-
and several state-of-the-art methods is conducted on a real ber/flow for simplicity.
WANG ET AL. 251
FIGURE 3 An illustration of the homogenization step for calculating boarding passenger flow at stop s
2.1 Preliminary the current flow series { fs,t −1 , fs,t −2 , …} and historical flow pat-
terns based on an adaptive Kalman filter. The adaptive Kalman
Consider a set of bus trips M = {1, 2, … , m, …} running on a filter approach has been successfully applied in traffic forecast-
line with a set of stops S = {1, 2, … , s, …} in one direction. ing and shows improved adaptability when traffic or passenger
The APC systems record the number of boarding passengers flow is highly volatile [11]. In this paper, the boarding, alighting,
bm,s , alighting passenger am,s , and arrival time Tm,s for bus trip or section flow at bus stop is the sort of flow with great fluctu-
m at every stop s. Then, the number of onboard passengers ations, thus the adaptive Kalman filter approach is considered
(passenger load) lm,s of bus m after loading at stop s is given appropriate to predict the passenger flows. In this subsection,
by: we fix the target stop s and omit the index for simplicity of nota-
tion.
s
∑ Consider the following state-space model. We assume the
lm,s = (bm, j − am, j ) (1) state transition equation that maps the passenger flow f from
j =1 t − 1 to t is given by:
To turn the series of number of boarding passengers {bm,s } ft = ft −1 + Δt + wt (4)

of bus m ∈ M into passenger flow within fixed time intervals at
stop s, a homogenization step is conducted as follows. First, the The observation equation is given by:
passenger number bm,s is uniformly allocated across its headway
i
from the previous bus m − 1; thus, the virtual timestamp 𝜔m,s
yt = ft + 𝜐t (5)
of passenger i among bm,s is expressed as:
i(Tm,s − Tm−1,s ) where

i
𝜔m,s = Tm−1,s + (i = 1, 2, … , bm,s ) (2)
bm,s + 1 Δt : a priori knowledge that can be obtained from historical
flow patterns;
The boarding flow at stop s is calculated by aggregating the vir- yt : the measurement value of ft ;
tual timestamps into fixed time intervals. An interval I0 = 15 wt ∼ N (qt , Qt ), 𝜐t ∼ N (0, Rt ): process noise and observa-
min is adopted in this study. Let Ωs be the set of virtual times- tion noise.
i
tamp 𝜔m,s for all passenger i and all buses m ∈ M at stop s. The
boarding flow fs,tb of stop s at time step t is expressed as: The conventional Kalman filter algorithm can be divided into
two phases, as described in Equations (6)–10):
fs,tb = # {𝜔 ∈ Ωs | (t − 1 ) I0 ≤ 𝜔 < t I0 } t ∈ N∗ (3)
Predict phase: State propagation and prior state estimation
Alighting flow fsa and section flow fsl at any stop s are obtained error covariance estimation.
in the same way by replacing bm,s in Equation (2) with am,s and
lm,s , respectively. Figure 3 illustrates the homogenization step in ft |t −1 = ft −1|t −1 + Δt + qt −1 (6)
a more intuitive way.
Pt |t −1 = Pt −1|t −1 + Qt −1 (7)
2.2 Passenger flow prediction based on
adaptive Kalman filter Update phase: Kalman gain computation, posterior state
estimation, and posterior state estimation error covari-
Let fs,t generally denote the (boarding, alighting, or section) flow ance estimation.
of stop s at time step t. This subsection aims to predict the pas-
senger flows in the subsequent time steps { fs,t , fs,t +1 , …} given Kt = Pt |t −1 (Pt |t −1 + Rt )−1 (8)
252 WANG ET AL.
time steps, while the parameters of process noises and obser-

ft |t = ft |t −1 + Kt (yt − ft |t −1 ) (9) vation noises stop updating. Using the adaptive Kalman filter
approach, we can obtain the predicted boarding, alighting and
section flows in the subsequent time steps for any stop.
Pt |t = Pt |t −1 (1 − Kt ) (10)
where 2.3 Bus passenger load prediction based on

support vector regression
ft |t −1 , Pt |t −1 : priori (predicted) estimation of passenger
flow at time step t and its error covariance; 2.3.1 Predictors
ft |t , Pt |t : posterior (updated) estimation of passenger flow
at time step t and its error covariance; Let hm,s denote the headway of bus m from the preceding bus at
Kt : Kalman gain at time step t. stop s; that is:
For heteroscedastic passenger flow series, an adaptation hm,s = Tm,s − Tm−1,s (12)
mechanism for updating the parameters of process noises and
observation noises is preferred, termed as the so-called adaptive The corresponding time window is expressed as 𝜏m,s =
Kalman filter [26]. In this study, the variance Rt of observation Tm−1,s , Tm,s . When predicting the passenger load of bus m at stop
noises and the mean qt and the variance Qt of process noises s from stop s − 1, we assume the headway keeps unchanged
are estimated by using a memory of observation errors and state from stop s − 1 to stop s; thus, the estimated time window
estimation errors, as described in Equation (11). 𝜏m,s|s−1 is given by:
𝜐t = yt − ft |t −1 (11a) 𝜏m,s|s−1 = (Tm−1,s , Tm−1,s + hm,s−1 ) (13)
𝜂 If 𝜏m,s|s−1 falls in a 15-min interval I0 (t ∗ − 1), I0t ∗ fully, the

1∑
r̂ = 𝜐 (11b) predicted passenger flow ̂fs (𝜏m,s|s−1 ) on 𝜏m,s|s−1 takes the exact
𝜂 j =1 t − j +1
value of ̂fs,t ∗ . Otherwise, 𝜏m,s|s−1 falls in two consecutive inter-
𝜂 [ ] vals, then ̂fs (𝜏m,s|s−1 ) takes the weighted average of the two con-
1 ∑ 2 N −1 secutive predicted passenger flows according to their shares in
Rt = (𝜐t − j +1 − r̂ ) − Pt − j +1|t − j (11c)
𝜂 − 1 j =1 N 𝜏m,s|s−1 , i.e.:
{
̂fs,t ∗ , 𝜏m,s|s−1 ∈ (I0 (t ∗ − 1), I0t ∗ )
wt = ft |t − ft −1|t −1 − Δt (11d) ̂fs (𝜏m,s|s−1 ) = (14)
(1 − 𝛼) ̂fs,t ∗ −1 + 𝛼 ̂fs,t ∗ , otherwise
𝜂
1∑ where t* is the upper bound time step of 𝜏m,s|s−1 , and 𝛼 is the
qt = q̂ = w (11e)
𝜂 j =i t − j +1 proportion that time step t* occupies in 𝜏m,s|s−1 .
Through Equations (13) and (14), we can obtain the predicted
1 boarding, alighting and section flows on certain time windows.
Qt =
𝜂−1 The predictive passenger flow predictor F m,s|s−1 used for BPL
𝜂 [ ] prediction is expressed as:
∑ 2 N −1
(wt − j +1 − q̂ ) − (Pt − j |t − j − Pt − j +1|t − j +1 ) (11f) ( )
j =1
N F m,s|s−1 = ̂f bs (𝜏m,s|s−1 ), ̂fsa (𝜏m,s|s−1 ), ̂f ls (𝜏m,s|s−1 ) (15)
where The real-time APC data is also important to BPL prediction.

Obviously, the current load lm,s−1 and current headway hm,s−1
𝜐t , wt : observation error and state estimation error at time are directly related to passenger load at the next stop. Moreover,
step t; the passenger loads and headways of the target bus at the pre-
r̂ , q̂ : unbiased estimations of the mean of observation vious two stops may imply some variation trends in passenger
errors and state estimation errors; loads or headways, and are selected as explanation variables as
𝜂: prescribed memory size of the adaptive Kalman filter well. The real-time APC predictor APC m,s|s−1 is expressed as:
recursion.
APC m,s|s−1 = (lm,s−1 , hm,s−1 , lm,s−2 , lm,s−3 , hm,s−2 , hm,s−3 ) (16)
For multi-step ahead prediction, since there is no measure-
ment to update the posterior state, we assume the posterior esti- Combining F m,s|s−1 and APC m,s|s−1 together and omitting
mation of flow is the same as the prior estimation in prediction the bus index m for simplicity of notation, the single-step ahead
WANG ET AL. 253
BPL prediction model aims to generalize the relationship of the

following form:
(
l̂s|s−1 = F ̂fsb (𝜏s|s−1 ), ̂fsa (𝜏s|s−1 ), ̂fsl (𝜏s|s−1 ), ls−1 ,
)
hs−1 , ls−2 , ls−3 , hs−2 , hs−3 (17)
where l̂s|s−1 is the predicted passenger load at stop s based on

stop s − 1.
For multi-step ahead prediction, namely predicting stop s
from stop s0 (s0 = s − 2, s − 3, …), the predicted stops (i.e., stop
s0 + 1, … , s) are regarded as one whole stop. The predicted pas-
senger flow of the whole stop is an average, denoted as ̄fs0 s ,
across the predicted flows of these initial stops, i.e.:
∑ s
̄fs s = 1 ̂f (𝜏 ) (18)
0 s − s0 j =s +1 j j |s0
0
Therefore, the form of multi-step ahead BPL prediction model

is expressed as:
( )
l̂s|s0 = F ̄fsb0 s , ̄fsa0 s , ̄fsl0 s , ls0 , hs0 , ls0 −1 , ls0 −2 , hs0 −1 , hs0 −2 (19)
FIGURE 4 Geographic distribution of bus line 1 in Suzhou, China
2.4 Support vector regression

tures motivate us to adopt the SVR algorithm to develop the
Support vector regression (SVR) is applied to develop the BPL BPL prediction model.
prediction model. SVR is the counterpart of SVM for regres-
sion problems. SVR can capture the complex input and output
relationship in non-linear systems by mapping the input vectors 3 CASE STUDY
into high-dimension space. The objective function of SVR is to
minimize the L2-norm of the coefficient vector with slack vari- 3.1 Bus line and APC data
ables involved, i.e.:
∑ The proposed two-stage BPL prediction framework is applied
1
min ∥ 𝝎∥2 + C (𝜉i + 𝜉i∗ ) (20) to bus line 1 in Suzhou, China. The geographic distribution of
2 i line 1 is shown in Figure 4. The studied direction is from north
to south. Line 1 is 11.2 km long and consists of 22 stops in
subject to: the studied direction. The typical run time from start to end in
one direction is around 45 min. Line 1 provides a quite frequent
yi − 𝝎 ⋅ xi − b ≤ 𝜀 + 𝜉i service; the departure interval is 5 min in peak and 8 min in off-
peak. The fleet serviced in line 1 is medium-sized buses with 22
𝝎 ⋅ xi + b − yi ≤ 𝜀 + 𝜉i∗ (21)
seats.
𝜉i 𝜉i∗ ≥ 0 The whole fleet of line 1 is equipped with the APC system.
The APC system records the number of boarding passengers,
where 𝜔 is the coefficient vector, C is the regularization con- number of alighting passengers, and bus arrival time for each
stant, 𝜀 is the tube size, 𝜉i , 𝜉i∗ are slack variables, yi is the depen- bus trip at each stop. APC data from weekdays on 30 July–14
dent variable, i.e. passenger load ls at target stop, and xi is the August 2018, are used in this study. After data cleaning and pre-
vector of explanatory variables in Equation (17). processing, we obtained 1804 bus trips in 12 days for the studied
By solving the dual of Equation (20) and introducing the ker- direction. Figure 5(a) plots the average passenger load, number
nel function, the primal problem is turned into a linearly con- of boarding passengers, and alighting passengers among all bus
strained quadratic programming problem, indicating that the trips at each stop. As shown in Figure 5(a), the segments from
solution of SVR is always unique and globally optimal [27]. A stop 5 to stop 7 (denoted as segment 1) and from stop 13 to
main drawback of SVR is its long training time when dealing stop 15 (denoted as segment 2) basically have more boarding
with large datasets and samples with many features. However, or alighting passengers than any other stop except two terminal
this issue is not influential in our prediction framework since stops; the variations in passenger load on these two segments
the dataset and sample features are relatively small. These fea- are obvious as well. Therefore, segment 1 and segment 2 are
254 WANG ET AL.
FIGURE 5 The spatial and temporal variations of passenger load in line 1. (a) Average passenger loads variation with stops. Studied segment 1 stands for a
boarding-dominant segment, and segment 2 for an alighting-dominant segment. (b) Passenger loads variation with time at stop 7
selected as the studied segments in this study since it is more set and 303 samples in the test set for each target stop. The RBF
challenging to predict BPL on these two segments, where seg- kernel function is selected for the SVR algorithm in this study.
ment 1 stands for a boarding-dominant segment, and segment A fivefold cross-validation approach is used to train the model.
2 stands for an alighting-dominant segment. Figure 5(b) plots Three parameters need to be determined while using RBF ker-
the passenger load variation with time at stop 7. The 10th per- nels in SVR, namely regularization constant C, tube size 𝜀, and
centile, medium, and 90th percentile passenger loads are calcu- scale parameter 𝛾. A grid-search is used to pick up the optimal
lated among all bus trips in the dataset within each 30-min inter- C, 𝜀 and 𝛾.
val. Figure 5(b) indicates that there is significant variability in
passenger loads between different bus trips. It also shows that
there is an obvious evening peak in passenger load on line 1. 3.3 Benchmarks
Four benchmarks are used to test the relative performance of

3.2 Model fitting the proposed framework. These methods include linear model
based on predicted flows, one-step forecast, and two BPL pre-
Data from the first two weeks (30 July–10 August, 10 days) are diction models in the existing literature [23, 25].
used as the historical/training set to derive passenger flow pat-
terns and to train the prediction model. Data from the last two
days (13, 14 August) are used as the test set. 3.3.1 Linear model based on predicted flows
In the stage of passenger flow prediction, we calculate the
average passenger flows ̄fs,t based on the historical set. Then, In contrast to the SVR algorithm, the linear model in this study
Δt in Equation (4) is obtained by calculating the first-order dif- tries to seek a specific form of function between the dependent
ference of the flow series {… , ̄fs,t −1 , ̄fs,t , …} for each stop s. We variable and the explanatory variables in Equation (17). The pre-
apply the adaptive Kalman filter approach to predict the board- dicted number of boarding passengers b̂ s|s−1 is estimated by:
ing, alighting and section flows for stop 3–7 and stop 11–15 in
both the historical set and test set. The memory size 𝜂 in the hs h
b̂ s|s−1 = ̂fsb (𝜏s|s−1 ) ≈ ̂fsb (𝜏s|s−1 ) s−1 (22)
adaptive Kalman filter recursion is set as 4. I0 I0
In the stage of BPL prediction, for any target stop, each bus
trip is turned into a sample. The samples in the first two weeks where I0 = 15 min is the length of time intervals. For the
are used as training data, and the samples in the last two days are predicted number of alighting passengers â s|s−1 , it is assumed
used as test data. Thus, there are 1801 samples in the training that the proportional relationship between as and Ls−1 is
WANG ET AL. 255
approximately equal to that between ̂fs,ta and ̂fs−1,t

l
; thus, â s|s−1 TABLE 1 Performance of passenger flow prediction on test set
is given by: Segment 1 Segment 2
̂fsa (𝜏s|s−1 ) Passenger flow type MAE RMSE MAE RMSE

â s|s−1 = ls−1 (23)
̂fsl (𝜏s|s−1 ) Boarding flow 2.45 3.23 1.81 2.49
Alighting flow 1.61 2.24 2.50 3.38
In total, the predicted passenger load given by the linear model Section flow 5.89 7.76 5.90 7.66
is expressed as:
hs−1 ̂fsa (𝜏s|s−1 ) and real-time APC data, respectively, i.e.:

l̂s|s−1 = ls−1 + ̂fsa (𝜏s|s−1 ) − ls−1 (24)
I0 ̂fsl (𝜏s|s−1 ) ( )
xm,s|s−1 = xshist , xm,s−1
avl
, xm,s−1
run
(27)
The reason for introducing the linear model as a benchmark
is to investigate the effects of the SVR algorithm and the auxil- The prediction model is assumed linear in coefficients. Then, a
iary variables (ls−2 , ls−3 , hs−2 , ls−3 ) in Equation (17). lasso regression is applied for variable selection and parameter
estimation by minimizing:
3.3.2 One-step forecast ( )2
1∑ ∑ ∑
lm,s − 𝛽0 − xm,s|s−1,i ⋅ 𝛽i +𝜆 |𝛽i | (28)
It forecasts the future load l̂s as the observed value ls−1 at 2 m i i
the former time step. The one-step forecast is one of the
simplest prediction ways, serving as a baseline method for where 𝛽0 and 𝛽i are parameters to be estimated, and 𝜆 is a regu-
comparison. larization coefficient that penalizes large parameter values. The
predicted passenger load is calculated based on the estimation
results of 𝛽0 and 𝛽i .
3.3.3 Two-step extended Kalman filter The linear model, one-step forecast, and lasso regression can
(2S-EKF) model easily extend to forms for multi-step ahead prediction.
Zhang, Shen [23] proposed a 2S-EKF model for the BPL pre-
diction problem. The first step is to search the historical data to 3.4 Performance evaluation
find the passenger load matrix L hist that is most similar to the
current load matrix L. The similarity S is defined as Equation The prediction results are evaluated in terms of the perfor-
(25). mances of two indices: mean absolute error (MAE) and root
mean square error (RMSE). RMSE gives a relatively high weight
L ⊙ L hist to large errors and is usually larger than MAE. The greater dif-
S= (25)
∥ L ∥ ⋅ ∥ L hist ∥ ference between them, the greater the variance in the individual
errors in the sample set [28]. The MAE and RMSE of N sam-
where ⊙ is the element-wise product operator. Then, we obtain ples are computed as follows:
a passenger load sequence {u1∗ , u2∗ , … , us∗ } from the most similar
1 ∑̂
historical matrix L hist,∗ . The second step is to predict the pas- MAE = l −l (29)
senger load using an extended Kalman filter, in which the state N n n n
transition function is given by: √
1 ∑ (̂ )2
RMSE = l −l (30)
∗
us∗ − us−1 N n n n
l̂s|s−1 = ls−1 + ∗ ∗ (ls−1 − ls−2 ) + ws (26)
us−1 − us−2
where l̂n and ln are the predicted and actual passenger loads of
where ws is Gaussian white noise. sample n, respectively.
3.3.4 Lasso regression 4 RESULTS

Jenelius [25] proposed a BPL prediction model based on both 4.1 Passenger flow prediction
real-time and historical APC data using lasso regression. The
vector of potential predictors xm,s|s−1 consist of three predic- Table 1 presents the performance of passenger flow prediction
tors, which are based on historical load data, real-time AVL data, on the test set (13, 14 August). The prediction performance on
256 WANG ET AL.
FIGURE 6 Predicted flow values versus ground truth data for stop 7 on 14 August 2018. (a) Boarding flow; (b) Alighting flow; (c) Section flow
TABLE 2 Performance comparison between the proposed method and that the proposed method outperforms all the benchmarks on
benchmarks both segments 1 and 2. The linear model performs closely to
Segment 1 Segment 2 the proposed method. Besides, it shows that the lasso regres-
sion is also acceptable for BPL prediction. However, the 2S-
Method MAE RMSE MAE RMSE EKF model presents very bad prediction results, even much
Proposed method 2.08 2.70 1.76 2.40 worse than the results of one-step forecast. A larger historical
Linear model 2.13 2.83 1.80 2.43 dataset may be helpful to improve the performance of the 2S-
Lasso regression 2.19 2.96 1.81 2.44
EKF model, since the first step of the model is to find a most
similar load sequence in history. Nonetheless, in essence, the 2S-
2S-EKF model 3.26 4.40 2.74 3.65
EKF model is only based on load data, and does not consider
One-step forecast 2.41 3.52 2.02 3.00 other important explanation variables (e.g. headway, boarding,
and alighting passenger numbers). Thus, it is more likely that
the 2S-EKF model is not capable of predicting BPL in high-
segment 1 is calculated by comparing the predicted flow values frequency transit; it may be more feasible for low-frequency
at stop 5, 6, 7 with their ground truth data; the prediction per- routes, on which the headways have much less variability.
formance on segment 2 is obtained similarly for stop 13, 14, Comparing the proposed method with the lasso regression,
15. As shown in Table 1, the prediction performance for alight- it shows that the advantages of the proposed method are more
ing flow is better than that for boarding flow on segment 1, but evident on segment 1. Specifically, the proposed method shows
shows the contrary on segment 2, since segment 1 has more its superiority on stops with many boarding passengers, which
boarding passengers while segment 1 has more alighting pas- is fully demonstrated in Table A1 in Appendix. The reason is
sengers. The prediction performance for section flow is not as probably that boarding number prediction relies more on the
good as that for boarding and alighting flow, partially because parameter of predicted boarding flow in equation (17), while the
the section flow has much larger numbers. Figure 6 also plots parameter of predicted alighting flow is not that important for
the predicted flow values versus ground truth data for stop 7 alighting number prediction.
on 14 August as an example. It shows that the passenger flows Figure 7 plots the predicted loads by the proposed method
are quite unstable over time, especially for section flow. Even versus actual loads on the two segments. Red dashed lines in
though, the predicted values fit with the ground truth well, indi- Figure 7 indicate the number of seats, which is 22 for the studied
cating the effectiveness of the adaptive Kalman filter-based pre- line. Figure 7 shows that most actual loads are smaller than 22,
diction model. indicating it is seat available on line 1 in most cases. Generally,
the predicted loads fit with the actual load well. For loads larger
than 22, the deviations are still in an acceptable range.
4.2 Single-step ahead passenger load
prediction
4.3 Multi-step ahead passenger load
In single-step ahead passenger load prediction, we forecast the prediction
load of the target stop from the last preceding stop. The perfor-
mances of the proposed method and four benchmarks on two In multi-step ahead prediction, we forecast the load of tar-
studied segments are displayed in Table 2. The detailed predic- get stop s from the preceding stop s − 𝜙, where 𝜙 is step
tion results for each stop are given in Appendix. Table 2 shows number. The performances of the proposed method and three
WANG ET AL. 257
FIGURE 7 Predicted loads by the proposed method versus actual loads on: (a) segment 1, and (b) segment 2. Red dashed lines indicate the upper limit of seat
available
TABLE 3 Multi-step ahead prediction performance of the proposed probably because the lasso regression does not include an expla-
method and benchmarks nation variable regarding future passenger flow and assumes
Segment 1 Segment 2 a linear relationship with the explanatory variables, leading to
poor ability to predict sharp passenger load increasing. Similar
Prediction step Method MAE RMSE MAE RMSE to before, the advantages of the proposed method on segment
Two-step ahead Proposed method 2.89 3.68 2.21 2.94 2 are not so evident as that on segment 1.
Linear model 3.00 3.89 2.37 3.14 Figure 8 plots three-step ahead predicted loads by the
Lasso regression 3.13 4.05 2.24 3.00
proposed method versus actual loads on the two segments.
Generally, Figure 8(b) shows a lower scatter around the diag-
One-step forecast 3.73 5.08 2.98 4.16
onal than Figure 8(a), which is consistent with the results in
Three-step ahead Proposed method 3.35 4.29 2.51 3.31 Table 3. Interestingly, there is an overall trend that the load on
Linear model 3.55 4.54 2.77 3.65 crowded runs is a bit underestimated but the load on the least
Lasso regression 3.73 4.74 2.58 3.40 crowded runs is a bit overestimated on both segments 1 and
One-step forecast 4.82 6.27 3.44 4.63 2. A similar phenomenon also appears in the experiments in
Jenelius [25], indicating a strong differentiation in BPL varia-
tion trends among different bus trips. Even though, Figure 8
shows that the proposed method can effectively predict bus
benchmarks on two studied segments are displayed in Table 3. loads several stops ahead from the current locations in most
Here, we consider two-step ahead and three-step ahead. The cases.
2S-EKF model is omitted since it cannot provide competitive
results. The detailed prediction results for each stop are given in
Appendix as well.
Overall, the proposed method still outperforms all the bench- 5 CONCLUSION
marks on both two segments. The differences between the per-
formances of the proposed method and linear model are larger In high-frequency transit, providing real-time crowding infor-
with the prediction step increasing. That indicates the auxiliary mation (RTCI) can help passengers choose less crowded
variables (ls−2 , ls−3 , hs−2 , hs−3 ) and non-linear relationship cap- vehicles and reduce negative crowding externalities, which
tured by SVR are beneficial to BPL prediction, especially for would be attractive for both passengers and transit agencies.
multi-step ahead prediction, in which the variation trends of Providing timely and effective RTCI generally requires that the
loads or headways are important information. In addition, the bus passenger load (BPL) be predicted several stops ahead from
superiority of the proposed method over the lasso regression the current bus locations. This paper contributes to the lim-
also becomes obvious with the prediction step increasing. That ited literature on the BPL prediction problem by formulating a
258 WANG ET AL.
FIGURE 8 Three-step ahead predicted loads by the proposed method versus actual loads on: (a) segment 1, and (b) segment 2. Red dashed lines indicate the
upper limit of seat available
two-stage prediction method based on APC data. The first stage RTCI. since RTCI may influence passengers’ boarding choices
is short-term passenger flow prediction at stop level, in which between two consecutive buses on the same route or different
boarding, alighting, and section flows at stops are predicted by routes, adding the variables of passenger loads of preceding bus
applying the adaptive Kalman filter approach. The second stage and following bus on the same route and other routes to the
is BPL prediction at vehicle level, in which the predicted flows current prediction framework seems to be reasonable and worth
obtained from the first stage, as well as other predictors directly trying to improve the prediction performance.
from APC data, act as the explanatory variables to predict final For future work, we will move forward to bus crowding level
passenger loads using the SVR algorithm. prediction based on the findings in this study. Another inter-
Four benchmarks are used to test the relative performance of esting topic is the release strategy of RTCI to disseminate the
the proposed method, including a linear model based on pre- information to customers in a reliable and useful way.
dicted flows, one-step forecast, and two methods in the existing
literature (2S-EKF model and lasso regression). Two segments ACKNOWLEDGMENTS
on bus line 1 in Suzhou, China are selected to test the per- This work was supported by the National Key Research and
formances of these prediction methods. The results show that Development Program of China (No. 2018YFB1601300), Joint
the proposed method generally achieves the best performance Funds of the National Natural Science Foundation of China
among all the methods. By comparing with the linear model, (No. U20A20330), the National Natural Science Foundation
it indicates that the SVR algorithm in the proposed method of China (No. 71901059), the Natural Science Foundation of
can effectively capture the non-linear relationship between bus Jiangsu Province in China (No. BK20180402), and the Post-
loads and explanatory variables, particularly improving the per- graduate Research and Practice Innovation Program of Jiangsu
formance in multi-step ahead prediction. The lasso regression Province (KYCX18_0143). The authors would like to thank
also performs well in general, but the proposed method observ- Suzhou Transportation Authority for kindly providing the APC
ably outperforms it for stops with sharp increases in BPL and data used in the study.
for multi-step ahead prediction. In contrast, another existing
method, the 2S-EKF model may not be appropriate for BPL CONFLICT OF INTEREST
prediction in high-frequency transit, due to its poor perfor- The authors declare that they have no conflict of interest.
mance shown in the results.
It should be noted that the prediction model in this paper is DATA AVAILABILITY STATEMENT
trained using APC data without RTCI. The variation patterns of The data that support the findings of this study are available
BPL may be a little different if RTCI is truly provided since the from Suzhou Transportation Authority. Restrictions apply to
boarding behaviours of passengers may be affected by RTCI, the availability of these data, which were used under license for
but the prediction method proposed in this paper is consid- this study. Data are available from the authors with the permis-
ered still applicable by training the model using APC data with sion of Suzhou Transportation Authority.
WANG ET AL. 259
ORCID 16. Gong, M., et al.: Sequential framework for short-term passenger flow pre-
Xuewu Chen https://orcid.org/0000-0002-2334-9702 diction at bus stop. Transp. Res. Rec., J. Transp. Res. Board 2417(2417),
58–66 (2014)
17. Noursalehi, P.: Decision Support Platform for Urban Rail Systems: Real-
REFERENCES time Crowding Prediction and Information Generation. Northeastern
1. Cheng, L., et al.: Active travel for active ageing in China: The role of built University, Boston, MA (2017)
environment. J. Transp. Geogr. 76, 142–152 (2019) 18. Khomchuk, P., et al.: Predicting passenger loading level on a train car: A
2. Tirachini, A., et al.: Crowding in public transport systems: Effects on users, Bayesian approach. arXiv preprint arXiv:180806962 (2018)
operation and implications for the estimation of demand. Transp. Res. Part 19. Pasini, K., et al.: LSTM encoder-predictor for short-term train load fore-
A Policy Pract. 53, 36–52 (2013) casting. In: Joint European Conference on Machine Learning and Knowl-
3. Kim, K.M., et al.: Does crowding affect the path choice of metro passen- edge Discovery in Databases. Springer, Würzburg, Germany (2019)
gers? Transp. Res. Part A Policy Pract. 77, 292–304 (2015) 20. Hu, R., et al.: Crowding prediction on mass rapid transit systems using a
4. Wang, P., et al.: Provision of bus real-time information: Turning passengers weighted bidirectional recurrent neural network. IET Intell. Transp. Syst.
from Being contributors of headway irregularity to controllers. Transp. Res. 14(3), 196–203 (2020)
Rec. J. Transp. Res. Board 2672(8), 143–151 (2018) 21. Jenelius, E.: Data-driven metro train crowding prediction based on real-
5. Vlahogianni, E.I., et al.: Short-term traffic forecasting: Overview of objec- time load data. IEEE Trans. Intell. Transp. Syst. 21(6), 2254–2265
tives and methods. Transp. Rev. 24(5), 533–557 (2004) (2019)
6. Smith, B.L., et al.: Comparison of parametric and nonparametric mod- 22. Yu, H., et al.: Headway-based bus bunching prediction using transit smart
els for traffic flow forecasting. Transp. Res. C, Emerg. 10(4), 303–321 card data. Transp. Res. C. Emerg. 72, 45–59 (2016)
(2002) 23. Zhang, J., et al.: A real-time passenger flow estimation and prediction
7. Gan, M., et al.: Seasonal and trend time series forecasting based method for urban bus transit systems. IEEE Trans. Intell. Transp. Syst.
on a quasi-linear autoregressive model. Appl. Soft. Comp. 24, 13–18 18(11), 3168–3178 (2017)
(2014) 24. He, H.W., et al.: Predictive air-conditioner control for electric buses with
8. Ding, C., et al.: Using an ARIMA-GARCH modeling approach to improve passenger amount variation forecast. Appl. Energy 227, 249–261 (2018)
subway short-term ridership forecasting accounting for dynamic volatility. 25. Jenelius, E.: Data-driven bus crowding prediction based on real-time
IEEE Trans. Intell. Transp. Syst. 19(4), 1054–1064 (2018) passenger counts and vehicle locations. In: 6th International Confer-
9. Tsai, T.H., et al.: Neural network based temporal feature models for short- ence on Models and Technologies for Intelligent Transportation Systems
term railway passenger demand forecasting. Expert Syst. Appl. 36(2), (MTITS2019). Cracow, Poland (2019)
3728–3736 (2009) 26. Myers, K., Tapley, B.: Adaptive sequential estimation with unknown noise
10. Zhang, Y., Liu, Y.C.: Traffic forecasting using least squares support vector statistics. IEEE Trans. Automat. Contr. 21(4), 520–523 (1976)
machines. Transportmetrica 5(3), 193–213 (2009) 27. Yu, B., Yang, Z., Yao, B.: Bus arrival time prediction using support vector
11. Guo, J.H., et al.: Adaptive Kalman filter approach for stochastic short-term machines. J. Intell. Transp. Syst. 10(4), 151–158 (2006)
traffic flow rate prediction and uncertainty quantification. Transp. Res. C, 28. Yu, B., et al.: Prediction of bus travel time using random forests based
Emerg. 43, 50–64 (2014) on near neighbors. Comput. Aided Civ. Infrastruct. Eng. 33(4), 333–350
12. Cheng, L., et al.: Applying a random forest method approach to model (2018)
travel mode choice behavior. Travel Behav. Soc. 14, 1–10 (2019)
13. Bai, Y., et al.: A multi-pattern deep fusion model for short-term
bus passenger flow forecasting. Appl. Soft. Comp. 58, 669–680
(2017) How to cite this article: Wang P, Chen X, Chen J,
14. Chen, X., et al.: Multi-model ensemble for short-term traffic flow pre- Hua M, Pu Z. A two-stage method for bus passenger
diction under normal and abnormal conditions. IET Intell. Transp. Syst.
load prediction using automatic passenger counting
13(2), 260–268 (2019)
15. Jia, F.F., et al.: Deep learning-based hybrid model for short-term subway data. IET Intell Transp Syst. 2021;15:248–260.
passenger flow prediction using automatic fare collection data. IET Intell. https://doi.org/10.1049/itr2.12018
Transp. Syst. 13(11), 1708–1716 (2019)
260 WANG ET AL.
APPENDIX
TABLE A.1 Prediction results of different methods at each stop in term of MAE
Segment 1 Segment 2
Prediction step Method Stop 5 Stop 6 Stop 7 Stop 13 Stop 14 Stop 15
Single-step ahead Proposed method 1.72 2.26 2.30 1.44 2.34 1.52
Linear model 1.76 2.22 2.41 1.44 2.39 1.57
Lasso regression 1.68 2.35 2.53 1.42 2.38 1.63
One-step forecast 1.67 2.40 3.17 1.41 2.80 1.85
Two-step ahead Proposed method 2.74 2.76 3.20 1.80 2.54 2.30
Linear model 2.81 2.69 3.50 1.90 2.67 2.55
Three-step ahead Proposed method 3.01 3.43 3.59 2.53 2.66 2.34
Linear model 3.11 3.55 4.01 2.66 2.82 2.83

IET Intelligent Trans Sys - 2021 - Wang - A Two Stage Method For Bus Passenger Load Prediction Using Automatic Passenger

Uploaded by

Copyright:

Available Formats

You might also like

IET Intelligent Trans Sys - 2021 - Wang - A Two Stage Method For Bus Passenger Load Prediction Using Automatic Passenger

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

IET Intelligent Trans Sys - 2021 - Wang - A Two Stage Method For Bus Passenger Load Prediction Using Automatic Passenger

Uploaded by

Copyright:

Available Formats

Received: 9 September 2020 Revised: 13 November 2020 Accepted: 8 December 2020 IET Intelligent Transport Systems

ORIGINAL RESEARCH PAPER

A two-stage method for bus passenger load prediction using

1 INTRODUCTION a feasible and effective way to promote passenger travel

248 wileyonlinelibrary.com/iet-its IET Intell. Transp. Syst. 2021;15:248–260.

FIGURE 2 The framework of the proposed two-stage BPL prediction method

To turn the series of number of boarding passengers {bm,s } ft = ft −1 + Δt + wt (4)

i(Tm,s − Tm−1,s ) where

time steps, while the parameters of process noises and obser-

where 2.3 Bus passenger load prediction based on

𝜐t = yt − ft |t −1 (11a) 𝜏m,s|s−1 = (Tm−1,s , Tm−1,s + hm,s−1 ) (13)

𝜂 If 𝜏m,s|s−1 falls in a 15-min interval I0 (t ∗ − 1), I0t ∗ fully, the

where The real-time APC data is also important to BPL prediction.

BPL prediction model aims to generalize the relationship of the

where l̂s|s−1 is the predicted passenger load at stop s based on

Therefore, the form of multi-step ahead BPL prediction model

2.4 Support vector regression

Four benchmarks are used to test the relative performance of

approximately equal to that between ̂fs,ta and ̂fs−1,t

̂fsa (𝜏s|s−1 ) Passenger flow type MAE RMSE MAE RMSE

hs−1 ̂fsa (𝜏s|s−1 ) and real-time APC data, respectively, i.e.:

3.3.4 Lasso regression 4 RESULTS

Prediction step Method Stop 5 Stop 6 Stop 7 Stop 13 Stop 14 Stop 15

You might also like