Professional Documents
Culture Documents
ETRI Journal - 2022 - Gangrade - Taxi Demand Forecasting Using Dynamic Spatiotemporal Analysis
ETRI Journal - 2022 - Gangrade - Taxi Demand Forecasting Using Dynamic Spatiotemporal Analysis
ETRI Journal - 2022 - Gangrade - Taxi Demand Forecasting Using Dynamic Spatiotemporal Analysis
DOI: 10.4218/etrij.2021-0123
ORIGINAL ARTICLE
KEYWORDS
combined covariates model, ensemble regression models, linear regression, spatiotemporal
analysis, taxi demand forecasting
This is an Open Access article distributed under the term of Korea Open Government License (KOGL) Type 4: Source Indication + Commercial Use Prohibition +
Change Prohibition (http://www.kogl.or.kr/info/licenseTypeEn.do).
1225-6463/$ © 2022 ETRI
(number of taxi-booking requests) generated by passen- separately in each stage. In the first stage, we perform a
gers in each second, which can then be projected into a spatiotemporal analysis for each community area on the
time-series model to predict the variation of demand as a basis of influence propagated by demands in nearby
function of time [1]. Thus, the demand for taxis varies areas. In the next stage, we include sociodemographic
with both space and time. However, the balance between factors and analyze their impact on taxi demand. In the
the supply of passengers and the need for taxis not only last stage, we utilize a framework that incorporates the
depends on spatiotemporal features but also is influenced previous two stages along with POI to study the com-
by other factors, such as sociodemographics (e.g., per- bined influence of these factors on the spatiotemporal
capita income, hardship index, unemployment, and age), variations of taxi demand. In all three of these stages, we
the influence of neighborhood areas, and point-of- consider various state-of-the-art linear and ensemble
interest (POI) locations (e.g., restaurants, hospitals, col- models as base regressors. Ultimately, the outcome of this
leges, pubs, and shopping malls). Various investigations study can be utilized to identify hotspots across the city.
[2–5] have been performed to explore the impact of these The main contributions of this paper include the fol-
factors on taxi demand. Some have made use of lowing: (a) We have performed a systematic analysis of
sociodemographic data alone to predict the taxi flow taxi-demand variation in different hours of the day for
between community areas. However, such data may not weekdays and weekends across all the months. (b) We
convey enough information about those areas, and they have investigated the influence of covariates like
also remains static over a considerable period of time sociodemographic factors and neighborhood influence.
(e.g., census information is collected only once every (c) We have proposed a combined model that considers
decade). Other studies have employed only neighborhood the overall influence of all the factors together (neighbor-
influences to discern the pattern of taxi flow. This factor, hood influence, sociodemographics, and POI). (d) We have
however, may be inadequate, in the sense that nearby proposed various data-transformation algorithms to make
areas are likely to share similar sociodemographics, taxi-demand forecasting suitable for regression models.
which limits the benefit of adding neighborhood influ- The rest of this paper is organized as follows: Section 2
ence in predicting taxi demand. In addition, the impact reviews previous work in the field of time-series forecast-
of factors that reflect the dynamics of the city—like ing and taxi-demand forecasting. Section 3 contains a
crowd-generated POI data—may also play important description of the datasets used to train and test the
roles in determining the demand. All these points lead to models proposed in this work. We provide an overview of
the conclusion that a robust framework can be devised the models proposed for taxi-demand forecasting in
only if we consider all these factors simultaneously in the Section 4. Section 5 contains the algorithm for trans-
spatiotemporal analysis of taxi demand [2]. forming the raw taxi-booking dataset into an appropriate
The ability to predict taxi demand in advance can time-series taxi-demand dataset. Section 6 contains all
help significantly to alleviate the problem of inadequate the models proposed in this work to select a set of inde-
taxi supply. By using an accurate time-series forecasting pendent variables corresponding to each target. We com-
framework, demand for the next time interval can be pare the results from the proposed models and explain
predicted for a given area. If the predicted demand in that them in detail in Section 7. Conclusions drawn from this
area decreases while the supply is high, we can safely research are included in Section 8.
conclude that many taxis will be running vacant because
they have not been reallocated to appropriate areas (the
supply can be predicted from pings generated by taxis 2 | L I T E R A T UR E R E V I E W
during their journeys). These idle taxis should be
redirected immediately to areas with high unmet With increasing travel demands, various approaches have
demand, balancing the flow between demand and supply been proposed to predict the transportation demand.
and truncating the overall idle driving time and avoidable Conventional forecasting approaches focus mainly on the
fuel consumption. Conversely, oversupply can lead to temporal variation of the taxi demand. As these
traffic congestion. Hence, it is necessary to understand approaches depend on the time-series characteristics of
the variable needs of the population in both time and taxi demand, they can be considered to be standard time-
space, which can be achieved only through a robust series problems that can be solved using traditional statis-
demand-prediction model. tical and machine-learning algorithms. For instance,
In this paper, we utilize the taxi-trip records for the Yang and Gonzales [3] aggregated the raw data by census
city of Chicago to determine the spatiotemporal varia- tract and hours of the day to extract valuable insights,
tions of taxi demand in three different stages, considering and they employed count-regression models (a Poisson
the variability of demands on weekdays and weekends model, a quasi-Poisson model, and a negative binomial
22337326, 2022, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.4218/etrij.2021-0123 by Sri Lanka National Access, Wiley Online Library on [14/03/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
626 GANGRADE ET AL.
model) to identify spatiotemporal differences between the network (CNN) with a gated recurrent unit (GRU) to han-
demand and availability of taxi services. Faghih and dle complex nonlinear spatiotemporal correlations. Shu
others [6] analyzed taxi demand in New York City using and others [13] proposed a hybrid model that integrated a
the demand for other modes of transportation as well as CNN and an LSTM to predict short-term taxi demand
weather conditions, and they presented a model for across different areas. Luo and others [4] proposed a multi-
predicting taxi demand by combining a linear-regression task deep-learning (MTDL) model using LSTM as a neural
model with a time-series model, which they termed linear unit to predict the need for taxis at the multisite level. The
regression with autoregressive moving-average errors. main goal of that paper was to improve the performance
The combined model helped to reduce the number of var- of the proposed model by using multiple hyperparameter-
iables, reducing the computational expense of the linear- optimization methods, including a random search, a grid
regression model and achieving better R2 values. Liu and search, and Bayesian optimization. Ye and others [14] pro-
others [7] identified hotspots and then predicted taxi posed a CoST-Net model to correlate spatial and temporal
demand in these hotspots using GPS data and environ- demand using a CNN and a heterogeneous LSTM, and
mental data based on three models: random forest (RF), they incorporated environmental features to predict multi-
ridge regression, and a combination forecasting model. ple demands simultaneously. Vanichrujee and others [15]
Markaou and others [8] pooled time-series data for taxi proposed an ensemble model based on the characteristics
records from New York City with textual data (compris- of LSTM, GRU, and XGBOOST to predict taxi demand in
ing event information for the city) extracted from the Bangkok City. They implemented the model in seven area
web by screen scraping using application programming functions, which denote prediction functions for each area,
interfaces (APIs), and they predicted taxi demand from and they confirmed the prediction results by mapping the
the combined data using linear-regression and Gaussian POIs with predicted demand in these areas. Liu and others
models. Antoniades and others [9] employed linear [16] proposed several models, combining information from
regression with model selection, the least absolute backpropagation from a neural network with extreme-
shrinkage and selection operator (LASSO), and RF to pre- gradient boosting to investigate the correlation between
dict taxi fares and trip durations using New York City online taxi-hailing demand and overall taxi demand. They
taxi-trip data. Safikhani and others [10] presented a gen- also introduced a data-driven forecasting approach to ana-
eralized version of the space-time autoregressive moving- lyze the real-time prediction of online taxi-hailing demand.
average model (which reduces the number of parameters Chen and others [17] predicted taxi demand at a finer spa-
compared with conventional time-series models), and tial level; that is, at the road-section level. To achieve this,
they utilized the autoregressive part of the vector auto- they devised a prediction network that considered the local
regressive (VAR) model to produce a generalized STAR and global relationships between road sections. They
model for forecasting the spatiotemporal variation of taxi established these two spatial relations using a graph CNN,
demand in New York City. They also introduced a pen- whereas they mapped the temporal characteristics using
alty function, which penalizes prediction parameters that an LSTM network. Quy and others [18] augmented an
are temporally or spatially distant. The proposed model, LSTM with demand knowledge from neighboring taxi
including the penalty function, outperformed conven- stands, along with historical taxi-demand counts, to fore-
tional models such as STAR and VAR. However, these cast the pickup demand for a given taxi stand. Guo [19]
methods considered only the temporal features of the proposed a hybrid model, combining a CNN with a bidi-
demand and did not focus on other potential factors, such rectional LSTM and the attention mechanism in order to
as the dependence on neighborhood demand. In addi- predict taxi demand. He termed this model a CNN–
tion, these approaches fail to capture the nonlinear BiLSTM–Attention model.
interdependence between spatial and temporal features. Some approaches used a two-level machine-learning
To address the aforementioned drawbacks, various framework to forecast taxi demand. Kim and others [1]
neural-network-based approaches have gained significant combined multivariate linear regression with an LSTM,
attention in recent years, as they consider both spatial and enabling it to assess a quota system aimed at balancing the
temporal features along with the nonlinear behavior of the volumes of demand for regular taxis and other for-hire
demand. For instance, Xu and others [11] divided New vehicles in New York City. Rodrigues and others [20] pres-
York City into small areas and predicted the demand in ented an analysis of spatiotemporal variations in short-term
each area by using a long short-term memory (LSTM) and taxi demand in Lisbon city and studied how they are
a recurrent neural network with an mixture density net- affected by weather conditions and POI. They selected a lin-
work layered on top of it. Liu et al. [12] proposed a con- ear statistical model (an autoregressive integrated moving
volutional recurrent network model for granulated taxi- average [ARIMA] model) and a machine-learning model
demand prediction that combined a convolutional neutral (an artificial neural network, or ANN) to forecast the taxi
22337326, 2022, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.4218/etrij.2021-0123 by Sri Lanka National Access, Wiley Online Library on [14/03/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
GANGRADE ET AL. 627
demand. Liu and others [21] utilized POI and GPS- geohash, enclosing an area of 0.72 km2. They first achieved
trajectory information to model the spatial variation of taxi clustering by analyzing the correlation between the nearest
demand in Qingdao City using a geographically weighted geohashes, and they expressed the demand for every geo-
regression model. They also studied how taxi demand is hash as a fraction of its total cluster demand to form per-
influenced by factors like socioeconomic, traffic, and land- centage time-series data. They multiplied the prediction of
use data. Zhou and others [22] proposed a method called the percentage time-series data by the predicted demand for
ST-Vec in which they predicted taxi demand at vital desti- the whole cluster to obtain the final prediction.
nations for a given region of New York City. The ST-Vec
method maps regions with dense, low-dimensional vectors
such that the vectors of more-likely destination regions will 3 | DATASET DESCRIPTION
be nearer, and hence, the spatiotemporal relationships
among zones can be obtained from the similarities between We utilized three different data sources for our study:
these vectors. Hu and others [5] initially studied the spatio- Chicago taxi-trip records, demographic data, and POI data.
temporal distribution of job–housing–travel and the travel-
ing characteristics of inhabitants. On this basis, they 1. Chicago taxi-trip records: We collected this dataset
introduced a metric system for evaluating jobs–housing– from the official data portal managed by the Depart-
taxi demand and a regional development-level index. Next, ment of Business Affairs & Consumer Protection of
they constructed a coupling coordination degree model that the city of Chicago.2 It contains information about taxi
makes use of the entropy-weight method to examine the flow between different community areas, consisting of
coupling relationship between regional taxi demand and around 195 million rows of pickup and dropoff loca-
socioeconomic development. Faial and others [23] pres- tions every 15 min for the past 4 years (2016–2019).
ented a data-stream mining framework to predict the taxi The average number of records per year is around
demand by adopting an approach to handle continuous 49 million.
data using batch and stream machine-learning algorithms. 2. Sociodemographic data: We collected this dataset
Moreover, Davis and others [24] approached the prediction from the U.S. Census Bureau, which collects data
of taxi demand as a clustering problem, and they proposed once every 10 years. We used the census data for the
a multilevel clustering method to model taxi-demand den-
sity at various locations in Bengaluru city. Each location 2
https://data.cityofchicago.org/Transportation/Taxi-Trips/wrvz-
was addressed by six alphanumeric characters called psew#column-menu.
22337326, 2022, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.4218/etrij.2021-0123 by Sri Lanka National Access, Wiley Online Library on [14/03/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
628 GANGRADE ET AL.
4 | P R O P O S ED W O R K
In Chicago City, there are 77 community areas (shown in FIGURE 1 Map of Chicago illustrating community areas
Figure 1). The taxi demand in these areas is heteroge-
neous; that is, the demand is considerably high in some
areas while it is quite low in others. This fluctuation in These models can be used to study the demand pattern
demand requires prompt predictions of the spatial char- further, including both the temporal features and spatial
acteristics of taxi demand to identify hotspots and thus characteristics, eventually identifying the dynamic pat-
redirect taxis into areas where the demand is high. tern of the hotspots. A detailed analysis of each model is
Another problem is the dynamic nature of taxi demand discussed in Section 6
in the temporal dimension. For example, demand may be For a given community area, the taxi demand may
high during working hours but comparatively low during also be affected by the distance of the dropoff destination
the early morning and late at night (see Figure 2A). In from the pickup location. If the destination is distant, a
addition, demand may vary between weekdays and week- passenger may prefer to travel by another convenient
ends and across different months of the year (see mode of transport, like the metro or buses. Thus, neigh-
Figure 2B,C). These figures illustrate the problem of borhood proximity may play an essential role in deter-
predicting the temporal characteristics of taxi demand. mining the demand in a given community area. To
To address this spatiotemporal heterogeneity, we there- incorporate the concept of neighborhood influence, we
fore consider spatial resolution at the level of the 77 com- propose a proximity-based prediction model, which is dis-
munity areas. Furthermore, the prediction for each area cussed in Section 6.2
is done on an hourly basis and then aggregated monthly, It is also possible that the taxi-demand pattern may
which defines the temporal resolution of the demand pre- be influenced by socioeconomic factors such as income,
diction (see Figure 2B). We consider monthly predictions, household status, employment status, and so forth. For
because a yearly aggregation excludes seasonality effects; instance, areas with high per-capita income may prefer
that is, the variation of demand in different seasons. Also, taxis as a primary mode of public transportation more
to capture the seasonality effect due to differences in than areas with lower per-capita income. Conversely,
demand between weekdays and weekends, we analyzed areas where the average household status is poor may
those demand patterns separately. We expect this spatio- prefer cheaper modes, such as buses or the metro. To
temporal resolution of the prediction results to aid in address such socioeconomic interdependence, we pro-
identifying hotspots. pose a sociodemographic-based prediction model, which
Because the demand varies in both space and time, it is discussed in Section 6.3
induces dynamism in the pattern of hotspots; that is, hot- Another consideration is the impact of taxi demand
spots vary from month to month and year to year. In this associated with frequently visited venues. The flow of
paper, we propose various dynamic demand-prediction taxis may be greater toward areas where there are more
models based on applications of various statistical and POI locations. If a given community area has a relatively
ensemble regression models to forecast the taxi demand. more-extensive collection of educational institutes, medi-
cal facilities, or recreational centers, the demand may be
3
https://developer.foursquare.com/. very large in this particular area. Therefore, we have
22337326, 2022, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.4218/etrij.2021-0123 by Sri Lanka National Access, Wiley Online Library on [14/03/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
GANGRADE ET AL. 629
5 | DA T A P R E P R O C E SSI NG
the sociodemographic data obtained were already in the sociodemographic parameters as independent variables
desired format, we performed no transformation opera- for the target area—the demand can be predicted more
tions on them. effectively. We elucidate this model in Section 6.3. We
term the third model a POI-based model. Its set of inde-
pendent variables is determined by the number of POI
venues in the CAs. This model also can be used to identify
hotspots by mapping the POI locations in CAs together
with the average daily demand in those areas. We discuss
this model in Section 6.4. We term the fourth model the
combined covariates model (CCM); in it, the independent
variables are selected by taking into account the charac-
teristics of all the previous models. We discuss this
combined-framework model in Section 6.5. In a broader
sense, the algorithm for a taxi-demand forecasting model
works by taking each community area once as a target
variable and applying the models mentioned above to
determine the vector of independent variables for it. We
ultimately predict the taxi-demand count in a target com-
munity area by employing various state-of-the-art statisti-
cal and ensemble regression models as base regressors.
where θ is a vector of coefficients and ϵ represents the random subset of input features (the second layer of
error term. randomness).
2. LASSO: The main objective of LASSO is to estimate 3. Bagging method: The bagging method chooses ran-
sparse coefficients by performing variable selection dom subsets from the training set of data and builds
and regularization: instances of black-box estimators on these subsets.
These individual subsets are then aggregated to form
X
N X
k
min ϵ2n þ λ jθi j, ð3Þ the final prediction set.
θ 4. Stacking: Stacking determines the best combination
i¼1 i¼1
of predicted outputs from two or more base machine-
where θ is a vector of coefficients, ϵ represents the learning algorithms by using a meta-learning
error term, and λ represents the regularization algorithm. It stacks the outputs determined by all
parameter. individual estimators, and this final output is then
3. LARS: LARS adds a penalty to the loss function dur- used as input for the final estimation.
ing the training phase itself; thus, it does not require 5. Voting: A voting ensemble determines the final
any hyperparameters, and it is therefore an efficient prediction on the basis of the average of individual
way of fitting a LASSO model. predictions calculated from various regression models.
4. Bayesian ridge (BR): The aim of BR is to find the pos-
terior distribution of model parameters. The mathe-
matical expression on which BR works is given by a 6.2 | NPBM
Gaussian formula, which can be written as
As discussed in Section 1, it is necessary to consider
pðωjλÞ ¼ Nðωj0, λ1 Ip Þ, ð4Þ neighborhood influence in forecasting the taxi demand
for a given community area. To achieve this goal, we
where ω is the weight vector and λ is the shape propose a neighborhood-proximity-based regression
parameter for a Gamma distribution. model in which the selection of independent variables
corresponding to a target variable are selected directly
5. Ridge: In ridge-regression algorithms, an additional based on its immediate neighbors (see Algorithm 3). Let
term is added to the OLS equation to minimize the t_CA be the target variable representing the community
penalized residual sum of squares: area for which the demand is to be predicted temporally.
Then, i_CA will be a vector of independent variables con-
X
N X
k
min ϵ2n þ λ θ2i , ð5Þ sisting of community areas that share boundaries with
θ t_CA. We plotted the heatmap shown in Figure 3 to find
i¼1 i¼1
the correlation of a given community area with other
where θ is a vector of coefficients, ϵ represents the areas. This plot shows that the areas located in the imme-
error term, and λ represents the regularization diate neighborhood of a given community area are more
parameter. correlated with it than those far away, which strengthens
the use of such a model for selecting the independent
variables.
6.1.2 | Ensemble models
TABLE 2 Table showing R2 values for the NPBM for the year 2019
Linear Ensemble
F I G U R E 5 Polygonal density maps representing hotspots for the indicated sociodemographic parameters: (A) Hardship index,
(B) Percentage of unemployed people (above 16 years age), (C) Dependency (people over 64 years and under 18 years age), (D) Percentage of
community areas below the poverty line, (E) Percentage of crowded housing units, (F) Percentage of people without high-school diploma
(above 25 years age), and (G) Per-capita income
6.5 | CCM
TABLE 4 Table showing R2 values for SIBM for the year 2019
Linear Ensemble
TABLE 6 Table showing R2 values for the CCM for the year 2019
Linear Ensemble
FIGURE 9 Graph comparing the average R2 values for (A) all linear models and (B) all ensemble models
The predictions for the year 2019 are made on the We also analyzed the predictions on the basis of
basis of taxi-demand data for the preceding three MAE scores for the years 2018 and 2019. The results
consecutive years. obtained are summarized in box plots (see Figure 10),
22337326, 2022, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.4218/etrij.2021-0123 by Sri Lanka National Access, Wiley Online Library on [14/03/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
638 GANGRADE ET AL.
F I G U R E 1 0 Box plot comparing MAE values for linear and ensemble regressors in (A) the NPBM for the year 2018, (B) the NPBM for the
year 2019, (C) the SIBM for the year 2018, (D) the SIBM for the year 2019, (E) the CCM for the year 2018, and (F) the CCM for the year 2019
FIGURE 11 Graph comparing the average R2 values for (A) the indicated linear models and (B) the indicated ensemble models
which show that the upper bound and interquartile range outliers for the linear models, and they vary sporadically,
corresponding to ensemble models are smaller than they which indicates that the predictions from the linear
are for the linear models. Also, there are significantly more models vary more than they do for the ensemble models.
22337326, 2022, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.4218/etrij.2021-0123 by Sri Lanka National Access, Wiley Online Library on [14/03/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
GANGRADE ET AL. 639
Moreover, comparing the NPBM and SIBM with the CCM ORCID
shows that the latter (CCM) outperforms the others on the Gaurav Hajela https://orcid.org/0000-0002-9835-205X
basis of the MAE score. Thus, the observation derived
from the R2 score is corroborated by the MAE score. RE FER EN CES
In addition, we tested the proposed models by per- 1. T. Kim, S. Sharda, X. Zhou, and R. M. Pendyala, A stepwise
forming sensitivity analyses of the R2 score in three dif- interpretable machine learning framework using linear regres-
ferent scenarios. In the first scenario, the base regressors sion (LR) and long short-term memory (LSTM): City-wide
are compared using the temporal variations of each of demand-side prediction of yellow taxi and for-hire vehicle (FHV)
service, Transp. Res. C: Emerg. Technol. 120 (2020), 102786.
the proposed models (see Figures 4, 6, and 8). Here, the
2. Q. Liu, C. Ding, and P. Chen, A panel analysis of the effect of
ensemble regressors performed better than did the linear the urban environment on the spatiotemporal pattern of taxi
regressors. In the second scenario, we compared the pro- demand, Travel Behav. Soc. 18 (2020), 29–36.
posed models for weekdays and weekends separately, 3. C. Yang and E. J. Gonzales, Modeling taxi demand and supply
keeping the temporal features constant for both the lin- in New York City using large-scale taxi GPS data, Seeing cities
ear and ensemble models. This shows that the CCM per- through big data: Research, methods and applications in urban
forms better than either the NPBM or the SIBM (see informatics, Springer International Publishing, Cham, 2017,
pp. 405–425.
Figure 9). In the third scenario, we compared the individ-
4. H. Luo, J. Cai, K. Zhang, R. Xie, and L. Zheng, A multi-task
ual base regressors in the ensemble and linear models, deep learning model for short-term taxi demand forecasting con-
keeping the other parameters constant (see Figure 11). sidering spatiotemporal dependences, J. Transp. Eng. (English
This shows that ET performs best among the ensemble Edition) 8 (2021), no. 1, 83–94.
regressors, whereas ridge regression performs best among 5. B. Hu, S. Zhang, Y. Ding, M. Zhang, X. Dong, and H. Sun,
the linear regressors. Research on the coupling degree of regional taxi demand and
social development from the perspective of job-housing travels,
Phys. A: Stat. Mech. Appl. 564 (2021), 125493.
6. S. Faghih, A. Shah, Z. Wang, A. Safikhani, and C. Kamga, Taxi
8 | C ON C L U S I ON S and mobility: Modeling taxi demand using ARMA and linear
regression, Procedia Comput. Sci. 177 (2020), 186–195.
Taxi-demand forecasting is a challenging task, as taxi 7. Z. Liu, H. Chen, Y. Li, and Q. Zhang, Taxi demand prediction
demands have variable spatiotemporal patterns. In this based on a combination forecasting model in hotspots, J. Adv.
work, we performed a meticulous spatiotemporal analysis Transp. 2020 (2020), 13.
8. I. Markou, F. Rodrigues, and F. C. Pereira, Multi-step ahead
of taxi demands using Chicago data. We found that
prediction of taxi demand using time-series and textual data,
approaches that consider only a single covariate at a time Transportation Research Procedia 41 (2019), 540–544.
(i.e., the NPBM and SIBM) did not convey enough infor- 9. C. Antoniades, D. Fadavi, and A. F. Amon, Fare and duration
mation about the flow of taxis. Consequently, they can prediction: A study of New York City taxi rides, 2016.
lead to inadequate forecasting of demand. In contrast, we 10. A. Safikhani, C. Kamga, S. Mudigonda, S. S. Faghih, and B.
found that combining the individual covariates improves Moghimi, Spatio-temporal modeling of yellow taxi demands in
the performance drastically. We also incorporated New York City using generalized STAR models, Int.
J. Forecasting 36 (2020), no. 3, 1138–1148.
dynamic POI data in this CCM to make it more robust.
11. J. Xu, R. Rahmatizadeh, L. Boloni, and D. Turgut, Real-time
The robustness of the CCM relative to the NPBM and prediction of taxi demand using recurrent neural networks,
SIBM is demonstrated by the experimental results, in IEEE Trans. Intell. Transp. Syst. 19 (2018), no. 8, 2572–2581.
which we tested all the models against the 2018 and 2019 12. T. Liu, W. Wu, Y. Zhu, and W. Tong, Predicting taxi demands
datasets using the R2 and MAE scores as criteria to com- via an attention-based convolutional recurrent neural network,
pare performances. We note that the performance of taxi- Knowl. Based Syst. 206 (2020), 106294.
13. P. Shu, Y. Sun, Y. Zhao, and G. Xu, Spatial-temporal taxi
demand forecasting depends upon the availability of his-
demand prediction using LSTM-CNN, (IEEE 16th International
torical data for taxi demand. If ambient and residential Conference on Automation Science and Engineering, Hong
populations are low in some areas, it will be challenging Kong, China), Aug. 2020, pp. 1226–1230.
to predict taxi demand accurately there. Demographics 14. J. Ye, L. Sun, B. Du, Y. Fu, X. Tong, and H. Xiong,
and POI data are other factors that can influence taxi Co-prediction of multiple transportation demands based on
demand, but they are useful only when combined with deep spatio-temporal neural network, (Proceedings of the 25th
historical taxi-demand data. In future, we plan to extend ACM SIGKDD International Conference on Knowledge Dis-
covery & Data Mining, Association for Computing Machinery,
this work to predict hotspots and analyze the accuracy
Anchorage, AK, USA), 2019, pp. 305–313.
and coverage of the model. 15. U. Vanichrujee, T. Horanont, W. Pattara-atikom, T.
Theeramunkong, and T. Shinozaki, Taxi demand prediction
CONFLICT OF INTEREST using ensemble model based on RNNs and xgboost, (Interna-
The authors declare no potential conflict of interests. tional Conference on Embedded Systems and Intelligent
22337326, 2022, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.4218/etrij.2021-0123 by Sri Lanka National Access, Wiley Online Library on [14/03/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
640 GANGRADE ET AL.