Transportation Research Part A: Sciencedirect

Transportation Research Part A 136 (2020) 262–281
Contents lists available at ScienceDirect
Transportation Research Part A

journal homepage: www.elsevier.com/locate/tra
Predicting the use frequency of ride-sourcing by off-campus

T
university students through random forest and Bayesian network
techniques
⁎
Mahdi Aghaabbasia, , Zohreh Asadi Shekaria, Muhammad Zaly Shaha,
⁎
Oloruntobi Olakunleb, Danial Jahed Armaghanic, , Mehdi Moeinaddinid
a
Centre for Innovative Planning and Development, Department of Urban and Regional Planning, Faculty of Built Environment and Surveying, Universiti
Teknologi Malaysia, 81310 Skudai, Malaysia
b
Department of Urban and Regional Planning, Faculty of Built Environment and Surveying, Universiti Teknologi Malaysia, 81310 Skudai, Malaysia
c
Institute of Research and Development, Duy Tan University, Da Nang 550000, Viet Nam
d
Local Environment & Management Analysis (LEMA), Urban and Environmental Engineering (UEE), University of Liège, Allée de la Découverte,
Quartier Polytech, 4000 Liège, Belgium
A R T IC LE I N F O ABS TRA CT
Keywords: This study used a survey technique to investigate factors that motivate the adoption and the
Random Forest usage frequency of ride-sourcing among students in a Malaysia public university. Two of the most
Bayesian Network broadly used machine learning techniques, Random Forest technique and Bayesian network
Ride-sourcing use frequency analysis were applied in this study. Random Forest was employed to establish the relationship
Off-campus university students
between ride-sourcing usage frequency and students' socio-demographic related factors, built
environment considerations, and attitudes towards ride-sourcing specific factors. Random Forest
identified 10 most important factors influencing university students’ use of ride-sourcing for
different travel purposes, including study-related, shopping, and leisure travel. These important
predictors were found to be indicators of the target variables (i.e., ride-sourcing usage frequency)
in Bayesian network analysis. Bayesian network analysis identified the students' age (0.15),
safety perception (0.32), and neighbourhood facilities in a walkable distance (0.21) as the most
important predictors of the use of ride-sourcing among students to get to school, shopping, and
leisure, respectively.
1. Introduction
The concept of ride-sourcing is a new mobility option which influence passenger’s travel behaviour, and the use of this travel
mode choice has increased noticeably in recent years. Ride-sourcing services also known as Transportation Network Companies
(TNCs) use the smartphones and mobile applications to match the travel supply and demand, and dynamically connects the drivers
and passengers (Cohen and Shaheen, 2016; Hughes and MacKenzie, 2016; Rayle et al., 2016; Shaheen et al., 2017b). According to
Shaheen and Chan (2016), ride-sourcing emerged from “on-demand ride services” and “shared mobility. Several successful TNCs are
available worldwide such as Grab and MyCar in Malaysia, Didi Express in China, as well as Uber and Lyft in the US. Jin et al. (2018)
indicated that ride-sourcing has several positive impacts on urban transport efficiency in several ways: (1) ride-sourcing improve the
accessibility to the poor neighbourhoods with insufficient taxicabs, (2) ride-sourcing complement the public transport services during
⁎
Corresponding authors.
E-mail addresses: amahdi@utm.my, aghaabbasi.mahdi@gmail.com (M. Aghaabbasi), danialjahedarmaghani@duytan.edu.vn (D.J. Armaghani).
https://doi.org/10.1016/j.tra.2020.04.013
Received 17 August 2019; Received in revised form 6 February 2020; Accepted 14 April 2020
0965-8564/ © 2020 Elsevier Ltd. All rights reserved.
M. Aghaabbasi, et al. Transportation Research Part A 136 (2020) 262–281
weekends and late-night hours, (3) ride-sourcing improve the trust between passengers and drivers. While the immense benefits of
ride-sourcing attracted the attention of researchers in the field of transport planning, urban planning, and economic sectors that
conducted studies on this travel mode choice (Arteaga-Sánchez et al., 2018; Cohen and Shaheen, 2016; Jiao, 2018; Kima et al., 2018;
Wang and Mu, 2018), the body of literature on this mode of travel is in its early development stages. Besides, a very limited number of
studies investigated the effects of passengers attributes on the use of this mode (Rayle et al., 2016; Zhanga et al., 2016). These studies
only considered the general population to determine why people use ride-sourcing and how this travel mode has affected public
transport and taxicab usage. Furthermore, existing studies on the ride-sourcing usage were mostly conducted in the China and US,
and to the best of authors’ knowledge, there are no scientific studies on ride-sourcing available in the Southeast Asia cities with
deficient transport systems.
Although, a considerable amount of literature exists on travel behaviours of general population towards the use of ride-sourcing;
however, only fewer studies on university students’ travel patterns are available. Khattak et al. (2011) explained that while university
students make up a substantial proportion of a region’s population, their travel patterns had been underrepresented or not well-
understood. The existing studies on the travel behaviour of university students suggested that these students have different calendars,
class schedules, lifestyle, and status of socioeconomic (Garikapati et al., 2016; Namgung and Akar, 2015). These differences prompted
different travel behaviour of university students in comparison to the general population (Garikapati et al., 2016; Khattak et al., 2011;
Namgung and Akar, 2015; Zhou, 2012). Therefore, investigating the travel behaviour of university students reveals valuable in-
formation covering all aspects of university student’s travel behaviour, particularly their travel mode choices. It is somewhat in-
structive to understand the students’ perceptions with regards to their travel mode preferences and their motivations for choosing a
certain transport mode. The urban and transport planners, transport companies, and the city’s transport managers can use this
information to develop transport policies that accommodate the travel needs of all population groups. The existing body of literature
seldom focuses on the travel behaviour of university students from the perspective of ride-sourcing as a newly-emerged mode.
Several universities across the world provide accommodations for students on campus to ensure equal and satisfactory education
opportunities. However, many students prefer to reside off-campus since it offer them more flexibility and opportunity to choose their
preferred roommates, freer lifestyle decisions, and in some cases, cheaper than on-campus living. The off-campus students often faced
with several mobility challenges with regard to trips to/from campus and non-study related trips. For example, the off-campus
student’s experience a longer travel time to/from campus compared to on-campus students, the elongated travel time alternatively
can be used efficiently for studies and building networks, as well as creating social relationships. Besides, the off-campus students also
encounter challenges with regards to finding appropriate travel options (if the private cars are not available) to attend classes
scheduled for early mornings, late evenings, or weekends. Till this present time and consistent with (Mbara and Celliers, 2013), only
limited number of studies independently assessed the travel behaviours of universities off-campus students and investigated the
transport-related challenges faced by these off-campus students.
Several transport studies successfully analysed the predictors of the university students’ preferred mode choices using different
methods such as the binary logit model, mixed logit model, and multinomial logit model (Akar et al., 2012; Namgung and Akar, 2015;
Nguyen-Phuoc et al., 2018; Proulx et al., 2014; Rotaris and Danielis, 2014; Rotaris et al., 2019; Rybarczyk and Gallagher, 2014;
Tezcan, 2016; Wang et al., 2013; Zhou, 2012, 2016). However, it is challenging to use regression models to examine the predictors of
the mode choice and travel frequency due to the bulk of complicated data on mode choices. Regression models have several and
robust statistical assumptions, including linearity in modelling the relationship and without outliers (Rashidi et al., 2014; Stylianou
et al., 2019), which are bare to be valid for mode choice data. Furthermore, using cross-product terms to detect the predictors is a
daunting task since interaction could occur in complex forms (Yan et al., 2010). Besides, these models often lack the potency to
effectively handle varying discrete variables and those variables with many categories (Karlaftis and Golias, 2002; Washington et al.,
1997; Washington and Wolf, 1997; Yan et al., 2010).
Machine learning technique signifies an algorithm that possesses pre-processing, feature selection and extraction and classifi-
cation processes (Acharya et al., 2019). Machine learning techniques such as Random Forest (RF) technique and Bayesian network
(BN) analysis have the following advantages: (1) they handle outliers; (2) assumption-free on variable distributions and possessed
priori probabilistic knowledge about the students’ mode choice in which modes’ usage frequency is not required; (3) several discrete
variables or variables with large categories are more properly handled than regression models; and (4) they skilfully extract in-
formation from large amounts of data (Ahmad et al., 2018; Breiman et al., 1984; Friedman et al., 1997; Khatami et al., 2017; Utkin
et al., 2019). Random Forest technique and Bayesian network analysis have been successfully applied in many transport-related
studies for estimating the likelihood of secondary crashes and monitoring safety and real-time traffic operations (Prati et al., 2017;
Shi and Abdel-Aty, 2015). However, to the authors best of knowledge, no study has used both RF technique and BN analysis in the
study of the students’ travel behaviours and their preferred mode choices.
Mode choices and transport mode use frequency datasets typically comprise many parameters and each parameter may have
several categories. In addition, these datasets may be updated continuously when new information is available or required.
Furthermore, it is very customary that mode choice datasets lack complete data or have missing values. A Bayesian Network model
can successfully handle such multivariate and under-sampling data and deal with incomplete, inaccurate or uncertain knowledge or
information (Liu et al., 2018; Susanti and Azizah, 2017; Wu et al., 2018). Bayesian network is also deemed suitable for learning
changeable behaviours (e.g. mode usage frequency) as it can efficiently adjust its network according to the data presented or entered
into it (Tareeq and Inamura, 2009). Considering the complexity of mode choice and mode usage frequency data, several past studies
acknowledged the suitability of Random Forest method for exploring complex and multivariate datasets (Gao et al., 2018; Kowshalya
and Nandhini, 2018; Strobl et al., 2009; Yadav and Ravi, 2018). Thus, a combination of these two techniques is perhaps suitable for
modelling mode choice and usage frequency data. In the subsequent sections, the authors demonstrated that the hybridization of
263
Random Forest and Bayesian Network techniques inclined to produce more accurate predictions over the linear and other hybrid
models.
This present study develops a prediction model to identify significant factors influencing the use of ride-sourcing by off-campus
university students. The paper explains the research as follows. In the next section, the literature on factors that influence the
university student’s mode choice, as well as ride-sourcing were presented. Literature section was followed by the study aims,
modelling methodology, case study, discussion of research results, and limitations. This paper concludes its findings with some
recommendations for the ride-sourcing companies.
2. Literature
2.1. Factors influencing university students’ mode choices and the general population
Several existing studies have identified factors influencing the individuals’ travel mode choices. According to Zhou (2012, 2014),
sociodemographic, trip characteristics and purposes, mode-specific dynamics, and built environment factors are the most frequently
used factors to investigate the mode choice of the general population. Related studies such as (Bernetti et al., 2008; Cheng et al.,
2016; Kim and Ulfarsson, 2004; Lind et al., 2015; Yang et al., 2010) pointed out that, sociodemographic of individuals, directly and
indirectly, influence their mode choices. Sociodemographic factors, such as age, race, education, gender, vehicle ownership, occu-
pation, and income are regarded as direct influential factors on travel mode choices. Additionally, the abovementioned factors
equally affect destination choices (Yang et al., 2010). An example of the influence of sociodemographic factors on destination choices
was reported by Yang et al. (2010), who identified that Chinese women were more interested to choose a workplace near their houses
compared to men, because they had to take care of members of family and they devoted much less time to work compared to men. In
other words, the destination itself defines the characteristics of activities and trips. Several characteristics, such as travel distance,
travel time, and travel cost influence the mode choices of individuals. Besides sociodemographic factors that define the trip char-
acteristics, built environment attributes also influence the trip characteristics.
Built environment attributes, including density and land use mixture, are considered as important factors on the travel distance
and time (Scheiner, 2010). An appropriate implementation of the abovementioned factors may reduce the need to travel and dis-
courage the use of private cars. These attributes are also associated with providing access to facilities in a certain distance radius. The
positive role of short distances within the neighbourhoods tend to increase the use of active modes, such as walking and cycling as
acknowledged in several studies (Du et al., 2019; Ermagun and Levinson, 2017; Kim and Ulfarsson, 2008; Loo and Siiba, 2019;
Manaugh and El-Geneidy, 2013). Besides the built environment and sociodemographic attributes, several studies have confirmed the
significant impact of commuters’ attitudes towards the flexibility, comfort, convenience, cost and safety of transport modes on their
preferred mode choices (Johansson et al., 2006; Kamargianni and Polydoropoulou, 2013; Kamruzzaman et al., 2015; Paulssen et al.,
2014; Stark and Hössinger, 2018). These attitudes directly influence individual mode choices.
On the question of factors influencing the students' travel behaviours, especially their mode choices and usage frequency, nu-
merous studies have been conducted across the world. These studies considered a wide range of sociodemographic, built environment
and trip-specific factors, as well as students’ attitudes towards different modes. The factors that were used by different authors to
investigate the students’ mode choices and usage frequencies are presented in Appendix A. While several studies investigated stu-
dents’ mode choices and frequencies of usage, a very limited number of studies investigated the mode choice and usage frequency of
students with respect to the ride-sourcing (Marten, 2015; Tarabay and Abou-Zeid, 2019; Yan et al., 2018). The possible impacts of
Fig. 1. Schematic diagram of different types of factors influencing university students’ transport mode choices and usage frequency.
264
different influential factors on mode usage frequencies are shown in Fig. 1.
2.2. Ride-sourcing
The body of literature on the ride-sourcing are still shallow due to the relatively recent emergence of this travel mode choice and
the majority of scientific studies that supported the transport modes were conducted in the US and China. The existing studies focused
on several issues related to ride-sourcing pricing (Jiao, 2018), legality (Flores and Rayle, 2017; Zha et al., 2016), user’s variability
(Rayle et al., 2016), vehicles automation (Shaheen and Cohen, 2018), impacts on public transport and conventional taxicabs usage
(Nie, 2017; Rayle et al., 2016), and ride-splitting behaviour (Chen et al., 2017; Chen et al., 2018), equity provision (Shaheen et al.,
2017a), as well as spatial variability of ride-sourcing waiting time (Hughes and MacKenzie, 2016). A recent review by Jin et al.
(2018) gave a comprehensive overview of the impacts of ride-sourcing on urban development, covering the positive, negative, and
uncertain impacts. Positive impacts of ride-sourcing include accessibility of ride-sourcing drivers to a poor neighbourhood with
insufficient taxicabs, the complementarity of ride-sourcing to public transport during weekends and late night hours. Ride-sourcing
also improves the trust between passengers and drivers. Negative impacts of ride-sourcing comprising its competition with public
transportation, insufficient driver training, lack of vehicle insurance, and lack of internet coverage in certain areas that make it
difficult to access the ride-sourcing. In addition, several questions regarding ride-sourcing remain unanswered including the impact of
ride-sourcing on congestion and its impact on energy consumption and emission.
Given the above, existing studies mostly focused on university students’ traditional travel patterns and modes and neglected the
recently emerged modes such as ride-sourcing. A very limited number of studies investigated the use of ride-sourcing by university
population (Marten, 2015; Tarabay and Abou-Zeid, 2019; Yan et al., 2018). Marten (2015) assessed the demand for Uber vs. Chicago
Transit Authority (CTA) public transportation system in Northwestern University using a multinomial logit model. This study found
that gender and age in-school students do not influence the demand for these modes. The study also found that cost of transportation,
duration of transportation, and past travel behaviour (money spent on transportation and more Uber trips) were important in de-
termining this demand. Tarabay and Abou-Zeid (2019) modelled the choice for switching from traditional transport modes to ride-
sourcing for social/recreational trips in the American University of Beirut, Lebanon using a hybrid choice model and discrete choice
model. They found that travel time, waiting time, and cost were the most influential on switching between ride-sourcing and the
current mode. Yan et al. (2018) investigated switching from a conventional bus system to an integrated system of ride-sourcing
services and bus lines in the University of Michigan using a mixed logit framework. They found that less waiting time and in-vehicle
time are the advantages of ride-sourcing compared to the bus system. Furthermore, they found that ride-sourcing can complement
public transit by enhancing last-mile transit access.
The recent studies and reports in the US showed a significant modal shift of the general population from public transport and
conventional taxicabs to the ride-sourcing (Fischer-Baum and Bialik, 2015; Rayle et al., 2016; San Francisco Municipal
Transportation Agency, 2014). Therefore, it is significant to the academia, practitioners and policymakers to analyse the travel
behaviour of university students with respect to ride-sourcing. The outcomes of such investigations can help to meaningfully compare
the travel behaviour of students and the general population regarding the ride-sourcing.
3. Study aims
Several ride-sourcing companies are operating and offering mobility option in Malaysia. These companies include Grab, MyCar,
Dacsee, JomRides, EzCab, Mula, Riding Pink (for female), PICKnGO, and Diffride (Diff). Currently, there is an E-Hailing Regulation in
Malaysia, which mandates the drivers to obtain a PSV license. Through this license, the drivers undergo background checks, training,
car inspections, and health check-ups (Grab, 2019). This license will perhaps improve the quality of ride-sourcing services in Ma-
laysia. Despite the regulation may be efficient, extensive academic studies are necessary to help the ride-sourcing companies to
improve their service quality for a wide range of commuters. While several academic studies on the ride-sourcing were conducted in
the US and China, there are no such studies in Malaysia. Thus, this study attempts to identify what factors of mode choice affect ride-
sourcing that are quite effective in increasing the use frequency of travel through the ride-sourcing. This study was conducted among
the university students who typically were from lower-income households and cannot afford a private vehicle. This study is expected
to help the ride-sourcing companies to improve their service quality not only for university students but for the general public.
4. Methodology
4.1. Data and survey
The present study conducted a survey among the students of the Universiti Teknologi Malaysia (UTM), the main campus in Skudai
between July and November 2018. The main campus of UTM is the second-largest public university campus in Malaysia. Nearly
24,000 undergraduate and post-graduate students studied at UTM at the time of data collection. While the campus is relatively close
to the Skudai city centre, the road between the campus and the town centre is not suitable for walking and cycling due to lack of
sidewalks and bicycle lanes. The road is suitable for cars and motorcycles. The city bus system also connects the campus to the town
centre; however, the service is very low and it is free for only the local students. The students can access other areas of the Skudai and
even the capital city of state (Johor Bahru) from the town centre using the city and inter/intra-city bus services or directly from the
campus using private vehicles and taxicabs. Within the campus, the students have free access to the university buses that operate
265
Fig. 2. Location of UTM campus and campus map;

Source: UTM (2018).
between residential colleges and school faculties/services. The students also access rental bicycles which help them to contribute to
more sustainable mobility. While the campus lacks separate bike lanes, it provides some storage lockers and bicycle parking.
Concerning the walking facilities, the sidewalk network in the UTM campus is poorly connected, incomplete, and narrow, which
indicates that the campus cannot efficiently encourage walking. The UTM campus map and its location within Skudai and Johor
Bahru are presented in Fig. 2.
To conduct this study, the research team initially identified the key locations expected to have a high concentration of ride-
sourcing customers. To this end, we have identified six potential locations based on the drivers' feedbacks and research team ob-
servations. These locations included (1) Perpustakaan Sultanah Zanariah (library 1), (2) Perpustakaan Raja Zarith Sofiah (library 2),
(3) Center point (parking and bus stops), (4) Arked Meranti UTM (food court), (5) UTM Health Centre, and (6) Dato Onn Jafar
College. The research team then used an intercept survey to collect data in these locations. So, the survey enumerators engaged
students who had used ride-sourcing during the last two weeks. The first author instructed the enumerators to intercept every three
students encountered in the locations identified. The research team asked the students if they had taken a ride-sourcing trip during
the past two weeks. If not, they were not eligible to complete the survey. If yes, the enumerators asked them to recall their travel
patterns. Eventually, out of the 652 participants approached to participate in the survey, only 358 of them completed the ques-
tionnaire. By considering the margin of error of 6% and a confidence level of 95%, this sample size properly represents UTM students’
population.
This present study used a questionnaire that included four main parts: (1) students’ attributes, (2) travel information, (3) ride
sourcing-related information, and (4) attitudes of students towards the ride-sourcing. The first part contained questions about stu-
dents' age, gender, race, education level, study mode, household income, and vehicle ownership. Furthermore, this part asked three
questions regarding the built environment where students live in. These questions were about the students' residential type and
neighbourhood type as well as neighbourhood facilities in a walkable distance from home. The second part contained 12 questions
about the students' daily travel time, distance, and cost, as well as usual travel mode for different trip purposes, including school/
work, shopping, and leisure. The research team asked the respondents to recall their average daily travel distance, time, and cost to
get to abovementioned destinations during the past two weeks before the survey. They then were asked to choose the appropriate
response for each item. The third part included five questions on the use of ride-sourcing by the students. The most important
question that was further used in our analysis as the target variable was the ride-sourcing use frequency. The target variable had
response levels, including infrequent (once a week), frequent (2–4 times per week), and regular (more than 5 times per week). The
fourth part contained six statements which assessed the overall perception of respondents towards each ride-sourcing features,
including time-effectiveness, cost-effectiveness, comfort, safety, and accessibility/availability of smartphone-application based ser-
vices, as well as complimentary use of these services to other transport alternatives. Table 1 provides detailed information on the
variables used in this study.
266
Table 1
Variables used in this study.
Variable Description Type Value
Sociodemographic
AGE Age Ordinal (1) 18–29; (2) 30–49; (3) 50–64; (4) 65 years or more
GEN Gender Nominal (1) male; (2) female
EDU Education Ordinal (1) primary; (2) secondary; (3) diploma; (4) bachelor’s degree; (5)
master’s degree; (6) doctorate degree
SMO Study mode Nominal (1) part-time; (2) full-time
INC Household Income Ordinal (1) less than RM 1000; (2) between RM 1000 and RM 2000; (3)
between RM 2000 and RM 3000; (4) between RM 3000 and RM 6000;
(5) between RM 6000 and RM 13,000: (6) more than RM 13,000
RAC Race Nominal (1) Malay; (2) Chinese; (3) Indian; (4) others
VOW Vehicle ownership Nominal (1) car; (2) motorcycle; (3) both car and motorcycle; (4) no
Built environment
RTY Residential type Nominal (1) Bungalow; (2) detached/semi-detached; (3) shop houses; (4) flat
(non-gated); (5) apartment (gated); (6) condominium (high rises)
NET Neighbourhood type Nominal (1) residential only; (2) residential with some commercial buildings;
(3) residential with some industrial facilities; (4) a commercial area
with some residential; (5) an industrial area with some residential; (6)
mixed residential and commercial
NFW Neighbourhood facilities in a walkable distance Nominal (1) schools; (2) childcare facilities; (3) public transport; (4) taxi
station; (5) shops; (6) banks; (7) healthcare facilities; (8) leisure
facilities; (9) parks and other open spaces; (10) place of worship; (11)
two of facilities; (12) three or more facilities
Trip characteristics
DTD (school/work; Daily travel distance Ordinal (1) ≤ 1 km; (2) 1 km < D ≤ 5 km; (3) 5 km < D ≤ 10 km; (4)
leisure, shopping) D > 10 km
DTT (school/work; Daily travel time (1) ≤ 10 min; (2) 10 min < T ≤ 15 min; (3) 15 min < T ≤ 30 min;
leisure, shopping) (4) T > 30 min
DTC (school/work; Daily travel cost Ordinal (1) C ≤ RM 5; (2) RM 5 < C ≤ RM 15; (3) RM 15 < C ≤ RM 25;
leisure, shopping) (4) D > RM 25
Attitudinal factors
TEP Overall, using application-based taxi services Ordinal (1) Strongly disagree; (2) disagree; (3) neutral; (4) agree; (5) strongly
reduces the total travel time agree
(wait + in-vehicle).
CEP Overall, using application-based taxi services is Ordinal (1) Strongly disagree; (2) disagree; (3) neutral; (4) agree; (5) strongly
cost-effective. agree
COP Overall, using application-based taxi services is Ordinal (1) Strongly disagree; (2) disagree; (3) neutral; (4) agree; (5) strongly
convenient. agree
SCP Overall, using application-based taxi services is Ordinal (1) Strongly disagree; (2) disagree; (3) neutral; (4) agree; (5) strongly
safe. agree
AAP Overall, it is easy to find an application-based Ordinal (1) Strongly disagree; (2) disagree; (3) neutral; (4) agree; (5) strongly
taxi service anytime and anywhere. agree
CUP Overall, it is easy to complement the Ordinal (1) Strongly disagree; (2) disagree; (3) neutral; (4) agree; (5) strongly
application-based taxi services to other agree
transport alternatives.
FR* Ride-sourcing use frequency Ordinal (1) once a week; (2) 2–4 times per week; (3) more than 5 times per
week
* Target variable.
4.2. Statistical analysis
The present study analysed the ride-sourcing use frequency data of UTM students in 2018 using Random Forest (RF) technique
and Bayesian Network (BN) analysis. The data were divided into training data (70%), test dataset (20%), and validation dataset
(10%) to avoid overfitting. The training dataset estimates models' parameters and builds models, tacitly tests the models to determine
models' ability to generalize and for their applicability to independent data. Validation dataset presents an unbiased models' fitness
evaluation of the training dataset while harmonising models' hyperparameters. This present study employed RF and BN techniques
using IBM SPSS Modeller version 18. Breiman (2001) developed the RF model which is considered as a non-parametric statistical
method and is based on decision trees. The RF aggregates many binary decision trees resulting from two random perturbation
mechanisms, such as the random choice of a subset of explanatory variables at each node and the use of bootstrap samples. In this
study, we employed RF because a single tree structure (or even the final tree structure) may not unveil the importance ranking of
variables, and they could be entirely masked by other associated inputs (Harb et al., 2009). More recently, several transport studies
are increasingly employing the RF approach to select the important input variables before applying other statistical models (Harb
et al., 2009; Jahangiri et al., 2016; Kitali et al., 2018; Siddiqui et al., 2012; Zhu et al., 2018).
Since the BN works best with a small set of input variables, the RF was used to reduce the set of predictors. The Bayesian
267
probability theory supports BN analysis. Bayesian probability uses joint distributions and prior distributions of each variable to
estimate a posterior distribution for variables of interest. The present study used the tree augmented Naïve Bayesian since it effec-
tively models interactions and allows each predictor to rely on another predictor.
4.3. Models’ assessment
The proposed RF-BN models were assessed in several ways. First, traditional and parametric modelling techniques such as ordinal
logistic regression models that are usual modelling techniques for ordinal data will be developed. These ordinal logistic regression
models can be used as the benchmarks to assess the proposed RF-BN models. However, these types of models have different as-
sumptions and pre-defined functions and if these assumptions are violated, the model power can be affected negatively (Chang and
Wang, 2006).
Concerning the hybrid modelling approach, both Random Forest and Feature Selection techniques are frequently used for input
selection in hybrid models. The aforementioned comparison help to identify which of these models produce more accurate results
when they are hybridized with the BN model to predict the ride-sourcing usage frequency. Feature Selection (FS) is one of the most
frequently used techniques in machine learning for input selection (Liu, 2010). This technique aims to reduce the dimensionality of
data and remove irrelevant inputs. The FS improves the predictive accuracy of machine learning in several ways, by enhancing the
efficiency of learning and effectiveness of data collection (Sammut and Webb, 2011). The FS identifies the variables quality using the
correlation between input variables and the target variable and selects those variables with the highest correlations. Typically, the FS
ranks the input variables according to the intrinsic properties of the data and chose top k variables according to thresholds.
Second, the RF-BN models were tested for robustness through a sensitivity analysis. Finally, the computational cost of the pro-
posed models was assessed through a twofold comparison between the RF-BN models and other models developed by sole BN method
with respect to: (1) the time required to run these models and (2) the achieved accuracy of these two types of models.
5. Results
The present study creates a dataset that comprised travel information of 358 university students, from which we used the 251
samples for training, 72 for testing, and 36 for validation. The dataset contains only the participants who have used ride-sourcing
within two weeks prior to the survey. It is simplistic to note that the target variable of this study is the ride-sourcing usage frequency.
On the ride-sourcing travel frequencies, 45.3% of students infrequently used ride-sourcing, 42.5% frequently used ride-sourcing, and
12.3% regularly used the ride-sourcing.
Students across all age groups are symmetrically represented in the participants' sociodemographic profile. However, the po-
pulation of students in the age group of 50–64 are lesser than the number of students in other age groups. It can be easily construed
that the older students may own personal cars or motorcycles, and are somehow not available in the survey locations during the data
collection. Besides, Doctorate students were underrepresented in this survey. The singular justification for their misrepresentation is
that the majority of these students owned their private cars. Additionally, Doctorate students do not regularly attend classes at the
university campus. Finally, many of the Doctorate students prefer to study in places other than the selected locations where the
survey are conducted (e.g. Postgraduate Workspaces at different faculties). Overall, the magnitude and diversity of this survey may
allow the authors to extrapolate the survey trends to the other university students. The sociodemographic profile of the participants is
presented in Appendix B.
5.1. Variable selection through the Random Forest (RF) technique
As indicated earlier, Random Forest is an ensemble of unpruned decision trees (CART decision trees) that uses bagging and
bootstrap techniques to predict the target variable and identify the importance of variables as well as produce the associate rules.
Using RF, the overall classification accuracy of the training sets of school/work, shopping, and leisure were 95.51%, 98.37%, and
96.73%, respectively. Fig. 3 shows the relative importance of the selected input variables using the RF for the trip purposes of school/
work, leisure, and shopping. The RF models selected the 10 most important variables for each trip purpose. For school/work purpose,
the most important factors were students’ household income and neighbourhood facilities in a walkable distance (0.13). Neigh-
bourhood type was the most important variable for leisure trips (0.16). For shopping trip purpose, usual travel mode for shopping
purpose had the highest importance score (0.17).
5.2. Modelling the ride-sourcing use frequency through the Bayesian network analysis
The ten predictors of the use of ride-sourcing that were selected using the RF algorithm (Fig. 3) for each travel purpose (a total of
30 predictors) were included as predictors of the target (i.e., ride-sourcing usage frequency) in Bayesian network analysis. The
accuracies of the BN models of school/work, leisure, and shopping are 71.43%, 74.69%, and 85.31%, respectively (for the training
set), which are desirable values. Generally, a greater number of training samples increases the performance of machine learning
techniques. In this present study, 358 samples were used to run the RF and RF-BN models. The accuracies achieved showed that this
number of samples can produce models with quite acceptable accuracies. A BN is a technique that graphically models the prob-
abilities and indicates variables (known as nodes) in an acyclic network. The acyclic network denotes the conditional or probabilistic
relationships among nodes explained by the links within the network (also referred to as arcs). Technically, a BN model includes an
268
0.18
0.16
0.14
0.12
IMPORTANCE
0.1
0.08
0.06
0.04
0.02
0
AGE CEP COP DTD DTT EDU GEN INC NET NFW RAC RTY SCP TEP UTM
WORKSCHOOL 0.08 0.1 0.08 0.08 0.13 0.09 0.13 0.09 0.1 0.11
LEISURE 0.09 0.07 0.09 0.16 0.14 0.07 0 0.07 0.08 0.13
SHOPPING 0.11 0.08 0.07 0.06 0.13 0.1 0.1 0.11 0.08 0.17
Fig. 3. Normalised score of predictors’ importance for getting to work/school, shop, and leisure by ride-sourcing.
acyclic graph that is directed by nodes and several edges along with a table that shows the conditional probability for each node and
values of their parent nodes. The resulting network graph is displayed in Fig. 4. The graph indicates the association between the
target variable and its predictors. Three BN models were developed for each students’ travel purpose, including school/work, leisure,
and shopping. In the naïve Bayesian network, each predictor (i.e. age, education level, and race) has the target variable (i.e. ride-
sourcing use frequency) as a parent and can have another predictor as a parent. Each network comprises eleven nodes, one for the
target and ten for the predictors. The network also displays the relationship between the predictors. The predictors' importance is
highlighted in the graphical models: the darkness shows the closeness of the relationship to ride-sourcing use frequency of university
students. The most important predictors of ride-sourcing usage frequency that had the darkest predictors' colour were safety per-
ception towards ride-sourcing (0.32) and daily travel time (0.2) for leisure trips and neighbourhood facilities in a walkable distance
(0.21) for shopping-related trips. Additionally, students' age (0.15) was the most important predictor of the transport mode choice to
get to school/work. As these four predictors were identified as the key determinants (importance greater than 0.2 for shopping and
leisure purpose) of ride-sourcing usage frequency, the four related relationships will be further discussed. As earlier indicated, for
each related node, the BN network model produces a conditional probability table. For all nodes, the joint probability distribution as
an outcome of conditional probabilities is computed by the BN model based on the value of nodes' parents. The table's columns
correlate with a predictor's value while each row correlates with a combination of values of the target variable and parent predictors.
SCHOOL/WORK LEISURE SHOPPING
IMPORTANCE
1.0 0.8 0.6 0.4 0.2 0.0
Fig. 4. The Bayesian network model for getting to school/work, shopping, and leisure by ride-sourcing.
269
Table 2
Students’ age/ride-sourcing use frequency for school/work conditional probabilities.
FR Probability
Students’ age
1 2 3
1 0.52 0.45 0.04

2 0.58 0.42 0
3 0.6 0.4 0
FR = 1: infrequent; 2: frequent; 3: regular.
Table 2 shows the conditional probability for each value of students' age across all ride-sourcing usage frequency. The conditional
probabilities of students’ age suggest that regular use of ride-sourcing to get to school/work was more probable than infrequent and
frequent use of ride-sourcing for the age group between 18 and 29. On the other hands, the regular use of ride-sourcing was the least
likely for the age ranges of 30–49 and 50–65.
Table 3 summarizes the conditional probabilities of neighbourhood facilities in a walkable distance, taking into consideration the
influence of the students' gender. The infrequent and frequent use of ride-sourcing to get to shopping had the same probability for
both male and female students, especially for those students that had more than three facilities in a walkable distance from their
home. Besides, the likelihood of the use of ride-sourcing regularly to get to shopping was more among male students than female
students that had more than three facilities in a walkable distance from their home.
Table 4 displays the conditional probabilities of safety perception of students towards ride-sourcing considering the influence of
students' residential type. While the regular use of ride-sourcing was not probable for those students that lived in condominiums
(value = 6), the infrequent and frequent use of ride-sourcing was likely regardless of the students' residential type. Overall and as
expected, those students that had a positive attitude towards the safety of ride-sourcing were more likely to use the ride-sourcing.
Table 5 shows the conditional probabilities of students’ daily travel time for leisure taking into consideration the influence of
students’ race. The infrequent use of ride-sourcing had the highest probability (1.00) for Indian male/female student with daily travel
time less than 10 min. For students of the same race and daily travel time, the frequent use of ride-sourcing had the highest likelihood
(0.92). If the student race is Indian or Chinese, and the daily travel time of the student is less than 10 min, the model prediction of the
frequency of use is likely to be regular (probability = 1.00).
5.3. Cause and effect relationship
In the previous sections, the authors developed multiple BN models using the Tree Augmented Naïve Bayes (TAN) classifier that
has better classification accuracy in comparison with Naïve Bayes (NB) and general Bayesian network (GBN). Indeed, the graph
created using the TAN shows only the inputs which are suitable for predicting the target and is not sufficient to understand the cause
and effect relationships. To conduct cause and effect analysis the Markov Blanket structure is applied. However, the main aim of this
study is to predict the usage frequency of ridesourcing and identify its most important predictors. Many previous studies confirmed
the suitability of this structure for analyzing the cause and effect relationships (Bui and Jun, 2012; Gao and Ji, 2015; Pearl, 2000;
Pellet and Elisseeff, 2008).
Fig. 5 shows the cause and effect diagrams for work/school, leisure, and shopping trips of students. For school/work and shopping
purposes, neighborhood facilities in a walkable distance affected the ridesourcing usage frequency by university students, while the
attitudes of students towards the safety of ridesourcing (SCP) and neighborhood facilities in a walkable distance (NFW) influenced
the frequency of ridesourcing usage for leisure. The effects of NFW on the usage frequency of ridesourcing makes sense because
availability of facilities near the students’ residential area may encourage them to use the non-motorized modes such as walking and
cycling rather than using private vehicles and ridesourcing. Attitudes towards the safety of any transport mode severely influence the
Table 3
Neighbourhood facilities in a walkable distance/students’ gender for shopping conditional probabilities.
GEN FR Probability
Neighbourhood facilities in a walkable distance
1 2 3 4 5 6 7 8 9 10 11 12
1 1 0 0.1 0.2 0 0 0 0 0 0 0 0.1 0.6

1 2 0.1 0 0 0 0.1 0.1 0 0 0 0 0.1 0.6
1 3 0.1 0.1 0 0 0 0.1 0 0.1 0.2 0 0.1 0.3
2 1 0.2 0.1 0.1 0 0.1 0 0 0 0 0 0.1 0.4
2 2 0.2 0 0 0 0.2 0.1 0 0 0 0 0 0.4
270
Table 4
Safety perception towards ride-sourcing/students’ residential type for leisure conditional probabilities.
RTY FR Probability
Overall, using application-based taxi services is safe
1 2 3 4 5
1 1 0 0.25 0.38 0.38 0

1 2 0.11 0 0.47 0.42 0
1 3 0 0 0 1 0
2 1 0 0.06 0.38 0.47 0.09
2 2 0 0.07 0.23 0.65 0.05
2 3 0.22 0.06 0.67 0.06 0
3 1 0 0 1 0 0
3 2 0 0 0.5 0.25 0.25
3 3 0 0 1 0 0
4 1 0 0 0.54 0.38 0.08
4 2 0 0 0.3 0.44 0.26
4 3 0 0.33 0.17 0.5 0
5 1 0 0 0.1 0.9 0
5 2 0 0 0.6 0.4 0
5 3 0 0 0 1 0
6 1 0 0 0 1 0
6 2 0 0 0 0 1
Table 5
Daily travel time/students’ race for leisure conditional probabilities.
RAC FR Probability
Daily travel time
1 2 3 4
1 1 0.79 0.11 0.11 0

1 2 0.66 0.09 0.25 0
1 3 0.56 0.19 0.25 0
2 1 0.73 0.14 0.09 0.05
2 2 0.7 0.03 0.27 0
2 3 1 0 0 0
3 1 1 0 0 0
3 2 0.92 0.08 0 0
3 3 1 0 0 0
4 1 0.18 0.09 0.45 0.27
4 2 0.5 0 0.5 0
4 3 0.33 0.33 0 0.33
a. Schoo/work b. Leisure c. Shopping
Fig. 5. Cause and effect structure based on the BN model and Markov Blanket structure.
271
Table 6
Fitting information for ordinal logistic regression models.
Model −2 Log Likelihood Chi-Square df Sig.
Leisure
Intercept Only 701.812
Final 662.595 39.217 20 0.006
Shopping
Final 652.312 49.500 20 0.000
Work/school
Final 649.654 52.158 20 0.000
Link function: Logit.
probability of a mode usage. Thus, the effect of SCP on the ridesourcing usage frequency seems sensible.
6. Characteristics of RF-BN models for the probabilistic prediction of ride-sourcing usage frequency
6.1. Comparison of the proposed RF-BN models with other models
First, ordinal logistic regression models that are usual modelling techniques for ordinal data are developed. Tables 6 and 7 show
that all ordinal logistic regression models are significant and Table 8 shows the Pseudo R-Square values for these models. However,
the parameter estimations in Table 9 show that few independent variables are significant. In addition, test of parallel lines in Table 10
shows that the assumption of proportional odds needs to be rejected for these models because the null hypothesis is rejected. The
relationship between the independent and the dependent variables is described by different assumptions and pre-defined functions
for the majority of traditional and parametric modelling techniques such as ordinal logistic regression models and the model power
can be affected negatively if these assumptions are violated. Therefore, assumption free models like the proposed RF-BN models can
be used to avoid this limitation.
As earlier pointed out, the authors compared the accuracy performance of the proposed RF-BN models with hybrid models
developed by Feature Selection and Bayesian Network (FS-BN) methods. To develop the FS-BN models, 18 input variables were used.
The FS selected five variables (NFW, SCP, CUP, VO, and INC) for school/work model, six variables (NFW, SCP, CUP, VO, INC,
DTDSHOPPING) for shopping model, and six variables (NFW, SCP, CUP, VO, UTMLEISURE, and INC) for leisure model. These
selected variables were used as the inputs to develop the BN models for each trip purpose. Table 11 shows the accuracies achieved for
each FS-BN model and compares these accuracies with those of the models developed by RF-BN. It can be seen that the RF-BN model
showed slightly better performance over the FS-BN models of school/work and leisure.
6.2. RF-BN models’ robustness
The present study conducted a sensitivity analysis to assess the proposed models’ robustness. Two approaches were used to
analyze the models’ sensitivity. The first approach used by the authors excluded the most important factor in each model and then ran
the model. The second approach employed by the authors substituted the most important factor in each model with another factor
that was excluded in the main analysis and then ran the model. It is important to note that this substitution was done among those
variables that belong to the same types. For instance, for the school/work model, the age was substituted one-by-one with gender,
vehicle ownership, and employment mode; for leisure, the overall attitude towards the safety of ride-sourcing (SCP) was substituted
one-by-one with the overall attitude towards the cost-effectiveness of ride-sourcing (CEP), the overall attitude towards the
Table 7
Goodness-of-fit tests for ordinal logistic regression model.
Chi-Square df Sig.
Leisure
Pearson 695.870 322 0.000
Deviance 662.595 322 0.000
Shopping
Pearson 708.595 324 0.000
Deviance 652.312 324 0.000
Work/school
Pearson 706.583 324 0.000
Deviance 649.654 324 0.000
272
Table 8
Pseudo R-Squares for ordinal logistic regression model.
Leisure Shopping Work/school
Cox and Snell 0.104 0.129 0.136

Nagelkerke 0.121 0.150 0.158
McFadden 0.056 0.071 0.074
Table 9
Significant parameters for ordinal logistic regression models.
Estimate Std. Error Wald df Sig. 95% Confidence Interval
Lower Bound Upper Bound
Leisure
Threshold [FR = 1.00] −4.188 1.407 8.866 1 0.003 −6.945 −1.431
[FR = 2.00] −1.846 1.391 1.762 1 0.184 −4.572 0.879
Location VO −0.822 0.320 6.613 1 0.010 −1.448 −0.195
SCP −0.438 0.157 7.787 1 0.005 −0.746 −0.130
CUP −0.463 0.156 8.771 1 0.003 −0.769 −0.157
Shopping
Threshold [FR = 1.00] −5.197 1.484 12.259 1 0.000 −8.106 −2.288
[FR = 2.00] −2.800 1.463 3.664 1 0.056 −5.668 0.067
Location I1 0.260 0.113 5.349 1 0.021 0.040 0.481
VO −0.957 0.313 9.365 1 0.002 −1.570 −0.344
SCP −0.370 0.153 5.798 1 0.016 −0.670 −0.069
CUP −0.558 0.159 12.276 1 0.000 −0.870 −0.246
DTCSHOPPING −0.372 0.145 6.548 1 0.011 −0.657 −0.087
Work/school
Threshold [FR = 1.00] −5.191 1.420 13.357 1 0.000 −7.975 −2.407
[FR = 2.00] −2.783 1.399 3.957 1 0.047 −5.524 −0.041
Location VO −1.065 0.330 10.431 1 0.001 −1.711 −0.419
SCP −0.379 0.158 5.777 1 0.016 −0.688 −0.070
CUP −0.534 0.157 11.500 1 0.001 −0.842 −0.225
DTTSCHOOL 0.427 0.152 7.867 1 0.005 0.129 0.726
DTCSCHOOL −0.611 0.174 12.378 1 0.000 −0.951 −0.270
Table 10
Test of Parallel Lines for ordinal logistic regression models.
Model −2 Log Likelihood Chi-Square df Sig.
Leisure
Null Hypothesis 662.595
General 593.135 69.460 20 0.000
Shopping
General 573.528 78.783 20 0.000
Work/school
General 593.029 56.626 20 0.000
accessibility and availability of ride-sourcing (AAP), and the overall attitudes towards the complementarity of ride-sourcing to other
transport options (CUP); for shopping, the substitution approach could not be used because all built environment variables were used
already in the main analysis. For exclusion approach, the age in school/work analysis; the overall attitude towards the safety of ride-
sourcing (SCP) in leisure analysis; and neighborhood facilities in a walkable distance (NFW) in shopping analysis were excluded. The
results of the modified models changed only slightly when these variables were excluded or substituted (Table 12).
6.3. RF-BN models’ computational cost
To assess the computational cost of the proposed models, the authors compared the time required to run the RF-BN models and
273
Table 11
Prediction capability of the proposed RF-BN models comparing with the models developed by FS-BN.
Trip purpose Partition RF-BN models FS-BN models
Correct Wrong Total N Correct Wrong Total N
N % N % N % N %
School/Work Training 175 71.43 70 28.57 245 171 69.8 74 30.2 245
Testing 34 55.74 27 44.26 61 33 54.1 28 45.9 61
Validation 29 55.77 23 44.23 52 26 50 26 50 52
Shopping Training 183 74.69 62 25.31 245 180 73.47 65 26.53 245
Testing 37 60.66 24 39.34 61 39 63.93 22 36.07 61
Validation 28 53.85 24 46.15 52 31 59.62 21 40.38 52
Leisure Training 209 85.31 36 14.69 245 209 85.31 36 14.69 245
Testing 46 75.41 15 24.59 61 44 72.13 17 27.87 61
Validation 37 71.15 15 28.85 52 33 63.46 19 36.54 52
Table 12
Correct predictions of main models and modified models.
Number of input variable Correct predictions (%)
Train Test Validation
Main school/work model 10 71.43 55.74 55.77

School/Work model-AGE was excluded 9 78.78 59.02 55.77
School/work model-AGE was substituted with GEN 10 74.29 67.21 59.62
School/work model-AGE was substituted with VO 10 77.55 62.30 57.69
School/work model-AGE was substituted with EM 10 76.33 67.21 61.54
Main shopping model 10 74.69 60.66 53.85
Shopping model-NFW was excluded 9 72.24 59.02 59.62
Main leisure model 10 85.31 75.41 71.15
Leisure model-SCP was excluded 9 79.18 70.49 63.46
Leisure model-SCP was substituted with CEP 10 86.94 75.41 75.00
Leisure model-SCP was substituted with AAP 10 86.12 73.77 65.38
Leisure model-SCP was substituted with CUP 10 84.90 75.41 69.23
model developed by sole BN method. Besides, the accuracies of the two types of models were compared. The authors observed that
the total required time to run the RF-BN for all models was almost 28 s (26 s for RF + 2 s for BN). As pointed out earlier, the training
accuracies of these models were 71.43%, 74.69%, and 85.31% for school/work, shopping, and leisure respectively. For the models
developed by sole BN method, the authors did not use the RF technique for input selection and ran the BN models using 18 input
variables. The research team observed that the time required to run each BN model was three (3) seconds, however, the accuracies of
the models for training data set decreased to 56.45% for school/work, 58.63% for shopping, and 69.03% for leisure. While con-
sidering cost-benefit, the proposed approach (RF-BN) increased the time per model by almost nine-fold but increased the accuracy by
over 21% on the school/work model, 22% on the shopping model, and 19% on the leisure model.
7. Discussion
The main objective of this study is to unveil the attributes of students and built environment, as well as students’ attitudes towards
ride-sourcing usage frequency for different trip purposes. The BN analysis showed that students’ age (AGE) to get to school/work;
safety perception towards ride-sourcing (SCP) and daily travel time (DTT) for leisure; and neighborhood facilities in a walkable
distance (NFW) to shopping were the most important factors to use the ride-sourcing by the university students.
As expected and consistent with previous studies, age is one of the most influential factors of the mode choice and travel fre-
quency for both students and the general population (Davison et al., 2015; Delmelle and Delmelle, 2012; Wang et al., 2013; Zhou,
2012). It must be noted that age was identified as the most important factor exclusively for school/work trips of students. According
to the BN model, the probabilities of the infrequent, frequent, and regular use of ride-sourcing were higher for those students that
were in the age range of 18–29 years; compared to those in the age cluster of 30–49 and 50–64. It implies that this age range is the
most influential for the use of ride-sourcing. One notable explanation for this result is that the majority of our respondents (55.9%)
are in the aged cluster of 18–29 years and this is consistent with the fact that the majority of UTM students belonged to this age range.
On the other hands, students in the age range of 50–64 were underrepresented in our dataset and it may be a possible explanation for
little probabilities of ride-sourcing for this age range. Typically, older students may have a better socioeconomic position than young
students and can afford private vehicles and do not require to rely on other modes of transport (Davison et al., 2015; Habib et al.,
2018; Kamruzzaman et al., 2011; Klöckner and Friedrichsmeier, 2011; Mitra and Nash, 2018; Zhou, 2016). This conviction most
274
certainly justifies the higher probability of ride-sourcing usage frequency among the younger students.
The BN model for leisure usage showed that safety perception towards ride-sourcing (SCP) is the most important factor for using
this mode. This finding is consistent with the literature (Jomnonkwao et al., 2016; Mbara and Celliers, 2013; Mitra and Nash, 2018;
Namgung and Akar, 2015; Nguyen-Phuoc et al., 2018; Rybarczyk and Gallagher, 2014; Sam et al., 2014; Wang and Liu, 2015) which
showed the importance of safety perception of different modes among the students. The model also showed that two associated
factors are essential to predict the ride-sourcing usage frequency for leisure which are safety perception towards ride-sourcing and
students’ residential types. The association identified here was not reported in the literature, for our information. Regular use of ride-
sourcing occurs with a probability of 1.00 (perfect probability) when a student lives in bungalow or apartment and has a positive
attitude towards the safety of ride-sourcing. This also occurs when a student lives in a shop-house and has a neutral perception
towards the ride-sourcing. If a student lives in a condominium and has a positive attitude towards the safety of ride-sourcing, the BN
model predicts the likelihood of usage frequency of ride-sourcing to be perfect (1.00). Infrequent use of ride-sourcing occurs with a
probability of 1.00, when a student lives in a shop-house or condominium and has a positive or neutral attitude towards the safety of
ride-sourcing, respectively. The findings of this model suggest that to increase the probability of the ride-sourcing usage frequency,
the students should have a positive attitude towards the ride-sourcing. Besides, living in a bungalow and gated apartments may
increase ride-sourcing usage frequency. For gated apartments, it was found that this housing type typically has limited parking
spaces, and large families cannot own or keep more than one car per apartment at the same time. Thus the students have to use
alternative modes. However, only a few studies investigated the relationship between the dwelling unit and mode choices (Asgari
et al., 2017; Cai et al., 2019; Heinen and Chatterjee, 2015; Kaplan et al., 2016; Ledsham et al., 2017). Consequently, it will be
valuable for future researchers to examine the link between dwelling type and travel mode choice of both the general population and
students, particularly ride-sourcing to confirm the transferability of this study.
Daily travel time to get for leisure destinations was the second most important factor identified via the BN model for leisure. This
is consistent with the findings of Wang and Liu (2015) and Shannon et al. (2006) that identified daily travel time as an influential
factor for travel mode choices among the university population. To predict the ride-sourcing usage regularity for leisure, the model
identified that two factors of daily travel time to go for leisure targets and students’ race were associated. However, several studies
examined the influence of students’ race and their daily travel time on their mode choices, but no studies ever investigated the effect
of these two factors on the students’ mode choices at the same time. This association revealed that students that spend 10 min or less
daily to get to leisure places and their race is Chinese or Indian are likely to be regular users of ride-sourcing (with perfect prob-
ability). Besides, if the students are Indian and spend 10 min or less daily to get to leisure targets, the BN model predicts that the
likelihood of infrequent ride-sourcing will be perfect (1.00). Thus, it could be inferred that the daily travel time of 10 min and less,
and students of Indian race are the most influential in the use of ride-sourcing to go for leisure. A possible explanation to these
findings is that most of the Indian students that participated in the survey are from low-income families and they may not own a car.
Thus, they have to use other alternative modes such as ride-sourcing. Besides, for a leisure trip that takes less than 10 min, the
students may prefer to use ride-sourcing, this may help them to save fuel and parking costs for a short trip. On the other hands, to
return from a distant trip to a leisure destination, the ride-sourcing might not offer available and flexible option. Thus, for long trips,
the students may prefer to use other options such as bus, normal taxi, friend’s vehicle, or even their own vehicles.
Neighbourhood facilities in a walkable distance were identified as the most important factor influencing the ride-sourcing usage
frequency for students’ shopping trips. This finding is consistent with (Cao et al., 2009; Chatman, 2003; Du et al., 2019; Ermagun and
Levinson, 2017; Kim and Ulfarsson, 2008; Loo and Siiba, 2019; Manaugh and El-Geneidy, 2013). These past studies highlighted the
importance of mixed land use for mode choices, especially walking and cycling. The model also showed that two factors of neigh-
bourhood facilities in a walkable distance and students’ gender are associated and this was useful to predict the ride-sourcing usage
frequency to shopping. To the authors’ discernment, no previous study investigates the simultaneous effect of the abovementioned
factors on transport mode choices or specific mode usage frequency. This study did not observe a perfect conditional probability
between the abovementioned factors and ride-sourcing usage frequency. However, the results showed that the availability of three or
more facilities in a walkable distance from students’ home was the most influential on the use of ride-sourcing by the students.
Relative to students’ gender, if a student is a male and lives in a neighbourhood that has three or more facilities from his home, the
likelihood of regular ride-sourcing usage is much more compared a student who is a female.
In comparison with other studies that investigated the use of ride-sourcing among the university students, such as Tarabay and
Abou-Zeid (2019), the most important factors that motivate students to switch from traditional modes to the ride-sourcing for social/
recreational trips were door-to-door travel time, waiting time for pick-up, and one-way fares. Cost of transportation, duration of
transportation, and previous travel behaviour in Marten (2015), were influential in determining the demand for Uber. However, the
effect of sociodemographic profile and built environment attributes on the use of ride-sourcing were not investigated in Tarabay and
Abou-Zeid (2019). In Marten (2015), the effect of built environment attributes on the use of ride-sourcing was similarly not in-
vestigated. Thus, investigating the effect of a wide range of factors from the sociodemographic profile, built environment, and mode-
specific on the use of ride-sourcing among university students also contribute to the body of ride-sourcing knowledge.
7.1. Implications for policy-making
To recommend policy formulations, it is crucially important to answer both casual and predictive questions. For instance, an
urban transportation policy-maker that desire to take action against the traffic congestions may require to know whether using active
modes, such as walking, cycling, and public transportation will yield the desired results. This is an example of a purely causal
question. But other active transportation-related decisions, including whether it is necessary to construct infrastructure that supports
275
the active modes’ usage in those areas where they do not exist, whether it is necessary to improve the existing infrastructures, and
how to constitute those that needed to be created, only need a reliable prediction for the probability of active modes’ usage. This
simplistic example reflects the denotation of “prediction policy problem” reported by Kleinberg et al. (2015).
Transportation researchers can use machine learning techniques to discover hidden patterns of urban trips and travel behavior, as
well as and predictions. These findings and predictions could help decision-makers to find optimal solutions that will make trans-
portation systems more reliable and efficient. Machine learning algorithms uniquely possessed capabilities to read individuals' travel
records, identify their travel patterns, and recommend future travel behaviors. Despite the tremendous advantages of machine
learning techniques, urban transportation researchers and policy-makers must subdue the multi-faceted challenges they encounter.
For instance, data availability is one of the main challenges in developing machine learning techniques. However, the policy-makers
can help researchers to surmount this difficulty by promoting open-data initiatives. However, this present study acknowledges that
ride-sourcing companies possessed a large amount of data, and access to these bulks of data could result in the identification of right
policy options for these companies. Therefore, policy-makers should strive to motivate efficient information exchanges as this could
assist in the attainment of much reliable results and companies can set their policies and strategies in the light of robust findings and
predictions, for business prosperity and survival. This present study concurs that the proposed RF-BN model possesses several ad-
vantages to be useful to transport and ride-sourcing policy and decision-makers. Firstly, the BN model can be interpreted by the
policy-makers, as the relationships between variables are clearly denoted by a graph and a number of tables. Secondly, the proposed
model is sufficiently robust against any change in the dataset. Thus, the decision-makers can easily modify any input variable to
observe the possible changes in the results. Thirdly, this model can be used later and new knowledge can be added simply through
updating the frequency tables of each input variable. Lastly, this RF-BN model provides ride-souring decision-makers with the most
important factors for each type of trips. This ensures decision and policymakers consider, analyse, and evaluate all relevant and
important factors.
7.2. Limitations
This study addressed several limitations that emerged during the research. Firstly, this study used an intercept data collection
method similar to all intercept surveys, the study was not entirely representative of both ride-sourcing commuters and market share.
Secondly, feedback from drivers revealed that many students use the ride-sourcing for their trip to bus terminals and airport. Hence,
our survey did not capture these trips. Thirdly, students did not represent all ride-sourcing users in Johor Bahru or Malaysia. Fourthly,
our survey oversampled students who were probably to be in the potential survey locations in the mornings. This study used the self-
report data of ride-sourcing usage which is the fifth limiting point. Further studies can complement self-report with trip observations.
The sixth limiting factor in this present study is the authors' selection of the six survey locations to conduct the study. The research
team only selected these locations based on observations and ride-sourcing drivers’ feedbacks, if ride-sourcing data was easily ac-
cessible during the data collection, the selection of the potential locations could be more robust and the team could identify more
potential locations. In addition, large universities like UTM may have many faculties and schools which have this potential to be
identified as the pick up or drop off points. Thus, the further studies may consider more locations by integrating data from ob-
servations, personal feedbacks, and ride-sourcing companies. While this present study achieved desirable accuracies for RF and RF-
BN models, the future studies may run these models using a larger sample size to achieve better accuracies which is the seventh
limiting factor identified. Finally, our survey was conducted among university students in a developing country where the majority of
people have their own private vehicles and overall condition of infrastructures of the public transport and active transport are not
desirable. Thus, we have to be cautious when applying the results to make inferences in developed countries.
8. Conclusion and recommendations
The recent emergence of ride-sourcing services triggered a large debate on their role in urban transport. This study employed the
Random Forest technique and Bayesian network analysis to identify the predictors of the ride-sourcing usage frequency. From the RF
analysis results, the most important predictors for school/work trips were age, residential type, attitudes towards the cost-effec-
tiveness of ride-sourcing, usual travel mode to get to shopping/work, household income, neighbourhood type, attitudes towards the
comfort of ride-sourcing, race, neighbourhood facilities in a walkable distance, and education level. The most important predictors
for leisure trips were attitudes towards the safety of ride-sourcing, daily travel time to go for leisure places, neighbourhood facilities
in a walkable distance, race, neighbourhood type, income level, attitudes towards the comfort of ride-sourcing, usual travel mode to
leisure, attitudes towards the time-effectiveness of ride-sourcing, and residential type. The most important predictors for shopping
trips were neighbourhood facilities in a walkable distance, attitudes towards the comfort of ride-sourcing, education level, neigh-
bourhood type, household income, race, residential type, usual travel mode to shopping, and gender. These 30 input variables were
included as predictors of the target (i.e., the ride-sourcing use frequency for school/work, leisure, and shopping trips) in BN analysis.
By applying BN on these 30 predictors, students’ age resulted as the most important predictor of ride-sourcing use frequency for
school/work; attitudes towards the safety of ride-sourcing and daily travel time to get to leisure resulted as the most important
predictors of ride-sourcing use frequency for leisure; and neighbourhood facilities in a walkable distance resulted as the most im-
portant predictor of ride-sourcing use frequency for shopping trips.
However, it is a challenging task for the ride-sourcing companies to consider all students’ trip purposes, but the following re-
commendations might help them to effectively meet the students’ usage needs. Taking cognizance of the students’ age, the companies
might offer more attractive options to the older students to motivate them to use the ride-sourcing. Modes’ safety is among the most
276
important factors that directly influence the people mode choices. Thus, it is of paramount importance that ride-sourcing companies
ensure that all ride-sourcing users feel safe while using this mode. This can be achieved by allowing the passengers to assess whether
the drivers treat different populations such as women in a proper way. The results of these assessments can be shared by other
passengers before haling a ride. Besides improving the service quality of ride-sourcing services to attract more students, the strategy
of mixed land use should be continued to reduce the need to travel and discourage the use of private cars.
CRediT authorship contribution statement
Mahdi Aghaabbasi: Conceptualization, Methodology, Software, Formal analysis, Investigation, Resources, Writing - original
draft, Supervision. Zohreh Asadi Shekari: Conceptualization, Software, Investigation. Muhammad Zaly Shah: Investigation,
Writing - review & editing, Investigation, Writing - review & editing. Oloruntobi Olakunle: Conceptualization, Resources. Danial
Jahed Armaghani: Conceptualization, Software. Mehdi Moeinaddini: Methodology, Software, Formal analysis.
Acknowledgement
The authors wish to thank all of those who have supported this research for their useful comments during its completion. In
particular, we would like to acknowledge the Universiti Teknologi Malaysia Research Management Centre (RMC) and Centre for
Innovative Planning and Development (CIPD). The funding for this project is made possible through the research grant obtained from
the Ministry of Education, Malaysia under the Professional Development Research University (PDRU) (Grant Reference no: PY/2018/
02906).
Appendix A. Factors used for investigating the university students’ mode choice and usage frequency
Factor Study Comment
Students’ sociodemographic
Age (Davison et al., 2015; Delmelle and Delmelle, 2012; Wang These studies showed the inclination of older students to use private vehicles.
et al., 2013; Zhou, 2012)
(Daisy et al., 2018; Khattak et al., 2011; Sims et al., 2018; These studies showed the inclination of younger students to use active modes.
Wang et al., 2013)
(Rotaris et al., 2019) This study showed the inclination of younger students to use carsharing.
Gender (Daisy et al., 2018; Delmelle and Delmelle, 2012; These studies showed the inclination of male students to use private vehicles.
Jomnonkwao et al., 2016; Tezcan, 2016)
(Davison et al., 2015) This study showed the inclination of female students to use private vehicles.
(Danaf et al., 2014; Davison et al., 2015; Delmelle and These studies showed the inclination of male students to use active modes.
Delmelle, 2012; Rybarczyk and Gallagher, 2014; Zhan et al.,
2016)
(Nguyen-Phuoc et al., 2018; Zhou, 2012) These studies showed the inclination of female students to use active modes.
(Tezcan, 2016) This study showed the inclination of female students to use carsharing.
Education level (Danaf et al., 2014; Namgung and Akar, 2015; Sims et al., These studies showed the inclination of undergraduate students to use active
2018; Zhou, 2012, 2016) modes.
(Delmelle and Delmelle, 2012) This study showed the inclination of postgraduate students to use active
modes.
Study mode (Davison et al., 2015) This study showed the inclination of part-time students to use private vehicle.
(Wang et al., 2013) This study showed the inclination of full-time students to use active modes.
Household Income (Danaf et al., 2014; Nguyen-Phuoc et al., 2018; Wang et al., These studies showed the inclination of students with higher household
2013) income to use private vehicle.
(Molina-Garcia et al., 2014) This study showed the inclination of students with lower household income to
use active modes.
Race (Cervero and Tsai, 2004) This study showed the effects of race/ethnicity on general population mode
choice.
Vehicle ownership (Kamruzzaman et al., 2011; Limanond et al., 2011; Nguyen- These studies showed the inclination of students that owned the car or
Phuoc et al., 2018) motorcycle to use private vehicle.
(Jomnonkwao et al., 2016) This study showed the inclination of students that did not own a private
vehicle to use the bus.
Built environment
Residential type (Nurul Habib, 2018) This study stated that apartments/condominiums are mostly located in the
core of the cities and the accessibility of transit services is also high.
Mixed land use (Namgung and Akar, 2015) This study showed that students who lived in mixed-use neighbourhoods and
low-density were more eager to use public transport.
(Mitra and Nash, 2018) This study showed that neighbourhoods with higher land use mix increase the
odds of students cycling.
(Nash and Mitra, 2019) This study showed that the type of neighbourhood of residence was an
important indicator of students’ transportation life-style.
Trip characteristics
Travel distance (Wang and Liu, 2015) This study showed that travel distance is an influencing factor on the use of
public transport by students.
277
Travel time (Wang and Liu, 2015) These studies showed that travel time is an influencing factor on the use of
public transport by students.
(Shannon et al., 2006) These studies showed that travel time is an influencing factor on the use of
transport by students.
Travel cost (Mohammed and Shakir, 2013) This study showed that students preferred using the public bus if there was
30% reduction in travel cost.
(Danaf et al., 2014; Whalen et al., 2013) These studies showed that travel cost is an influencing factor on the students’
mode choice.
Mode-specific
Travel time (Abdul Sukora and Hassan, 2014; Lundberg and Weber, These studies showed that students preferred to use their own private vehicles
2014; Whalen et al., 2013) because of their lower travel time.
(Shannon et al., 2006) This study showed that the travel time of active transport modes was the most
important barrier of using these modes.
Travel cost (Akar et al., 2012; Wang and Liu, 2015) These studies showed that students who cited the travel cost as an important
factor for their travel mode choice, prefer to not drive alone, instead, they
choose to use alternative modes such as public transport.
(Danaf et al., 2014; Salon and Aligula, 2012) These studies showed that students from higher socio-economic levels did not
assume the travel cost as an important factor for their primary mode choice.
Comfort (Nguyen-Phuoc et al., 2018) This study showed that when the students were offered a new bus system with
more comfort and easier access, more than 40% of their respondents (mostly
motorcyclists) were willing to shift their current modes to the new bus system.
(Mbara and Celliers, 2013; Wang and Liu, 2015; Whalen These studies showed that the students’ attitude towards the comfort of
et al., 2013) different modes, including taxies, public transport, and cars influence their
choice.
Safety (Nguyen-Phuoc et al., 2018) This study showed that the majority of respondents choose to walk because of
its safety.
(Jomnonkwao et al., 2016; Sam et al., 2014; Wang and Liu, These studies showed that personal perceived safety of students was shown to
2015) be criteria for using public transport.
Accessibility and (Nguyen-Phuoc et al., 2018; Whalen et al., 2013; Zhou, These studies showed that student’s access to active modes such as buses,
availability 2012, 2014) trains, and bikes affect their willingness to use these modes and reduce the
usage of private vehicles.
(Molina-Garcia et al., 2010; Nguyen-Phuoc et al., 2018; Sims These studies showed that the availability of private vehicles may reduce the
et al., 2018) active mode usage among the students.
Complementarity (Balsas, 2003; Muromachi, 2017) These studies showed that installing bicycle racks on the buses that serve
to other modes campus locations and implementing certain TDM strategies within the
campuses increases the complementarity between different modes.
Appendix B. Sociodemographic of participants
Characteristic School/Work Shopping Leisure
Infrequent Frequent Regular Infrequent Frequent Regular Infrequent Frequent Regular
N % N % N % N % N % N % N % N % N %
Age
18–29 84 42 90 45 26 13 84 42 90 45 26 13.0 84 42.0 90 45.0 26 13.0
30–49 72 47.4 62 40.8 18 11.8 72 47.4 62 40.8 18 11.8 72 47.4 62 40.8 18 11.8
50–64 6 100 0 0 0 0 6 100 0 0 0 0 6 100 0 0 0 0
Gender
Male 84 47.2 70 39.3 24 13.5 84 47.2 70 39.3 24 13.5 84 47.2 70 39.3 24 13.5
Female 78 43.3 82 45.6 20 11.1 78 43.3 82 45.6 20 11.1 78 43.3 82 45.6 20 11.1
Education
Diploma 58 48.3 46 38.3 16 13.3 58 48.3 46 38.3 16 13.3 58 48.3 46 38.3 16 13.3
Bachelor 80 42.1 82 43.2 28 14.1 80 42.1 82 43.2 28 14.7 80 42.1 82 43.2 28 14.7
Masters 22 47.8 24 52.2 0 0 22 47.8 24 52.2 0 0 22 47.8 24 52.2 0 0
Doctorate 2 100 0 0 0 0 2 100 0 0 0 0 2 100 0 0 0 0
Study mode
Part-time 16 40 16 40 8 20 16 40 16 40 8 20 16 40.0 16 40.0 8 20.0
Full-time 146 45.9 136 42.8 36 11.3 146 45.9 136 42.8 36 11.3 146 45.9 136 42.8 36 11.3
Monthly household income

Less than RM 1000 14 33.3 18 42.9 10 23.8 14 33.3 18 42.9 10 23.8 14 33.3 18 42.9 10 23.8
RM 1000–2000 48 54.5 36 40.9 4 4.5 48 54.5 36 40.9 4 4.5 48 54.5 36 40.9 4 4.5
RM 2000–3000 50 43.9 46 40.4 18 15.8 50 43.9 46 40.4 18 15.8 50 43.9 46 40.4 18 15.8
RM 3000–6000 38 45.2 36 42.9 10 11.9 38 45.2 36 42.9 10 11.9 38 45.2 36 42.9 10 11.9
RM 6000–13,000 12 46.2 14 53.8 0 0 12 46.2 14 53.8 0 0 12 46.2 14 53.8 0 0
More than RM 13,000 0 0 2 50 2 50 0 0 2 50 2 50 0 0 2 50 2 50
Race
Malay 100 47.6 82 39.0 28 13.3 100 47.6 82 39.0 28 13.3 100 47.6 82 39.0 28 13.3
278
Chinese 30 36.6 46 56.1 6 7.3 30 36.6 46 56.1 6 7.3 30 36.6 46 56.1 6 7.3
Indian 16 44.4 16 44.4 4 11.1 16 44.4 16 44.4 4 11.1 16 44.4 16 44.4 4 11.1
Others 16 53.3 8 26.7 6 20.0 16 53.3 8 26.7 6 20.0 16 53.3 8 26.7 6 20.0
Vehicle ownership
No 22 28.9 42 55.3 12 15.8 22 28.9 42 55.3 12 15.8 22 28.9 42 55.3 12 15.8
Yes 140 49.6 110 39.0 32 11.3 140 49.6 110 39.0 32 11.3 140 49.6 110 39.0 32 11.3
Infrequent: once a week; frequent: 2–4 times per week; regular: more than 5 times per week.
References
Abdul Sukora, N.S., Hassan, S.A., 2014. En route to a sustainable campus–an analysis of university students’ travel patterns via 7 day travel diary. Jurnal Teknologi 70,
9–16.
Acharya, U.R., Fujita, H., Oh, S.L., Hagiwara, Y., Tan, J.H., Adam, M., Tan, R.S., 2019. Deep convolutional neural network for the automated diagnosis of congestive
heart failure using ECG signals. Appl. Intell. 49, 16–27.
Ahmad, I., Basheri, M., Iqbal, M.J., Rahim, A., 2018. Performance comparison of support vector machine, random forest, and extreme learning machine for intrusion
detection. IEEE Access 6, 33789–33795.
Akar, G., Flynn, C., Namgung, M., 2012. Travel choices and links to transportation demand management. J. Transp. Res. Board 2319, 77–85.
Arteaga-Sánchez, R., Belda-Ruiz, M., Ros-Galvez, A., Rosa-Garcia, A., 2018. Why continue sharing: determinants of behavior in ridesharing services. Int. J. Mark. Res.
1–18.
Asgari, H., Zaman, N., Jin, X., 2017. Understanding Immigrants’ Mode Choice behavior in Florida: Analysis of Neighborhood Effects and Cultural Assimilation. Transp.
Res. Procedia 25, 3079–3095.
Balsas, C.J.L., 2003. Sustainable transportation planning on college campuses. Transp. Policy 10, 35–49.
Bernetti, G., Longo, G., Tomasella, L., Violin, A., 2008. Sociodemographic groups and mode choice in a middle-sized European City. Transp. Res. Rec.: J. Transp. Res.
Board 2067, 17–25.
Breiman, L., 2001. Random forests. Mach. Learn. 45, 5–32.
Breiman, L., Friedman, J., Olshen, R., Stone, C., 1984. Classification and regression trees. Wadsworth Int. Group 37, 237–251.
Bui, A.T., Jun, C.H., 2012. Learning Bayesian network structure using Markov blanket decomposition. Pattern Recogn. Lett. 33 (16), 2134–2140.
Cai, Y., Wang, H., Ong, G.P., Meng, Q., Lee, D.-H., 2019. Investigating user perception on autonomous vehicle (AV) based mobility-on-demand (MOD) services in
Singapore using the logit kernel approach. Transportation.
Cao, X., Mokhtarian, P.L., Handy, S.L., 2009. The relationship between the built environment and nonwork travel: A case study of Northern California. Transp. Res.
Part A: Policy Pract. 43, 548–559.
Cervero, R., Tsai, Y., 2004. City CarShare in San Francisco, California: second-year travel demand and car ownership impacts. Transp. Res. Rec.: J. Transp. Res. Board
117–127.
Chang, L.Y., Wang, H.W., 2006. Analysis of traffic injury severity: an application of non-parametric classification tree techniques. Accid. Anal. Prev. 38, 1019–1027.
Chatman, D.G., 2003. How density and mixed uses at the workplace affect personal commercial travel and commute mode choice. Transp. Res. Rec. 1831, 193–201.
Chen, X., Zahiri, M., Zhang, S., 2017. Understanding ridesplitting behavior of on-demand ride services: An ensemble learning approach. Transp. Res. Part C: Emerg.
Technol. 76, 51–70.
Chen, X., Zheng, H., Wang, Z., Chen, X., 2018. Exploring impacts of on-demand ridesplitting on mobility via real-world ridesourcing data and questionnaires.
Transportation 11–2121. https://doi.org/10.1007/s11116-018-9916-1.
Cheng, L., Chen, X., Yang, S., Wang, H., Wu, J., 2016. Modeling mode choice of low-income commuters with sociodemographics, activity attributes, and latent
attitudinal variables: case study in Fushun, China. Transp. Res. Rec.: J. Transp. Res. Board 2581, 27–36.
Cohen, A.P., Shaheen, S.A., 2016. Planning for shared mobility.
Daisy, N.S., Hafezi, M.H., Liu, L., Millward, H., 2018. Understanding and modeling the activity-travel behavior of university commuters at a large Canadian university.
J. Urban Plann. Dev. 144, 1–10.
Danaf, M., Abou-Zeid, M., Kaysi, I., 2014. Modeling travel choices of students at a private, urban university: Insights and policy implications. Case Stud. Transp. Policy
2, 142–152.
Davison, L., Ahern, A., Hine, J., 2015. Travel, transport and energy implications of university-related student travel: a case study approach. Transp. Res. Part D: Transp.
Environ. 38, 27–40.
Delmelle, E.M., Delmelle, E.C., 2012. Exploring spatio-temporal commuting patterns in a university environment. Transp. Policy 21, 1–9.
Du, Y., Deng, F., Liao, F., 2019. A model framework for discovering the spatio-temporal usage patterns of public free-floating bike-sharing system. Transp. Res. Part C:
Emerg. Technol. 103, 39–55.
Ermagun, A., Levinson, D., 2017. “Transit makes you short”: On health impact assessment of transportation and the built environment. J. Transp. Health 4, 373–387.
Fischer-Baum, R., Bialik, C., 2015. Uber is taking millions of Manhattan rides away from taxis. In: FiveThirtyEight.
Flores, O., Rayle, L., 2017. How cities use regulation for innovation: the case of Uber, Lyft and Sidecar in San Francisco. In: Proceedings of World Conference on
Transport Research - WCTR 2016 Shanghai, Shanghai, pp. 3760–3772.
Friedman, N., Geiger, D., Goldszmidt, M., 1997. Bayesian Network Classifiers. Mach. Learn. 29, 131–163.
Gao, T., Ji, Q., 2015. Local causal discovery of direct causes and effects. In: Advances in Neural Information Processing Systems, pp. 2512–2520.
Gao, M., Li, P., Chen, C., Jiang, Y., 2018. Research on software multiple fault localization method based on machine learning. In: Proceedings of MATEC Web of
Conferences, p. 01060.
Garikapati, V.M., You, D., Pendyala, R.M., Patel, T., Kottommannil, J., Sussman, A., 2016. Design, development, and implementation of a university travel demand
modeling framework. J. Transp. Res. Board 105–113.
Grab, 2019. E-Hailing Regulations. Malaysia.
Habib, K.N., Weiss, A., Hasnine, S., 2018. On the heterogeneity and substitution patterns in mobility tool ownership choices of post-secondary students: The case of
Toronto. Transp. Res. Part A: Policy Pract. 116, 650–665.
Harb, R., Yan, X., Radwan, E., Su, X., 2009. Exploring precrash maneuvers using classification trees and random forests. Accid Anal Prev 41, 98–107.
Heinen, E., Chatterjee, K., 2015. The same mode again? An exploration of mode choice variability in Great Britain using the National Travel Survey. Transp. Res. Part
A: Policy Pract. 78, 266–282.
Hughes, R., MacKenzie, D., 2016. Transportation network company wait times in Greater Seattle, and relationship to socioeconomic indicators. J. Transp. Geogr. 56,
36–44.
Jahangiri, A., Rakha, H., Dingus, T.A., 2016. Red-light running violation prediction using observational and simulator data. Accid Anal Prev 96, 316–328.
Jiao, J., 2018. Investigating Uber price surges during a special event in Austin, TX. Res. Transp. Bus. Manage. https://doi.org/10.1016/j.rtbm.2018.02.008.
Jin, S.T., Kong, H., Wu, R., Sui, D.Z., 2018. Ridesourcing, the sharing economy, and the future of cities. Cities 76, 96–104.
Johansson, M.V., Heldt, T., Johansson, P., 2006. The effects of attitudes and personality traits on mode choice. Transp. Res. Part A: Policy Pract. 40, 507–525.
Jomnonkwao, S., Sangphong, O., Khampirat, B., Siridhara, S., Ratanavaraha, V., 2016. Public transport promotion policy on campus: evidence from Suranaree
University in Thailand. Public Transport 8, 185–203.
279
Kamargianni, M., Polydoropoulou, A., 2013. Hybrid choice model to investigate effects of teenagers' attitudes toward walking and cycling on mode choice behavior.
Transp. Res. Rec. 2382, 151–161.
Kamruzzaman, M., Hine, J., Gunay, B., Blair, N., 2011. Using GIS to visualise and evaluate student travel behaviour. J. Transp. Geogr. 19, 13–32.
Kamruzzaman, M., Shatu, F.M., Hine, J., Turrell, G., 2015. Commuting mode choice in transit oriented development: Disentangling the effects of competitive
neighbourhoods, travel attitudes, and self-selection. Transp. Policy 42, 187–196.
Kaplan, S., Nielsen, T.A.S., Prato, C.G., 2016. Walking, cycling and the urban form: A Heckman selection model of active travel mode and distance by young
adolescents. Transp. Res. Part D: Transp. Environ. 44, 55–65.
Karlaftis, M.G., Golias, I., 2002. Effects of road geometry and traffic volumes on rural roadway accident rates. Accid. Anal. Prev.
Khatami, A., Khosravi, A., Nguyen, T., Lim, C.P., Nahavandi, S., 2017. Medical image analysis using wavelet transform and deep belief networks. Expert Syst. Appl. 86,
190–198.
Khattak, A., Wang, X., Son, S., Agnello, P., 2011. Travel by university students in Virginia. J. Transp. Res. Board 2255, 137–145.
Kim, S., Ulfarsson, G.F., 2004. Travel mode choice of the elderly effects of personal, household, neighborhood, and trip characteristics. Transp. Res. Rec.: J. Transp.
Res. Board 1894, 117–126.
Kim, S., Ulfarsson, G.F., 2008. Curbing automobile use for sustainable transportation: analysis of mode choice on short home-based trips. Transportation 35, 723–737.
Kima, K., Baekb, C., Lee, J.-D., 2018. Creative destruction of the sharing economy in action: The case of Uber. Transp. Res. Part A: Gen. 110, 118–127.
Kitali, A.E., Alluri, P., Sando, T., Haule, H., Kidando, E., Lentz, R., 2018. Likelihood estimation of secondary crashes using Bayesian complementary log-log model.
Accid. Anal. Prev. 119, 58–67.
Kleinberg, J., Ludwig, J., Mullainathan, S., Obermeyer, Z., 2015. Prediction policy problems. Am. Econ. Rev. 105, 491–495.
Klöckner, C.A., Friedrichsmeier, T., 2011. A multi-level approach to travel mode choice – How person characteristics and situation specific aspects determine car use in
a student sample. Transp. Res. Part F: Traff. Psychol. Behav. 14, 261–277.
Kowshalya, G., Nandhini, M., 2018. Predicting fraudulent claims in automobile insurance. In: Proceedings of 2018 Second International Conference on Inventive
Communication and Computational Technologies (ICICCT), pp. 1338–1343.
Ledsham, T., Farber, S., Wessel, N., 2017. Dwelling type matters: untangling the paradox of intensification and bicycle mode choice. Transp. Res. Rec. 2662, 67–74.
Limanond, T., Butsingkorn, T., Chermkhunthod, C., 2011. Travel behavior of university students who live on campus: a case study of a rural university in Asia. Transp.
Policy 18, 163–171.
Lind, H.B., Nordfjærn, T., Jørgensen, S.H., Rundmo, T., 2015. The value-belief-norm theory, personal norms and sustainable travel mode choice in urban areas. J.
Environ. Psychol. 44, 119–125.
Liu, H., 2010. In: Feature Selection. Encyclopedia of Machine Learning. Springer US, Boston, MA, pp. 402–406.
Liu, T., Zhang, Y., Chen, J., Shen, H., 2018. Discovery of association rule of learning action based on Bayesian network. In: Proceedings of 2018 9th International
Conference on Information Technology in Medicine and Education (ITME), pp. 466–470.
Loo, B.P.Y., Siiba, A., 2019. Active transport in Africa and beyond: towards a strategic framework. Transp. Rev. 39, 181–203.
Lundberg, B., Weber, J., 2014. Non-motorized transport and university populations: an analysis of connectivity and network perceptions. J. Transp. Geogr. 39,
165–178.
Manaugh, K., El-Geneidy, A.M., 2013. Does distance matter? Exploring the links among values, motivations, home location, and satisfaction in walking trips. Transp.
Res. Part A: Policy Pract. 50, 198–208.
Marten, L., 2015. Assessing the Demand for Uber. Northwestern University.
Mbara, T.C., Celliers, C., 2013. Travel patterns and challenges experienced by University of Johannesburg off-campus students. J. Transp. Supply Chain Manage.
7, 1–8.
Mitra, R., Nash, S., 2018. Can the built environment explain gender gap in cycling? An exploration of university students' travel behavior in Toronto, Canada. Int. J.
Sustain. Transp. 1–10. https://doi.org/10.1080/15568318.2018.1449919.
Mohammed, A.A., Shakir, A.A., 2013. Factors that affect transport mode preference for graduate students in the national university of Malaysia by logit method. J. Eng.
Sci. Technol. 8, 352–363.
Molina-Garcia, J., Castillo, I., Sallis, J.F., 2010. Psychosocial and environmental correlates of active commuting for university students. Prev. Med. 51, 136–138.
Molina-Garcia, J., Sallis, J.F., Castillo, I., 2014. Active commuting and sociodemographic factors among university students in Spain. J. Phys. Act. Health 11, 359–363.
Muromachi, Y., 2017. Experiences of past school travel modes by university students and their intention of future car purchase. Transp. Res. Part A: Policy Pract. 104,
209–220.
Namgung, M., Akar, G., 2015. Influences of neighborhood characteristics and personal attitudes on university commuters’ public transit use. J. Transp. Res. Board
2500, 93–101.
Nash, S., Mitra, R., 2019. University students' transportation patterns, and the role of neighbourhood types and attitudes. J. Transp. Geogr. 76, 200–211.
Nguyen-Phuoc, D.Q., Amoh-Gyimah, R., Tran, A.T.P., Phan, C.T., 2018. Mode choice among university students to school in Danang, Vietnam. Travel Behav. Soc. 13,
1–10.
Nie, Y., 2017. How can the taxi industry survive the tide of ridesourcing? evidence from Shenzhen, China. Transp. Res. Part C: Emerg. Technol. 79, 242–256.
Nurul Habib, K., 2018. Modelling the choice and timing of acquiring a driver’s license: Revelations from a hazard model applied to the University students in Toronto.
Transp. Res. Part A: Policy Pract. 118, 374–386.
Paulssen, M., Temme, D., Vij, A., Walker, J.L., 2014. Values, attitudes and travel behavior: a hierarchical latent variable mixed logit model of travel mode choice.
Transportation 41, 873–888.
Pearl, J., 2000. The art and science of cause and effect. Causal.: Models, Reason. Inference 331, 358.
Pellet, J.P., Elisseeff, A., 2008. Using Markov blankets for causal structure learning. J. Mach. Learn. Res. 9 (Jul), 1295–1342.
Prati, G., Pietrantoni, L., Fraboni, F., 2017. Using data mining techniques to predict the severity of bicycle crashes. Accid. Anal. Prev. 101, 44–54.
Proulx, F., Cavagnolo, B., Torres-Montoya, M., 2014. Impact of parking prices and transit fares on mode choice at the University of California, Berkeley. J. Transp. Res.
Board 2469, 41–48.
Rashidi, S., Ranjitkar, P., Hadas, Y., 2014. Modeling bus dwell time with decision tree-based methods. Transp. Res. Rec.: J. Transp. Res. Board 2418, 74–83.
Rayle, L., Dai, D., Chan, N., Cervero, R., Shaheen, S., 2016. Just a better taxi? a survey-based comparison of taxis, transit, and ridesourcing services in San Francisco.
Transp. Policy 45, 168–178.
Rotaris, L., Danielis, R., 2014. The impact of transportation demand management policies on commuting to college facilities: A case study at the University of Trieste,
Italy. Transp. Res. Part A: Policy Pract. 67, 127–140.
Rotaris, L., Danielis, R., Maltese, I., 2019. Carsharing use by college students: the case of Milan and Rome. Transp. Res. Part A: Policy Pract. 120, 239–251.
Rybarczyk, G., Gallagher, L., 2014. Measuring the potential for bicycling and walking at a metropolitan commuter university. J. Transp. Geogr. 39, 1–10.
Salon, D., Aligula, E.M., 2012. Urban travel in Nairobi, Kenya: analysis, insights, and opportunities. J. Transp. Geogr. 22, 65–76.
Sam, E.F., Adu-Boahen, K., Kissah-Korsah, K., 2014. Assessing the factors that influence public transport mode preference and patronage: Perspectives of students of
University of Cape Coast (UCC), Ghana. Int. J. Dev. Sustain. 3.
Sammut, C., Webb, G.I., 2011. Encyclopedia of Machine Learning. Springer Science & Business Media.
San Francisco Municipal Transportation Agency Board Meeting, 2014. Taxis and Accessible Services Division: Status of Taxi Industry. San Francisco, U.S.
Scheiner, J., 2010. Interrelations between travel mode choice and trip distance: trends in Germany 1976–2002. J. Transp. Geogr. 18, 75–84.
Shaheen et al., 2017a. Travel Behavior: Shared mobility and Transportation Equity. Washington, DC.
Shaheen, S., Chan, N., 2016. Mobility and the sharing economy: potential to facilitate the first- and last-mile public transit connections. Built Environment 42,
573–588.
Shaheen, S., Cohen, A., 2018. Shared ride services in North America: definitions, impacts, and the future of pooling. Transp. Rev. 39, 427–442.
Shaheen et al., 2017b. Mobility on Demand Operational Concept Report. Department of Transportation. Intelligent Transportation, United States.
280
Shannon, T., Giles-Corti, B., Pikora, T., Bulsara, M., Shilton, T., Bull, F., 2006. Active commuting in a university setting: Assessing commuting habits and potential for
modal change. Transp. Policy 13, 240–253.
Shi, Q., Abdel-Aty, M., 2015. Big Data applications in real-time traffic operation and safety monitoring and improvement on urban expressways. Transp. Res. Part C:
Emerg. Technol. 58, 380–394.
Siddiqui, C., Abdel-Aty, M., Huang, H., 2012. Aggregate nonparametric safety analysis of traffic zones. Accid. Anal. Prev. 45, 317–325.
Sims, D., Bopp, M., Wilson, O.W.A., 2018. Examining influences on active travel by sex among college students. J. Transp. Health 9, 73–82.
Stark, J., Hössinger, R., 2018. Attitudes and mode choice: Measurement and evaluation of interrelation. Transp. Res. Procedia 32, 501–512.
Strobl, C., Malley, J., Tutz, G., 2009. An introduction to recursive partitioning: Rationale, application, and characteristics of classification and regression trees,
bagging, and random forests. Psychol. Methods 14, 323–348.
Stylianou, K., Dimitriou, L., Abdel-Aty, M., 2019. Big data and road safety: a comprehensive review. Mobil. Patterns Big Data Transp. Anal. 297–343.
Susanti, S.P., Azizah, F.N., 2017. Imputation of missing value using dynamic Bayesian network for multivariate time series data. In: Proceedings of 2017 International
Conference on Data and Software Engineering (ICoDSE), pp. 1–5.
Tarabay, R., Abou-Zeid, M., 2019. Modeling the choice to switch from traditional modes to ridesourcing services for social/recreational trips in Lebanon.
Transportation.
Tareeq, S.M., Inamura, T., 2009. A sample discarding strategy for rapid adaptation to new situation based on Bayesian behavior learning. In: Proceedings of 2008 IEEE
International Conference on Robotics and Biomimetics, pp. 1950–1955.
Tezcan, H.O., 2016. Potential of carpooling among unfamiliar users: case of undergraduate students at Istanbul Technical University. J. Urban Plann. Dev. 142, 1–11.
Utkin, L.V., Konstantinov, A.V., Chukanov, V.S., Kots, M.V., Ryabinin, M.A., Meldo, A.A., 2019. A weighted random survival forest. Knowl.-Based Syst. 177, 136–144.
UTM, 2018. UTM Map. Universiti Teknologi Malaysia, Skudai, Malaysia.
Wang, D., Liu, Y., 2015. Factors influencing public transport use: a study of university commuters’ travel and mode choice behaviours. State Austr. Cities Conf.
Wang, M., Mu, L., 2018. Spatial disparities ofUber accessibility: an exploratory analysis in Atlanta, USA. Comput. Environ. Urban Syst. 67, 169–175.
Wang, X., Khattak, A.J., Son, S., 2013. What can be learned from analyzing university student travel demand? Transp. Res. Rec.: J. Transp. Res. Board 2322, 129–137.
Washington, S., Jean, W., Guensler, R., 1997. Binary recursive partitioning method for modeling hot-stabilized emissions from motor vehicles. J. Transp. Res. Board
96–105.
Washington, S., Wolf, J., 1997. Hierarchical tree-based versus ordinary least squares linear regression models theory and example applied to trip generation. J. Transp.
Res. Board 82–88.
Whalen, K.E., Páez, A., Carrasco, J.A., 2013. Mode choice of university students commuting to school and the role of active travel. J. Transp. Geogr. 31, 132–142.
Wu, Q., Yang, C., Gao, X., He, P., Chen, G., 2018. EPAB: Early pattern aware Bayesian model for social content popularity prediction. In: Proceedings of 2018 IEEE
International Conference on Data Mining (ICDM), pp. 1296–1301.
Yadav, M., Ravi, V., 2018. Quantile Regression random forest hybrids based data imputation. In: Proceedings of 2018 IEEE 17th International Conference on Cognitive
Informatics & Cognitive Computing (ICCI* CC), pp. 195–201.
Yan, X., Levine, J., Zhao, X., 2018. Integrating ridesourcing services with public transit: An evaluation of traveler responses combining revealed and stated preference
data. Transp. Res. Part C: Emerg. Technol.
Yan, X., Richards, S., Su, X., 2010. Using hierarchical tree-based regression model to predict train-vehicle crashes at passive highway-rail grade crossings. Accid. Anal.
Prev. 42, 64–74.
Yang, M., Wang, W., Chen, X., Wang, W., Xu, R., Gu, T., 2010. Modeling destination choice behavior incorporating spatial factors, individual sociodemographics, and
travel mode. J. Transp. Eng. 136, 800–810.
Zha, L., Yin, Y., Yang, H., 2016. Economic analysis of ride-sourcing markets. Transp. Res. Part C: Emerg. Technol. 71, 249–266.
Zhan, G., Yan, X., Zhu, S., Wang, Y., 2016. Using hierarchical tree-based regression model to examine university student travel frequency and mode choice patterns in
China. Transp. Policy 45, 55–65.
Zhanga, Y., Guoa, H., Lia, C., Wanga, W., Jianga, X., Liu, Y., 2016. Which one is more attractive to traveler, taxi or tailored taxi? An empirical study in China. Proc.
GITSS2015 867–875.
Zhou, J., 2012. Sustainable commute in a car-dominant city: Factors affecting alternative mode choices among university students. Transp. Res. Part A: Policy Pract.
46, 1013–1029.
Zhou, J., 2014. From better understandings to proactive actions: Housing location and commuting mode choices among university students. Transp. Policy 33,
166–175.
Zhou, J., 2016. Proactive sustainable university transportation: marginal effects, intrinsic values, and university students' mode choice. Int. J. Sustain. Transp. 10,
815–824.
Zhu, M., Li, Y., Wang, Y., 2018. Design and experiment verification of a novel analysis framework for recognition of driver injury patterns: From a multi-class
classification perspective. Accid. Anal. Prev. 120, 152–164.
281

Transportation Research Part A: Sciencedirect

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Transportation Research Part A: Sciencedirect

Uploaded by

Copyright:

Available Formats

Transportation Research Part A 136 (2020) 262–281

Contents lists available at ScienceDirect

Transportation Research Part A

Predicting the use frequency of ride-sourcing by oﬀ-campus

diﬀerent inﬂuential factors on mode usage frequencies are shown in Fig. 1.

4.1. Data and survey

Fig. 2. Location of UTM campus and campus map;

4.2. Statistical analysis

4.3. Models’ assessment

5.1. Variable selection through the Random Forest (RF) technique

SCHOOL/WORK LEISURE SHOPPING

1 0.52 0.45 0.04

FR = 1: infrequent; 2: frequent; 3: regular.

5.3. Cause and eﬀect relationship

Neighbourhood facilities in a walkable distance

1 1 0 0.1 0.2 0 0 0 0 0 0 0 0.1 0.6

FR = 1: infrequent; 2: frequent; 3: regular.

Overall, using application-based taxi services is safe

1 1 0 0.25 0.38 0.38 0

FR = 1: infrequent; 2: frequent; 3: regular.

1 1 0.79 0.11 0.11 0

FR = 1: infrequent; 2: frequent; 3: regular.

a. Schoo/work b. Leisure c. Shopping

Link function: Logit.

6.1. Comparison of the proposed RF-BN models with other models

6.2. RF-BN models’ robustness

Link function: Logit.

Cox and Snell 0.104 0.129 0.136

Link function: Logit.

Lower Bound Upper Bound

Link function: Logit.

Link function: Logit.

6.3. RF-BN models’ computational cost

Correct Wrong Total N Correct Wrong Total N

Train Test Validation

Main school/work model 10 71.43 55.74 55.77

7.1. Implications for policy-making

8. Conclusion and recommendations

CRediT authorship contribution statement

Factor Study Comment

Appendix B. Sociodemographic of participants

Characteristic School/Work Shopping Leisure

Infrequent Frequent Regular Infrequent Frequent Regular Infrequent Frequent Regular

Monthly household income

You might also like