Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

JOURNAL OF INTELLIGENT TRANSPORTATION SYSTEMS

https://doi.org/10.1080/15472450.2021.1926246

Estimating pedestrian delay at signalized intersections using high-resolution


event-based data: a finite mixture modeling method
Abolfazl Karimpoura , Jason C. Andersonb , Sirisha Kothurib , and Yao-Jan Wua
a
Department of Civil and Architectural Engineering and Mechanics, University of Arizona, Tucson, Arizona, USA; bDepartment of Civil
and Environmental Engineering, Portland State University, Portland, Oregon, USA

ABSTRACT ARTICLE HISTORY


It has been widely shown that pedestrians’ level of frustration grows with the increase of Received 7 September 2020
pedestrian delay, and may cause pedestrians to violate the signals. However, for agencies Revised 30 March 2021
seeking to use multimodal signal performances for signal operations, the pedestrian delay is Accepted 3 May 2021
not always readily available. To tackle this issue, this study proposed a finite mixture model-
KEYWORDS
ing method to estimate pedestrian delay using high-resolution event-based data collected Finite mixture modeling;
from the smart sensors. The proposed method was used to estimate pedestrian delay at network-wide estimation;
four signalized intersections on a major arterial corridor in Pima County, Arizona. The results pedestrian delay;
showed the proposed method was able to capture and track the actual pedestrian delay transferability test
fluctuations during the day at all the study intersections with average errors of 10 s and 13 s pedestrian delay
for mean-absolute-error and root-mean-square-error, respectively. In addition, the proposed
model was compared with three conventional methods (HCM 2010, Virkler, Dunn) and the
comparison results showed that the proposed method outperforms all the other methods
in terms of both mean-absolute-error and root-mean-square-error. Furthermore, it was found
that the proposed method is transferable and can be used as a network-wide delay estima-
tion model for intersections with similar traffic patterns. The application of the proposed
method could provide agencies with a more reliable, robust, and yet accurate approach for
estimating pedestrian delay at signalized intersections where the pedestrian data are not
readily available. In addition, it will allow system operators to quantitatively assess existing
delays and enact changes to incorporate the better serve pedestrian needs.

Introduction unnecessary delays for pedestrians (Smaglik, 2018).


This unnecessary pedestrian delay is especially seen
Walking is a critical component in the development
of healthy and sustainable communities. Although along arterial corridors since arterials serve as hubs
more than one out of four trips are a 20-min walk or for the pedestrian activity to varying land uses
less, the majority of these short trips are completed (Habibian & Hosseinzadeh, 2018). Since most walking
using the automobile (FHWA, 2019). To improve trips are short, the delays imposed by the transporta-
physical activity, and reduce congestion and emis- tion system on pedestrians affect them disproportion-
sions, many cities are desirous of shifting these short ately when compared to other users. Therefore,
trips to active transportation modes such as walking. transportation agencies need to measure, analyze, and
Many of these walking trips are in urban areas and monitor pedestrian and vehicle delays quantitatively
require street crossings (Zegeer, 2002). Though gener- and qualitatively.
ally viewed as the preferred place to cross the street Delay, in general, is one of the most significant
for safety reasons, intersections can be a deterrent for measures that quantify the operation level of service
walking if their design and operation heavily favor of intersections and is the most frequent measure
motor vehicle movement. When vehicle volumes are used for intersection mobility (Lattimer, 2020). For
high, a traffic signal is used to separate conflicting the last decade, many studies have focused on various
users in time. Traditionally, in the U.S., signal timing analytical approaches for estimating vehicle delays
objectives at intersections have prioritized vehicle (Qiao et al., 2002; Wei et al., 2015). However, less
movements, resulting sometimes in large and effort has been expended on analytical methods to

CONTACT Abolfazl Karimpour karimpouremail.arizona.edu Department of Civil & Architectural Engineering & Mechanics, The University of
Arizona, 1209 E. Second St. P.O. Box 210072, Tucson, AZ 85721.
ß 2021 Taylor & Francis Group, LLC
2 A. KARIMPOUR ET AL.

estimate pedestrian delay. Most of the existing studies metrics that can be used to provide insights into multi-
only focus on a modified formulation of the HCM modal system operation (Day et al., 2016; Li et al., 2012;
method. Pedestrian delay is an important measure that Zheng et al., 2013). The performance metrics obtained
is used to describe pedestrian travel and to evaluate the using ATSPM could be used to monitor and evaluate
level of service for pedestrians at intersections (HCM, the operation of multimodal transportation, such as ped-
2010). It has been shown that pedestrians become impa- estrian delay. While signal performance measures are a
tient when they experience delays greater than 30–50 s great tool for system operations, they have been primar-
per pedestrian (Guo & Zhang, 2014; Vallyon et al., ily focused on vehicular measures. Signal performance
2011). Alternately, they are likely to show high degrees measures focused on multimodal users are a pressing
of compliance when delays are less than 10 s per pedes- need for future research (Huang et al., 2018). However,
trian. The commonly used method in the HCM 2010 transportation agencies are still facing challenges on how
and HCM 6th edition to estimate pedestrian delay was to incorporate the high-resolution event-based data for
formulated as a function of cycle length and effective obtaining multimodal signal performance measures,
walk time (Elefteriadou, 2016). The HCM method was such as pedestrian delay.
initially developed by Pretty (1979) by assuming uniform In this study, a pedestrian estimation method with
arrival rates and fixed pedestrian timing (Pretty, 1979). higher accuracy over the conventional deterministic
Recent studies are proposing more advanced analytical methods, such as HCM, is proposed. The proposed
models for enhancing the accuracy of the HCM method. methods use high-resolution event-based to estimate
For instance, Wang and Tian (2010) proposed an pedestrian delay. Previous studies showed that pedestrian
enhanced pedestrian delay estimation model based on delay distribution highly depends on pedestrian arrival
the HCM method for a two-stage pedestrian crossing rate (Fi & Igazv€olgyi, 2014). Further, it has been widely
(Wang & Tian, 2010). The authors validated their model shown that the pedestrian arrival rate is no uniformly
under various simulation scenarios and showed that the distributed (Li et al., 2005; Zheng & Elefteriadou, 2017).
relationship between their proposed model and simula- Therefore, this study proposed a method to estimate
tion results is statistically significant. In another study, Li pedestrian delay based on a mixture distribution. A mix-
et al. (2005) developed a relationship between average ture or finite mixture distribution is the probability dis-
pedestrian delay and pedestrian arrival on each sub- tribution of a random response variable that can be
phase. Unlike the HCM method, the authors assumed characterized as a function of other random variables
that pedestrian arrival rates are not uniform throughout (Lao et al., 2012). Due to the adaptability and applicabil-
cycles (Li et al., 2005). Marisamynathan and Vedagiri ity of mixture models, they have been widely applied
(2013) proposed a pedestrian delay estimation model across different fields within traffic and transportation
based on three components: (a) pedestrian average wait- engineering, such as vehicle classification and speed esti-
ing time estimated from the HCM method, (b) average mation (Lao et al., 2012), truck weight distribution esti-
crossing time delay, and (c) average vehicular pedestrian mation (Hernandez, 2017; Hernandez & Hyun, 2020;
interaction time (Marisamynathan & Vedagiri, 2013). Regehr et al., 2020), arterial travel time estimation and
Many researchers would still argue regarding the imputation (Karimpour, 2020; Karimpour et al., 2019;
accuracy of the estimated pedestrian delay using the Yang et al., 2018), and freeway travel time reliability
HCM method (Chilukuri & Virkler, 2005; Hubbard (Yang & Cooke, 2018), and traffic safety (Bakhshi &
et al., 2008; Kothuri et al., 2012). For instance, Chilukuri Ahmed, 2021; Mansourkhaki et al., 2017a, 2017b).
and Virkler conducted field studies of pedestrian delay However, to the best knowledge of the authors, no pre-
and showed that pedestrian arrivals at signalized inter- vious studies were identified that used finite mixture
sections in a coordinated system were not random and modeling to estimate pedestrian delay. In this study, by
the observed pedestrian delays were significantly differ- using a finite mixture model we were able to indirectly
ent than the pedestrian delay estimated using the HCM estimate pedestrian delay using different metrics, such as
method (Chilukuri & Virkler, 2005). To tackle this prob- vehicular-related variables (e.g., traffic volume), and inte-
lem, recent technological advances in smart sensors have grate different variables with non-Gaussian underlying
allowed cities to collect a large amount of multimodal distributions. Also, the advantage of using finite mixture
data to estimate pedestrian delay. Automated Traffic modeling compared to other traditional approaches is
Signal Performance Measures (ATSPMs) is one of the that it does not assume a uniform pedestrian
most recent technological advancements that utilize arrival rate.
high-resolution event-based data, signal phasing, and The application of the proposed method was used
overlap states and data analysis techniques to generate to estimate pedestrian delay at four signalized
JOURNAL OF INTELLIGENT TRANSPORTATION SYSTEMS 3

intersections on a major arterial corridor in Pima


County, Arizona. The objectives of this study were
three folds: (1) develop a pedestrian delay estimation
model using finite mixture modeling, (2) identify spe-
cific vehicular variables that can best characterize ped-
estrian delay distribution function, and (3) develop a
transferable network-wide pedestrian delay estima-
tion model.
The remainder of this paper is structured as described
herein. In the methodology section, the statistical
approaches used for delay estimation are thoroughly dis-
cussed. The study sites and data collection section provide
detailed information regarding the study intersections and
data used for model development. In the results section,
the proposed method is compared with some of the most
conventional delay estimation methods and model trans-
ferability is explored. In the last section, the application
and policy implications of the proposed method are
articulated and finally, conclusions and recommendations
for future research are provided.

Methodology
The framework of the proposed pedestrian estimation
method is illustrated in Figure 1. Initially, potential
variables and the appropriate number of mixture com-
ponents that can explain pedestrian delay functions
are selected and then the mixtures are generated. At
this stage, three-model selection criteria are used to
select the best fit. Next, the performance of the devel-
oped method is examined and compared with several
conventional pedestrian estimation methods. Finally, Figure 1. Study framework.
the model transferability is tested. In the following
subsections, detailed information regarding each stage
is thoroughly discussed. where x represents the observable pedestrian delay, pk
represents the mixture weights, and N ðx j lk , Rk Þ
represents the component Gaussian densities. The
Generating mixture models component Gaussian density is a D-variate Gaussian
Based on the central limit theorem, when the sample function such that:
size is large, the mean of the response variable is h i
1 12ðxlk Þ' R1 ðxl Þ
approximately normally distributed. Based on this the- N ðx j lk , Rk Þ ¼ D 1
e k k
(2)
orem, in this study, a Gaussian mixture model ð2pÞ 2 jRk j2
(GMM), one of the common finite mixture models, is
where lk is a component-specific mean vector, Rk is a com-
utilized to analytically characterize pedestrian delay
ponent-specific covariance matrix, and pk must satisfy:
distributions as a function of specific variables (e.g.,
traffic flow, cycle length, pedestrian effective green X
K
pk ¼ 1 (3)
duration). A GMM is a weighted sum, or superpos-
k¼1
ition, of K component Gaussian densities (also called
a mixture of Gaussians): For the mixing weights to satisfy the conditions of
being probabilities, pðxÞ  0 and N ðx j lk , Rk Þ  0
X
K
pðxÞ ¼ pk N ðx j lk , Rk Þ (1) implies that pk  0 8 k: Therefore, values for pk are
k¼1 constrained to be between zero and one. Based on the
4 A. KARIMPOUR ET AL.

presented formulae, the form of the Gaussian mixture important to train a few models that can be trans-
is controlled by lk , pk , and Rk , where they are esti- ferred to other intersections. The succeeding subsec-
mated via maximum likelihood: tion will discuss the approach used for testing the
" # transferability of the proposed method.
X
N X
K
LLðpk , lk , Rk Þ ¼ ln pk N ðx j lk , Rk Þ (4)
n¼1 k¼1
Transferability test
Assuming pedestrian delay has a Gaussian distribu-
The transferability of a model is defined as the appli-
tion, such that:
cation of a formulated and trained model from one
x  N k ðl k , Rk Þ (5) context to another context. For effective model trans-
the component-specific mean (l k Þ and the compo- ferability, theoretical and practical conditions should
nent-specific covariance matrix ðRk Þ are a function of be met. The theoretical condition describes the under-
covariates as below: lying behavioral process of the model in the applica-
tion and in the context where the model was
l k ¼ b0, k þ bj, k X þ e (6)
estimated. The practical condition describes the avail-
Rk ¼ Rk ðXÞ (7) ability of similar data sources in both the application
and in the context where the model was estimated.
where X is a vector of random variables. bi, k are the
In the case of the current study, transferability is
estimated coefficients of the variables for the kth
defined as developing and training a general model that
Gaussian mixture component using maximum likeli-
can be used to estimate pedestrian delay at all intersec-
hood. For this study cycle length, pedestrian effective
tions with similar traffic patterns in a network.
green duration, and traffic flow were used as the vari-
Transferability in this study is premised on parameter
ables, and based on the above formulation the pedes-
transferability using a log-likelihood ratio test from a
trian delay model was estimated. In the next
statistical or econometric model, or any model where
subsection, the conventional pedestrian delay estima-
previously estimated parameters from one scenario are
tion model used for performance evaluation and com-
used to estimate data from another (Karasmaa, 2001;
parison are summarized.
Sheela & Mannering, 2020; Wang et al., 2021; Yasmin
et al., 2015). The underlying ideology is to conduct a
GMMs performance evaluation and comparison log-likelihood ratio test, assess predictions, and/or assess
In order to compare the accuracy of the proposed parameter estimates. The current study applies this
method with existing literature, three conventional ideology by using previously estimated parameters in a
pedestrian delay estimation approaches new context, in this case, alternate intersections. The
were considered. proposed approach in the current study most closely
resembles one of the techniques presented by Wang
1. HCM 2010 (similar to HCM 6th edition) et al. (2021), in which various prediction metrics are
(Elefteriadou, 2016; Wang & Tian, 2010): assessed by using previously estimated parameters to
estimate data from an alternate intersection (Wang
0:5 ðC  gÞ2 et al., 2021). To evaluate the accuracy of the transferred
dp ¼ (8)
C model, multiple measures of effectiveness (MOEs) are
2. Virkler method (Virkler, 1998): selected, and the accuracy of the transferred model is
evaluated with actual data. In addition, the performance
ðc  ðg þ 0:69AÞÞ2
dp ¼ (9) of the transferred model is evaluated using a baseline
2c method. A detailed description of the transferability test
3. Dunn method (Dunn & Pretty, 1984) is provided in the “Transferability Test” subsection.
ðg þ 10Þ2
dp ¼ (10)
2ðg þ 15Þ Study sites and data collection
where dp denotes the average delay per pedestrian (s), Ina Road corridor in Pima County, Arizona was
C denotes the cycle length (s), g is the pedestrian selected as the study corridor. This is a west-east cor-
effective walk time (s), and A is the clearance time (s). ridor connecting the east side of Tucson to Interstate
Training estimation models is usually a costly, com- 10, with a speed limit of 45 mi/h. This corridor is a
plex, and time-consuming procedure. Therefore, it is multimodal arterial with high volumes of passenger
JOURNAL OF INTELLIGENT TRANSPORTATION SYSTEMS 5

Figure 2. (a) In a road corridor and case study intersections; (b) sample ring barrier diagram.

cars, transit, and pedestrian activity. Four major sig- historical monitoring. The SmartView sensors provide
nalized intersections: (1) W Ina Rd. & N Camino De vehicle detection, traffic classification, and high-reso-
La Tierra, (2) W Ina Rd. & N Shannon Rd., (3) W lution event-based data (Miovision Team, 2020),
Ina Rd. & N La Cholla Blvd., and (4) W Ina Rd. & N which is accessible through an application program-
La Ca~ nada Dr. on this corridor, as illustrated in ming interface (API). The raw data include cycle
Figure 2a, were selected as study locations. All the length, vehicle and pedestrian delay, and pedestrian
intersections have four legs, with two through move- effective green duration. For this study, data from
ment lanes for the major streets and dedicated left- January and February 2020 were obtained from the
turn lane(s) that separate left-turning vehicles from API for the four selected intersections.
through movements. A ring barrier diagram for the
signal timing plan deployed on these intersections is
Data preprocessing
illustrated in Figure 2b. Pima County Department of
Transportation (PCDOT) oversees the operation of Before conducting any data analysis, an outlier filter-
this corridor. The selected intersections follow actu- ing algorithm was applied to the raw data to remove
ated-coordinated timing, with the major approaches any erroneous observations. The outlier filtering algo-
in coordination (Phases 2 and 6) while the minor rithm used in this study was the moving interquartile
approaches are actuated (Phases 4 and 8) only during range (IQR). The IQR is the difference between the
peak periods. 25th percentile (Q [1]) and the 75th percentile (Q [3])
These intersections were specifically chosen as of the data (IQR ¼ Q½3  Q ½1Þ: Based on the defin-
study locations as they are all equipped with ition by Tukey (1977), all data that fall outside of the
Miovision’s SmartView 360 video-based sensors. inner fences f1 ¼ Q ½1  1:5ðIQRÞ and f2 ¼
These video-based sensors provide both real-time and Q ½3 þ 1:5ðIQRÞ a re considered outliers
6 A. KARIMPOUR ET AL.

(Schwertman & de Silva, 2007; Tukey, 1977). The The results of the outlier analysis for cycle length,
event-based data used in this study (cycle length, vehicle and pedestrian delay, and pedestrian effective
vehicle and pedestrian delay, and pedestrian effective green duration are tabulated in Table 1. For all the
green duration) are not all normally distributed. variables, the pattern associated with the percent
Therefore, it is important to use an outlier filtering change showed a similar trend for all the intersec-
algorithm that is able to handle non-normally distrib- tions. It is worth mentioning that pedestrian delay
uted data. Since IQR outlier filtering uses the first and values in Table 1 are calculated based on the high-
third quartile as the filter range, it is known to be resolution event-based data collected using
more robust than other outlier filtering approaches. In Miovision’s SmartView 360 sensors at each intersec-
addition, traffic pattern and signal timing parameters tion. Pedestrian Delay in this table displays the differ-
(e.g., cycle length) frequently change during the ence in time between pedestrian pushbutton actuation
course of the day. Therefore, in this study, a moving during the Don’t Walk phase and when the phase
IQR outlier filtering algorithm was used, which turns to Walk.
grouped data into 15-min blocks and calculated the The magnitude of percent change among various
inner fences (f1 and f2) for each block. At each block, variables and intersections clearly outlines the import-
all the data samples outside of the inner fences are ance of outlier filtering. Among the variables, only
considered outliers. Previous studies also showed that traffic flow does not significantly change before and
IQR outlier filtering is known to be a strong filtering after outlier filtering. Based on the results from Table
algorithm while dealing with time-series data and has 1, W Ina Rd. & N La Cholla Blvd. and W Ina Rd. &
been applied frequently for removing outliers in many N La Ca~ nada Dr. are experiencing a higher amount of
traffic-related studies (Li et al., 2014; Zang pedestrian delay while crossing the major streets com-
et al., 2018). pared to the other two intersections. This could be
After outlier filtering, there was over a 4% reduc- explained by the traffic patterns at these two intersec-
tion in the vehicle-related data and a 16% reduction tions. W Ina Rd. & N La Cholla Blvd. and W Ina Rd.
in the pedestrian-related data. The outlier distribution & N La Ca~ nada Dr have a much higher traffic volume
showed that over 45% of all outliers for the vehicle- on their minor streets compared to W Ina Rd. & N
related data were during off-peak hours and 55% dur- Camino De La Tierra and W Ina Rd. & N Shannon
ing peak hours. This is intuitive because a larger Rd. To shed additional light on the traffic patterns on
amount of data were recorded during the peak hours. minor and major streets, they are visualized at each
A similar trend was also observed for pedestrian- intersection for different approaches in Figure 4.
related data. A more in-depth analysis was carried out Based on the traffic patterns shown in Figure 4, on
on the data to illustrate the outlier distribution the major streets (EBT and WBT), the traffic pattern
throughout the day. As expected, the density of the is similar for all the intersections. For all the intersec-
outliers was primarily concentrated during peak peri- tions, the peak traffic starts forming around 6:00 a.m.,
ods. Figure 3 illustrates the outlier and nonoutlier with the highest volume of 1,288 vehicles per hour
observations for cycle length, traffic flow, pedestrian observed at 7:30 a.m. and 1,600 vehicles per hour
delay, and pedestrian effective green duration (dark observed at 3:00 p.m. The traffic pattern on minor
points represent outliers); Figure 3a shows the plots streets (NBT & SBT) is higher for W Ina Rd. & N La
related to major approaches, and Figure 3b shows the Cholla Blvd. and W Ina Rd. & N La Ca~ nada Dr. com-
plots related to minor approaches. pared to the other two intersections; this is due to the
All of the intersections follow an actuated-coordi- presence of local businesses and surrounding land use.
nated timing plan during peak periods (6:00  9:00 Among all the intersections, W Ina Rd. & N Camino
a.m. and 3:00  6:00 p.m.) with a maximum cycle De La Tierra experiences the lowest amount of traffic
length of 150 s, and during off-peak periods operate on the minor streets, with the highest observed peak
under a free, non-coordinated timing plan. Therefore, of 84 vehicles per hour. To temporally align the ped-
as illustrated in Figure 3, the cycle length might be estrian data with the vehicle data, all the data were
extended to 1,000 s during off-peak hours. It is worth aggregated into 15-min bins and the vehicle data were
mentioning that some of the outliers were not shown aligned with the pedestrian data, as appropriate, at
in the figure simply because they were automatically each intersection. Upon aligning the two datasets, a
removed due to restrictions imposed on the y- finite mixture model was generated for each intersec-
axis limits. tion under consideration.
JOURNAL OF INTELLIGENT TRANSPORTATION SYSTEMS 7

Figure 3. Outlier vs non-outliers distribution on time of day: (a) major approaches, (b) Minor Approaches.
8
Table 1. Descriptive statistics before and after outlier analysis.
W Ina Rd. & N Camino De La Tierra W Ina Rd. & N Shannon W Ina Rd. & N La Cholla W Ina Rd. & N La
(Minor Streets) Rd. (Minor Streets) Blvd. (Minor Streets) Can
~ada Dr. (Minor Streets)
With
outliers No outliers % Change With outliers No outliers % Change With outliers No outliers % Change With outliers No outliers % Change
Cycle Length (Sec) Sample Size 42,490 40,807 –3.96 47,559 46,310 –2.63 53,643 51,397 –4.19 56,350 53,949 –4.26
Mean 127.18 120.71 –5.09 110.93 108.07 –2.58 111.96 110.16 –1.61 103.95 102.32 –1.57
Median 124.03 122.58 –1.17 107.99 106.91 –1.00 115.98 113.30 –2.32 104.08 101.48 –2.50
St. Dev. 82.12 53.79 –34.50 60.74 47.58 –21.67 44.49 41.42 –6.90 46.70 43.01 –7.89
A. KARIMPOUR ET AL.

95th Percentile 189.49 167.17 –11.78 167.01 163.05 –2.37 167.74 165.83 –1.14 160.34 158.73 –1.00
Traffic Flow (Veh/Hr) Sample Size 9,955 9,597 –3.60 8,485 8,319 –1.96 9,063 8,960 –1.14 8,827 8,727 –1.13
Mean 22.28 22.23 –0.21 124.08 123.93 –0.12 300.66 302.50 0.61 340.00 345.63 1.66
Median 20.00 20.00 0.00 132.00 132.00 0.00 336.00 336.00 0.00 344.00 344.00 0.00
St. Dev. 14.77 14.21 –3.79 91.77 90.67 –1.20 231.44 231.20 0.11 274.19 274.28 0.03
95th Percentile 48.00 48.00 0.00 276.00 272.00 –1.45 692.00 692.00 0.00 800.00 804.00 0.50
Pedestrian Delay (Sec) Sample Size 1,133 1,082 –4.50 2,564 2,491 –2.85 1,745 1,683 3.55 346 299 –13.58
Mean 74.97 47.82 –36.21 51.06 49.32 –3.41 64.27 57.80 –10.06 76.42 66.63 –12.81
Median 37.74 36.05 –4.49 39.56 39.27 –0.73 49.36 48.36 –2.02 57.69 59.16 2.54
St. Dev. 372.34 42.21 –88.66 59.49 38.63 –35.07 125.25 38.78 –69.03 22.59 39.44 74.60
95th Percentile 137.88 133.40 –3.25 127.92 125.80 –1.66 138.68 136.64 –1.47 142.85 140.17 –1.87
Effective Ped. Sample Size 1,133 977 –13.77 2,564 2,071 –19.23 1,745 1,477 –15.36 346 276 –20.23
Green Duration (Sec) Mean 5.00 5.00 0.12 6.98 7.00 0.29 4.99 5.00 0.20 4.99 5.00 0.26
Median 5.00 5.00 0.00 7.01 7.01 0.00 5.01 5.01 0.00 5.01 5.01 0.00
St. Dev. 0.03 0.01 –68.40 0.24 0.01 –96.04 0.03 0.01 –61.87 0.16 0.022 –86.03
95th Percentile 5.01 5.01 –0.02 7.02 7.02 –0.03 5.01 5.01 –0.04 5.01 5.01 –0.02
JOURNAL OF INTELLIGENT TRANSPORTATION SYSTEMS 9

Figure 4. Traffic patterns during day by intersection and approach.

Estimation results Next, to select the number of GMMs a range of


K ¼ 1:20 mixtures were selected. Based on the Akaike
Generating mixture model
Information Criterion (AIC) and Bayesian informa-
Identifying the specific variables that can explain the tion criterion (BIC) values, a graph relating the num-
pedestrian delay distribution function is beneficial ber of GMMs with the corresponding AIC and BIC
when the measured pedestrian delay information is was developed. Based on this graph it was observed
not available. In this study, three variables, cycle that after four GMMs, the AIC/BIC values diverged.
length, pedestrian effective green duration, and traffic Therefore, further model development was only done
flow were used as potential candidates to model ped- for two, three, and four mixture components.
estrian delay. To avoid multicollinearity issues during Seven models with various combinations of the var-
the modeling process, the correlation between these iables with two, three, and four mixture components
variables was calculated using Pearson and Spearman were trained for each intersection. To identify the
correlation tests (Pearson, 1895; Wissler, 1905). Figure
most appropriate combination of variables and the
5 depicts the correlation among these variables.
best number of mixture components, the AIC, BIC,
The correlation results indicate that there exists a
and log-likelihood values were estimated for the indi-
negative correlation between traffic flow and the ped-
vidual models (Tables 2–5). The AIC value denotes
estrian effective green duration, cycle length, and ped-
the relative distance between the true and the esti-
estrian effective green duration, and a positive
correlation between cycle length and traffic flow. mated likelihood function of the observed data
However, for all three cases, the magnitude of the cor- (Hirotugu, 1974), the BIC value denotes an estimate
relation is lower than 0.30 and therefore no strong of a function of the posterior model being accurate
correlation could be observed. (Stone, 1979), and the log-likelihood measures the
goodness of fit of a statistical model. Therefore, a
10 A. KARIMPOUR ET AL.

Figure 5. Correlation analysis among the variables.

Table 2. Gaussian mixture model results for W Ina Rd. & N Camino De La Tierra.
Two-component Three-component Four-component
Independent variables AIC BIC Log-likelihood AIC BIC Log-likelihood AIC BIC Log-likelihood
Traffic flow 9,445.95 9,480.05 –4,715.98 9,376.60 9,430.19 –4,677.30 9,384.40 9,457.48 –4,677.20
Cycle length 9,389.37 9,423.48 –4,687.68 9,329.96 9,383.55 –4,653.98 9,309.87 9,382.96 –4,631.67
Ped. green duration 9,514.77 9,548.87 –4,750.38 9,388.69 9,442.29 –4,683.35 9,358.46 9,431.54 –4,664.23
Traffic flow þ cycle length 9,392.75 9,436.60 –4,687.38 9,323.38 9,391.59 –4,647.69 9,323.53 9,391.74 –4,647.76
FLOW þ Ped. green duration 9,448.99 9,492.84 –4,715.49 9,380.13 9,448.31 –4,676.06 9,392.34 9,484.91 –4,677.17
Cycle length þ Ped. green duration 9,392.77 9,436.62 –4,687.38 9,334.26 9,402.47 –4,653.13 9,301.35 9,393.92 –4,639.94
Cycle length þ Ped. green 9,453.77 9,507.36 –4,715.88 9,325.09 9,407.92 –4,645.54 9,399.99 9,482.82 –4,682.99
duration þ traffic flow

Table 3. Gaussian mixture model results for W Ina Rd. & N Shannon Rd.
Two-component Three-component Four-component
Independent variables AIC BIC Log-likelihood AIC BIC Log-likelihood AIC BIC Log-likelihood
Traffic flow 17,742.68 17,781.13 –8,864.34 17,666.70 17,727.11 –8,822.35 17,603.46 17,685.84 –8,786.73
Cycle Length 17,661.51 17,699.95 –8,823.75 17,567.87 17,628.28 –8,772.94 17,541.51 17,623.89 –8,755.76
Ped. Green Duration 17,753.78 17,792.22 –8,869.89 17,675.22 17,735.62 –8,826.61 17,682.78 17,765.15 –8,826.39
Traffic Flow þ Cycle Length 17,664.38 17,713.81 –8,823.19 17,572.36 17,649.24 –8,772.18 17,536.46 17,640.81 –8,749.23
FLOW þ Ped. Green Duration 17,746.18 17,795.61 –8,864.09 17,670.05 17,746.93 –8,821.02 17,679.78 17,784.12 –8,820.89
Cycle Length þ Ped. Green Duration 17,664.88 17,714.31 –8,823.44 17,570.38 17,647.27 ––8,771.19 17,541.87 17,646.22 –8,751.94
Cycle Length þ Ped. Green 17,667.72 17,728.13 –8,822.86 17,574.62 17,667.98 –8,770.31 17,545.22 17,671.53 –8,749.61
Duration þ Traffic Flow

lower AIC and BIC mean a model closer to the real most appropriate model. Therefore, the models with
data and a higher log-likelihood (i.e., a value closer to the highest log-likelihood values (closest to zero) were
zero) shows a model with a better fit. selected as the final model representing pedestrian
Being that maximum likelihood estimation was delay at each intersection (shaded in gray in Tables
used to estimate the Gaussian mixture component 2ble 5).
parameters (lk , pk , and Rk ), the log-likelihood criter- Based on the log-likelihood values for W Ina Rd.
ion was selected as the measure for identifying the and N Shannon Rd., W Ina Rd. and N La Cholla
JOURNAL OF INTELLIGENT TRANSPORTATION SYSTEMS 11

Table 4. Gaussian Mixture Model Results for W Ina Rd. & N La Cholla Blvd.
Two-component Three-component Four-component
Independent variables AIC BIC Log-likelihood AIC BIC Log-likelihood AIC BIC Log-likelihood
Traffic flow 12,366.95 12,402.76 –6,176.47 12,357.53 12,413.80 –6,167.77 12,335.95 12,412.69 –6,152.98
Cycle length 12,310.50 12,346.31 –6,148.25 12,294.42 12,350.69 –6,136.21 12,270.81 12,347.54 –6,120.40
Ped.green duration 12,380.55 12,416.36 –6,183.28 12,370.76 12,427.03 –6,174.38 12,367.06 12,443.79 –6,168.53
Traffic flow þ cycle length 12,310.45 12,356.49 –6,146.23 12,298.41 12,370.03 –6,135.20 12,276.85 12,374.05 –6,119.43
Traffic flow þ Ped. green duration 12,370.39 12,416.43 –6,176.19 12,432.65 12,361.03 –6,166.52 12,355.84 12,453.04 –6,158.92
Cycle length þ Ped. green duration 12,314.45 12,360.49 –6,148.22 12,299.28 12,370.89 –6,135.64 12,290.43 12,387.63 –6,126.22
Cycle length þ Ped. green 12,314.44 12,370.71 –6,146.22 12,300.48 12,387.44 –6,133.24 12,302.04 12,419.69 –6,128.02
duration þ traffic flow

Table 5. Gaussian mixture model results for W Ina Rd. & N La Ca~nada Dr.
Two-component Three-component Four-component
Independent variables AIC BIC Log-likelihood AIC BIC Log-likelihood AIC BIC Log-likelihood
Traffic flow 2,357.04 2,381.20 –1,171.52 2,359.11 2,397.07 –1,168.56 2,367.56 2,419.33 –1,168.78
Cycle length 2,332.41 2,356.57 –1,159.21 2,327.60 2,365.56 –1,152.80 2,333.08 2,384.84 –1,151.54
Ped. green duration 2,354.68 2,378.84 –1,170.34 2,361.58 2,399.54 –1,169.79 2,369.90 2,421.66 –1,169.95
Traffic flow þ cycle length 2,336.42 2,367.48 –1,159.21 2,331.39 2,379.71 –1,151.70 2,332.87 2,398.44 –1,147.44
Traffic flow þ Ped. green duration 2,356.15 2,387.21 –1,169.07 2,363.79 2,412.11 –1,167.90 2,356.91 2,405.23 –1,164.46
Cycle length þ Ped. green duration 2,329.4 2,360.46 –1,155.70 2,324.92 2,373.24 –1,148.46 2,324.93 2,373.25 –1,148.47
Cycle length þ Ped. green 2,333.4 2,371.36 –1,155.70 2,330.69 2,389.35 –1,148.34 2,342.75 2,422.12 –1,148.37
duration þ Traffic flow

Table 6. Detailed information on the selected models.


Estimated coefficient
Intersection Variables GMM #1 GMM #2 GMM #3 GMM #4
W Ina Rd. & N Camino De La Tierra Intercept 1,544.84 –38.17 –299.74 –1,907.37
Cycle length 0.50 0.02 0.13 1.47
Traffic flow – – – –
Effective pedestrian green duration –305.62 8.98 62.63 365.25
Variance (Rk ) 30.99 5.17 14.56 5.48
Prior probability (pk ) 0.41 0.32 0.20 0.06
W Ina Rd. & N Shannon Rd. Intercept 10.46 10.12 6.83 11.63
Cycle length 0.11 0.74 0.02 0.31
Traffic flow 0.01 0.01 0.00 0.03
Effective pedestrian green duration – – – –
Variance (Rk ) 10 22.93 5.37 19.51
Prior probability (pk ) 0.28 0.23 0.17 0.32
W Ina Rd. & N La Cholla Blvd. Intercept 0.39 –4.16 –1.22 8.06
Cycle length 0.23 0.58 0.42 0.84
Traffic flow –0.01 0.03 0.00 0.00
Effective pedestrian green duration – – – –
Variance (Rk ) 13.98 11.35 19.9 16.1
Prior probability (pk ) 0.40 0.14 0.31 0.15
~ada Dr.
W Ina Rd. & N La Can Intercept –5.57 42.68 –29.69 –1.08
Cycle length 1.02 –0.30 0.52 0.66
Traffic flow –0.01 0.02 0.00 0.00
Effective pedestrian green duration – – – –
Variance (Rk ) 10.65 4.93 14.07 16.33
Prior probability (pk ) 0.16 0.06 0.48 0.30
GMM ¼ Gaussian Mixture Model.

Blvd., and W Ina Rd. and N La Ca~ nada Dr., a four- value for this intersection. The difference between the
component mixture model with cycle length and traf- log-likelihood of these two models is only 7.82.
fic flow as the variables was the best model that repre- The reason behind the difference in the variables
sents pedestrian delay distribution. For W Ina Rd. and that represent the pedestrian delay function for W Ina
N Camino De La Tierra a four-component mixture Rd. and N Camino De La Tierra compared to other
model with cycle length and pedestrian green duration intersections could be the traffic patterns on this
as variables was the best model. However, note that a intersection. Relatively, this intersection is experienc-
four-component mixture model with cycle length and ing a much lower traffic volume on its minor streets
traffic flow showed almost a similar log-likelihood (high volume of 84 vehicles per hour) compared to
12 A. KARIMPOUR ET AL.

Figure 6. Probability distribution function of pedestrian delay.

the other intersections. Therefore, traffic flow might effective pedestrian green duration be negative (it is
not be the best variable representing pedestrian delay associated with a decreased likelihood of observing
function at this intersection. pedestrian delay). In the remaining components, how-
ever, positive estimates are observed for an effective
pedestrian green duration and, relatively, large nega-
GMMs performance evaluation and comparison
tive estimates are observed for the estimated con-
After identifying the final variables and number of com- stants. At first glance, this may appear to be
ponents for the mixture model of each intersection, the counterintuitive but is likely a sign of the data being
estimated coefficients of these variables for each susceptible to large amounts of heterogeneity (unob-
Gaussian mixture component (bi, k ), the component-spe- servable that influence the outcome of interest), in
cific covariance (Rk ), and the prior probability of their which the constants are inherently capturing. That is
component (pk ) are tabulated in Table 6. The bi, k from to say, the effects of effective pedestrian green dur-
this table were inserted into Equation 6 to identify the ation may be heterogeneous within the remaining
center of each mixture component, the component-spe- Gaussian components due to a variety of unobservable
cific covariance was used to define the width of each that are related to spatial characteristics, operational
component, and the Prior Probability (pk ) was used to characteristics, pedestrian-specific behavior, etc.
define how big or small the Gaussian function will be. In regard to the additional characteristics (cycle
The premise behind selecting certain characteristics length and the traffic flow) that were considered, all
(e.g., signal schedule and traffic volume) stems from estimates are positive. These findings are plausible, as
previous analytical methods determining that these increases in cycle length and the traffic flow can lead
characteristics have a vital role in estimating pedes- to increased delay at intersection crossings.
trian delay (Elefteriadou, 2016; Qiao et al., 2002; Wei To provide a side-by-side accuracy evaluation, the
et al., 2015). The estimated coefficients in Table 6 can empirical histograms of the actual pedestrian delay
shed additional light on how these characteristics con- were plotted against the PDFs of the pedestrian delay
tribute to the likelihood of observing pedestrian delay. estimated by the mixture models, as shown in Figure
Intuitively, it is expected that the coefficient for 6; the blue line shows the PDF of the estimated delay
JOURNAL OF INTELLIGENT TRANSPORTATION SYSTEMS 13

Figure 7. Model evaluation-average delay.

using the mixture models and the gray boxes are the and clearance time as variables used for estimat-
empirical histograms of the pedestrian delay. ing the pedestrian delay function. The Virkler
The density plots in Figure 6 show that the esti- method is overestimating the pedestrian delay on
mated PDFs using four mixture components are able average by, 28.06, 22.27, 32.05, and 14.26 s on W
to capture and track the trend of the empirical histo- Ina Rd. & N Camino De La Tierra, W Ina Rd. &
gram. The mixture models were more robust while N Shannon Rd., W Ina Rd. & N La Cholla Blvd.,
estimating higher pedestrian delay. That is, the esti- and W Ina Rd. & N La Ca~ nada Dr, respectively.
mated PDF was much closer to the empirical histo- 3. Compared with Dunn and Virkler, HCM is the
gram of the data. The performance of the proposed most accurate method. The HCM method is also
method was also compared to three conventional ped- able to capture all the fluctuation of pedestrian
estrian delay estimation models. Figure 7 compares delay throughout the day. The HCM method only
the average pedestrian delay at the study intersections uses cycle length as a variable for defining the
with the estimated values from four different models. pedestrian delay function. However, since the
The following findings were observed from this figure. HCM method does not integrate the impact of
traffic patterns during the day, it sometimes heav-
1. Based on Equation 10, in the Dunn method, pedes- ily overestimates or underestimates the actual
trian effective green duration (g) is the only variable value. For instance, HCM overestimates pedes-
used for estimating the pedestrian delay function. The trian delay throughout the day for W Ina Rd. &
average amount of effective pedestrian green duration N Camino De La Tierra and W Ina Rd. & N La
for all the intersections is 5.68 s, with a variance of Cholla Blvd on average by 4.74 and 4.08s, respect-
0.94 s. Therefore, it is expected to see a constant flat ively; for W Ina Rd. & N Shannon Rd. overesti-
curve when estimating pedestrian delay. mates pedestrian delay on peak hours on average
2. The Virkler method is able to capture most of the by 3.82 s and underestimate during off-peak on
fluctuation of pedestrian delay throughout the average by 2.43 s; for W Ina Rd. & N La Ca~ nada
day, but it heavily overestimates the actual delay. Dr underestimate pedestrian delay on peak hours
Comparing with the proposed method, this on average by 12.07 s and overestimates during
method uses cycle length, effective green duration, off-peak on average by 2.97 s.
14 A. KARIMPOUR ET AL.

Table 7. Comparison results (average delay).


W Ina Rd. & W Ina Rd. & N W Ina Rd. & N La W Ina Rd. & N
Methods MOEs N Camino De La Tierra Shannon Rd. Cholla Blvd. La Ca~
nada Dr.
Proposed method RMSE (s) 14.69 12.32 11.63 15.69
MAE (s) 11.13 9.19 9.52 11.26
HCM 2010 RMSE (s) 38.98 40.24 39.46 42.92
MAE (s) 32.75 34.25 33.47 37.31
Virkler method RMSE (s) 52.77 44.15 49.62 41.95
MAE (s) 44.5 38.36 42.91 36.49
Dunn method RMSE (s) 61.02 60.12 69.86 76.74
MAE (s) 43.99 45.63 56.75 63.34

4. The proposed method is able to capture and track conventional methods irrespective of the intersection.
all the fluctuation of pedestrian delay during the The values in this table are based on the average delay
day. In addition, based on the results from Figure over the data collection period.
7, the proposed method is robust toward the RMSE gives a relatively high weight to larger errors
spikes happening during the day. This is because and is useful in cases where large magnitude errors
the proposed method uses both information from are particularly undesirable (Jao, 2011). The low
cycle length and traffic flow as variables repre- RMSE of the proposed method compared to other
senting pedestrian delay function (for one inter- methods shows the absence of large errors in the pro-
section cycle length and effective pedestrian posed method. MAE shows the absolute difference
delay duration). between the estimated and actual value. MAE does
not consider the direction of error, and all errors have
To quantitatively compare the accuracy of each equal weights. Comparing the amount of MAE among
method, Root-mean-squared-error (RMSE) and mean- the methods, the proposed method also estimates ped-
absolute-error (MAE) were selected as measures of estrian delay with better accuracy. Overall, the results
effectiveness (MOEs). RMSE is related to the standard of the method comparison show the superiority of the
deviation of the estimated error and measures the
proposed method and how the proposed method is
weighted average of estimation error, and MAE is
able to provide a more reliable, robust, and yet accur-
related to the absolute value of the estimation error.
ate estimate for pedestrian delay at signalized
Previous studies also suggested RMSE and MAE as
intersections.
good indicators of estimation accuracy (Hosseinpour
et al., 2013; Murat, 2006). RMSE and MAE can be cal-
culated based on the following equations: Transferability test
Training estimation models is usually a costly, com-
1. Root-mean-square error (RMSE): this measure is
plex, and time-consuming procedure. In addition, it is
the weighted average of fitting error. RMSE gives
not always feasible for agencies to collect sufficient
higher weights to larger errors.
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi traffic data at each intersection on a network to per-
1X n form variable selection and training. Therefore, it is
RMSE ¼ ðDF, t  DA, t Þ2 (11) important to train a few models that can be trans-
n i¼1
ferred to other intersections. In this study, the trans-
2. Mean absolute error (MAE): this measure shows ferability test was used to analyze whether the trained
the average prediction error (the average distance model for one intersection could be used to predict
of the fitted value from the actual value). the pedestrian delay in another intersection. To do so,
the disaggregated prediction transferability test was
Pn conducted. The following steps describe the transfer-
jDF, t  DA, t j ability test conducted in this study:
MAE ¼ i¼1
(12)
n
where n denotes the sample size, and Y^ t and Yt are  Step 1: Develop a pedestrian estimation model
the estimated and observed data, respectively. (PDE) for each intersection using the data from
that specific intersection. At this step, four PDEs
Table 7 quantifies the results of the method com- will be developed. PDE-1 describes the PDE
parison. The overall results of the MOEs clearly trained with data from intersection 1 (intersections
showed that the proposed method outperforms all the 1–4 are described in Table 8 using abbreviation
JOURNAL OF INTELLIGENT TRANSPORTATION SYSTEMS 15

Table 8. Results of the disaggregated prediction transferability test.


Predictions for
Int.#1: W Ina Rd. Int.#3: W Ina Int.#4: W Ina
& N Camino De Int.#2: W Ina Rd. & N La Rd. & N La
La Tierra Rd. & N Shannon Rd. Cholla Blvd. Ca~nada Dr.
RMSE (s)
Int.#1: W Ina Rd. & N Camino
De La Tierra (PDE-1) – 17.64 (40.24) 19.17 (39.46) 22.35 (42.92)
Int.#2: W Ina Rd. &
N Shannon Rd. (PDE-2) 10.91 (38.98) – 12.34 (39.46) 14.78 (42.92)
Int.#3: W Ina Rd. & N
La Cholla Blvd. (PDE-3) 15.40 (38.98) 14.70 (40.24) – 13.06 (42.92)
Int.#4: W Ina Rd. & N
Trained based0 on La Can~ada Dr. (PDE-4) 19.56 (38.98) 18.10 (40.24) 17.91 (39.46) –
MAE (s)
Trained Based On Int.#1: W Ina Rd. & N – 13.96 (34.25) 15.52 (33.47) 18.65(37.31)
Camino De La Tierra (PDE-1)
Int.#2: W Ina Rd. & N 8.95 (32.75) – 9.86 (33.47) 12.33 (37.31)
Shannon Rd. (PDE-2)
Int.#3: W Ina Rd. & N La 12.65 (32.75) 11.77 (34.25) – 9.23 (37.31)
Cholla Blvd. (PDE-3)
Int.#4: W Ina Rd. & N La 14.75 (32.75) 12.94 (34.25) 12.47 (33.47) –
Ca~nada Dr. (PDE-4)
Model accuracy using HCM 2010.

Int.#). Similarly, we developed PDE-2, PDE-3, and and the RMSE of using the HCM method to esti-
PDE-4 (refer to Table 8). mate pedestrian delay on W Ina Rd. & N
 Step 2: Each PDE is used to estimate the pedes- Shannon Rd is 40.24 s.
trian delay at other intersections. For instance, 2. “14.75 (32.75)” in Table 8 shows that the MAE of
PDE-12 means the model trained using data from predicting pedestrian delay on W Ina Rd. & N
intersection 1, was used to estimate the pedestrian Camino De La Tierra using a mixture model
delay at intersection 2. Similarly, we developed trained for W Ina Rd. & N La Ca~ nada Dr is
PDE-13, PDE-14, etc. 14.75 s and the MAE of using the HCM method
 Step 3: In order to evaluate the accuracy of the to estimate pedestrian delay on W Ina Rd. & N
prediction, RMSE and MAE were calculated. Camino De La Tierra is 32.75 s.
 Step 4: To compare the performance of the trans-
ferred model, the HCM method was also used as The results provide convincing arguments that the
the base comparison method. It is worth mention- model trained for one intersection could be used to
ing that for the HCM method, the data of each accurately predict pedestrian delay on other intersec-
intersection were used to estimate pedestrian delay tions. For all the intersections, the values of the trans-
at that intersection; therefore, all the values in each ferred model RMSE and MAE were extremely lower
column of Table 8 have similar RMSE and MAE at than the HCM method. Meaning that even if transporta-
each column. tion agencies estimate only one PDE for one intersection
and use it to predict the pedestrian delay on other inter-
The results of the disaggregated prediction transferabil- sections, they are able to achieve a more reliable and
ity test are summarized in Table 8 (the values in Table 8 accurate estimation.
are based on the average delay over the data collection The results of the disaggregated prediction transferabil-
period). The rows in Table 8 show the intersection used ity test showed that the proposed method is transferable to
for training the model, and the columns show the intersec- other intersections with similar specifications. Therefore,
tion used for prediction. The values in the parenthesis are to save time and money, transportation agencies could use
the prediction accuracy using the HCM method. Below the trained model for one intersection and estimate delay
are two examples of how to read this table: at other intersections, with similar specifications.

1. “17.64 (40.24)” in Table 8 shows that the RMSE


Application and policy implications
for predicting pedestrian delay on W Ina Rd. & N
Shannon Rd. using a mixture model trained for Walking and bicycling are critical components in the
W Ina Rd. & N Camino De La Tierra is 17.64 s, development of healthy and sustainable communities.
16 A. KARIMPOUR ET AL.

These two groups of users should have the highest explain the pedestrian delay function. Then, Gaussian
priority because they are the most vulnerable users in mixture modeling was applied to characterize pedestrian
the transportation system. One of the major deterrents delay distributions as a function of these specific varia-
is the inordinate delay they experience while crossing bles. Four intersections on Ina Road were selected as
multilane arterials at signalized intersections in major the study locations and individual models were trained
cities. High delays at intersections, particularly when for each intersection. Initially, it was identified that
vehicle traffic volume is low, can lead to frustration cycle length and traffic flow were the most appropriate
and risky behaviors. variables that could explain the pedestrian delay func-
Traditionally, transportation strategies have been pri- tion for three out of four intersections and cycle length
marily focused on improving mobility and maximizing and effective pedestrian green duration were the most
the speed and flow of motorists on the roadway, rather appropriate variables that could explain the pedestrian
than considering the value of time for nonmotorists. It delay function for the other intersection. The results of
has been postulated that pedestrians’ level of frustration estimating the pedestrian delay using the trained model
grows when the delay exceeds 30–50 s at a signalized at each intersection showed the proposed method was
intersection (Guo et al., 2012; Vallyon et al., 2011). able to capture and track the actual delay fluctuation
Using the proposed method, system operators can easily during the day with an average 10% of mean-absolute-
determine the proportion of time when pedestrians error. Further, the result of the test of disaggregated
experience delays larger than a predefined threshold. prediction showed that the proposed method was trans-
For example, pedestrians crossing the major street at ferable to other intersections with similar specifications.
the intersection of W Ina Rd. & N Camino De La The application of the proposed method can be
Tierra experience delays greater than 30 and 50 s, 70% beneficial to transportation agencies in three capaci-
and 51% of the time, respectively. Similarly, at W Ina ties: (1) provide a more reliable, robust, and accurate
Rd. & N Shannon Rd., W Ina Rd. & N La Cholla approach for estimating pedestrian delay at signalized
Blvd., and W Ina Rd. & N La Ca~ nada Dr., pedestrians intersections where sensors are not available to collect
experience delays greater than 30 s, 73%, 85%, and 86% pedestrian delay, (2) a tool to develop pedestrian delay
of the time, respectively. Furthermore, pedestrians at PDFs for analyzing the risk of pedestrians violating
these intersections experience delays greater than 50 s, the signal, and (3) train a network-wide model for
51%, 65%, and 66% of the time respectively. The high estimating pedestrian delay at all intersections without
proportions of pedestrians experiencing delays greater the need to use additional resources. Furthermore, the
than existing thresholds defined in the literature indi- presented transferability results can be advantageous
cate that these intersections may be prone to higher to transportation agencies within Arizona and urban
risk-taking behaviors. The methods proposed in this areas with similar characteristics by providing insight
study allow system operators to quantitatively assess into what model specifications may provide the best
existing delays and enact changes to incorporate the pedestrian delay prediction accuracy.
better serve pedestrian needs. Some of the potential limitations of this study are
as follows. Due to the data limitation, the transferabil-
ity of the model was only tested on intersections in
Conclusion
Pima County, Arizona. Additionally, only cycle length,
Delay is one of the most used indices for signalized pedestrian effective green duration, and traffic flow
intersections that quantify the intersection operation were used as variables in the pedestrian delay function
level of service. Therefore, it is important to estimate based on data available from the sensors. While the
and optimize delay for all modes of transportation, limitations discussed here should not have a signifi-
including pedestrian delay at signalized intersections. cant impact on the results of this study, additional
The pedestrian delay is important to safety specialists research should be conducted to extend the study’s
because it is closely linked to pedestrian signal com- findings. More studies would be needed to compre-
pliance. A higher amount of pedestrian delay will hensively evaluate if the proposed model could be trans-
increase pedestrian frustration and noncompliance. ferred to other jurisdictions and counties. In addition,
In this study, a novel method based on finite mixture other variables could be considered as the potential vari-
modeling was proposed to indirectly estimate pedestrian ables for estimating pedestrian delay. Furthermore, the
delay using high-resolution event-based data collected at behavior of pedestrian delay while accounting for het-
the intersections. Initially, the proposed method was erogeneity (pedestrian-delay-specific, component-specific,
used to identify the specific variables that can best etc.) should be considered in future endeavors. For the
JOURNAL OF INTELLIGENT TRANSPORTATION SYSTEMS 17

transferability section, due to data availability, transfer- Journal of Intelligent Transportation Systems, 18(2),
ability could only be assessed within-county. Future 164–174. https://doi.org/10.1080/15472450.2013.802151
Habibian, M., & Hosseinzadeh, A. (2018). Walkability index
research using data from other MPOs, states, etc., is rec-
across trip purposes. Sustainable Cities and Society, 42,
ommended to further validate the transferability results 216–225. https://doi.org/10.1016/j.scs.2018.07.005
presented in the current study. HCM (2010). Transportation Research Board of the National
Academies. Google Scholar.
Hernandez, S. (2017). Estimation of average payloads from
Acknowledgments weigh-in-motion data. Transportation Research Record:
Journal of the Transportation Research Board, 2644(1),
The authors would also like to thank the Pima County
39–47. https://doi.org/10.3141/2644-05
Department of Transportation for data support.
Hernandez, S., & Hyun, K. (2020). Fusion of weigh-in-
motion and global positioning system data to estimate
Funding truck weight distributions at traffic count sites. Journal of
Intelligent Transportation Systems, 24(2), 201–215. https://
This project was funded by the National Institute for doi.org/10.1080/15472450.2019.1659793
Transportation and Communities (NITC; grant number Hirotugu, A. (1974). A new look at the statistical model
1298) a U.S. DOT University Transportation Center. identification. IEEE Transactions on Automatic Control,
19(6), 716–723.
Hosseinpour, M., Yahaya, A. S., Ghadiri, S. M., & Prasetijo,
ORCID J. (2013). Application of adaptive neuro-fuzzy inference
system for road accident prediction. KSCE Journal of
Abolfazl Karimpour http://orcid.org/0000-0002-
Civil Engineering, 17(7), 1761–1772. https://doi.org/10.
8707-6408
1007/s12205-013-0036-3
Jason C. Anderson http://orcid.org/0000-0001-9189-5345
Huang, T., Poddar, S., Aguilar, C., Sharma, A., Smaglik, E.,
Sirisha Kothuri http://orcid.org/0000-0002-2952-169X
Kothuri, S., & Koonce, P. (2018). Building intelligence in
Yao-Jan Wu http://orcid.org/0000-0002-0456-7915
automated traffic signal performance measures with
advanced data analytics. Transportation Research Record:
References Journal of the Transportation Research Board, 2672(18),
154–166. https://doi.org/10.1177/0361198118791380
Bakhshi, A. K., & Ahmed, M. M. (2021). Practical advantage of Hubbard, S. M., Bullock, D. M., & Day, C. M. (2008).
crossed random intercepts under Bayesian hierarchical mod- Integration of real-time pedestrian performance measures
eling to tackle unobserved heterogeneity in clustering critical into existing infrastructure of traffic signal system.
versus non-critical crashes. Accident Analysis & Prevention, Transportation Research Record: Journal of the
149, 105855. https://doi.org/10.1016/j.aap.2020.105855 Transportation Research Board, 2080(1), 37–47. https://
Chilukuri, V., & Virkler, M. R. (2005). Validation of HCM doi.org/10.3141/2080-05
pedestrian delay model for interrupted facilities. Journal Jao, C. (2011). Efficient decision support systems: Practice
of Transportation Engineering, 131(12), 939–945. https:// and challenges from current to future. Intech. ISBN: 978-
doi.org/10.1061/(ASCE)0733-947X(2005)131:12(939) 953-307-326-2. https://doi.org/10.1016/10.5772/682
Day, C. M., Taylor, M., Mackey, J., Clayton, R., Patel, S. K., Karasmaa, N. (2001). The spatial transferability of the Helsinki
Xie, G., Li, H., Sturdevant, J. R., & Bullock, D. M. (2016). metropolitan area mode choice models. In Selected
Implementation of automated traffic signal performance Proceedings of the 9th World Conference on Transport
Research World Conference on Transport Research Society,
measures. Institute of Transportation Engineers (ITE), 86(8),
Seoul, Korea (27 pp.).
26–34.
Karimpour, A. (2020). Data-Driven Approaches for
Dunn, R., & Pretty, R. (1984). Mid-block pedestrian cross-
Assessing the Impact of Speed Management Strategies for
ings-an examination of delay. Australian Road Research,
Arterial Mobility and Safety. The University of Arizona.
12(4), 118–127. Karimpour, A., Ariannezhad, A., & Wu, Y.-J. (2019).
Elefteriadou, L. A. (2016). The highway capacity manual 6th Hybrid data-driven approach for truck travel time imput-
edition: A guide for multimodal mobility analysis. ITE ation. IET Intelligent Transport Systems, 13(10),
Journal, 86(4), 14–18. 1518–1524. https://doi.org/10.1049/iet-its.2018.5469
FHWA (2019). 2017 NHTS data user guide. Kothuri, S. M., Reynolds, T., Monsere, C. M., & Koonce, P.
Fi, I., & Igazv€
olgyi, Z. K. (2014). Travel time delay at pedes- (2012). Preliminary development of methods to automat-
trian crossings based on microsimulations. Periodica ically gather bicycle counts and pedestrian delay at sig-
Polytechnica Civil Engineering, 58(1), 47–53. https://doi. nalized intersections. In 91st Annual Meeting of the
org/10.3311/PPci.7406 Transportation Research Board, Washington DC, United
Guo, H., Wang, W., Guo, W., Jiang, X., & Bubb, H. (2012). States (14 pp.).
Reliability analysis of pedestrian safety crossing in urban Lao, Y., Zhang, G., Corey, J., & Wang, Y. (2012). Gaussian
traffic environment. Safety Science, 50(4), 968–973. mixture model-based speed estimation and vehicle classi-
https://doi.org/10.1016/j.ssci.2011.12.027 fication using single-loop measurements. Journal of
Guo, R., & Zhang, Y. (2014). Identifying time-of-day break- Intelligent Transportation Systems, 16(4), 184–196. https://
points based on nonintrusive data collection platforms. doi.org/10.1080/15472450.2012.706196
18 A. KARIMPOUR ET AL.

Lattimer, C. R. and Atkins North America. (2020). Society Series B (Methodological), 41(2), 276–278. https://
Automated traffic signals performance measures.. No. doi.org/10.1111/j.2517-6161.1979.tb01084.x
FWHA-HOP-20-002. United States. Federal Highway Team, M. (2020). Miovision SmartView 360. http://miovi-
Administration, 2020 sion.com/wp-content/uploads/Miovision_
Li, J.-Q., Zhou, K., Zhang, L., & Zhang, W.-B. (2012). A SMARTVIEW360_Oct23_2017_RGB.pdf
multimodal trip planning system with real-time traffic Tukey, J. W. (1977). Exploratory Data Analysis. Addison-
and transit information. Journal of Intelligent Wesley Publishing Company Reading, Mass. — Menlo
Transportation Systems, 16(2), 60–69. https://doi.org/10. Park, Cal., London, Amsterdam, Don Mills, Ontario,
1080/15472450.2012.671708 Sydney 1977, XVI, 688 S.
Li, Q., Wang, Z., Yang, J., & Wang, J. (2005). Pedestrian delay Vallyon, C., Turner, S., & Hodgson, S. (2011). Reducing
estimation at signalized intersections in developing cities. pedestrian delay at traffic signals. NZ Transport Agency.
Transportation Research Part A: Policy and Practice, 39(1), Virkler, M. R. (1998). Pedestrian compliance effects on sig-
61–73. nal delay. Transportation Research Record: Journal of the
Li, Y., Li, Z., & Li, L. (2014). Missing traffic data: comparison Transportation Research Board, 1636(1), 88–91. https://
of imputation methods. IET Intelligent Transport Systems,
doi.org/10.3141/1636-14
8(1), 51–57. https://doi.org/10.1049/iet-its.2013.0052
Wang, X., Chen, Z., Guo, Q., Tarko, A., Lizarazo, C., &
Mansourkhaki, A., Karimpour, A., & Sadoghi Yazdi, H.
Wang, X. (2021). Transferability analysis of the freeway
(2017a). Introducing prior knowledge for a hybrid accident
continuous speed model. Accident Analysis & Prevention,
prediction model. KSCE Journal of Civil Engineering, 21(5),
1912–1918. https://doi.org/10.1007/s12205-016-0495-4 151, 105944. https://doi.org/10.1016/j.aap.2020.105944
Mansourkhaki, A., Karimpour, A., & Sadoghi Yazdi, H. Wang, X., & Tian, Z. (2010). Pedestrian delay at signalized
(2017b). Non-stationary concept of accident prediction. intersections with a two-stage crossing design.
Proceedings of the Institution of Civil Engineers-Transport, Transportation Research Record: Journal of the
170(3), 140–151. https://doi.org/10.1680/jtran.15.00053 Transportation Research Board, 2173(1), 133–138. https://
Marisamynathan, S., & Vedagiri, P. (2013). Modeling pedes- doi.org/10.3141/2173-16
trian delay at signalized intersection crosswalks under mixed Wei, D., Liu, H., & Tian, Z. (2015). Vehicle delay estimation at
traffic condition. Procedia – Social and Behavioral Sciences, unsignalised pedestrian crosswalks with probabilistic yielding
104, 708–717. https://doi.org/10.1016/j.sbspro.2013.11.165 behaviour. Transportmetrica A: transport Science, 11(2),
Murat, Y. S. (2006). Comparison of fuzzy logic and artificial 103–118. https://doi.org/10.1080/23249935.2014.928758
neural networks approaches in vehicle delay modeling. Wissler, C. (1905). The Spearman correlation formula. Science,
Transportation Research Part C: Emerging Technologies, 22(558), 309–311. https://doi.org/10.1126/science.22.558.309
14(5), 316–334. Yang, Q., Wu, G., Boriboonsomsin, K., & Barth, M. (2018).
Pearson, K. (1895). VII. Note on regression and inheritance A novel arterial travel time distribution estimation model
in the case of two parents. Proceedings of the Royal and its application to energy/emissions estimation.
Society of London, 58(347–352), 240–242. Journal of Intelligent Transportation Systems, 22(4),
Pretty, R. (1979). The delay to pedestrians and vehicles at 325–337. https://doi.org/10.1080/15472450.2017.1365606
signalized intersections. ITE Journal, 49(5), 20–23. Yang, S., & Cooke, P. (2018). How accurate is your travel
Qiao, F., Yi, P., Yang, H., & Devarakonda, S. (2002). Fuzzy time reliability? – Measuring accuracy using bootstrap-
logic based intersection delay estimation. Mathematical ping and lognormal mixture models. Journal of Intelligent
and Computer Modelling, 36(11–13), 1425–1434. https:// Transportation Systems, 22(6), 463–477. https://doi.org/
doi.org/10.1016/S0895-7177(02)00298-4 10.1080/15472450.2017.1421075
Regehr, J. D., Maranchuk, K., Vanderwees, J., & Hernandez, Yasmin, F., Morency, C., & Roorda, M. J. (2015).
S. (2020). Gaussian mixture model to characterize pay- Assessment of spatial transferability of an activity-based
load distributions for predominant truck configurations
model, TASHA. Transportation Research Part A: Policy
and body types. Journal of Transportation Engineering,
and Practice, 78, 200–213.
Part B: Pavements, 146(2), 04020017. https://doi.org/10.
Zang, Z., Xu, X., Yang, C., & Chen, A. (2018). A closed-
1061/JPEODX.0000174
form estimation of the travel time percentile function for
Schwertman, N. C., & de Silva, R. (2007). Identifying out-
liers with sequential fences. Computational Statistics & characterizing travel time reliability. Transportation
Data Analysis, 51(8), 3800–3810. https://doi.org/10.1016/j. Research Part B: methodological, 118, 228–247. https://
csda.2006.01.019 doi.org/10.1016/j.trb.2018.10.012
Sheela, P. V., & Mannering, F. (2020). The effect of infor- Zegeer, C. V. (2002). Pedestrian facilities users guide:
mation on changing opinions toward autonomous vehicle Providing safety and mobility. Diane Publishing.
adoption: An exploratory analysis. International Journal Zheng, J., Ma, X., Wu, Y.-J., & Wang, Y. (2013). Measuring sig-
of Sustainable Transportation, 14(6), 475–487. https://doi. nalized intersection performance in real-time with traffic sen-
org/10.1080/15568318.2019.1573389 sors. Journal of Intelligent Transportation Systems, 17(4),
Smaglik, E. (2018). Guidance on signal control strategies for 304–316. https://doi.org/10.1080/15472450.2013.771105
pedestrians to improve walkability. Institute of Zheng, Y., & Elefteriadou, L. (2017). A model of pedestrian
Transportation Engineers. ITE Journal, 88(5), 35–39. delay at unsignalized intersections in urban networks.
Stone, M. (1979). Comments on model selection criteria of Transportation Research Part B: methodological, 100,
Akaike and Schwarz. Journal of the Royal Statistical 138–155. https://doi.org/10.1016/j.trb.2017.01.018

You might also like