Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/267097975

Getis–Ord Spatial Statistics to Identify Hot Spots by Using Incident Management


Data

Article  in  Transportation Research Record Journal of the Transportation Research Board · December 2010
DOI: 10.3141/2165-05

CITATIONS READS
34 4,242

2 authors:

Praprut Songchitruksa Xiaosi Zeng


Schlumberger Limited Texas A&M University
33 PUBLICATIONS   243 CITATIONS    13 PUBLICATIONS   259 CITATIONS   

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Praprut Songchitruksa on 09 July 2015.

The user has requested enhancement of the downloaded file.


Getis–Ord Spatial Statistics to Identify
Hot Spots by Using Incident
Management Data
Praprut Songchitruksa and Xiaosi Zeng

Traditionally, data have been collected to measure and improve the per- (HSID) methods—also known as methods to identify black spots,
formance of incident management (IM). While these data are less detailed sites with promise, accident-prone areas, and hazardous locations—
than crash records, they are timelier and contain useful attributes typ- have been employed to distinguish potentially problematic sites
ically not reported in the crash database. This paper proposes the use of from those sites that simply happened to have experienced a higher
Getis–Ord (Gi *) spatial statistics to identify hot spots on freeways from an than normal number of accidents during the course of observation.
IM database while selected impact attributes are incorporated into the HSID can be viewed as a sieve to minimize the number of locations
analysis. The Gi* spatial statistics jointly evaluate the spatial dependency that necessitate further safety inspections, and help efficiently allo-
effect of the frequency and attribute values within the framework of the cate safety investment resources to initiate proper remedial actions.
conceptualized spatial relationship. The application of the method was In other words, priorities can be set on the basis of results obtained
demonstrated through a case study by using the incident database from from the use of this technique. HSID is a legislative requirement that
the Houston, Texas, Transportation Management Center (TranStar). The reflects the Intermodal Surface Transportation Efficiency Act of 1991.
method successfully identified the clusters of high-impact accidents from The act dictated that each state develop strategies to identify safety
more than 30,000 accident records from 2006 to 2008. The accident dura- deficiencies within jurisdictional regions (2), as did amendments to
tion was used as a proxy measure of its impact. The proposed method the act in 1998 and 2005 (SAFETEA-LU).
could be modified, however, to identify the locations with high-valued Studies of HSID techniques have resulted in a wide range of meth-
impacts by using any other attributes, provided that they were either ods to identify hot spots in a more efficient and mathematically sound
continuous or categorical in nature and could provide meaningful impli- manner. One commonality across these methods is that they rely
cations. With improved intelligent transportation system infrastructure almost exclusively on the crash database. Such reliance, however, has
and communication technology, hot spot analyses performed with IM undesirable shortcomings, primarily because crash reports lack time-
data of freeway network and arterials in the vicinity have become a much liness. Alternatively, incident management (IM) data collected at
more promising alternative. Freeway management agencies can use the major transportation management centers (TMCs) offer a more timely
results of hot spot analysis to provide visualized information to aid the data source for hot spot analysis on the freeway network. These inci-
decision-making process in the design, evaluation, and management of dent data are routinely logged and archived on a near real-time basis,
IM strategies and resources. The limitations of the method and possible given that surveillance camera coverage is available. From all indica-
future research are discussed in the closing section of the paper. tions, no studies have yet been conducted to identify unusual spatial
patterns for the purpose of HSID through the use of an IM database.
In the study reported on here, the potential of HSID application
Over the past decade, more than 37,000 fatal crashes occurred every was examined by using the data attributes typically archived in the
year in the United States, which resulted in at least 1.3 highway IM database. The analyst no longer needs to wait several months to
fatalities per 100 million vehicle miles traveled (1). A comprehen- a year for crash data to become available for analysis. The incident
sive evaluation of every single entity to search for safety deficien- data attributes were also examined that could be useful for HSID. A
cies is impractical and economically indefensible. This motivates technique is proposed to explore and identify the spatial heterogene-
safety researchers to find an alternative approach to identify unsafe ity of these incident data attributes. The proposed methodology is
locations and properly set their priorities. demonstrated through a case study to identify high-risk locations on
Studies have demonstrated that crash history can be used to con- a freeway network with the use of the Houston, Texas, TranStar IM
struct the spatial distribution of crashes, which can then be used to database.
identify the locations with a tendency to exhibit an unusually high
degree of collision hazard (2–5). Various hot spot identification
HOT SPOT IDENTIFICATION

TransLink Research Center, Texas Transportation Institute, Texas A&M Univer- Attention paid to highway safety HSID started as early as the 1950s
sity, 3135 TAMU, College Station, TX 77843-3135. Corresponding author: when Norden et al. used statistical quality control techniques to ana-
P. Songchitruksa, praprut@tamu.edu. lyze highway accident data (3). Over the past half century, safety
researchers have continued to mine crash databases to estimate the
Transportation Research Record: Journal of the Transportation Research Board,
No. 2165, Transportation Research Board of the National Academies, Washington,
safety performance of various types of transportation entities. Cor-
D.C., 2010, pp. 42–51. respondingly, a number of issues related to safety estimation have
DOI: 10.3141/2165-05 been recognized and addressed in the HSID procedures.

42
Songchitruksa and Zeng 43

One issue is how hot spots are defined. The literature suggests two “fixed entities.” Geocoding of these crash events to their locations
paradigms: Hot spots may be referred as (a) fixed entities or as (b) clus- will preserve the spatial distribution of crash data and thus enable
ters of individual events. In the first paradigm, hot spot locations are the use of point pattern analysis for safety evaluation.
predetermined by fixed transportation entities (e.g., intersections,
roadway segments) and analysis of the aggregated crash data asso-
ciated with those locations determines if they are hot spots. The lat- Point Pattern Analysis
ter paradigm evaluates each crash individually and defines the hot
spots as those where unusual clusters of crash patterns are located. Point pattern analysis determines whether an observed distribution
This subtle difference has an important implication on how HSID of point events results from a random pattern, or whether it follows
methods have been developed. some systematic process so as to form a clustered or regular pattern
(20). This is similar to the concept defined for HSID in the context of
vehicle crashes. A number of methods have been developed for point
Overview of HSID Methods pattern analysis, such as nearest-neighbor distances, kernel density
estimation, and K-function (21). These methods focus on the distri-
Most HSID studies have focused on “fixed entities” and aggregated butional phenomena of the point events; for examples, population
the crash data for analyses. Several studies proposed that sites with analysis (22) and the distribution pattern of cancer patients (20).
significantly higher crash frequencies than expected be identified as Studies in the context of crash analysis include the comparison of the
hot spots (3–5). Hakkert and Mahalel and later McGuigan suggested distribution of crash event locations with population concentrations
that sites with promise be designated as sites with potential accident through use of the nearest-neighbor index (14) and of K-function
reduction (PAR), which is the difference between the observed analysis in the constraint network environment (21).
crash counts and the expected number of crashes (4–5). Both HSID These point pattern analyses account for spatial information but
methods rank crash sites with respect to either site crash frequencies still treat all sites equally regardless of their characteristics; in other
(or crash rates, or both) or PAR. These definitions imply that HSID words, each point is equally weighted. The spatial autocorrelation
can be based entirely on accident counts at the locations (6). An method is an advanced point-pattern-detection method, which incor-
alternative approach that relies on spatial analysis considers hot spot porates not only the locations of point events but also their associated
regions as a set of contiguous spatial units characterized by a high values if appropriately defined (23). Moran’s I, a popular measure of
number of accidents (7). spatial autocorrelation (24), and Geary’s C have been employed to
One simple HSID method is to rank accident sites with respect evaluate the existence of clusters in the spatial arrangement of motor
to site accident frequencies, accident rates, or both. The method is vehicle crashes. In effect, these indices indicate whether the appar-
straightforward but cannot address the regression-to-mean (RTM) ent similarity (or dissimilarity) in values (e.g., the number of crashes)
effect, and thus increases the likelihood of false positives in HSID at a given site and its neighbors is greater than expected in a random
(6–8). One commonly accepted approach to address the RTM distribution. While the Moran’s I either confirms the site of interest
effect is the empirical Bayes (EB) safety estimation method (6–7 ). as part of its surrounding sites (in cluster) or distinguishes the site
The EB method relies on the knowledge of two clues, which are the from the cluster, it cannot discriminate between patterns that are
known history of crashes at the site of concern and the expectation high-value dominant or low-value dominant (19, 24, 25). This prob-
about the safety of its “reference population” (6, 8–10). The EB lem can be considered in the context of the spatial distribution of
method is mathematically sound but can be complicated by the dif- total crash costs. The Moran’s I index value can distinguish the clus-
ficulty to establish proper reference entities (11). Building on the EB ters of crashes with extremely low and extremely high total costs
method, the full Bayes (FB) method is rapidly being developed and from the rest. But it cannot determine which specific clusters are the
is considered a generalized version of the EB method. The FB high-value ones. This inability to discriminate between hot spots
method can address the imperfections in the EB method, such as the and cold spots has led to consideration of a relatively new spatial
double-use of crash data and inadequacy of accounting for the autocorrelation method known as Getis–Ord (Gi*) spatial statistics.
“uncertainty” of associations of covariate and safety (12). The Gi* spatial statistics method was introduced by Getis and
The growing recognition of spatial information associated with the Ord. It can identify a tendency for positive spatial clustering and can
transportation data has introduced another critical element into the distinguish between the locations of high and low spatial associa-
HSID analysis. This leads to an alternative spatial analysis approach, tions (24, 25). The study reported here describes the use of Gi* spa-
which identifies hot spot regions as a set of “fixed” spatial units char- tial statistics to identify hot spots from an IM database, where selected
acterized by a high number of crashes (7). One important variable in incident attribute values were incorporated as proxies for incident
the HSID with spatial information is how to define the geographical impacts. The proposed statistics were able to simultaneously cap-
boundaries (e.g., lengths/radii and centers of spatial units) within ture the frequency of the events, the associated values, and spatial
which crash data can be aggregated. The definitions of boundaries correlation. The Gi* spatial statistics are described in the subsequent
can vary, ranging from the resolutions of point in space (i.e., inter- sections. The application of statistics was demonstrated through a
section), road segment, corridor, ward, zip code to city, to the reso- case study that made use of Houston’s TranStar IM data.
lution of county (13–16). Safety researchers recognized the need
to define proper spatial resolutions, as varied analytical results might
result otherwise (10, 16–18). Varied aggregation schemes may affect, IM Data Versus Crash Reports
to a certain extent, the spatial arrangement of crashes, and thus may
be inappropriate or even misleading when used to explain the data HSID studies to date have relied almost exclusively on a crash data-
that are spatially continuous in nature (19). When spatial information base, largely because of the lack of valid alternative data sources.
of individual crash events is available (e.g., geographic location, Crash data, nonetheless, are not timely and, in Texas, can take from
milepost), spatial units are no longer required to be referenced to any 6 months to a year to become available. Crash data from police reports
44 Transportation Research Record 2165

are not always geographically coded. An examination of the Texas the time lag in the availability of crash reports and the dynamics of
Department of Transportation (DOT) Crash Records Information traffic conditions that ebb and flow with changing demand and supply
System (CRIS) database from 2006 to 2008 indicated that 15% of conditions (e.g., capacity restriction from work zones).
all records did not have coordinate information. The accident dura-
tions were not part of the CRIS database, and no documented study
has assessed the validity of coordinate data recorded in the CRIS SPATIAL STATISTICS
database.
An alternative data source for HSID on freeways is the IM data- Instead of the application of nonspatial statistical models to the data
base. In Texas, all of the TMCs in metropolitan areas (i.e., Houston, with spatial information, spatial statistics integrate spatial relationships
Austin, Dallas, Fort Worth, and San Antonio) archive incident data directly into their mathematics (19).
on a regular basis (26). These data are used primarily to monitor
freeway and IM performance (27 ). In addition to basic data attrib-
utes, such as incident type, incident detection, and clearance times, Spatial Autocorrelation
the IM database contains descriptive location information and geo-
graphic coordinates, which are critical to the spatial analyses. Table 1 Tobler’s first law of geography stated that spatial units share more
shows an example of selected incident attributes collected at Houston’s similarities with nearby units than with units that are far apart (28).
TranStar. In the IM database, the accident type is a subset of one This notion is known as spatial dependency. To study spatial depen-
type of incident. Incidents are defined as those events that may cause dency, two types of analyses are done: (a) analysis of the spatial
disruption to the traffic flow or may require service attention (e.g., structure of the locations (i.e., geographical positions) among a set
broken down vehicles in the shoulder lane). No distinction is made of spatial units, and (b) analysis of the spatial structure of the values
between the terms “crash” and “accident” in this paper. associated with a set of spatial units (29). Spatial autocorrelation can
TranStar detects most of Houston’s incidents on the basis of be classified as the second type, since it investigates the covariations
24/7 closed-captioned television scanning. The remainder is detected for the properties of observations within a two-dimensional geo-
through police dispatch monitoring, motorist assistance program calls, surface. Spatial autocorrelation analysis is particularly useful to
and commercial traffic services (26). Operators generally log all inci- address the problems associated with HSID because every spatial
dents as soon as they detect them from the scanning camera, regard- unit bears some quantifiable attributes (e.g., crash severity, crash
less of cost or damages. Police reports are available if police are cost, impact duration) and not just location information.
dispatched to the scene; such reports are required in Texas only
when crash costs exceed $1,000 in damages. As redundant channels
of information feed the incident database, the likelihood is strong Defining Spatial Units
that IM data are more thorough than crash reports, particularly for
crashes cleared rapidly and for those whose costs were below the To study the spatial structure of a subject requires an understanding
mandatory report threshold. of the spatial arrangement of its smallest constituents—spatial units.
The use of IM data for HSID has certain shortcomings. Since the The definition of spatial units may vary, depending on the objectives
data are primarily intended to measure IM performance, some attri- of a given analysis. Consider the spatial dependency among contigu-
butes such as specific crash types or crash contributing factors, are not ous roadway elements li (e.g., intersection, road segment): Incidents
usually recorded. Overreporting of incidents may also be an issue, if that occurred in proximity have to be aggregated to form an integral
control operators duplicate entries or make errors that need to be cor- representation of li. On the contrary, when the emphasis is on the
rected. Fortunately, operational errors are discernible and can be fil- spatial distribution of individual incidents xi, the incidents should
tered out with an appropriate algorithm. By contrast, the timely HSID remain disaggregated. In the former circumstance, the spatial units
on freeways is easier to achieve through the IM database, given are defined as roadway segments (Type 1 definition), and the spatial

TABLE 1 Example of Incident Management Database (Houston, Texas)

ID Roadway Name Cross Street Name Direction Latitude Longitude Severity

683XX IH-610 North Loop US-59, Eastex Westbound 29.8082 −95.3361 Major
683XX IH-610 North Loop Shepherd Dr. Westbound 29.8128 −95.4103 Major
683XX South Sam Houston Blackhawk Eastbound 29.6007 −95.2474 Major
Tollway
683XX IH-10 Katy Baker Cypress Rd. Eastbound 29.7845 −95.6883 Major
683XX IH-610 East Loop SH-225 Southbound 29.7099 −95.267 Major
683XX West Sam Houston IH-10 Katy Northbound 29.7844 −95.5692 Minor
Tollway
683XX IH-45 FM-1764/Johnny Southbound 29.40838 −95.0369 Major
Palmer Highway
683XX SH-288 OREM Northbound 29.6276 −95.3872 Major
: : : : : : :

NOTE: The table shows selected incident attributes that are consistently recorded and potentially useful for the HSID.
Songchitruksa and Zeng 45


n
statistics consequently return collective values that characterize the wij x j
j =1
distribution of location variable L. In the latter case, individual inci- G* = (1)

i n
dents correlate with each other rather than the roadway segments j =1
xj
and thus constitute the spatial distribution (Type 2 definition). The
spatial statistics method then calculates the significance of spatial
where
dependency of incident variable X.
The Type 2 definition of spatial units leaves the spatial distribu- Gi* = statistic that describes the spatial dependency of incident i
tion of incidents intact, does not require any aggregation scheme, over all n events,
and treats each individual incident as a unit of analysis where xj = magnitude of variable X at incident location j over all n
located. The Type 2 definition was used in this study. (j may equal i), and
wij = weight value between event i and j that represents their
spatial interrelationship.
Gi* Spatial Statistic
Usually, wij is calculated on the basis of the conceptualized spatial
A family of G statistics, originally developed by Getis and Ord relationship and in reference to d. Hence it is often written as wij(d).
(24–25), is used to study the evidence of identifiable spatial patterns. The value of d (e.g., Cartesian distance) is a user-specified thresh-
Similar to Moran’s I and Geary’s C, the general G statistic is global old. However, the argument d may be omitted depending on the
in that the overall degree of spatial interdependency is studied, specification of spatial relationship. Gi* statistics may vary according
which results in a single index for the entire study area (19). These to the selection of d.
global statistics are usually too general in a way that local patterns For Gi* in its simplest form, wij is a binary where 1 is to include
are likely to be neutralized over a vast area and become undetected. and 0 is to exclude the relationship between incidents i and j com-
pletely. Since Σ j=1 xj remains constant for all i, to include an
n
The fact that the level of spatial dependency may vary significantly
across the space suggests that the capacity to detect and pinpoint increased number of high-valued incident events j would result in a
spatial heterogeneity is more desirable. Local Moran’s I was then higher Gi* index. This implies that high frequencies and high values
developed by decomposing Global Moran’s I to compensate for are both contributing factors. The wij can be extended to nonbinary
such limitations and frequently used in many hot spot analyses of values as they were in this study.
motor vehicle safety (7, 15, 23). The family of Moran indices, how- The sum of the weights (Wi) is defined as
ever, does not discriminate between hot spots and cold spots. The
Gi* index is therefore more suitable because it can locate unsafe Wi = ∑ j =1 wij .
n
(2)
regions on a global scale and discern cluster structures of high- or
low-value concentration among local observations.
Consider the study area subdivided into n indefinite size of It can be shown that the expectation (E) of Gi* is
regions where each region is identified with a central point i (1, 2,
. . . , n), and incidents are labeled with exact Cartesian coordinates. Wi
The goal was to examine the existence of a spatial pattern for a ran- E ( Gi*) = (3)
n
dom variable X, a selected incident attribute, whose values xi, were
associated with every region individually; in turn, if xi exhibited
similarities between contiguous regions, it could be claimed that the where n is the number of incidents, and the variance of Gi* is
spatial autocorrelation of variable X existed over region i. Spatial
s 2 Wi ( n − Wi )
Var ( Gi*) =
dependency was assumed in the process. A simple form of the Gi* i (4)
statistic as defined by Getis and Ord (25) is x n −1

No. of Vehicles No. of Mainlanes


Type Weather Response Involved Blocked Detected Time Cleared Time ...

Accident Rain City 1 2 1/1/2008 0:47 1/1/2008 1:27 ...


Stall Ice Coroner 2 1 1/1/2008 2:40 1/1/2008 3:08 ...
High water Hail County 2 1 1/1/2008 3:11 1/1/2008 5:01 ...

Lost load Fog EMS 2 0 1/1/2008 3:14 1/1/2008 4:33 ...


Fire High wind Fire dept. 1 2 1/1/2008 4:21 1/1/2008 4:26 ...
Hazmat Dust Metro 1 0 1/1/2008 4:34 1/1/2008 4:38 ...

Accident Smoke Map 1 0 1/1/2008 4:48 1/1/2008 5:13 ...

Stall Other Police 2 1 1/1/2008 5:23 1/1/2008 6:03 ...


: : : : : : :
46 Transportation Research Record 2165

The sample mean (x–) and the sample variance (s2) of variable X are mathematical models conceptualize spatial relationship, such as
defined as impedance, sphere of influence, zone of indifference, and K nearest
neighbors (19, 30). The zone of indifference method is more appro-
∑ ∑
n n
xj x 2j priate for HSID applications as it assigns high weights of influence
j =1 j =1
x= ;s 2
= − x2 (5) to those incidents within a specified zone (e.g., intersection radius,
n n freeway segment) and much less weight to those outside the zone.

The distribution of the Gi* statistic is normal when the normality is


also observed in the underlying distribution of variable X. However, CASE STUDY OF HOUSTON’S IM DATA
when the underlying distribution is nonnormal (e.g., heavily skewed
incident duration), the test statistic becomes nonnormal correspond- The Houston TranStar is a partnership of four public agencies—
ingly. In such cases, an increase in the number of spatial units in Texas DOT, Harris County, the Metropolitan Transit Authority of
the clusters analyzed will help the distribution of the Gi* statistic Harris County, and the City of Houston. TranStar operates 24 hours
approach normality. One common method is to raise the value of d to a day 7 days a week. The center has collected and archived freeway
include more xj. Under the exact or asymptotical normal conditions, incident data since 1996. The TranStar’s incident data from 2006 to
Gi*is usually standardized based on its sample mean and variance: 2008 were used in this study. Only accident type was considered in
the analysis. After duplicate and erroneous entries were removed
∑ wij x j − x ∑ j =1 wij2
n n
from the database, 33,192 accident records remained for analysis.
Z ( Gi*) = j =1
(6)
( )
2
n∑ j =1 wij2 − ∑ j =1 wij
n n

s Data Requirement
n −1
The first step was to check if the incident data attributes were sufficient
The standardized Gi* is essentially a Z score and therefore can be
for analysis. As noted, characterization of incident data entails evalu-
attached to the statistical significance. A close-to-zero Gi* value
ation of temporal and spatial patterns in the distribution of incidents.
implies random distribution of the observed spatial events. Con-
Two types of data attributes are generally required for HSID:
versely, positive and negative Gi* statistics with high absolute values
correspond to the clusters of high- and low-valued events, respec-
• Temporal attributes. Typically collected as time logs for vari-
tively. The negative Gi*, however, indicates a tendency of clusters of
ous events in an incident timeline. The most critical temporal element
events with short incident durations. In summary, if the calculated
is the incident occurrence time. The incident detection or notification
index values are greater than a threshold associated with statistical
time is often used to signify the incident starting time since the actual
significance, the location of a cluster is identified as a hot spot. There-
occurrence time can be difficult to obtain.
fore, any roadway entities that are nearby or that encompass such a
• Spatial attributes. Used to identify incident locations on a free-
cluster are identified as hot spots.
way. For TranStar, each incident record had descriptive location
information referenced by a freeway intersection and the nearest
Bonferroni Correction cross street, a geographic coordinate for the intersection (i.e., latitude
and longitude), and a location identifier (before, at, and after) for the
The calculation of a Gi* index is equivalent to a null hypothesis test actual location of the incident with respect to the nearest cross street.
where a rejection indicates a hot spot. The Type 2 spatial definition Location identifiers were not used in this study as they were not
examines all n events, and each event requires one independent test. always consistently recorded. As a result, the spatial distribution of
As n becomes large, the number of tests increases significantly. Even incidents may be slightly displaced from the actual occurrences.
with a 99% confidence level, one out of 100 events could be falsely
classified as a hot spot or vice versa. Furthermore, the increase in n A sufficiently large displacement may cause false positive or
can make the study area so dense that the tests for nearby xi may false negative clustering in the hot spot results. This study defined
include common neighbors. This violates the independence of the test the spatial relationship by using the zone of indifference. Therefore,
(19). Ord and Getis have suggested that the Bonferroni correction be the impacts from the displacement were minimal as long as the dis-
applied when n is large and the study area is dense (24). placement was not greater than the defined radius of the zone of
The Bonferroni test minimizes the likelihood of misidentification indifference.
of spatial dependency. The simplest form is to divide the confidence
level, α, by the number of tests. However, such a practice becomes
unduly conservative when n becomes too large. The coefficient of Selection of Impact Attributes
mean correlation can be factored into the Bonferroni adjustment
to allow correction of the α level to avoid it becoming either too For the purpose of HSID, the focus was on areas where high-impact
conservative or too aggressive. accidents were clustered. Figure 1a and 1b illustrate the correla-
tions between incident durations and the number of vehicles involved
and the number of main lanes blocked, respectively. These relation-
Conceptualization of Spatial Relationship ships imply that the duration can serve as a proxy of incident impact.
Therefore, duration was selected as an impact attribute in this case
Conceptualization of spatial relationship summarizes how spatial units study.
interact with one another. For instance, spatial relationship can be Figure 2 exhibits the distributions of accident durations and their
defined as the closer the more influential or, even counterintuitively, natural-logarithm-transformed counterparts. A log-transformed dura-
the closer the less important. Several commonly used homogeneous tion accounted for the scaling effect of duration data. To illustrate,
Songchitruksa and Zeng 47

55 180
50 Incident (All Types) 160 Incidents (All Types)
Average Duration (min)

Average Duration (min)


Accident Accidents
45 140 Stall
Stall
120
40 * Data Period: 2006 to 2008
100 * Stall Type Generally Blocks No More
35 Than 3 Main Lanes
80
30
60
25 * Data Period: 2006 to 2008
* Only Consider Incidents Causing 40
20 Main Lane Blocking
20
15 0
1 2 3 4 5 ≥6 1 2 3 4 5 and more All
Number of Vehicles Involved Number of Main Lanes Blocked
(a) (b)

FIGURE 1 Median incident duration: (a) number of vehicles involved and (b) number of main lanes blocked.

consider a 5-min increment for 10-min accidents versus 100-min accident durations. In this study, a total of 33,192 calculations for
accidents. Without using the transformation, both increments would 3 years of data were performed by using the ArcGIS software pack-
be weighed equally in the analytical process when in fact the same age developed by the Environmental Systems Research Institute.
increment at a higher duration should have much less influence in the The radius threshold of 2,500 ft was selected to ensure that each
analysis. Second, extreme input values may bias the calculation of accident had at least six neighbors, as suggested by Getis and Aldstadt
the Gi* statistic, and log-transformation may reduce the impact of (30). This conceptualization gave equal weight to all accidents
this issue. The strict normality of the underlying distribution of the within this critical distance. The weight dropped off significantly for
attribute was desirable but not required to calculate a proper stan- those accidents beyond this threshold on the basis of the inverse of
dardized Gi* statistic. The logarithm transformed the nonnormal their relative distances.
duration data into a bell-shaped form, which was closer to the nor- For the purpose of HSID, a high positive standardized Gi* sta-
mality assumption. With asymptotical normality, analysts can statis- tistic implied that the accidents were located within the clusters of
tically control the number of hot spots to be identified based on those with high accident durations at a high degree of statistical
specified confidence level. The spatial distribution of the data set was confidence level.
maintained in the transformation of the data attribute (24–25). Hence,
the duration data were log-transformed for the analysis in this study.
Results

Calculation of Spatial Statistics For comparison purposes, a map of hot spots was prepared by using
frequency ranking and median-duration-based methods. The fre-
The coefficients of local spatial autocorrelation, Gi*, were calcu- quency ranking method combined all accidents by location within the
lated for each individual accident by using the natural logarithm of study period. Frequency may be normalized by appropriate exposure

Log of Duration
-4 -2 0 2 4 6
6,000 20,000

Duration
Frequency of Log of Duration

5,000
16,000
Frequency of Duration

Log of Duration
4,000
12,000
3,000
8,000
2,000

4,000
1,000

0 0
0 50 100 150 200
Duration (min)

FIGURE 2 Empirical distributions of actual and logarithmic-transformed accident durations.


48 Transportation Research Record 2165

(e.g., duration of data period, traffic volume, length of segment). 2,602 accidents distributed densely among 25 clusters of high-duration
The results were then ranked and the hot spots identified as those accidents. The roadway entities that encompassed these clusters
that exceeded the specified threshold. Although the method has the- as hot spot locations were then defined. In this study, these clus-
oretical flaws (i.e., RTM effect), it has been frequently used because ters were referenced to the nearest cross streets on the freeway.
of its simplicity. Moran’s I index is a commonly used variation By adjusting the significance level, the analysis could produce
of frequency-based spatial statistics. The median-duration-based either a higher or a lower number of high-duration accidents and
method considers the duration as an impact measure and treats those correspondingly more or less clusters of hot spots.
locations with high median duration as hot spots.
Figure 3 displays a geographic information system map with the
hot spots identified by simple frequency ranking, median duration Discussion of Results
ranking, and spatial statistic (Gi*) methods. The top 20 locations
from the first two methods were defined as hot spots. The hot spot Frequency-based hot spots were located along IH-45 between Belt-
threshold for Gi* statistics was defined based on a two-tailed Z-score way 8 North and IH-610 North Loop as well as at the junction
at a Bonferroni corrected confidence level of 95%, which was equiv- between US-59 and IH-610 West Loop. This was to be expected
alent to 3.06 (a mean correlation of 0.70). This threshold resulted in because these locations generally experience heavy traffic volume

012 4 6 8
Miles

FIGURE 3 Hot spot results.


Songchitruksa and Zeng 49

(traffic exposure) and frequent congestion and thus contribute to the sorted, and the top 20 locations from both methods were compared
high frequency of accidents. with the existing hot spot results. It was found that 75% and 80% of
By contrast, duration-based hot spots were located primarily outside the top 20 sites respectively identified as hot spots from the FM and
Beltway 8. Several possibilities may explain this finding. First, these FA methods coincided with the hot spot results identified from the
locations might have limited surveillance coverage, which in turn frequency-based method. Only 25% matched, however, when the
might limit the ability of operators to closely monitor events visually. same comparison was made between results from the frequency-
These locations also may be farther away from responder locations. As based and the Gi* statistics methods. This implies that the FM and
a result, it could take longer for operators to verify an incident, coordi- FA methods do not provide any advantage over the frequency-based
nate with appropriate responders, and monitor as to when the incident method.
has been cleared, resulting in an increase in incident duration. Finally, To illustrate how the Gi* spatial statistics work to identify hot spots,
these locations generally have higher prevailing traffic speed, therefore Table 2 shows a comparison of selected accident hot spots identified
increasing the possibility of more severe crashes. Such crashes take from three different approaches along with their corresponding
longer to clear and thus lead to longer durations. frequencies and summary statistics of accident durations.
The Gi* spatial statistics method also accounts for frequency and As the table shows, the frequency and duration were respectively
duration attributes simultaneously, but in addition it takes into account dominant in the frequency-based and duration-based methods. In
the spatial information of each accident. The hot spots were indicated the frequency-based method, all of the top-ranked locations had
by those with Gi* statistics greater than 3.06 (Bonferroni-adjusted relatively lower median and average durations. This indicates that
95% confidence level). These locations exhibited an unusually high accidents of short duration tend to occur frequently. These sites are
degree of clustering of accidents with long accident durations. The likely to experience frequent minor accidents from heavy traffic
likelihood that the clustering of long-duration accidents occurred by exposure rather than severe accidents. Therefore, the Gi* value was
chance was less than 5%. The map shows that Gi* hot spots partially relatively low for these sites. In some extreme cases, such as at the
coincided with those identified through frequency-based (25%) and intersection of West Sam Houston Toll Way and South Sam Plaza,
duration-based methods (25%). the frequency was as high as 188, and the median duration was as
In addition to the Gi* statistics, simpler alternatives were evalu- low as 10.11 min. Such a high-frequency–low-duration situation
ated by combining frequency and duration for HSID, which were yielded a Gi* index of −9.98. Technically, these locations are “cold
the product of frequency and median duration (FM method) and of spots,” where low-impact accidents tend to cluster. In contrast, in
frequency and average duration (FA method). The results were the duration-based method, many sites had high-duration statistics

TABLE 2 Comparison of Hot Spot Results

No. of Accidents Mean Median


Within a Duration Standard Deviation Duration Average
Roadway Cross Street 2,500-ft Radius (min) of Duration (min) (min) Gi* Indices

High Frequency Hot Spots


US-59 southwest IH-610 west loop 530 28.49 30.04 20.56 1.61
IH-45 north IH-610 north loop 414 28.69 30.06 19.03 1.25
IH-610 west loop US-59 southwest 372 29.66 24.60 22.68 1.61
IH-45, Gulf Broadway St.–Park Place 291 28.96 35.51 18.05 −0.58
US-59 southwest Chimney Rock Rd. 283 29.43 26.47 21.12 0.99
IH-45 north Gulf Bank Rd. 266 29.57 27.43 23.68 0.66
High Median Duration Hot Spots
IH-45 FM-3083–Teas Nursery 7 117.99 109.42 120.72 2.40
BELTWAY 8-north Fairbanks North Houston Rd. 6 110.74 132.92 77.83 0.30
IH-45 Creighton Rd. 11 67.45 61.73 70.17 2.12
SH-225 Bearle St. 13 45.16 32.18 54.62 1.48
BELTWAY 8-south SH-288 31 47.89 46.32 44.02 2.54
US-59 Sweetwater Blvd. 23 53.14 51.88 40.01 1.79
High Gi* Index Hot Spots
IH-10 east Jensen Dr. 47 50.49 53.09 29.73 6.44
US-59, EASTEX IH-10 east 82 86.35 174.35 32.00 6.43
SH-288 Almeda–Genoa Rd. 95 46.82 35.41 42.20 4.82
IH-610 south loop SH-288 121 44.67 69.26 33.12 4.70
IH-45 SH-242 64 47.48 45.28 34.12 3.71
IH-45 north Hardy Toll Rd. 96 42.37 41.44 32.97 3.60
Average of all locations (2006–2008) 57.00 39.04 43.04 25.65 —

NOTE: The study defined accidents with Gi* larger than 3.06 as hot spots (95% Bonferroni-corrected confidence level).
50 Transportation Research Record 2165

but experienced only a handful of accidents. They were identified quantified. The measure can be calculated such that it also includes
as hot spots with the duration-based method but not with the Gi* the incident-induced traffic congestion cost.
method. Freeway agencies can use the hot spot results from IM data in
The Gi* statistic, however, balances these two attributes. That is, various ways. Hot spots are very useful to provide visualized infor-
both frequency and duration have to be high for a site to be a hot mation to aid the decision-making process in the design, evalua-
spot candidate. Of course, this is not the only condition. For the Gi* tion, and management of IM strategies and resources. Some of the
method to label hot spots, the cluster of accidents has to comprise strategies that can be used to improve incident detection and response
high-duration events in a more statistically consistent fashion as times include the improvement of the schedules of roving courtesy
compared with all other clusters. The appropriate level of statistical patrols, installment and adjustment of the rotation of surveillance
significance can also be specified and adjusted to balance the number cameras, and the planning of locations to improve traffic sensor
of hot spots identified with available IM resources. In this manner, the coverage.
results are more consistent and reliable.

LIMITATIONS AND FUTURE RESEARCH


SUMMARY
As with any HSID method, the proposed methodology has certain
Traditionally, incident management data have been collected to mea- limitations. Because the local statistic for each incident is calculated
sure and improve the performance of IM procedure. While these data in relation to all other incidents, the computational resource required
are less detailed than crash records, they are timelier and frequently can be demanding when the sample size gets large. The Gi* statis-
contain useful data elements for freeway safety evaluation typically tics method can tell if the accident of reference lies in the cluster of
unavailable in the crash database. This paper proposes the use of a high/low values, but it cannot distinguish on a global scale which
spatial statistics method to identify hot spots or high-risk locations hot spot clusters have the higher values compared with other hot spot
from an IM database and to incorporate selected incident attributes clusters. In other words, the method cannot prioritize hot spots within
into the analysis. The selected attributes should be representative of hot spots (31).
the impacts of the units investigated. The units of analysis and impact Incident duration may not be an ideal proxy of incident impacts,
attributes can be tailored to suit the study objectives (e.g., hot spots particularly when the traffic volume is light (e.g., at nighttime). Future
for truck-related accidents; accidents with slow response times). research should attempt to identify appropriate proxy measures
Depending on the analysis objectives, the analyst may find differ- from the crash database and conduct comparative evaluations of the
ent uses of the results from different methods. Frequency-based hot proposed method by using different data sources.
spots may be suitable if the goal is to develop strategies to improve The alternative conceptualization of spatial relationship is worth
accident detection time. In such cases, the gain can be most achieved exploring. Spatial weights were defined among location i and j (wij )
through improved detection capabilities at these locations. If the as a constant radius in this study. The level of contiguity may best
objective is to reduce accidents with long durations, the analyst may be varied by depending on the level of heterogeneity (e.g., connec-
shift the focus to duration-based hot spots, which may require fur- tivity, accessibility) observed in the types and geometries of the
ther investigation into the crash contributing factors. The solution transportation infrastructure studied. For example, a system of basic
may be to increase surveillance camera coverage or to adjust the routes coordinated intersections or a long stretch of homogeneous freeway
and the frequencies of courtesy patrols, or both. When a single objec- segments should be treated as one unit of spatial cluster. Although
tive no longer meets the analysis requirement, the proposed spatial several models of spatial conceptualization have commonly relied
autocorrelation method, which uses local statistic (Gi*), is an effec- on a user-specified cut-off distance, application of such a homoge-
tive and practical solution. This study has demonstrated its utility neous model to a heterogeneous traffic environment is an over-
and how it can be applied to identify freeway locations with high- simplification. Furthermore, it has been pointed out that the planar
duration accidents using Houston’s IM data archive with emphasis spatial statistics have not recognized the constraints from complex
on identification of the clusters of high-impact accidents. highway networks (21). Construction of a network weight matrix is
The Gi* spatial statistic has an advantage over the more popular therefore recommended to address heterogeneous major roadway
Moran’s I index because it can distinguish high-valued from low- properties in future research.
valued local spatial structures; thus it is more suitable for HSID appli-
cations. Although the results of Gi* analysis consider the joint effects
of high frequency and high duration in the detection of hot spots, ACKNOWLEDGMENT
extremely large accident durations can strongly influence the hot spot
results even though the accident frequencies may be low at those This research was performed in cooperation with Texas DOT.
locations. This undesirable consequence was mitigated in this study
when the level of contiguity in the analysis was increased and thus
more neighboring accidents (the sample size effect) were included. REFERENCES
The accident duration attribute used in this study served as a
1. National Statistics in Fatality Analysis Reporting System. http://www-
proxy of accident impacts. Application of the Gi* method is not nec- fars.nhtsa.dot.gov/Main/index.aspx. Accessed June 5, 2009.
essarily limited to the duration, however. In fact, any attributes that 2. Depue, L. NCHRP Synthesis of Highway Practice 322: Safety Manage-
are either continuous or categorical in nature may be candidates as ment Systems. Transportation Research Board of the National Academies,
long as the spatial clusters of high or low values can provide mean- Washington, D.C., 2003.
3. Norden, M., J. Orlansky, and H. Jacobs. Application of Statistical Quality-
ingful implications. For example, clusters of freeway accidents with Control Techniques to Analysis of Highway-Accident Data. Highway
long response times can be identified by using this technique. A total Research Board Bulletin 120, National Research Council, Washington,
accident cost is another valid impact measure if it can be reliably D.C., 1956, pp. 17–31.
Songchitruksa and Zeng 51

4. Hakkert, A. S., and D. Mahalel. Estimating the Number of Accidents at 18. Renshaw, D. L., and E. C. Carter. Identification of High-Hazard Loca-
Intersections from a Known Traffic Flow on the Approaches. Accident tions in the Baltimore County Road-Rating Project. In Transportation
Analysis & Prevention, Vol. 10, No. 1, 1978, pp. 69–79. Research Record 753, TRB, National Research Council, Washington,
5. McGuigan, D. R. D. The Use of Relationships between Road Accidents D.C., 1980, pp. 1–8.
and Traffic Flow in “Black-Spot” Identification. Traffic Engineering 19. Mitchell, A. ESRI Guide to GIS Analysis—Volume 2: Spatial Measure-
and Control, 1981, pp. 448–453. ments & Statistics. ESRI Press, RedLands, California, 2005.
6. Hauer, E., D. W. Harwood, F. M. Council, and M. S. Griffith. Estimat- 20. Bailey, T., and T. Gatrell. Interactive Spatial Data Analysis. Longman
ing Safety by the Empirical Bayes Method: A Tutorial. In Transporta- Scientific & Technical, Harlow Essex, England, 1995.
tion Research Record: Journal of the Transportation Research Board, 21. Yamada, I., and J.-C. Thill. Comparison of Planar and Network
No. 1784, Transportation Research Board of the National Academies, K-Functions in Traffic Accident Analysis. Journal of Transport Geog-
Washington, D.C., 2002, pp. 126–131. raphy, Vol. 12, No. 2, 2003, pp. 149–158.
7. Flahaut, B., M. Mouchart, E. San Martin, and I. Thomas. The Local Spa- 22. Getis, A. Second-Order Analysis of Point Patterns: The Case of Chicago
tial AutoCorrelation and the Kernel Method for Identifying Black Zones: as a Multi-Center Urban Region. The Professional Geographer, Vol. 35,
A Comparative Approach. Accident Analysis & Prevention, Vol. 35, No. 1, 1983, pp. 73–80.
No. 6, 2003, pp. 991–1004. 23. Mitra, S. Spatial Autocorrelation and Bayesian Spatial Statistical Method
8. Cheng, W., and S. P. Washington. Experimental Evaluation of Hotspot for Analyzing Fatal- and Injury-Crash-Prone Intersections. Presented at
Identification Methods. Accident Analysis & Prevention, Vol. 37, No. 5, 88th Annual Meeting of the Transportation Research Board, 2009.
2005, pp. 870–881. 24. Ord, J. K., and A. Getis. Local Spatial Autocorrelation Statistics: Dis-
9. Hauer, E. Observational Before-After Studies in Road Safety, Pergamon, tributional Issues and an Application. Geographic Analysis, Vol. 27, No.
2002. 4, 1995, pp. 286–306.
10. Washington, S., and W. Cheng. High Risk Crash Analysis. Report 25. Getis, A., and J. K. Ord. The Analysis of Spatial Association by Use
FHWA-AZ-05-558. FHWA, U.S. Department of Transportation, 2005. of Distance Statistics. Geographic Analysis, Vol. 24, No. 3, 1992,
11. Elvik, R. Some Difficulties in Defining Populations of “Entities” for pp. 189–206.
Estimating the Expected Number of Accidents. Accident Analysis & 26. Songchitruksa, P., K. Balke, X. Zeng, C.-L. Chu, and Y. Zhang. A Guide-
Prevention, Vol. 20, No. 4, 1988, pp. 261–275. book for Effective Use of Incident Data at Texas Transportation Manage-
12. Huang, H., H. C. Chin, and M. M. Haque. Empirical Evaluation of ment Centers. FHWA/TX-09/0-5485-P2. Texas Transportation Institute,
Alternative Approaches in Identifying Crash Hot Spots: Naive Ranking, College Station, Feb. 2009.
Empirical Bayes, and Full Bayes Methods. In Transportation Research 27. Margiotta, R., T. Lomax, M. Hallenbeck, S. Turner, A. Skabardonis, C.
Record: Journal of the Transportation Research Board, No. 2013, Trans- Ferrell, and B. Eisele. Guide to Effective Freeway Performance Measure-
portation Research Board of the National Academies, Washington, D.C., ment: Final Report and Guidebook. NCHRP Web-Only Document 97.
2009, pp. 32–41. National Cooperative Highway Research Program, TRB. http://online
13. Aguero-Valverde, J., and P. P. Jovanis. Spatial Analysis of Fatal and pubs.trb.org/onlinepubs/nchrp/nchrp_w97.pdf, Washington, D.C., 2006.
Injury Crashes in Pennsylvania. Accident Analysis & Prevention, Vol. 38, 28. Tobler, W. A. Computer Movie Simulating Urban Growth in the Detroit
No. 3, 2006, pp. 618–625. Region. Economic Geography. Vol. 46, 1970, pp. 234–240.
14. Levine, N., K. E. Kim, and L. H. Nitz. Spatial Analysis of Honolulu Motor 29. Haggett, P., A. D. Cliff, and A. Frey. Locational Analysis in Human
Vehicle Crashes: I. Spatial Patterns. Accident Analysis & Prevention, Geography. John Wiley & Sons, London, 1997.
Vol. 27, No. 5, 1995, pp. 663–674. 30. Getis, A., and J. Aldstadt. Constructing the Spatial Weights Matrix Using
15. Quddus, M. A. Modelling Area-Wide Count Outcomes with Spatial a Local Statistic. Geographical Analysis, Vol. 36, No. 2, 2004, pp. 91–104.
Correlation and Heterogeneity: An Analysis of London Crash Data. 31. Ord, J. K., and A. Getis. Testing for Local Spatial Autocorrelation in the
Accident Analysis & Prevention, Vol. 40, No. 4, 2008, pp. 1486–1497. Presence of Global Autocorrelation. Journal of Regional Science, Vol. 41,
16. Thomas, I. Spatial Data Aggregation: Exploratory Analysis of Road Acci- No. 3, 2001, pp. 411–432.
dents. Accident Analysis & Prevention, Vol. 28, No. 2, 1996, pp. 251–264.
17. Steenberghen, T., T. Dufays, I. Thomas, and B. Flahaut. Intra-Urban The contents of this paper reflect the views of the authors and do not necessar-
Location and Clustering of Road Accidents Using GIS: A Belgian Exam- ily reflect the official views or policies of TxDOT.
ple. International Journal of Geographical Information Science, Vol. 18,
No. 2, 2004, pp. 169–181. The Statistical Methods Committee peer-reviewed this paper.

View publication stats

You might also like