2017 - Alipour Et Al - Load Capacity Rating of Bridge Populations Through Machine Learning Application of Decision Trees

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Load-Capacity Rating of Bridge Populations through Machine

Learning: Application of Decision Trees and Random Forests


Downloaded from ascelibrary.org by The State University of New York at Buffalo (SUNY - Buffalo) on 06/09/24. Copyright ASCE. For personal use only; all rights reserved.

Mohamad Alipour, S.M.ASCE1; Devin K. Harris, Ph.D., A.M.ASCE2; Laura E. Barnes, Ph.D.3;
Osman E. Ozbulut, Ph.D., A.M.ASCE4; and Julia Carroll5

Abstract: The functionality of the U.S. transportation infrastructure system is dependent upon the health of an aging network of over
600,000 bridges, and agencies responsible for maintaining these bridges rely on the process of load rating to assess the adequacy of individual
structures. This paper presents a new approach for safety screening and load-capacity evaluation of large bridge populations that seeks to
uncover heretofore unseen patterns within the National Bridge Inventory database and establish relationships between select bridge attributes
and their load-capacity status. Decision-tree and random-forest classification models were trained on the national concrete slab bridge data
set of over 40,000 structures. The resulting models were validated on an independent data set and then compared with a number of existing
judgment-based schemes found in an extensive survey of the current state of practice in the United States. The proposed approach offers a
method that provides guidance for improved allocation of resources by informing maintenance decisions through rapid identification of candi-
date bridges that require further scrutiny for either possible load restriction or restriction removal. DOI: 10.1061/(ASCE)BE.1943-
5592.0001103. © 2017 American Society of Civil Engineers.
Author keywords: National Bridge Inventory (NBI); Data-driven; Load rating; Load posting; Decision trees; Random forests.

Introduction maintenance continues to grow as a challenge for federal, state, and


local governments (AASHTO 2012).
The health and functionality of the national infrastructure system is Under federal regulations, in-service bridges are subjected to
so vital to the United States that its incapacitation would have debil- periodic inspections to identify visible deterioration, locate inter-
itating effects on the nation’s economy, security, and integrity nal degradation, and ultimately assign condition ratings to each
(DHS 2013). At the core of this infrastructure system is an aging bridge (NBIS 2004). These condition ratings and observed deteri-
transportation network, which has been well documented to be in a oration attributes are typically used together with structural
state of disrepair (ASCE 2013). This includes a national inventory bridge plans to determine the safe load-carrying capacity of
of over 600,000 bridges, carrying 4.5 billion cars and trucks per bridges (AASHTO 2011). In turn, this calculated capacity is com-
day. As of 2014, the average age of these structures is 42 years, pared to the maximum live-load effects anticipated under state
approaching the typical 50-year design service life. Consequently, law to calculate a rating factor and determine if all traffic may
24% of these bridges are classified as either structurally deficient or pass freely, or if restrictive load posting is required. This process
functionally obsolete, and 10% have posted weight limits restricting is performed at three different levels—namely, the design, legal,
the flow of traffic (FHWA 2014). Considering the size and age of and permit load ratings—to reflect different vehicle types and the
this population, the need for strategies and resources for corresponding uncertainty. States re-evaluate load ratings of their
bridges in the event of major changes to loads, traffic, material
properties, or structural conditions, such as widenings, repairs/
1
Graduate Research Assistant, Dept. of Civil and Environmental rehabilitation, and changes in conditions as revealed in inspec-
Engineering, Univ. of Virginia, 351 McCormick Rd., Charlottesville, VA tions (cracks, section loss, excessive deflections, collision dam-
22904-4742 (corresponding author). ORCID: https://orcid.org/0000-0003 age, etc.). Moreover, in-service bridges may also need to be rated
-2018-134X. E-mail: ma4cp@virginia.edu to ensure their safety under the passage of permit or irregular
2
Associate Professor, Dept. of Civil and Environmental Engineering, loads (AASHTO 2011). Fig. 1 depicts the steps in the load-rating
Univ. of Virginia, 351 McCormick Rd., Charlottesville, VA 22904-4742. process and the possible resulting postings. Further details about
E-mail: dharris@virginia.edu
3
Assistant Professor, Dept. of Systems and Information Engineering,
the rating process can be found in The manual for bridge evalua-
Univ. of Virginia, 151 Engineer’s Way, Charlottesville, VA 22904-4747. tion (MBE) (AASHTO 2011).
E-mail: lbarnes@virginia.edu The practice of load rating has evolved over the years to align
4 with improved engineering knowledge and available analytical
Assistant Professor, Dept. of Civil and Environmental Engineering,
Univ. of Virginia, 351 McCormick Rd., Charlottesville, VA 22904-4742. methods, yet it still involves significant levels of engineering
E-mail: ozbulut@virginia.edu judgment in determining the rating values. For instance, there is
5
Graduate Research Assistant, Dept. of Civil and Environmental a lack of knowledge of actual system-level behavior and load-
Engineering, Univ. of Virginia, 351 McCormick Rd., Charlottesville, VA distribution mechanisms in bridge systems, especially in the
22904-4742. E-mail: jdc3rb@virginia.edu
presence of damage and deteriorating conditions (Gheitasi and
Note. This manuscript was submitted on September 8, 2016; approved
on April 21, 2017; published online on August 9, 2017. Discussion period Harris 2015). In addition to the simplified assumptions in the
open until January 9, 2018; separate discussions must be submitted for analysis, the uncertainties in the assessment of structural deteri-
individual papers. This paper is part of the Journal of Bridge oration highlight the subjectivity associated with current rating
Engineering, © ASCE, ISSN 1084-0702. practices. This subjectivity has the potential to impact

© ASCE 04017076-1 J. Bridge Eng.

J. Bridge Eng., 2017, 22(10): 04017076


Structural Plans
* RF denotes the rating factor
Field Condions
** See (AASHTO 2011) for
exclusions

Design Load Rang


RF*≥ 1
(Inventory-Level)
Downloaded from ascelibrary.org by The State University of New York at Buffalo (SUNY - Buffalo) on 06/09/24. Copyright ASCE. For personal use only; all rights reserved.

RF < 1

RF < 1 Legal Load Rang RF ≥ 1

RF < 1

(Oponal)
Posng or Repair/Rehab Posng not required**
RF < 1 Refined Methods RF ≥ 1
No permit rang May be rated for permit
or Load Tesng

(a) (b)

Fig. 1. (a) Bridge load-rating and posting process; (b) load-posting signage (image by author)

management strategies and safety-related decisions ranging (NCHRP 2014); however, the subjectivity of the process is
from permitting and load posting to rehabilitation, or an even amplified dramatically for structures with missing or incomplete
more conservative approach of terminating operations and design information, which are currently rated based on engineer-
replacing the structure. ing judgment. Furthermore, although the study presented herein
At the national level, it is of primary interest to the Federal has been limited to a specific class of structures (RC slab
Highway Administration (FHWA), the agency responsible for bridges), the proposed approach is generic, allowing for extrapo-
establishing and overseeing bridge safety requirements, to improve lation across other bridge types, and is expected to promote fur-
the fidelity of the current rating and performance evaluation proce- ther discussion on the topic within the national and international
dures to optimize the administration of federal funds and advance bridge community.
the level of safety in the national highway system (OIG 2006).
Considering the shortcomings of the current procedures, responsi- Review of Current Practice in the United States
ble authorities are concerned with the level of errors involved in the
current load-posting status of bridges within the National Bridge An examination of the statistics on the methods used in the
Inventory (NBI) database (OIG 2006). In addition to safety-related United States to load rate highway bridges shows the primary
concerns, bridge owners are hesitant to impede commerce or approach to be analytical methods. In 2014, among bridges lon-
increase operational costs and travel time by imposing unnecessary ger than 6 m (20 ft), over 80% of the bridges in the NBI were
load limits. To overcome these issues, it has been recommended load rated using analytical methods, 0.6% were load tested, and
that state DOTs and local agencies develop data-driven, risk-based the remainder were load rated using engineering judgment or not
approaches for oversight and updating the load-posting status of rated at all (Harris et al. 2015). In many of these cases, informa-
their in-service inventory (OIG 2006). tion such as design or construction plans is not available for load
Within this context, this study leverages emerging machine- ratings, preventing the use of analytical methods. Article 6.1.4
learning techniques to correlate select attributes of the in-service of the MBE states that, for bridges lacking sufficient design
structures available within the NBI database to predict the load- details, a physical inspection of the bridge by a qualified inspec-
posting status of each structure. This process serves to reveal hidden tor and evaluation by a qualified engineer may be sufficient to es-
patterns, uncover possibly causal relationships, and identify struc- tablish an approximate load rating; load tests may also be con-
tures that may be misclassified. The proposed approach is expected sidered (AASHTO 2011). Further investigation of the NBI data
to inform maintenance and management decisions for in-service shows that those bridges lacking design plans are usually short-
bridges, with the potential to optimize the allocation of resources span, local, and low-traffic bridges, for which resource-intensive
based on the structural vulnerability and/or opportunity for load tests are usually deemed impractical considering the rela-
improved traffic flow. The two major outcomes of the proposed sys- tive importance of the bridge. Without the benefit of a load test,
tem if implemented in practice are as follows: little direction is given as to how to implement a judgment-based
1. Providing data-driven load postings for bridges with missing or rating. The MBE states that a safe load capacity can be estimated
incomplete design information and structural plans based on based on design live load, current condition of the structure, and
objective data analysis rather than subjective judgment. live-load history; additionally, a concrete bridge with missing
2. Rapid screening of the entire inventory to find candidate structural details need not be posted for restricted loading if it
bridges for further analysis and possible required load posting has been carrying normal traffic for an appreciable period of
or posting removal, where misclassification is suspected. time and shows no distress (AASHTO 2011). Although the fun-
Note that, for bridges with available design information, the damental premise of these MBE prescriptions is rationally
safe load-carrying capacities are calculated using structural sound, their application is challenged by the lack of objective
analysis, with the same principles assumed in the design process and quantitative criteria and by the intrinsic dependence on

© ASCE 04017076-2 J. Bridge Eng.

J. Bridge Eng., 2017, 22(10): 04017076


personal experience and judgment of individual bridge engi- Although these methods provide an approach for engineers to
neers. Furthermore, the accuracy and precision of the results follow, they are still subjective, not rooted in analysis, and not
cannot be quantitatively evaluated. consistent across the country.
As a baseline method for comparison and evaluation of the
approach proposed in this work, an extensive review of the state
DOT bridge manuals was performed to reveal how each state cur-
rently carries out a judgment-based rating. Many of the states only
refer to the clause in the MBE and list a number of variables that Data-Driven Approach
Downloaded from ascelibrary.org by The State University of New York at Buffalo (SUNY - Buffalo) on 06/09/24. Copyright ASCE. For personal use only; all rights reserved.

should be taken into consideration during the evaluation. A few Machine learning seeks to mimic and automate the human learning
examples are UDOT (2014), CODOT (2011), and FDOT (2014). process, finding patterns in data without explicitly being pro-
These include year of construction; design vehicle; live loads (past, grammed (Witten et al. 2011). These algorithms have been increas-
present, and future); measurable structural dimensions; condition of ingly used in a wide range of disciplines (Gullien et al. 2015;
load-carrying components; redundancy of load path; changes since Hergenroeder et al. 2014), including civil engineering (Amiri et al.
original construction; comparable structures of known design; traf- 2016; Saitta et al. 2009; Melhem and Cheng 2003; Farrar and
fic characteristics; and performance of bridge under current traffic,
Worden 2012; Jootoo and Lattanzi 2016). Specifically, the area of
such as evidence of distress or excessive movement under load.
infrastructure maintenance and management, which naturally
These manuals do not explicitly explain the process of using these
involves large data sets, has welcomed the use of these methods
variables to arrive at a posting load. It is notable that the
(Melhem and Cheng 2003; Li and Burgueño 2010; Kim and Yoon
Massachusetts bridge manual states that engineering judgment
2010; Melhem et al. 2003; Bektas et al. 2013; Morcous 2005). Li
alone is not acceptable as a rating method and requires that compre-
and Burgueño (2010) built models on Michigan’s database of
hensive field measurements, nondestructive testing, and a material
bridges to predict abutment condition ratings, whereas Huang
testing program be performed for structures with unknown struc-
tural details (MassDOT 2013). (2010) used artificial neural network classification on Wisconsin’s
A number of the state DOTs have outlined systematic proce- concrete bridges to predict deck condition ratings based on geomet-
dures, usually in the form of flowcharts or tables, to help engi- rical, functional, and environmental descriptors. Similarly, decision-
neers carry out judgment-based load postings. Oregon, tree classification algorithms have been used by a number of
Washington, Idaho, and Kentucky provide either a table or researchers to model infrastructure performance (Melhem and
clauses for load rating bridges without plans based solely on Cheng 2003). In a conceptually similar approach, the authors of
condition ratings, in which a structure with a condition rating of this paper used multiple linear regression and neural networks on
4 or less is posted (ODOT 2015; WSDOT 2015; ITD 2014; the Virginia database of RC slab bridges to estimate load ratings
KYTC 2015). The Nebraska Department of Roads (NDOR) uses (Harris et al. 2015) and decision trees and random forests on the
the same method, but indicates a condition rating of 3 to be used national database to study the feasibility of predicting load postings
as the load-posting threshold (NDOR 2010). Pennsylvania pro- (Alipour et al. 2016). The present investigation contributes to the
vides a tabulated rating method that is based on condition rat- existing push toward data-driven performance assessment by refor-
ings, average daily traffic (ADT), and a specific description of mulating the problem of predicting bridge load postings on a
signs of distress on the structure (PennDOT 2010). The Texas national level into a data-driven framework. Although possible dis-
DOT manual presents a flowchart to help rate concrete bridges crepancies and inconsistencies in load-posting policies and prac-
without plans, which is based on the observation of signs of dis- tices in different states are expected to affect the performance of
tress in the inspection reports, as well as structure age and condi- the framework, the collective study of the national inventory is
tion ratings (TxDOT 2013) (Fig. 2). Note that IR and OR refer to expected to supply a data set with sufficient size and diversity for
inventory and operational level load ratings, respectively. efficient knowledge extraction.

Fig. 2. Flowchart for load rating concrete bridges without plans (adapted from TxDOT 2013)

© ASCE 04017076-3 J. Bridge Eng.

J. Bridge Eng., 2017, 22(10): 04017076


Method of Study form a more consistent data set (using Item 107: deck structure
type = concrete cast-in-place). Finally, the remaining data under-
went a careful validity check to find and discard 1,334 (2.7%) miss-
Research Outline
ing or obviously erroneous values. These include missing, rare, or
This paper proposes a prediction tool for load-posting status of RC impossible values (such as zero or negative for length and width,
slab bridges based on eliciting the knowledge hidden within the age, etc.); extreme values, such as deck widths smaller than 10 ft;
existing bridge database. The approach is illustrated schematically maximum spans shorter than 3 m (10 ft); and ADT of zero or less
in Fig. 3 and is generally described by the following steps: than five vehicles, which reflect rare and unusual conditions if not
Downloaded from ascelibrary.org by The State University of New York at Buffalo (SUNY - Buffalo) on 06/09/24. Copyright ASCE. For personal use only; all rights reserved.

1. Data were collected predominantly from the NBI in 2014 and typos and entry mistakes. Other examples are inconsistencies
underwent a careful preprocessing step to obtain to a format between interrelated variables, such as deck width and number of
suitable for modeling. lanes or span length and total length. In the absence of access to doc-
2. Models were trained on the data using two popular classifica- umentation on individual bridges, an investigation of the causes of
tion algorithms (decision trees and random forests), which the aforementioned inconsistencies was not possible. Therefore, the
learn the patterns and relationships between the bridge descrip- basic process of validation using domain expertise and metadata
tors and their load-posting status. and elimination of suspicious and inexplicable instances was fol-
3. These models were then evaluated in terms of accuracy lowed in this paper. For details on the use of domain expertise to
performance. ensure acceptable data quality as well as a variety of systematic
4. The developed models were tuned to maximize their data-cleansing techniques, the reader is referred to Dasu and Johnson
performance. (2003). Once data cleansing was finished, all of these filters accounted
5. Once finalized, the models were evaluated on an unseen valida- for a further reduction of the population to 47,385 structures.
tion set to report a realistic estimate of the expected future per- Of these remaining structures, 5,151 bridges are currently either
formance. The performance was also compared with the load rated based on judgment or are not load rated at all. Based on a
available DOT practice as a baseline. previous study (Harris et al. 2015), evidence suggested that the ma-
6. Finally, the models were applied to the data set for bridges with jority of these bridges likely lack sufficient as-built structural draw-
missing plans, and the current (judgment-based) and predicted ings and documents necessary for a proper load rating. In these
(using the models) load postings were compared. cases, load rating and posting decisions are typically based on sub-
jective engineering judgment rather than objective analytical
Data Collection and Preprocessing capacity calculations. This data set, referred to as bridges without
plans, is not used in modeling, but is later used in an application of
The data set used for this investigation was derived from the NBI the constructed models. From the remaining data, which are bridges
database for the year 2014, which includes 610,749 structures with plans (and thus have analytical calculation-based load ratings
(FHWA 2014). This study focused on highway bridges; hence, a and postings), an independent validation set of identical size
preliminary filter was used to exclude all types of structures other (5,151) was randomly chosen and kept aside. This validation data
than highway bridges (e.g., culverts, railroad bridges, tunnels, etc.). set was never seen in any of the modeling or analysis steps and was
Also excluded were temporary structures, those currently closed, exclusively used to assess the performance of the finalized models.
and newly constructed and unopened bridges, which are structures The rest of the data (37,083 instances), referred to as the training
in unusual service conditions and therefore known to have outdated and test set in Table 1, are later split into a training set to create
or missing information. The resulting database included 456,219 models and a test set to evaluate them (see the “Performance
observations. By filtering the data based on NBI Item 43A (kind of Evaluation” section). The number of observations used in each step
material and/or design = concrete or concrete continuous) and 43B is summarized in Table 1.
(type of design and/or construction = slab), this study focused on The NBI database includes 116 items, some with multiple
RC slab bridges, which resulted in a total of 64,134 bridges entries and descriptors for each bridge; however, many of the
(14.1%). Furthermore, 6,413 bridges reconstructed during their descriptors describe nonstructural properties of a bridge, such as
service life (thus their condition is not consistent with their age) and bridge name and identification numbers, geographical information,
another 9,002 of those with precast decks were also excluded to hydrological features, properties of the wearing surface, safety

Data Collection & Preprocessing Model Development and Tuning Performance


• Train/test data set Assessment
Sensitivity

Test
Set
Bridges ROC
with Decision
Design Tree 1 - Specificity
Plans Validation
ccc Training
Set
set

Naonal
Bridge Bridges
Inventory Bridges
without Validation
(NBI) Design Set
without
Plans Design
Plans

Fig. 3. Proposed investigation approach

© ASCE 04017076-4 J. Bridge Eng.

J. Bridge Eng., 2017, 22(10): 04017076


Table 1. Size of Data Sets Used for Modeling and Validation S ¼ s1 ; s2 ; …:; sn , where each observation (si ) is defined by p
attributes ½Ai ðfor i ¼ 1; …; pÞ], with values si ¼ fa1;i ; a2;i ; …;
Data set Total size Posted bridges
ap;i g and a class label Ci ðfor i ¼ 1; …; mÞ. In this algorithm, to
Training and test set 37,083 3,291 construct a decision tree, an attribute (A) is selected for a collection
Validation set 5,151 456 (S), and the observations are split into a number of subsets accord-
Bridges without plans 5,151 645 ing to their values for A. The selection of the attribute for splitting
Total 47,385 4,392 the data is based on a measure called information gain, which is a
measure of entropy reduction. For a collection of observations (S),
Downloaded from ascelibrary.org by The State University of New York at Buffalo (SUNY - Buffalo) on 06/09/24. Copyright ASCE. For personal use only; all rights reserved.

entropy is defined as in Eq. (1), where pi is the probability that an


features, and vertical and horizontal clearances. Those of little to no observation in S belongs to class Ci . With this definition, entropy is
relevance to load ratings and posting were immediately discarded a measure of impurity of the collection and is zero for a perfectly
from the model, as were those features that were obviously interde- pure collection of observations all belonging to the same class
pendent and/or correlated, and those with many missing values.
Eleven of the remaining features were directly used as modeling X
m

attributes. Deck, superstructure, and substructure condition ratings EðSÞ ¼  pi log2 ðpi Þ (1)
i¼1
and deck geometry evaluation ratings, which were initially on a
scale of 1–9, were recategorized as poor (rating <5), fair (5), and
good (>5), following the recommendation of the MBE (AASHTO Information gain (G) is then defined as the reduction in entropy
2011). Design loads were also recategorized into three groups of from the original S using the attribute A. This can be seen in Eq. (2),
heavy, light, and other vehicles based on their equivalent truck where the first term is the entropy of S and the second term is the
weight. Three new attributes were also derived using the informa- sum of entropies of subset Sv made by A weighted by the fraction of
tion in the NBI: superstructure continuity, existence of water (over instances of S that belong to Sv . In this equation, values(A) is the set
water versus over a highway/railroad), and urban (yes or no). Two of all possible values of attribute A, and Sv is the subset of S where
attribute A has the value of v
more attributes, sine(skew) and log(ADT), were created using
mathematical transformations of the current attributes skew angle X jSv j
and ADT, because the alternative representations were found to per- GðS; AÞ ¼ EðSÞ  EðSv Þ (2)
jSj
form better than the original attributes. Finally, two more attributes v2Valuesð A Þ
were imported from external sources to represent climatic and eco-
nomic conditions of the locations of the selected bridges. Each The tree will thus be formed by recursively splitting the data at
bridge instance was assigned to one of nine climatic zones as classi- the nodes using the attribute with the maximum information gain.
fied by the National Centers for Environmental Information: north- The stopping criterion is the minimum number of instances per leaf
west, west, southwest, south, southeast, northeast, central, east (final node), which can determine the size of the tree; a larger mini-
north central, and west north central (Karl and Koss 1984). The eco- mum leaf size results in a smaller tree, whereas a smaller minimum
nomic condition for each bridge instance was characterized by the leaf size allows for more complexity. Further details on the theory
ratio of the state’s gross domestic product (GDP) to its respective and construction of decision trees, such as alternative impurity
number of bridges (categorized into three groups as high, low and measures and pruning methods, can be found in the works by
moderate), thus attempting to represent a possible disparity in the Witten et al. (2011) and Mitchell (1997).
maintenance and management of bridges. Although this attribute is Decision trees offer a number of advantages that make them a
expected to provide an indirect indication of the relative resources suitable choice for this investigation. They can be visualized and
available to the states for maintenance and upkeep, future study is are relatively transparent and easy to construct and understand.
required to design more accurate economic attributes based on dif- They are also very easy to interpret and can be translated into simple
ferent infrastructure funding mechanisms. Table 2 summarizes the if–then rules. These characteristics make them a suitable choice for
24 resulting attributes (including the class) used in modeling and applications involving nonmachine-learning experts, such as bridge
the basic statistics for each attribute. owners, infrastructure managers, and maintenance officials. In addi-
tion, decision-tree learning is naturally able to handle both numeri-
cal and categorical data and is not sensitive to outliers. In this work,
Modeling Techniques binary splits were used and minimum leaf size was varied to change
the size of trees constructed (Witten et al. 1999).
Decision Trees
Random Forests
Decision trees are a popular family of classification models that
have been used in various domains and applications to approximate To provide further comparison, C4.5 decision-tree learning was
discrete target functions. The learned functions are composed of a also evaluated against an ensemble learning technique called ran-
series of successive decisions, represented by branches, which ulti- dom forests (Breiman 2001). This method has gained significant
mately terminate in a target class, represented by leaves. Each node traction because it is invariant under scaling and various other fea-
in the tree represents an attribute of the observation, and each ture transformations, and it is robust to the inclusion of irrelevant
branch of the tree represents one of the possible values of the attrib- features (Hastie et al. 2009). In random forests, a large number of
ute. A particular observation is classified by beginning at the root decision trees are constructed on randomized samples, obtained
node of the tree and traversing the entire tree, testing the attribute with replacement, of the same size of the training data, each with a
specified by the corresponding node of the tree with the process few randomly selected attributes, and an instance is classified by
being repeated for each subtree (Mitchell 1997). taking the majority vote among all the trees. The decision trees are
In this work, the C4.5 algorithm was used for tree induction, usually grown to the fullest, and the optimal number of trees and
which is a top-down recursive splitting algorithm (Quinlan 1993). attributes is usually problem-dependent; for a typical classification
pffiffiffi
Assume a collection of observations from the training data as problem with p attributes, p has been recommended as the

© ASCE 04017076-5 J. Bridge Eng.

J. Bridge Eng., 2017, 22(10): 04017076


Table 2. Attributes Used in Modeling and Their Statistics

Standard
Item Number Attribute name–NBI item number Unit Type Range/value Mean deviation
Input 1 Age–27a Year Numeric [1, 138] 42.5 22.7
attributes 2 Deck geometry evaluation–68 — Nominal Good (54.1%), fair (24.4%), poor — —
(21.6%)
3 Maintenance responsibility–21a — Nominal State (43.2%), county (41.2%), city — —
Downloaded from ascelibrary.org by The State University of New York at Buffalo (SUNY - Buffalo) on 06/09/24. Copyright ASCE. For personal use only; all rights reserved.

(7.7%), town (6.8%), other (1.1%)


4 Substructure condition rating–60 — Nominal Good (87.7%), fair (8.8%), poor (3.5%) — —
5 Deck condition rating–58 — Nominal Good (86.0%), fair (10.7%), poor (3.2%) — —
6 Superstructure condition rating–59 — Nominal Good (89.5%), fair (8.3%), poor (2.3%) — —
7 Urban or rural–26a — Nominal Rural (81.8%), urban (18.2%) — —
8 Kind of highway–58 — Nominal Roads (60.6%), state (31.6%), interstate — —
(7.8%)
9 Number of lanes–28 — Numeric 1 (3.0%), 2 (91.2%), 3 (1.6%), 4 (2.9%), 2.1 0.6
5 (0.6%), 6 (0.4%), 7-14 (0.3%)
10 ADT–29 — Numeric [5, 283,000] 3,918.0 10,100.3
11 Skew angle–34 Degrees Numeric [0, 90] 9.9 15.1
12 Number of spans–45 — Numeric [1, 105] 3.3 2.8
13 Max span length–48 Meter Numeric [2.5, 122.5] 9.6 3.9
14 Structure length–49 Meter Numeric [6.1, 1,119] 29.0 24.5
15 Deck width–52 Meter Numeric [3, 121.2] 10.8 4.4
16 Truck percentage of ADT–109 % Numeric [0, 98] 8.8 9.4
17 Span continuity–43a — Nominal Simple (43.0%), continuous (57.0%) — —
18 Design load–31 — Nominal Heavy (57.0%), light (28.7%), other — —
(14.3%)
19 Existence of water–71a — Nominal Water (92.2%), no water (7.8%) — —
20 Climate zone–N/A — Nominal S (30.7%), ENC (20.6), C (17.1%), SE — —
(9.2%), W (8.6%), WNC (6.3%), NE
(2.9%), NW (2.3%), SW (2.2%)
21 Economic index–N/A — Nominal Limited (37.0%), moderate (33.5%), high — —
(29.4%)
22 Log(ADT)–29a — Numeric [0.778, 5.307] 2.622 0.828
23 Sine(skew angle)–34a — Numeric [0, 1] 0.195 0.26
Class Load-posting status–41 — Nominal Unposted (91.1%), posted for load (Code — —
P) (8.9%)
Note: C = central; ENC = east north central; NE = northeast; NW = northwest; S = south; SE = southeast; SW = southwest; W = west; WNC = west north
central.
a
The feature is not originally in the NBI data and was defined based on one or more NBI items.

number of random attributes (Hastie et al. 2009). As a result, all When evaluating a classifier on the test set, the model provides a
attributes will have the opportunity to be used in a number of trees confusion matrix as its output, which summarizes the number of
and contribute to the model, thus adding to its accuracy and stabil- correctly and incorrectly classified instances in each of the classes
ity. In this study, random forests were trained on the data while involved. Consistent with the regular convention in binary-class
varying the number of attributes to be used in each tree and the num- problems, such as the present problem, posted bridges were
ber of trees in the forest. assumed positive and unposted bridges as negative. Hence, the con-
fusion matrix includes the true positives (TPs) and true negatives
(TNs) as the total number of instances correctly classified as posted
Performance Evaluation and unposted, respectively. In contrast, it also includes false posi-
tives (FPs), which are the unposted bridges incorrectly classified as
In each classification task, there is a training and testing phase to posted, and false negatives (FNs), which are posted bridges incor-
build and tune the model and a validation phase to evaluate the rectly classified as unposted (Fig. 4).
model’s performance on unseen data. In this research, a holdout Accuracy is defined as the ratio of all correctly classified instan-
testing method was used in which two-thirds of the data were ran- ces to all data (Fig. 4) and is generally the most widely used crite-
domly selected as the training set and the rest as the test set. rion to evaluate a learner. It should be noted, however, that this is an
Furthermore, as previously described, a separate validation set was insufficient measure of accuracy for the class-imbalanced problem
also randomly selected and kept intact until modeling was finalized. at hand (see the next section); by simply guessing the class of all
The models were constructed on the training set, and optimum val- instances as not posted, an accuracy of 91.1% will be achieved,
ues of the parameters in the models were found by evaluating per- which is equal to the ratio of all unposted bridges to the total number
formance on the test set (Table 1). Once the models were fully tuned of instances under study. In other words, the misclassification of the
and finalized, they were tested on this validation set to determine a minority class (posted), which is actually the main focus of this
reliable estimate of the future error. study, will not contribute to more than 8.9% of this measure.

© ASCE 04017076-6 J. Bridge Eng.

J. Bridge Eng., 2017, 22(10): 04017076


Therefore, two additional measures, the false positive rate (FPR) high FNR). As the data set under study has only 8.9% posted instan-
and the false negative rate (FNR) errors, were utilized in conjunc- ces, it was expected that, for models constructed on the data in its
tion with overall accuracy. In the present problem, FPR shows the originally unbalanced form, a high FNR and a small FPR would be
percentage of unposted bridges incorrectly classified as posted, achieved. However, this investigation was more focused on detect-
whereas FNR shows the percentage of posted bridges incorrectly ing the posted (positive) instances, hence reducing FNR while
classified as unposted (Fig. 4). Therefore, it can be concluded that maintaining an acceptable FPR was a primary goal.
the FNR error will lead to potentially insufficient bridges errone- Two main approaches to deal with class imbalance can be identi-
ously predicted as sufficient, which will sacrifice safety, whereas a fied in the literature: (1) resampling methods and (2) cost-sensitive
Downloaded from ascelibrary.org by The State University of New York at Buffalo (SUNY - Buffalo) on 06/09/24. Copyright ASCE. For personal use only; all rights reserved.

FPR error produces the opposite effect, resulting in unnecessary classification (Visa and Ralescu 2005; Chawla et al. 2002). In resam-
conservatism and loss of resources. pling methods, the ratio of the minority and majority is adjusted to-
Although both the FPR and FNR errors are used to assess the ward balance by randomly undersampling the majority (removing a
models, it is recognized that neither can be the sole criterion for number of majority instances), randomly oversampling the minority
determining the best model, and a single-figure criterion is needed [adding artificial minority instances using the synthetic minority
for that purpose. The literature on classification, and class-imbalanced oversampling technique (SMOTE) (Chawla et al. 2002)], or a com-
problems in particular, suggests that the area under the receiver bination thereof. In this investigation, four different resampling sce-
operating characteristic (ROC) curve, denoted as AUC, can be narios were tested: two of them resampled the data to a 1:1 class ra-
used as an efficient single-figure estimate of a classifier’s overall tio, and the other two to a 3:1 class ratio. However, one of the 1:1
accuracy over the full range of FPR–FNR trade-offs in the test scenarios was done by undersampling the majority equal to the mi-
set. A ROC curve is defined as a two-dimensional curve plotted in nority (R1), whereas the other one first undersampled the majority to
a coordinate system with the ratio of TPs over all positives 3 times the minority and then added enough SMOTE samples to
(TP þ FN) on the y-axis, and the FPR on the x-axis. For a classi- make the classes equal (R4). One of the 3:1 scenarios used under-
fier, the curve is constructed by plotting points corresponding to sampling (R2), whereas the other one used SMOTE to achieve this
classification results with different decision thresholds in this ratio (R3). A summary of the four resampling scenarios together
coordinate system. Once the curve is plotted for a classifier, the with the resulting number of observations used in each scenario is
AUC can be measured and reported as a unified performance summarized in Table 3. It should be noted that, in all of these scenar-
measure (Fawcett 2006). Further details in this regard can be ios, the resampling is only performed on the training set, and the test
found in the literature (Huang and Ling 2005; Chawla 2005). set was not resampled to achieve a more reliable evaluation.
A second option in dealing with class-imbalanced data is the use
of cost-sensitive classifiers. This methodology is based on the idea
Class Imbalance Treatment
that the relative costs of FP and FN errors are introduced in model
One of the challenges frequently encountered in classification tasks training by internal instance reweighting, such that the model is
is that the class attribute may have an imbalanced distribution, constructed by minimizing the misclassification cost rather than
meaning that one label (minority) of the class attribute is much less raw accuracy, thus making the base classifier cost-sensitive (Witten
frequent than the other (majority). Often, the detection and predic- et al. 2011). In this investigation, a cost-sensitive meta classifier
tion of the minority group is more of interest. Nevertheless, classifi- was used, which takes as input a cost matrix and the base classifier
cation algorithms generally tend to focus on maximizing the correct to be used, which in this case was either a decision tree or a random
classifications (accuracy) rather than on the minority group. Hence, forest. At this stage of the investigation, the true costs of the prob-
when a model is trained on a class-imbalanced data set, good accu- lem are unknown and would need to be derived from detailed risk
racy (and thus very low FPR) is achieved, whereas most of the mi- analysis considering bridge owners’ feedback. However, in this
nority instances are incorrectly classified as majority (unacceptably investigation, the approach was tested using three cases of relative

TP + TN
ACC =
Actual TP + TN + FP + FN

Posted Unposted FP
FPR =
FP + TN
Posted TP FP
Predicted
Unposted FN TN FN
FNR =
TP + FN
(a) (b)

Fig. 4. (a) Confusion matrix for the classification task; (b) performance criteria equations (ACC denotes accuracy)

Table 3. Different Implemented Resampling Scenarios

Number Resampling scenarios Minority [posted (%)] Majority [unposted (%)] Number of instances
R0 Original data 3,291 (8.9) 33,792 (91.1) 37,083
R1 Majority undersampling 1:1 3,291 (50) 3,291 (50) 6,582
R2 Majority undersampling 3:1 3,291 (25) 9,873 (75) 13,164
R3 SMOTE minority oversampling 242.3% 11,264 (25) 33,792 (75) 45,056
R4 Majority undersampling 3:1 þ SMOTE minority oversampling 200% 9,873 (50) 9,873 (50) 19,746

© ASCE 04017076-7 J. Bridge Eng.

J. Bridge Eng., 2017, 22(10): 04017076


costs, as summarized in Table 4. The values in this table show the Decision-Tree Results
relative cost of the misclassification of a positive instance versus a
To find the best model and optimize performance, decision-tree
negative one. For example, C2 is a scenario in which misclassifying
models were trained on the data set considering different class-
a load-posted bridge as an unposted one (with possibly severe out-
imbalance treatments as discussed earlier, while also varying the
comes, such as loss of life) is assumed to be 10 times costlier than minimum leaf-size parameter. Figs. 5(a–c) shows that the R0 sce-
vice versa (which is only an overly conservative and wasteful nario, which includes the intact imbalanced data, provides the low-
decision). est predictive quality across all leaf sizes with respect to the positive
Downloaded from ascelibrary.org by The State University of New York at Buffalo (SUNY - Buffalo) on 06/09/24. Copyright ASCE. For personal use only; all rights reserved.

(posted) class by FN errors of more than 50%. This means that over
Results 50% of all posted bridges have been predicted as not posted. In con-
trast, R1 is the strongest scenario with FNRs between 10 and 20%,
This section presents the results of training, tuning, and testing sev- with R4 as the second best. However, R1 has the highest FPR, and
eral viable models. All modeling and analysis reported in this sec- thus the worst performance in detecting the unposted bridges.
tion were done using Weka 3.6. Weka is an open-source machine- Therefore, a question arises as to which model should be selected.
learning software package developed at the University of Waikato, The answer is given by the AUC criterion [Fig. 5(c)]. Interestingly,
New Zealand, which offers efficient implementations of many clas- R4 possesses the highest AUC and thus the best overall predictive
performance. Intuitively, both R2 and R3 have a 3:1 ratio of
sification algorithms, including decision trees and random forests,
unposted to posted instances, thus giving a higher weight to
that were used in this work (Witten et al. 1999).
unposted predictions yielding a higher FNR. In addition, R3 is
slightly better than R2 according to the AUC, which is in agreement
Table 4. Relative Misclassification Costs Used in Cost-Sensitive with Chawla et al. (2002), who stated that SMOTE is expected to
Classifiers
perform better than simple subsampling. Also, although R4 and R1
Number of both have a 1:1 class ratio, R4 is derived from a 3-times larger sam-
Number Cost-sensitive scenarios FPR cost FNR cost instances ple size, thus constructing a more powerful model, and as such is
selected as the preferred model. Another conclusion from Fig. 5(c)
C0 Original data 1 1 37,083
is that the minimum leaf size of 25 provides the best AUC and will
C1 Low FNR cost 1 5 37,083
thus be considered the optimum leaf size for the models developed
C2 Moderate FNR cost 1 10 37,083
in this work. Figs. 5(d–f) also present the results of cost-sensitive
C3 High FNR Cost 1 15 37,083
cases for decision-tree induction. In a manner similar to the

Resampling Scenarios
R0 R1 R2 R3 R4 R0 R1 R2 R3 R4 R0 R1 R2 R3 R4
0.35 0.95
Area Under ROC Curve (AUC)
False Negative Rate (FNR)

0.85
False Positive Rate (FPR)

0.93
0.75 0.3
0.91
0.65 0.25 0.89
0.55 0.2 0.87
0.45 0.85
0.15 0.83
0.35
0.1 0.81
0.25 0.79
0.15 0.05
0.77
0.05 0 0.75
5 25 50 75 100 125 150 5 25 50 75 100 125 150 5 25 50 75 100 125 150
Minimum Leaf Size Minimum Leaf Size Minimum Leaf Size

(a) (b) (c)

Cost-Sensitive Scenarios
C0 C1 C2 C3 C0 C1 C2 C3 C0 C1 C2 C3
Area Under ROC Curve (AUC)

0.85 0.25 0.93


False Negative Rate (FNR)

False Positive Rate (FPR)

0.75 0.91
0.2
0.65 0.89
0.55 0.15 0.87
0.45 0.85
0.35 0.1 0.83
0.25 0.81
0.05
0.15 0.79
0.05 0 0.77
5 25 50 75 100 125 150 5 25 50 75 100 125 150 5 25 50 75 100 125 150
Minimum Leaf Size Minimum Leaf Size Minimum Leaf Size
(d) (e) (f)

Fig. 5. Decision-tree results for different leaf sizes (a and d) FNR; (b and e) FPR; (c and f) AUC

© ASCE 04017076-8 J. Bridge Eng.

J. Bridge Eng., 2017, 22(10): 04017076


resampling scenarios, Case C0 (the intact original imbalanced data) 100, including a few of the short decision paths, showing how the tree
shows unacceptably higher FNR and significantly lower FPR com- can be used to predict the load-posting status of a bridge under ques-
pared with the three cost-sensitive cases. This highlights the effect of tion. For example, this tree indicates that, for older bridges, mainte-
class imbalance and confirms the need to use proper class-imbalance nance responsibility is a very useful predictor, whereas younger
treatment methods. Also, although C2 and C3 have very close FNRs, bridges require a more complex network of predictors. Thus, the tree
the higher FPR of C3 results in its overall lower AUC. Therefore, C2 not only offers a simple and understandable tool to the owners as a
is chosen as the best cost-sensitive decision-tree model. Fig. 5(f) also preliminary screening step or an alternative to judgment-based rating
confirms a minimum leaf size of 25. Values of performance measures for bridges with missing as-built plans, but can also serve to illumi-
Downloaded from ascelibrary.org by The State University of New York at Buffalo (SUNY - Buffalo) on 06/09/24. Copyright ASCE. For personal use only; all rights reserved.

for the best resampled (R4) and the best cost-sensitive (C2) tree mod- nate trends and relationships previously undetected.
els are presented in Table 5.
The resulting structure of a developed decision-tree model can
Random-Forest Results
shed light on the classification problem and the interaction of attrib-
utes within the problem. For schematic purposes, Fig. 6 illustrates a Random-forest models were constructed for the problem by varying
sample decision tree using R1 resampling and a minimum leaf size of the number of trees in each forest and number of attributes in each

Table 5. Results of Testing the Models on Unseen Validation set of 5,151 Bridges

FNR (%) FPR (%) Accuracy (%) AUC


Number Model Test Val. Test Val. Test Val. Test Val.
M1 DT, C2, leaf size = 25 12.9 12.1 16.0 15.6 84.3 84.7 0.917 0.925
M2 DT, R4, leaf size = 25 16.7 16.7 13.6 12.8 86.1 86.9 0.896 0.909
M3 RF, 200 trees and R1 9.8 9.9 15.2 13.5 85.3 86.8 0.941 0.947
M4 RF, 200 trees and R2 23.9 22.8 6.4 6.6 92.0 92.0 0.945 0.951
M5 RF, 200 trees and R4 19.7 16.2 8.9 8.9 90.1 90.4 0.942 0.951
Note: DT = decision tree; RF = random forest.

AGE
≤ 54 yrs > 54 yrs

SUBSTRUCTURE Poor MAINTENANCE


CONDITION POSTED* RESPONSIBILITY
State Other
Other

DECK ≤ 7.2m DECK ≤ 7.5m ECONOMIC Other


WIDTH
POSTED WIDTH
POSTED INDEX
POSTED

> 7.2 m > 7.5m High

DESIGN Heavy CLIMATE SE SPAN Simple


LOAD NOT POSTED ZONE
POSTED CONTINUITY
POSTED

Other Other Continuous

CLIMATE SE
ZONE
POSTED NOT POSTED NOT POSTED

Other
* “POSTED” denotes a bridge that requires load posng.
ECONOMIC Other
INDEX
NOT POSTED Sample Decision Paths
Moderate If age is less than 54, substructure condion is Other, deck
width is greater than 7.2m and design load is Heavy, then
KIND OF Roads
HIGHWAY POSTED structure is not to be posted.

If age is greater than 54, maintenance responsibility is


Other
Other, economic index is High and span is simple, then
structure is to be posted.
NOT POSTED

Fig. 6. Sample decision tree with R1 resampling and minimum leaf size of 100; SE denotes southeast

© ASCE 04017076-9 J. Bridge Eng.

J. Bridge Eng., 2017, 22(10): 04017076


tree. The number of trees was varied from 50 to 250; all performance bridges without plans. Because of the subjective nature of this
measures improved in larger forests, with 200 trees providing a bal- approach, the accuracy of the results depends on the knowledge and
anced choice between performance and computational complexity. experience of the engineer and, therefore, cannot be easily quanti-
Fig. 7 depicts the relationship between accuracy and the number of fied. However, in the case of the few states where auxiliary tables or
features in each tree, compared across multiple resampling scenar- flowcharts are provided in the load-rating manuals, the accuracy
ios. Without resampling, the R0 model has a FNR of approximately can be assessed. To demonstrate the advantage of the proposed
50% and a FPR of 2%, indicating that this model fails in detecting approach over the current practice, both methods were applied on
the positive (posted) class, but is very successful in detecting the the validation set. Note that this is a set of 5,151 randomly selected
Downloaded from ascelibrary.org by The State University of New York at Buffalo (SUNY - Buffalo) on 06/09/24. Copyright ASCE. For personal use only; all rights reserved.

negative class. This result is to be expected considering the class bridges with plans for which actual quantitative load ratings exist
imbalance and is consistent with the decision-tree results. The lowest (ground truth). The proposed approach (Model M3 as an example)
FNR and highest FPR belong to R1 and R4, respectively; however, is compared with the approach shared by Oregon, Washington,
the FPRs are all below 15% for R1 and 10% for R4. The AUC scores Idaho, and Kentucky, in which any bridge with a condition rating of
show a general decreasing trend with an increase in number of attrib- 4 or below is to be posted (ODOT 2015; WSDOT 2015; ITD 2014;
utes, which is an expected behavior with random forests with 4, 5, KYTC 2015). The methods in the Texas and Pennsylvania load-
and 6 attributes working best, which is again in line with the value of rating manuals require knowledge of the appearance of the member
pffiffiffiffiffi
p recommended by Breiman (2001); however, Fig. 7(c) shows and signs of distress, and thus cannot be directly used for compari-
that all scenarios are comparable in terms of overall performance son. As an alternative, assuming no signs of structural distress, the
because the largest difference between AUCs of any two scenarios condition ratings in the Texas DOT flowchart were used to create
is approximately 0.5%. Although R2 and R0 show a negligible another baseline (TxDOT 2013). Table 6 summarizes the perform-
advantage in AUC, Models R1, R2, and R4 were selected as the ance comparison and shows that the proposed approach provides
most efficient models because they also have satisfactory FNR and superior performance, despite an overall lower accuracy, with a sig-
FPR values. Cost-sensitive random-forest models exhibited similar nificantly lower FNR while maintaining an acceptable FPR. As
trends, with AUC values comparable to the R2 and R0 models, but stated before, FNR error is a matter of safety, as an insufficient
for these models, the FNRs ranged from 27 to 36%, whereas the bridge is allowed to remain open, whereas the FPR error corre-
FPRs ranged from 3.5 to 6%; consequently, none of those models sponds to conservative misclassifications that lead to unnecessary
were deemed suitable as the preferred model. restrictions.

Final Evaluation of Validation Set


Applications and Discussion
Two sets of performance metrics are reported in Table 5 for the five
final best models. The first values (Test) represent the performance The data-driven approach introduced in this paper can be readily
calculated on the test set as part of the tuning process. Moreover, used for two main applications. First, the models can be used on the
each model is tested on the independent validation set (Val.), which population of bridges without plans to complement and verify the
has never been seen or used in the modeling process and thus can judgment-based rating. Second, the approach can be used to screen
more effectively predict the performance on any future unseen data. a population of bridges and identify which may be misclassified and
It can be seen from these performance metrics that all of the models warrant further investigation.
perform very well by producing validation results in close agreement
with the performance expected by the test results, thus confirming Table 6. Comparison of Performance of the Proposed Approach with
the models’ generalizability. Also, considering the ranges of FNR, Current Practice
FPR, and accuracy, it is expected that these models can successfully
predict nearly 77–90% of the posted bridges, 84–93% of the Method FNR (%) FPR (%) Accuracy (%)
unposted bridges, and 85–92% of the entire population. Proposed method (M3) 9.9 13.5 86.8
Oregon, Washington, Idaho, 89.5 2.3 90.0
Comparison with Current Practice Kentucky DOT methods
Using condition ratings of Texas 45.8 14.1 83.1
It was noted in the “Introduction” that the majority of state DOTs
DOT flowchart
prescribe engineering judgment by qualified engineers for rating

R0 R1 R2 R3 R4 R0 R1 R2 R3 R4 R0 R1 R2 R3 R4
0.7 0.25 0.95
Area Under ROC Curve (AUC)
False Negative Rate (FNR)

False Positive Rate (FPR)

0.6 0.945
0.2
0.5
0.94
0.15
0.4
0.935
0.3 0.1
0.93
0.2
0.05
0.1 0.925

0 0 0.92
3 5 7 9 11 13 15 17 19 21 23 25 3 5 7 9 11 13 15 17 19 21 23 25 3 5 7 9 11 13 15 17 19 21 23 25
Number of attributes in each tree Number of attributes in each tree Number of attributes in each tree
(a) (b) (c)

Fig. 7. Comparison of different resampling scenarios for random forests with 200 trees: (a) FNR; (b) FPR; (c) ROC

© ASCE 04017076-10 J. Bridge Eng.

J. Bridge Eng., 2017, 22(10): 04017076


Downloaded from ascelibrary.org by The State University of New York at Buffalo (SUNY - Buffalo) on 06/09/24. Copyright ASCE. For personal use only; all rights reserved.

Fig. 8. Application of proposed approach for inventory screening

Model Deployment on Bridges without Plans Conclusions


As an initial application of the proposed approach, Model M3 (low-
Machine learning was leveraged to construct models that learn the
est FNR) and Model M4 (lowest FPR) were applied to the popula-
underlying relationships between a number of bridge properties and
tion of 5,151 bridges without plans in an effort to assess the quality
the load-capacity status of the structure. Decision-tree models were
of current judgment-based load postings. Using M3 results, 8.1% of
used to provide a flowchart-type tool to guide bridge owners and
all bridges currently posted by judgment (52 cases) may not need
load posting and should be considered for further evaluation and maintenance officials while their performance was compared with
possible removal of load postings. In contrast, according to M4, random forests. Both sets of models, with the required treatment of
18.0% of bridges currently unposted (812) may actually need to be class imbalance, were found to yield satisfactory results in terms of
posted and should be considered for more careful rating analysis accuracy in both posted and unposted classes. A number of selected
and possible posting. This is consistent with the findings of Table 6 models were then validated on an independent set of over 5,000
resulting from the application of the current practice models on bridges, and the observed performance was compared with a num-
bridges with plans, whose true load postings are known from ana- ber of existing judgment-based ratings found through a survey of
lytical calculations, wherein the current practice resulted in a signif- the current state of practice in the United States. The superior per-
icantly higher FNR than FPR error. This in turn demonstrates that formance observed in this comparison proves the power of the
current judgment-based ratings are biased toward economy and trained models while also highlighting the inherent bias of the cur-
away from safety, with a higher tendency to misclassify a deficient rent practice toward economy over safety. Finally, the application
structure that seems sufficient than its seemingly insufficient coun- of the models on bridges without plans resulted in a significantly
terpart. One plausible explanation is that human judgment is not higher number of unposted cases that may need to be posted than
fully capable of detecting hidden deterioration mechanisms taking vice versa, reaffirming the bias found in the current judgment-based
place within a structure. This suggests that more effective load- ratings. The misclassified structures in this set may also serve to
rating and load-posting analyses are required for structures without identify candidate bridges for closer examination, as they may pos-
plans to reach a desirable level of safety in the national infrastruc- sess yet undetected deterioration. Considering these results and the
ture network. Because load-testing and nondestructive evaluation prohibitive cost and resource demand associated with performing
techniques are cost prohibitive to implement indiscriminately, the load tests on bridge populations, the proposed approach is shown to
approach presented in this paper is therefore proven to be a useful be of value for prioritizing maintenance operations and optimizing
tool for prioritizing those bridges most in need of a more quantita- the allocation of resources in infrastructure management practice.
tive analysis. Future work will extend the proposed approach to other classes of
structures and other bridge safety and functionality measures, pro-
Inventory Screening viding a step toward data-driven infrastructure management.

To aid in the allocation of resources for improving safety and traffic


flow, these models can be used to highlight bridges that would Acknowledgments
likely benefit from a more thorough evaluation. Unposted bridges
that the model predicts to be posted may have hidden load-carrying This work was part of a project sponsored by the Virginia
deficiencies. Conversely, posted bridges predicted to be unposted Department of Transportation (VDOT) on load rating bridges with
may have sufficient reserve capacity to safely carry unrestricted missing or insufficient as-built information. The authors thank Dr.
traffic. Such bridges may have been subject to overly conservative Bernie Kassner from the Virginia Transportation Research Council
analysis and can be candidates for further evaluation and possible (VTRC), Jonathan Mallard from the VDOT, and Dr. Amir Gheitasi
load-posting removal, thereby increasing the flow of commercial for their assistance during the study and manuscript preparation. The
and emergency vehicles. The flowchart in Fig. 8 illustrates the pro- contents of this paper reflect the views of the authors, who are
cess of applying the data-driven models for the aforementioned responsible for the facts and accuracy of the presented data, but do
tasks. not necessarily reflect the official views of the VDOT.

© ASCE 04017076-11 J. Bridge Eng.

J. Bridge Eng., 2017, 22(10): 04017076


Supplemental Data ITD (Idaho Transportation Dept.). (2014). Idaho manual for bridge evalua-
tion, Boise, ID.
Figs. S1–S22, showing histograms of the attributes of Table 2, and Jootoo, A., and Lattanzi D. (2016). “A machine learning approach to bridge
Table S1, showing a subset of the data, are available online in the prototyping.” Proc., Int. Conf. on Sustainable design, Engineering and
Construction, ASCE Architectural Engineering Institute and Arizona
ASCE Library (www.ascelibrary.org).
State Univ.
Karl, T., and Koss, W. J. (1984) “Regional and national monthly, seasonal,
and annual temperature weighted by area (Historical climatology se-
References
Downloaded from ascelibrary.org by The State University of New York at Buffalo (SUNY - Buffalo) on 06/09/24. Copyright ASCE. For personal use only; all rights reserved.

ries), National Climatic Data Center, Asheville, NC.


Kim, Y. J., and Yoon, D. K. (2010). “Identifying critical sources of bridge
AASHTO. (2011). The manual for bridge evaluation, 2nd Ed., Washington, DC. deterioration in cold regions through the constructed bridges in North
AASHTO. (2012). LRFD bridge design specifications, 6th Ed., Dakota.” J. Bridge Eng., 10.1061/(ASCE)BE.1943-5592.0000087,
Washington, DC. 542–552,
Alipour, M., Gheitasi, A., Harris, D. K., Ozbulut, O. E., and Barnes, L. E. KYTC (Kentucky Transportation Cabinet). (2015). Kentucky bridge inspec-
(2016). “A data-driven approach for automated operational safety evalu- tion procedures manual, Division of Maintenance, Bridge Preservation
ation of the national inventory of reinforced concrete slab bridges.” Branch, Frankfort, KY.
Transportation Research Board (TRB) 95th Annual Meeting, Li, Z., and Burgueño, R. (2010). “Using soft computing to analyze inspec-
Transportation Research Board, Washington, DC. tion results for bridge evaluation and management.” J. Bridge Eng., 10
Amiri, M., Ardeshir, A., Fazel Zarandi, M. H. and Soltanaghaei, E. (2016). .1061/(ASCE)BE.1943-5592.0000072, 430–438.
“Pattern extraction for high-risk accidents in the construction industry: A MassDOT (Massachusetts Dept. of Transportation). (2013). LRFD bridge
data-mining approach.” Int. J. Inj. Control Saf. Promotion, 23(3), 264–276. manual: Part I, Boston.
ASCE. (2013). “Report card for America’s infrastructure.” hhttp://www Melhem, H. G., and Cheng, Y. (2003). “Prediction of remaining service life
.infrastructurereportcard.org/i (July 25, 2015). of bridge decks using machine learning.” J. Comput. Civ. Eng., 10.1061
Bektas, B. A., Carriquiry, A., and Smadi, O. (2013). “Using classification /(ASCE)0887-3801(2003)17:1(1), 1–9.
trees for predicting national bridge inventory condition ratings.” J. Melhem, H. G., Cheng, Y., Kossler, D., and Scherschligt, D. (2003).
Infrastruct. Syst., 10.1061/(ASCE)IS.1943-555X.0000143, 425–433. “Wrapper methods for inductive learning: Example application to
Breiman, L. (2001). “Random forests.” Mach. Learn., 45(1), 5–32. bridge decks.” J. Comput. Civ. Eng., 10.1061/(ASCE)0887-3801(2003)17:
Chawla, N. V. (2005). “Data mining for imbalanced datasets: An over- 1(46), 46–57.
view.” Data mining and knowledge discovery handbook, Springer, New Mitchell, T. M. (1997). Machine learning, McGraw-Hill, Columbus, OH.
York, 853–867. Morcous, G. (2005). “Modeling bridge deck deterioration by using decision
Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. (2002). tree algorithms.” Transp. Res. Rec., 11s, 509–516.
“SMOTE: Synthetic minority over-sampling technique.” J. Artif. Intell. NBIS (National Bridge Inspection Standards). (2004). Code of Federal
Res., 16, 321–357. Regulations, No. 23CFR650, U.S. Government Printing Office,
CODOT (Colorado Dept. of Transportation). (2011). Bridge rating manual, Washington, DC.
Staff Bridge Branch, Denver, CO. NCHRP (National Cooperative Highway Research Program). (2014).
Dasu, T., and Johnson, T. (2003). Exploratory data mining and data clean- “State bridge load posting processes and practices—A synthesis of high-
ing, Vol. 479, John Wiley and Sons, Hoboken, NJ. way practice.” Synthesis 453, Washington, DC.
DHS (Dept. of Homeland Security). (2013). “What is critical infrastruc- NDOR (Nebraska Dept. of Roads). (2010). Bridge inspection program
ture?” hhttp://www.dhs.gov/what-critical-infrastructurei (July 25, 2015). manual, Bridge Division, Lincoln, NE.
Farrar, C. R., and Worden, K. (2012). Structural health monitoring: A ODOT (Oregon Dept. of Transportation). (2015). LRFR manual, Salem, OR.
machine learning perspective, John Wiley and Sons, Chichester, U.K. OIG (Office of Inspector General). (2006). “Audit of oversight of load rat-
Fawcett, T. (2006). “An introduction to ROC analysis.” Pattern Recognit. ings and postings on structurally deficient bridges on the national high-
Lett., 27(8), 861–874. way system.” Rep. No. MH-2006-043, U.S. Dept. of Transportation,
FDOT (Florida Dept. of Transportation). (2014). Bridge load rating man- Washington, DC.
ual, Office of Maintenance, Tallahassee, FL. PennDOT (Pennsylvania Dept. of Transportation). (2010). Bridge safety
FHWA (Federal Highway Administration). (2014). “National Bridge inspection manual, Harrisburg, PA.
Inventory Database.” hhttps://www.fhwa.dot.gov/bridge/nbi.cfmi. (July Quinlan, J. R. (1993). C4. 5: Programs for machine learning, Morgan
25, 2015). Kauffmann, San Mateo, CA.
Gheitasi, A., and Harris, D. K. (2015). “Implications of overload distribu- Saitta, S., Benny, R., and Smith, I. (2009). Data mining: Applications in
tion behavior on load rating practices in steel stringer bridges.” Transp. civil engineering, VDM Verlag, Saarbrücken, Germany.
Res. Rec., 2522, 47–56. TxDOT (Texas Dept. of Transportation). (2013). Bridge inspection manual,
Gullien, J., et al. (2015). “Predictive models for severe sepsis in adult ICU Austin, TX.
patients.” Proc., Systems and Information Engineering Design Symp. UDOT (Utah Dept. of Transportation). (2014). Bridge management man-
(SIEDS), IEEE, New York. ual, Salt Lake City.
Harris, D. K., Ozbulut, O. E., Alipour, M., Usmani, S., and Kassner, B. L. Visa, S., and Ralescu, A. (2005). “Issues in mining imbalanced data sets-a
(2015). “Implications of load rating bridges in Virginia with limited review paper.” Proc., 16th Midwest Artificial Intelligence and Cognitive
design or as-built details.” Proc., 7th Int. Conf. on Structural Health Science Conf., Univ. of Dayton, Dayton, OH.
Monitoring of Intelligent Infrastructure, Curran Associates, Inc., Red Weka 3.6 [Computer software]. University of Waikato, Hamilton, New
Hook, NY. Zealand.
Hastie, T., Tibshirani, R., and Friedman, J. (2009). The elements of statisti- Witten, I. H., Frank, E., and Hall, M. A. (2011). Data mining: Practical
cal learning: Data mining, inference, and prediction (Springer series in machine learning tools and techniques, 3rd Ed., Morgan Kaufmann,
statistics), 2nd Ed., Springer, New York. Burlington, MA.
Hergenroeder, K., et al. (2014). “Automated prediction of adverse post- Witten, I. H., Frank, E., Trigg, L. E., Hall, M. A., Holmes, G., and Cunningham,
surgical outcomes.” Proc., Systems and Information Engineering Design S. J. (1999). “Weka: Practical machine learning tools and techniques with
Symp. (SIEDS), IEEE, New York. Java implementations.” Proc., ICONIP/ANZIIS/ANNES99 Workshop
Huang, J., and Ling, C. X. (2005). “Using AUC and accuracy in evaluating Future Directions for Intelligent Systems and Information Sciences, Asia-
learning algorithms.” IEEE Trans. Knowl. Data Eng., 17(3), 299–310. Pacific Neural Network Assembly, Dept. of Industry, Science and Tourism,
Huang, Y. H. (2010). “Artificial neural network model of bridge deteriora- Murdoch Univ., and Univ. of New South Wales.
tion.” J. Perform. Constr. Facil., 10.1061/(ASCE)CF.1943-5509.0000124, WSDOT (Washington State Dept. of Transportation). (2015). Bridge
597–602. inspection manual, Bridge Preservation Office, Olympia, WA.

© ASCE 04017076-12 J. Bridge Eng.

J. Bridge Eng., 2017, 22(10): 04017076

You might also like