Professional Documents
Culture Documents
Three-Dimensional Vision-Based Structural Damage Detection and Loss Estimation - Towards More Rapid and Comprehensive Assessment
Three-Dimensional Vision-Based Structural Damage Detection and Loss Estimation - Towards More Rapid and Comprehensive Assessment
by
Xiao Pan
DOCTOR OF PHILOSOPHY
in
THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES
(Civil Engineering)
December 2022
in Civil Engineering
Examining Committee:
Tony T.Y. Yang, Professor, Department of Civil Engineering, UBC
Supervisor
Carlos Ventura, Professor, Department of Civil Engineering, UBC
Supervisory Committee Member
ii
Abstract
Civil engineering structures such as buildings and bridges inevitably experience damage due
to aging effects and natural disasters such as earthquakes. Damage inspection of these structures
is of vital importance to maintain their functionalities. Earlier damage identification before natural
disasters can greatly alleviate or prevent catastrophic failure in the event of natural disasters.
Traditional manual inspection is inefficient and highly reliant on the proper training and experience
of inspectors, which may result in false conclusions and erroneous evaluation reports. In recent
decades, structural health monitoring (SHM) methods such as vibration-based SHM and non-
destructive testing and evaluation (NDTE) methods have been developed to automate the
inspection process. These methods generally require relatively complicated and expensive
instrumentation to evaluate the conditions of structures. More recently, computer vision-based (or
vision-based) SHM has been established as a convenient, economical and efficient complementary
approach to the other SHM methods for civil structures. In comparison to contact-type vibration
sensors, vision-based methods use low-cost and non-contact sensors, and easy installation and
operation. However, most existing vision-based SHM methods are built on 2D computer vision
where the evaluation outcomes are sensitive to camera locations and poses. Besides, these 2D
vision methods are limited to the evaluation of in-plane damages, while not directly capable of
quantifying damages in 3D space. In short, existing 2D vision methods may not provide a reliable
To address these limitations, this dissertation proposes a 3D vision-based SHM and loss
estimation framework, which aims to provide a more rapid and comprehensive damage evaluation
iii
and loss assessment of civil structures. Within the framework, the dissertation is strongly focused
on the development and application of advanced 2D and 3D vision-based SHM methods for civil
structures. Experiments of the vision algorithms developed have been conducted on three prevalent
structural types including reinforced concrete structures, steel structures and structural bolted
components. Results show that the proposed 3D vision-based damage evaluation and loss
quantification framework can achieve high accuracy and low cost in damage recognition,
localization and quantification, and provide more comprehensive assessment results which can be
iv
Lay Summary
The civil and structural engineering industries in many countries are pursuing faster
construction, more efficient and automated operation and maintenance of civil structures which
are smarter, more sustainable and resilient. This requires advanced and efficient automated
technologies to be developed and validated in this field. This dissertation proposes a 3D computer
vision-based structural damage evaluation and loss quantification framework, which aims to
provide a more rapid and comprehensive automated inspection methodology for civil structures.
Successful implementation of these automated technologies will greatly enhance the consistency
and efficiency in civil infrastructure inspection and maintenance, thus making the cities and civil
structures smarter, more resilient and sustainable. Moreover, it will greatly help alleviate the labor
v
Preface
Most of this dissertation has been published or under review in peer-reviewed journals. The
following summarizes the publications and manuscripts under review where the contributions of
Published/Accepted contributions
The following publications were prepared by the primary author, Xiao Pan, whereas the
coauthors provided technical and editorial comments. The author of this dissertation is responsible
for the literature review, experimental tests, data collection, formulation development,
computational analysis, data processing, and results presentation of the publications as detailed
below:
1. Pan, X., & Yang, T. Y. (2020). Postdisaster image‐based damage detection and repair cost
5.
2. Pan, X., & Yang, T. Y. (2021). Image-based monitoring of bolt loosening through deep-
steel plate structures using structure from motion, deep learning and point cloud
vi
4. Pan, X., Vaze, S., Xiao, Y., Tavasoli, S., Yang T.Y. “Structural damage detection of steel
corrugated panels using computer vision and deep learning.” Canadian Society for Civil
and Appendix B
In the below manuscript, the author of this dissertation is responsible for the experimental
testing, data collection and data preprocessing for training and validation of algorithms in Chapter
4 of the dissertation.
5. Yang, T. Y., Li, T., Tobber, L., & Pan, X. (2020). Experimental and numerical study of
Under-review contributions
The following under-review manuscripts were prepared by the primary author, Xiao Pan,
whereas the coauthors provided technical and editorial comments. The author of this dissertation
is responsible for the literature review, experimental tests, data collection, formulation
development, computational analysis, data processing, and results presentation of the publications
as detailed below:
6. Pan, X., Yang, T. Y., Xiao, Y., Yao, H., & Adeli, H. (2022). Vision-based real-time
7. Pan, X., Yang, T. Y. (2022). 3D vision-based structural bolt loosening quantification using
vii
In the below manuscript, the author of this dissertation is responsible for the implementation
8. Tavasoli, S., Pan, X., Yang, T. Y. (2022). Autonomous indoor navigation and vision-based
damage assessment of reinforced concrete structures using low-cost drones. Under review
– Chapter 4.
viii
Table of Contents
Preface ................................................................................................................................... vi
ix
2.4 TLS-based SHM methods ......................................................................................... 14
2.5.3.2 Developments and applications of DL-based vision methods for SHM of civil
structures 19
Chapter 3: 3D vision-based SHM and loss estimation framework for civil structures –
xi
4.2.2.4 Component-level damage state determination ............................................ 66
xii
4.3.2.3 Localization of points of interest .............................................................. 105
4.4 Vision-based SHM methods for structural bolted components .............................. 120
4.4.2 Overview and application scenarios of the proposed bolt loosening quantification
4.4.3.1 Overview of real-time integrated detection and tracking framework ....... 127
xiii
4.4.4.2 Multi-view structural bolted device detection .......................................... 144
4.4.5.1 Training and testing results of RCNN, YOLOv3 and YOLOv3-tiny ....... 150
xiv
5.4 Discussion of the case study ................................................................................... 181
6.3.1 Development of new 3D vision methods for structural damage assessment .. 188
6.3.3 Integration of advanced robotic technologies into the proposed framework .. 193
Bibliography ........................................................................................................................195
Appendices...........................................................................................................................221
xv
List of Tables
Table 4-3 System-level and component-level training parameters and performance of transfer
Table 4-5 Summary of the steel exposure quantification of the RC columns ....................... 89
Table 4-7 Comparison of the estimated out-of-plane displacements with the ground truth
Table 4-8 Parameters examined for KLT tracking algorithms ............................................ 140
Table 4-10 Speed comparison of RCNN, YOLOv3 and YOLOv3-tiny ............................. 152
Table 4-11 Comparison of the estimated rotation and ground truth rotation for the six bolts in
Table 4-12 Complete results of parameter studies, expressed by the accuracy of rotation
Table 5-1 Fragility data for a sample concrete column (component ID: B1041.031a). ...... 178
xvi
Table 5-2 Description of the damage states for the sample component .............................. 178
xvii
List of Figures
Figure 3.6 Vision-based damage localization: steel reinforcement exposure localization .... 35
Figure 3.8 3D vision-based damage quantification pipeline for structural components ....... 37
Figure 4.3 The schematic architecture of YOLOv2 built on ResNet-50 for steel ................. 61
Figure 4.4 Relationship between Mean IoU and number of dimension priors ...................... 63
Figure 4.6 Geometric properties of the predicted bounding box and the anchor prior ......... 66
xviii
Figure 4.7 Sample images of RC columns that classification model wrongly identifies as DS
Figure 4.9 System-level collapse identification for training and validation sets using ResNet-
Figure 4.10 System-level collapse versus no collapse: confusion matrices of (a) training set
Figure 4.11 Sample testing images of the building with predicted probability for each class
....................................................................................................................................................... 77
Figure 4.12 Component-level DS classification for training and validation sets using ResNet-
Figure 4.13 Component-level damage state identification: confusion matrices of (a) training
Figure 4.14 True prediction of sample testing images of the building with predicted probability
Figure 4.15 False prediction of sample testing images with ground truth of “Severe Damage”
....................................................................................................................................................... 80
Figure 4.16 Recall - precision curve of training (upper) and testing (testing) ...................... 82
Figure 4.17 Detection of steel bars highlighted by yellow rectangular bounding boxes (a)
Sample testing images, (b) detection of exposed steel bars (lower) in testing images wrongly
xix
Figure 4.18 Confusion matrix with consideration of only the classification model (left); the
Figure 4.27 Sample testing results of YOLOv3-tiny on the rendered scenes and real-world
Figure 4.28 Plane segmentation for an I-shaped steel plate damper ................................... 107
Figure 4.31 Out-of-plane displacement measurements for the steel plate damper. The units are
in [mm]........................................................................................................................................ 113
Figure 4.33 Vision-based 3D reconstruction procedures for the steel corrugated plate wall
..................................................................................................................................................... 116
xx
Figure 4.34 One iteration of plane segmentation for the steel corrugated panel. The units are
in [mm]........................................................................................................................................ 116
Figure 4.35 Illustration of the reference plane for the steel corrugated plate wall. The units are
in [mm]........................................................................................................................................ 117
Figure 4.36 Quantification of out-of-plane displacement distribution for the steel corrugated
Figure 4.41 Hough transformation of the original image, smoothed image, and sharpened
image, using the Canny method, Prewitt method and Log method, respectively. ...................... 135
Figure 4.43 Vision-based 3D reconstruction of the structural bolted device ...................... 143
Figure 4.46 Illustration of loosened bolts and tight bolts. ................................................... 148
Figure 4.49 Precision-recall curve for (a) training, and (b) testing ..................................... 153
Figure 4.50 Sample results of YOLOv3-tiny detection of steel bolts ................................. 154
xxi
Figure 4.51 Montage of videos processed by the RTDT-bolt method: original video frame
with the illustration of the changing light conditions, and a highlight of the bolt under investigation
by the rectangular box; closed-up video frame, with the illustration of detection, tracking, and re-
detection. (Note: the frame index is shown at the top-left corner of each thumbnail image. Frame
Figure 4.52 Montage of videos processed by the RTDT-bolt method: closed-up video frame,
with the illustration of detection, tracking, and re-detection. (Note: the frame index is shown at the
top-left corner of each thumbnail image. Frame rate: 30 frames per second) ............................ 157
Figure 4.54 Time-history rotation estimation of the bolt in the short video with the base model
..................................................................................................................................................... 160
Figure 4.55 Montage of sample close-up frames of the short video processed by the base
model in parameter studies (Note: each thumbnail image with the labeled frame index corresponds
Figure 4.56 Sensitivity study of rotation estimation to the selected parameters, with a control
parameter of (a1-a3) number of pyramid levels, (b1-b3) maximum bidirectional error, (c1-c3)
search block size, and (d1-d3) maximum number of iterations .................................................. 164
Figure 4.58 Plane segmentation of the 3D point cloud of the friction damper (units are in mm)
..................................................................................................................................................... 167
Figure 5.3 Case study of an RC building with sample results (a) system-level identification:
non-collapse and (b) component-level damage evaluation: severe damage with detection of steel
Figure 5.4 Repair cost distribution corresponding to the hypothetical case ........................ 181
xxiii
List of Abbreviations
AE=Acoustic emissions
AI=Artificial intelligence
Conv=Convolution
DBPPC=Distance-based point-projection-clustering
DL-Deep learning
DM=Damage measure
DS=Damage state
DV=Decision variable
FC=Fully-connected
FP=Feature points
xxiv
HT=Hough transform
IM=Intensity measure
IT=Infrared thermography
KLT=Kanade-Lucas-Tomasi
PZT=Piezoelectric
RC=Reinforced concrete
RGB=Red-green-blue
ROI=Region of interest
RT=Radiographic testing
xxv
RTDT=Real-time detection and tracking
SfM=Structure-from-motion
UT=Ultrasonic testing
xxvi
Acknowledgements
This endeavor would not have been possible without the support of numerous people. I will
be forever thankful for all the relationships and connections formed during my undergraduate and
graduate studies. Life is a perpetual struggle to maintain a balance between various opposing
forces. I would have never got to this point without the endless support from my beloved family
members, friends, colleagues, and academic advisors during the days of prosperity and adversity,
particularly during the 8 years of my overseas studies.
Words cannot express my gratitude to my Ph.D. supervisor, Prof. Tony Yang, for his
invaluable feedback, encouragement, and scrutiny of my work during this incredible journey. I
have been deeply influenced by his vision, insights, and passion about research and developments
in the field of civil and structural engineering.
I also could not have undertaken this journey without my Ph.D. comprehensive exam
committee and the supervisory committee, who generously provided numerous constructive
feedback. Special thanks to Prof. Carlos Ventura and Prof. Hojjat Adeli for giving critical inputs
to help me advance to my Ph.D. candidacy and articulate my dissertation from a diverse
perspective of both academia and industries. I am very grateful to my master’s supervisor, Dr
Christian Malaga-Chuquitaype at Imperial College London, who continuously interacted with and
supported me in research during this journey. Besides, I would like to express my sincere gratitude
to my undergraduate advisors, Prof. Xinmin Zhan, Prof. Padraic O'Donoghue, Dr. Bryan McCabe,
Prof Chaosheng Zhang, at the National University of Ireland, Galway, and Prof. Yun Zou at
Jiangnan University for their thoughtful comments and recommendations in pursuit of my career
as a student and structural engineering researcher. Moreover, many thanks to Prof. Yunxin Pan at
the Hong Kong University of Science and Technology, and Prof. Qipei Mei at the University of
Alberta, for giving me important suggestions as an early-career researcher.
Over the past few years of my Ph.D. program, I have got a couple of great opportunities to
conduct various experimental tests including steel honeycomb damper tests, friction damper tests,
steel corrugated panel tests, nonlinear shake table control tests, vision-based shake table tests,
reinforced concrete column tests, timber bridge tests, and masonry prism tests. Therefore, I would
like to extend my sincere thanks to my fellow students, T. Li, T. Qiao, X. Zuo, S. Vaze, Y. Xiao,
xxvii
Y. Hsu, M. Dou, for having such opportunities to work with them on these tests, which have greatly
enriched my hands-on experience and expertise. Besides, the experimental data collected in some
of these tests are crucial to validate the essential algorithms and methods developed in this
dissertation. Furthermore, I had the pleasure of working with my colleagues, S. Tavasoli, Y. Xiao,
M. Azimi, and E. Faraji, where I got a chance to discuss many novel ideas in programming with
robotic control and sensing technologies, which stays as an active ongoing and future work during
and beyond my Ph.D. research.
I offer my enduring gratitude to the civil engineering department, the university, and all the
funding agencies. The research and teaching opportunities granted, and the professional seminars
and workshops organized by them have greatly inspired me to continue my research and teaching
in academia.
I would also like to express my profound appreciation to many of my supportive friends
outside of research and work, H. Wu, Y. Xiao, S. Tavasoli, S. Vaze, P. Kakoty, T. Li, H. Zhang,
S. Zhuo, X. Xie, H. Xu, D. Tung, M. Muazzam, R. Fu, W. Li, J. Li, to name some of them, for
sharing their values, helping me enjoy a fruitful life and build a sense of identity, resilience and
perseverance during this hard journey.
Last but not the least, I am deeply indebted to my loving parents and other family members
for their unconditional financial and moral support throughout the years. Thank you so much for
being a source of love, advice and friendship.
xxviii
Dedication
I dedicate this dissertation to my family: my loving parents, and the ones to come
xxix
Chapter 1: Introduction
1.1 Background
Civil engineering structures such as buildings and bridges deteriorate continuously due to
aging and environmental impacts. The Canadian Infrastructure Report notes that 40% of the
infrastructure is rated in poor condition, with the estimated repair or replacement costs of $141
billion (The Globe and Mail, 2016). For example, the British Columbia province of Canada is
situated in a high seismic zone, where a significant earthquake can result in $75 billion in financial
losses and thousands of casualties and injuries. Failure to understand the impact of deteriorating
infrastructure can significantly hinder the region’s, as well as Canada’s ability to recover, while
further impacting the social and economic progress. Infrastructure investments in Canada are
estimated to reach $11 trillion by 2067, of which 60-70% will be used to replace the existing
infrastructure (Future Cities Canada, 2018). Currently, there is a lack of efficient and cost-effective
consists of regular human inspections on site, which are slow and heavily dependent on the proper
training and experience of inspectors. Data obtained by visual inspection are manually analyzed
and documented by the inspectors or engineers, which is inherently biased and can be inconsistent
from time to time. This may result in false conclusions and generate erroneous evaluation reports.
On the other hand, manual inspection procedures are usually unsafe and inefficient, because the
civil structures are relatively large and can be constructed in a harsh environment. Besides, some
workers in the field of infrastructure inspection in Canada. These issues result in very high long-
1
term maintenance costs and become even more challenging when a rapid assessment of
infrastructure on a regional scale is demanded by decision-makers right after the event of natural
Within this context, there is a compelling need to provide efficient, accurate and economical
automated structural health monitoring (SHM) methods (e.g., Salawu, 1997; An et al., 2019) to
replace traditional manual inspection methods, using novel sensing technologies and data
processing algorithms, which are more reliable, consistent, and less dependent on environmental
conditions. Prior to natural disasters, engineers and researchers can use these SHM methods in
long-term monitoring of civil structures for earlier damage detection, followed by necessary repair
actions to avoid catastrophic failure of the civil structures. During and immediately after natural
disasters, the health condition of civil structures can be rapidly obtained to aid decision-makers in
allocating critical resources (such as fire trucks, ambulances and police) to prevent the risks from
have been well established in recent decades to enhance the efficiency of damage inspections. In
general, vibration-based SHM methods typically rely on contact-type sensors to measure global
structural response to identify damage based on the structural modal properties (e.g., stiffness and
damping), which are related to natural frequencies and mode shapes. On the other hand, NDTE-
based methods are widely applied to damage detection with more focus on the local component
level. While the vibration-based and NDTE-based methods have shown promising results, there
exist several limitations, such as the high sensitivity of sensors to environmental effects,
2
complicated instrumentation setup, and high cost, to name a few. A more detailed discussion of
In recent years, computer vision-based (or vision-based) SHM has been established as an
SHM methods for structural damage detection in various civil engineering applications (Spencer
Jr, Hoskere, & Narazaki, 2019). In comparison to contact-type sensors, vision-based methods use
non-contact detection and have relatively low costs in sensors, and their installation and operation.
In addition, due to the nature of cameras, the damage features in the image are immune to some
environmental effects such as temperature or humidity within the operating range. Despite the
achievements of existing vision-based SHM methods, there exist several limitations such as
insufficient algorithm speed for real-time deployment, algorithm robustness and constraints which
hamper their real-world applications, insufficient level of damage evaluation which does not
provide accurate and comprehensive damage assessment, etc. Detailed discussion about the
SHM and loss estimation framework to provide a more rapid and comprehensive performance
assessment of civil structures. Within the framework, advanced 2D and 3D vision methods, and
3D point cloud processing techniques have been developed to provide more comprehensive
damage evaluation. In addition, the vision-based damage evaluation pipeline is combined with a
3
loss estimation scheme to provide additional evaluation metrics. The main goals and contributions
1) Propose a 3D vision-based damage detection and loss estimation framework for civil
detection and loss evaluation framework which provides a more rapid and comprehensive
solution to evaluate structural damages and the associated loss information, which can be
more easily conveyed to owners, stakeholders, and decision-makers to aid their decision
making.
the research has proposed enhanced 2D vision methods from several aspects as follows.
effects (e.g., background noise and illumination changes). b) It has made efforts in
improving the speed of the local damage evaluation algorithms towards real-time
performance, which provides a firm foundation for future deployments in rapid real-world
applications (e.g., structural component and damage localization). c) It has eliminated the
evaluation of civil structures: Within the framework, the research has developed a 3D
existing vision-based research that is built on 2D computer vision methods. The research
first expands the scope of most existing 2D vision-based damage detection methods from
4
damage recognition and localization, towards more detailed quantification in 3D space.
structural component detection method, and newly proposed point cloud processing
methods to recognize, localize and quantify the structural damages. The effectiveness of
the rapid 2D vision-based assessment methods has been examined on the system level and
been validated on three examples of structure components, where each one of them comes
from three prevalent types including reinforced concrete (RC) structures, steel structures,
and structural bolted components, respectively. Compared to the majority of the existing
4) Bridge the gap between vibration-based SHM and vision-based SHM through vision-
application of computer vision techniques for visual damage evaluation of civil structures.
Further, as an ongoing and future research, the research has discussed an economical and
structural health evaluation using vibration theories. Besides, the outcomes allow the
5
proposed framework in the near future. Once achieved, this will provide even more
This dissertation first proposes a 3D vision-based structural damage detection and loss
estimation framework for civil structures. The development and implementation of the framework
consist of structural damage data collection and preparation, training and validation of the
methods, implementation of the methods on three common types of civil structures, as well as a
case study to illustrate the loss estimation. The dissertation is first dedicated to the development,
optimization and validation of the new 2D vision methods to enhance the existing 2D vision-based
structural damage detection methods. Further, to achieve a more comprehensive structural damage
evaluation, the research has proposed more advanced 3D computer vision-based methods to detect
and quantify structural damages. The methodology consists of vision-based 3D dense point cloud
damage quantification procedures. The 2D vision and 3D vision and deep learning algorithms are
developed and implemented in OpenCV, PyTorch, and MATLAB. The 3D point cloud
reconstruction pipeline is implemented in Meshroom, Metashape and Open3D. The point cloud
The development and application of deep learning-based computer vision algorithms for SHM
requires structural damage data to train, validate and test the computer vision and deep learning
algorithms. The structural damage data in this research were collected from available online
6
datasets (which will be presented in detail in Chapter 4), and experimental tests conducted in the
The collected data are utilized to train the deep learning algorithms to perform classification
and object detection tasks on images of civil structures. Once trained, the algorithms are expected
to identify the structural condition at the system level (e.g., collapse or non-collapse), structural
damage states at the component level (e.g., light damage or severe damage), and localization of
critical structural components (e.g., structural columns, structural bolts) and critical damage
features (e.g., concrete spalling, exposure of steel reinforcement in damaged reinforced concrete
components). Meanwhile, this research also examines 2D and 3D vision-based methods in damage
quantification of three common structures. Further, these trained deep learning and computer
vision algorithms are validated through experimental test specimens in the UBC structural
laboratory, and on the on-site data collected from the web sources.
Once the damage information has been determined, the corresponding repair costs for the
components can be estimated according to the ATC-58 (2007) fragility database and the associated
guidelines. This fragility database is one of the essential products of the ATC-58 project
established by the Applied Technology Council (ATC) in contract with the Federal Emergency
Engineering). The total financial loss of the building can be determined by adding the total repair
quantities of all damaged structural and nonstructural components taking into account their
suitable unit cost distribution. The loss information can be easily conveyed to decision-makers and
7
stakeholders who lack engineering knowledge to perform early repair actions, or for risk
The proposed framework and SHM methods provide guidance to engineers and researchers
these methods will reduce the cost, enhance efficiency and consistency during infrastructure
inspection, and shorten the post-disaster recovery process after the event of natural hazards such
as earthquakes. In addition, it can help alleviate the skilled worker shortage issue in Canada.
This thesis is organized into six chapters with the following content.
motivation, main goals and contribution, objectives and methodology, and the thesis organization.
Chapter 2: Literature review provides a review of the various SHM methods, including
methods for civil structures. The review discusses the advancements and limitations of the existing
SHM methods. More efforts have been made to highlight the limitations of existing vision-based
SHM methods, and demonstrate the need in proposing a more systematic 3D vision-based SHM
and loss estimation framework to provide a more rapid and comprehensive evaluation.
Chapter 3: 3D Vision-based SHM and loss estimation framework for civil structures –
towards more rapid and comprehensive assessment proposes a more comprehensive 3D vision-
based SHM and loss estimation framework. The main objectives of the framework are stated in
8
this chapter. Additionally, the detailed methodology is presented including the system-level
detailed description of the development and application of 2D and 3D vision-based SHM methods
for RC structures, steel structures and structural bolted components. Data collection and
preparation, development and training of the algorithms, experimental validations, as well as real-
world implementation for the proposed vision-based SHM methods are presented in detail. The
main contributions of the proposed methods to the existing methods are highlighted, while the
Chapter 5: Combined vision-based SHM and loss estimation framework presents the
general background of the FEMA P-58 performance-based loss estimation methodology, the
description of the fragility database, and a case study to illustrate the implementation of the
Chapter 6: Conclusions provides a summary of this dissertation. The key contributions and
limitations are presented. Discussions for the ongoing and future work are provided.
9
Chapter 2: Literature review
2.1 Overview
Civil infrastructure deteriorates continuously due to aging problems, human activities, and
Accumulation or growth of these damages may result in catastrophic failure if no remedial action
integrity, provide early warnings, and prevent catastrophic events. Traditional infrastructure
maintenance consists of regular human inspections, which are laborious, and heavily dependent
on the experience and expertise of inspectors. While this procedure is usually unsafe and
inspection approaches.
To address these issues, in the past decades, a wide variety of SHM methods such as vibration-
based and non-destructive testing and evaluation (NDTE)-based methods have been successfully
developed and applied to evaluate structural damages of many types of infrastructure at both the
system level and component level. The vibration-based methods are generally intended to detect
damage patterns by analyzing structural vibration response data (Wu & Jahanshahi, 2018). These
SHM methods have been proven to make the monitoring process more feasible, efficient and
reliable. The NDTE methods rely on special tools to estimate material properties, or to indicate
the presence of other defects or discontinuities. They have been shown to provide effective and
more detailed results in localized applications (Kumar & Mahto, 2013). In addition, research using
terrestrial laser scanners (TLS) has been attempted for the evaluation of different damage types,
10
such as concrete spalling quantification. The results indicate that the high-end TLS can reach a
On the other hand, in recent years, computer vision-based (or vision-based) SHM methods
methods have shown very promising results in detecting damages that are visually observable by
cameras or human beings. This indicates its great potential to significantly improve damage
a more economical and feasible way to perform damage localization and quantification in some
The chapter is organized as follows. It starts with a brief review of common SHM methods,
including vibration-based SHM methods, general non-destructive testing and evaluation (NDTE)-
based SHM methods, and terrestrial laser scanning (TLS)-based SHM methods. Further, the
subsequent sections of the chapter are focused on a review of vision-based SHM methods. This
consists of advancements in computer vision and deep learning in recent years, followed by the
detection. The review highlights the advancements and limitations of these existing vision-based
SHM methods. Further, the chapter is concluded with the need in proposing a more advanced
framework to perform vision-based structural damage detection and loss estimation, leveraging
comprehensive evaluation outcomes for engineers and decision-makers, thus benefiting the
11
2.2 Vibration-based SHM methods
Vibration-based SDD methods are generally applied to civil structures at the global level
where the vibration response of the structures is utilized to understand the global state of the
structures (Peeters, Maeck, & De Roeck, 2001; Li, Deng, & Dai, 2007; Bayissa, Haritos, &
Thelandersson, 2008; Brincker, & Ventura, 2015; Amezquita-Sanchez, & Adeli, 2016). The idea
of vibration-based methods originated from the assessment of train wheels by hammer tapping the
wheels and analyzing the resulting sound (Turner & Pretlove, 1991). With the advancement of
sensing techniques and computational hardware, vibration-based SDD theories have received
enormous developments and applications in both model-driven methods (e.g., Kopsaftopoulos &
Fassois, 2013; Chang, & Kim, 2016; Vamvoudakis-Stefanou, Sakellariou, & Fassois, 2018), and
data-driven methods (e.g., Yuen, & Lam, 2006; Betti, Facchini, & Biagini, 2015; Hakim, Razak,
& Ravanfar, 2015; Ghiasi, Torkzadeh, & Noori, 2016; Abdeljaber, & Avci, 2016; Gulgec, Takáč,
& Pakzad, 2020; Ni, Zhang, & Noori, 2020; Jiang et al., 2021; Sajedi, & Liang, 2021; Eltouny, &
Liang, 2021). The vibration response is typically measured and recorded by a sensor network
deployed at predefined locations. The response will be processed in time, frequency, or modal
domains by advanced software algorithms to be translated to damage indicators which will be used
to recognize the existence of damages, and their locations and severity if damages are identified
(Figure 2.1).
12
Figure 2.1 Vibration-based structural health monitoring setup
NDTE-based methods are widely applied to damage detection with more focus on the local
component level. In general, dedicated sensing techniques are required such as ultrasonic testing
(UT), acoustic emissions (AE), infrared thermography (IT), radiographic testing (RT), magnetic
particle testing (MT), laser testing methods (LM), ground penetrating radar (GPR), and
piezoelectric (PZT) sensing (Ph Papaelias, Roberts, & Davis, 2008; Dwivedi, Vishwakarma, &
Soni, 2018). For example, electro-mechanical impedance-based SDD approaches are widely used
for component-level damage detection of small structural members. Piezoelectric units are
installed on the structural members to be monitored which will act as actuators and sensors
simultaneously. The units are excited at a relatively high frequency. The impedance signals across
the piezoelectric units are recorded and analyzed to assess the damage condition of the adjacent
area of the units. On the other hand, radiographic testing (RT) utilizes the x-ray that is able to
penetrate specimens and generates a radiograph showing any changes in thickness, or defects. If
13
an object has internal voids, more x-rays will pass in the void area and the part beneath that area
will have more exposure than that under the non-void area. This method can be considered to
representation of structures by projecting laser beams onto the surfaces of the structures (Van
Genechten, 2008). The word “laser” is an abbreviation which stands for Light Amplification by
Stimulated Emission of Radiation. A laser typically emits light in a narrow, low-divergence beam
with a well-defined wavelength, and has a large propagation distance. It propagates mostly in a
well-defined direction, at a constant speed in a certain medium. Due to these properties, laser is
well suited for measurement purposes for civil structures. The measurement methods using laser
light can be classified as triangulation-based methods and time-based methods. In the former type
of methods, a laser emitter and a camera are placed at a certain distance away from each other, at
a constant angle facing towards the object to be scanned. The laser emitter is used to project laser
beams onto the surface of an object to create a pattern (e.g., a set of points), which will be captured
by a camera which is placed at a certain distance from the laser emitter. Using the triangulation
principle based on the matching point pairs, the 3D shape of the structure can be constructed. The
triangulation scanners generally have a relatively short range (typically less than 10 meters) but
can reach very high accuracy. On the other hand, the latter type of methods using laser light, the
time-based methods consist of two scanning principles including pulse-based (time-of-flight) and
phase-based. A time-of-flight scanner has a transmitter which sends out lights, a receiver which
14
receives the lights reflecting from the surface of an object to be scanned, and a clocking device
which measures the light travel time. The time-of-flight scanners employ the simple fact that light
travels at a constant velocity in a certain medium. Once the time delay from the light source to the
scanning surface, and back to the source is measured, the distance between the light source to the
scanning surface can be measured. The surface can be constructed accordingly. The measurement
accuracy of time-of-flight scanners is highly dependent on the clocking mechanism, desired time
resolution, the counting rate, etc. The accuracy is also affected by signal strength, noise, time jitter,
and sensitivity of the threshold detector, etc. Phase-based scanners do not rely on high-precision
clocking mechanisms. The scanner modulates a light before projecting it onto an object's surface.
The reflected light wave from the surface is collected by a receiver. The phase difference between
the original wave and the reflected wave can be calculated, which can be used to determine the
distance between the light source to the object's surface. In general, compared to time-of-flight
scanners, phase-based scanners have higher speeds and resolution, but less precision. The accuracy
of the phase-based scanners is affected by signal strength, noise, stability of the modulation
oscillator, etc.
In recent years, research using TLS devices has been conducted to identify structural damages
in 3D space. The scanning results are point cloud coordinates in 3D space. For example, Mizoguchi
et al. (2013) quantifies the scaling damage of concrete bridge pier using a TLS. Kim et al. (2015)
employed a TLS to localize and quantify concrete spalling. Kim et al. (2021) proposed a damage
quantification framework for concrete bridge piers with more complicated shapes by processing
the point cloud obtained using a TLS. The accuracy of the TLS as reported in many of these studies
15
2.5 Vision-based SHM methods
2.5.1 Overview
On the other hand, vision-based methods have been proposed as complementary damage
detection approaches to the aforementioned approaches. Vision-based methods have been proven
to be very effective in detecting structural surface damages that can be captured by modern
(UAVs) and unmanned ground vehicles (UGVs) to facilitate more efficient data
collection.
ease on a wide range of open-source platforms, thanks to the open-source nature of the
• A computational unit typically equipped with a dedicated GPU for efficient parallel
significantly from the rapid developments of graphics, gaming, computer and mobile
device industry, leading to a fairly low cost when processing image data of structural
damages. For example, many newly developed lightweight vision algorithms (e.g.,
smartphones.
16
From the algorithm’s perspective, vision-based SHM methods can be classified into non-deep-
learning (DL)-based methods and DL-based methods. These methods will be discussed as follows.
Prior to the year of 2012, most computer vision methods developed for SDD were based on
traditional image processing techniques (IPT) such as color thresholding, image filtering, template
histogram transform, and texture pattern recognition (Jahanshahi et al., 2009; Koch et al., 2014;
Koch et al., 2015). For example, Sinha et al. (2003) investigates different types of filtering
pipeline defects such as cracks and holes. It was concluded that the background lighting condition
and the complicated texture on the pipe surface impose a big challenge for the damage detection
process. Choi and Kim (2005) used color, texture, and shape to characterize corrosion images and
classify the images into six categories (non-corroded specimen, crevice corrosion, intergranular,
pitting, fretting, and uniform corrosion. Although the accuracy was reported at ~85%, the
algorithms were validated on a relatively small dataset. German, Brilakis, & DesRoches (2012)
Further, the authors employed a connected image pixel labelling algorithm, a global adaptive
thresholding algorithm, and a template matching algorithm, to estimate the concrete spalling length
and depth. The spalling localization accuracy was reported ~80%, while the average error of length
17
2.5.3 DL-based vision methods for SHM
The transition from non-DL-based methods to DL-based methods occurred when the
breakthrough was achieved by AlexNet (Krizhevsky, Sutskever, and Hinton, 2012), which is built
upon convolutional neural networks (CNNs, a type of artificial neural networks) and has shown
substantial accuracy increase and robustness in image classification, object detection, and semantic
The development of artificial neural networks can be generally divided into 3 phases. The first
phase can be dated back to the 1940s–1960s, when the theories of biological learning (McCulloch
and Pitts, 1943) and the first artificial neural network such as Perceptron (Rosenblatt, 1958) were
implemented. The second period happened during the 1980–1995 period, when the back-
propagation technique (Rumelhart, Hinton, & Williams, 1986) was developed to train a neural
network with one or two hidden layers. During the 1990s, the artificial neural network evolved
into deep neural networks (DNNs), where multiple layers can be trained through the back-
propagation algorithm. One such application was the work done by LeCun, Bottou, Bengio, &
Haffner (1998) for document recognition. The third phase of neural networks (also named deep
learning) began with the breakthrough in 2006 when Hinton, Osindero, and Teh (2006)
demonstrated that a so-called deep belief network could be efficiently trained using a greedy layer-
wise pretraining strategy. With the fast-growing and optimization of the deep learning algorithms,
the increasing size of training data, as well as enhanced computational power, Convolutional
Neural Network (CNN, or ConvNet), which is a class of DNNs, or deep learning (DL), has been
18
advancing rapidly. Unlike traditional neural networks that utilize multiple fully-connected (FC)
layers, the hidden layers of a CNN typically include a series of convolutional layers that convolve
with multiplication or other dot product through learnable filters. In recent years, CNNs have
dominated the fields of computer vision, speech recognition, and natural language processing.
Within the field of computer vision, CNN such as the AlexNet, developed by Krizhevsky,
Sutskever, and Hinton (2012), has shown a substantial increase in accuracy and efficiency than
any other previous algorithms. With the success of AlexNet, CNNs have been successfully applied
in computer vision for classification, object detection, semantic segmentation, and visual object
tracking. In addition to AlexNet, other deeper CNN networks such as VGG Net (Simonyan &
Zisserman, 2014), Google Net (Szegedy et al., 2015), Deep Residual Net (He, Zhang, Ren, & Sun,
2016), DenseNet (Huang, Liu, Van Der Maaten, & Weinberger, 2017) and MobileNet (Sandler et
2.5.3.2 Developments and applications of DL-based vision methods for SHM of civil
structures
Ever since the great achievement by Alexnet, CNN-based vision methods have been widely
developed and applied in damage identification and localization of various types of infrastructure.
CNN-based vision methods have been proven effective for structural damage classification. These
include metal surface defects detection (Soukup & Huber-Mörk, 2014), post-disaster collapse
classification (Yeum, Dyke, Ramirez, & Benes, 2016), joint damage detection through a one-
dimensional CNN (Abdeljaber, Avci, Kiranyaz, Gabbouj, & Inman, 2017), concrete crack
detection using a sliding window technique (Cha, Choi, & Büyüköztürk, 2017), pavement crack
19
detection (Zhang et al., 2017; Vetrivel et al., 2018), damage detection of masonry structures (Wang
et al., 2018; Wang et al., 2019), structural damage classification with the proposal of Structural
ImageNet (Gao & Mosalam, 2018), regional post-disaster damage assessment based on time-
frequency distribution plots of ground motions (Lu et al., 2021), and post-hurricane preliminary
damage assessment of buildings using aerial imagery (Cheng, Behzadan, & Noshadravan, 2021).
In addition to classification, CNNs can be used in the field of object detection which involves
classification and localization of an object (e.g., damage area). Prior to the use of CNNs, object
detection was dominated by the use of histogram of oriented gradients (HOG) (Dalal, & Triggs,
2005) and scale-invariant feature transform (SIFT) (Lowe, 2004). In 2014, Girshick, Donahue,
Darrell and Malik (2014) proposed the region-based CNNs (R-CNNs), which utilizes the Region
Proposal Function (RPF) in order to localize and segment objects. It significantly improved the
global performance compared to the previous best result on PASCAL Visual Object Classes
(VOC) challenge 2012. The PASCAL Visual VOC challenge ran each year from 2005 to 2012,
which provides a benchmark in visual object category recognition and detection with a standard
dataset of images and annotation, and standard evaluation procedures. Discussion of object
detection proposal methods can be found in Hosang, Benenson, and Schiele (2014), Hosang,
Benenson, Dollár, and Schiele (2015). Further, the Fast R-CNN (Girshick et al., 2015) and the
Faster R-CNN (Ren, He, Girshick, & Sun, 2017) were developed to improve the speed and
accuracy of the R-CNN. Region-based CNN methods (e.g. RCNN, Fast-RCNN, and Faster-
RCNN) have been successfully implemented in civil engineering applications. In recent years, the
development and application the vision-based detectors have been well demonstrated for visual
damage localization of different types of structural systems (Spencer Jr, Hoskere, & Narazaki,
20
2019), such as reinforced concrete (RC) structures (e.g., Li et al., 2018; Liang et al., 2019; Zhang,
Chang & Jamshidi, 2020; Zhang & Yuen, 2021; Deng, Lu, & Lee, 2020; Liu et al., 2020; Chun,
Izumi, & Yamane, 2021; Miao et al., 2021; Maeda et al., 2021; Jiang et al., 2021), masonry
structures (e.g., Wang et al., 2018; Wang et al., 2019), steel structures (e.g., Yeum & Dyke, 2015;
Yun et al., 2017; Kong & Li, 2018; Xu, Gui, & Han, 2020; Luo et al., 2021; Pan & Yang, 2022),
roads and pavements (e.g., Tong et al., 2020), tunnels (e.g., Xue & Li, 2018), or multiple damage
detection of different types of structures (Cha, Choi, Suh, Mahmoudkhani & Büyüköztürk, 2018).
While the vibration-based and NDTE-based methods offer great advantages in their specific
methods require careful design of sensor placements, where numerical modelling of the target
structure is typically required to identify suitable sensor locations. Although this can be practically
achieved, there exist uncertainties in modelling assumptions made. Besides, the process requires
many efforts and becomes more difficult for structures with irregular shapes. b) Although NDTE-
based methods provide relatively detailed results at the local component level, generalization of
these detailed evaluation methods at the global infrastructure level becomes practically very
challenging and expensive. c) Both vibration-based and NDTE-based methods require relatively
complicated setup protocols and dedicated data processing software. This demands more
21
2.6.2 Limitations of TLS-based methods
The accuracy of the TLS as reported in many studies generally ranges from 5 to 15mm, and
is suitable for applications where an error of 5-15mm in quantifying local damage areas does not
greatly affect the global health inspection of the entire structures. Although the results are
promising in those studies, the TLS devices may not be accurate enough to quantify structural
damages at a relatively small magnitude such as crack width of less than 3 mm, or the out-of-plane
displacements of steel plate structures due to buckling less than 10 mm. Besides, these TLS are
generally expensive and may not be readily available to many researchers and engineers. Although
the low-end and mid-tier laser scanners cost less, they typically have much lower detection range
and accuracy. Moreover, these TLS devices can only be operated by qualified and well-trained
SHM methods. In this section, the main limitations of existing vision-based SHM methods are
preprocessing of the images such that the background information is sufficiently removed
to ensure the algorithm is applied to a relatively small region of interest. More importantly,
all the non-DL methods are not robust against background noises. The evaluation outcomes
22
are sensitive to the image noise, image quality and lighting conditions. Hence, the real-
• The current trend of computer vision developments and applications is strongly focused on
due to their high accuracy and robustness in real-world applications. However, the speed
of the CNN-based vision algorithms (i.e., DL-based methods) achieved in many damage
detection applications is not sufficiently high, which may hamper their deployment as rapid
• A vast majority of existing vision methods (i.e., both non-DL-based and DL-based
methods) were developed in the area of 2D computer vision, which assumes the camera to
be placed at appropriate locations and poses to obtain informative photos or videos. The
evaluation outcomes are relatively sensitive to the locations and poses where the photos or
structures. For example, considering a structural component is damaged on one side while
undamaged on the other side, if only the photos of the undamaged side are collected, there
will be no damage reported for this structural component, leading to a false evaluation.
• Most of these vision methods were developed for qualitative structural state identification
• Some 2D vision methods show promising results in quantifying in-plane damage extent
(e.g., concrete crack width, concrete spalling area, bolt rotation). However, they are
23
incapable of quantifying out-of-plane damage features such as concrete spalling volume or
spalling depth, and out-of-plane structural deformations. This is due to the limitation that
• Lastly, the vision-based methods provide structural damages as output, such as concrete
cracks, concrete spalling and steel corrosion, of specific structural components. Damage
structural residual stiffness and capacity. However, local damage information of specific
structural components may not be useful enough, and can be difficult to understand for
owners or decision-makers who likely lack engineering knowledge, but instead pay more
attention to the global condition of the structures, cost to repair the damaged structures or
situations, it is necessary to convert such damage information to other metrics (e.g., repair
To address these limitations and challenges, this dissertation proposed a more comprehensive
proposed to identify, localize, and quantify structural damages in 3D space. At the time of writing,
there exist almost no attempts at 3D vision methods in structural damage evaluation, and their
developments and applications are currently in the infancy stage (Bao, & Li, 2021). Within the
24
investigated. The 2D vision research is intended to optimize the speed of the existing 2D vision
algorithms for real-time local applications, while still maintaining high accuracy compared to the
existing methods. The 3D vision research is aimed at providing more detailed damage
quantification of structural components. The last part of the research is intended to combine the
damage evaluation pipeline with the performance quantification procedures to provide additional
25
Chapter 3: 3D vision-based SHM and loss estimation framework for civil
3.1 Overview
In this chapter, a 3D vision-based SHM and loss estimation framework is proposed, which
aims to provide a more rapid and comprehensive performance assessment of civil structures.
Within the framework, the vision-based damage detection pipeline consists of system-level failure
structures. Further, the damage detection pipeline is combined with loss quantification procedures
to provide additional metrics to aid decision-making. The methodology presented in this chapter
will be implemented on three prevalent types of civil structures, as will be detailed in Chapter 4.
3.2 Introduction
A rapid vision-based SHM and loss estimation framework is proposed in this study to identify
damages and quantify the associated loss of civil structures (Figure 3.1). First, system-level and
component-level images for the structures are collected, which can be achieved by manual site
inspection, unmanned aerial vehicles (UAVs), or preinstalled cameras. The system-level images
are assessed by system-level classification CNNs to confirm if the structure has collapsed. If the
system-level collapse is identified, the replacement loss of the structure should be estimated
considering various factors such as the current market value of the structure, cost of demolition,
and potential cost inflation due to disruptions in the entire local supply chain in a natural disaster.
If the building is identified as non-collapse, the component-level images are input into component-
level classification and localization CNNs. Once the component damage states are identified, the
26
corresponding repair costs for the components were identified and the total loss of the structure
can be computed accordingly through the PBEE methodology and FEMA P-58 fragility database.
The dissertation is focused on the development and application of CNN-based vision methods to
identify structural damage states at the system level and component level. The loss estimation is
achieved by adopting existing well-established literature from FEMA P-58 Seismic Performance
Assessment of Buildings, Methodology, and Implementation. The total loss is the general decision
variable, which can be defined as repair cost and repair time, etc. Within the context, the
framework proposed in this dissertation makes the first attempt to integrate the proposed CNN-
based computer vision methods with the PBEE methodology to facilitate loss evaluation of civil
structures.
This section briefly discusses possible image data collection methods. The methods of data
collection can be determined based on the application scenarios of structural damage detection,
which is typically classified into long-term continuous structural health monitoring, scheduled
periodic site inspections, and post-disaster site inspections, etc. In the long-term monitoring
scenario, preinstalled sensing devices such as RGB or thermal cameras, and other structural
vibration measurement sensors can be considered. In the periodic and post-disaster site inspection
these methods can be considered. In these two scenarios, although manual inspection is a common
practice at the time of writing, the deployment of UAV fleet and UGVs on sites is very likely to
be a future trend for autonomous, efficient and safe data collection, especially for applications at
large scale (Ham et al., 2016). In the post-disaster scenario, rapid data collection and analysis right
after the disaster is particularly important to provide valuable inputs to decision-makers to make
informed risk management decisions. The current manual inspection practices are time-
consuming, inherently based, highly dependent on the proper training of the inspectors, and impose
a threat to the life safety of the inspectors. Therefore, it is of prime interest to researchers to develop
more intelligent and efficient autonomous data collection methods for next-generation vision-
based structural damage detection. It should be noted that this dissertation is not focused on the
development of autonomous data collection methods. However, for completeness, a more detailed
discussion about the use of advanced robotic technologies (e.g., UAVs and UGVs) for image data
28
During data collection, to facilitate a more accurate and comprehensive structural damage
assessment, images or videos of the structural systems and components should be taken from
multiple views at multiple locations. This is because first, civil structures generally have a large
scale where structural components are constructed at different spatial locations, and second, some
portions of structural components may be damaged while the remaining portions may stay intact.
Civil structures continuously deteriorate. Structural collapse can occur if the structures are not
properly maintained, and consequently, the accumulation of local damages may cause a global
failure of the structure. On the other hand, in extreme natural events such as major earthquakes,
structures can experience significant nature forces exceeding their designed resisting capacity. In
this case, structural collapse may occur within a very short period without any prewarning.
engineers and rescue workers. On one hand, structural collapse greatly contributes to the total loss
particular needs immediate attention during the post-disaster recovery process, and repair or
reconstruction actions need to be applied promptly. On the other hand, chances are survivors may
be hidden or stuck under the debris due to structural collapse. Rapid identification of structural
collapse in a large region can greatly help deploy rescue workers efficiently to the key locations
29
Figure 3.2 Building collapse in Wenchuan earthquake, 2008
Therefore, the framework incorporates the identification of structural collapse in the first
place. As structural collapse exists in a wide variety of ways, traditional computer vision methods
which are built upon hand-crafted features do not work well. This research adopted the advanced
CNN architecture to build the classification vision methods for collapse recognition. The structure
is classified by CNN-based vision algorithms into collapse or non-collapse (Figure 3.3). If the
structure is identified as collapse, the total loss of the structure is taken as the output. In this case,
no further evaluation of structural components is required. If the structure does not experience
collapse, the damages to critical structural components (e.g., columns) will be evaluated.
30
Figure 3.3 Vision-based collapse identification
It should be noted that the above classification procedures by the CNN model only conduct a
rapid assessment at the system level to identify whether collapse has occurred visually. In some
cases, a building may not collapse after an earthquake, but can have unacceptable residual drift
(e.g., exceeding the code drift limit, or not expected to achieve satisfactory performance even after
repair). The owners may decide to demolish the building. Therefore, the building is still considered
the same as “collapse” and the replacement cost should be evaluated. The residual drift may be
obtained using distance measurement tools during the post-disaster inspection, or using sensors
31
preinstalled on each floor of the building. In brief, while the system-level CNN model can provide
a rapid assessment, the CNN model and the drift measurement should both be considered for more
If the structures are identified as non-collapse, component-level damage state recognition will
evaluation codes. One of the noticeable structural performance evaluation guidelines, FEMA P-58
(ATC-58, 2007), can be considered. The guideline provides the ATC-58 database where damage
states of structural and non-structural components are defined. For example, the definition of
damage states for reinforced concrete beam-column joints is shown in Table 3-1. Figure 3.4
presents an example of damage state recognition results for structural columns. If the structural
components are identified as having no damage, there will be zero associated loss for these
localization and quantification will be performed as shown in the subsequent sections. Such
evaluation procedures have been practically accepted and implemented manually for many years
as demonstrated by Nakano, Maeda, Kuramoto, & Murakami, M. (2004) and Maeda, Matsukawa,
DS index Description
0 No damage
32
1 Light damage: visible narrow cracks and/or very limited
spalling of concrete
2 Moderate damage: cracks, large area of spalling
concrete cover without exposure of steel bars
3 Severe Damage: crushing of core concrete, and or
exposed reinforcement buckling or fracture
However, a single damage classification network may not perform well in the situation of multiple
complex damage patterns present on the structural components at the same time. In this case, a
33
second network dedicated to identifying local damage features can be additionally implemented
to enhance the accuracy of damage state recognition. Details of an example of such cases will be
demonstrated in Section 4.2. Besides, identifying the locations of damage within the structural
components is important to understand the residual performance of the structures. Figure 3.5,
Figure 3.6, and Figure 3.7 present examples of damage localization for reinforced concrete
columns including concrete cracks, steel reinforcement exposure, and concrete spalling,
respectively.
34
Figure 3.6 Vision-based damage localization: steel reinforcement exposure localization
3.7.1 Overview
While the damage state classification and damage feature localization methods provide the
preliminary information on damage severity and location, they do not offer detailed damage
quantifications. Quantification of damages, such as the concrete spalling volume or spalling depth
35
of concrete structures, provides important metrics in damage inspection scenarios (Beckman,
Polyzois, & Cha, 2019; Kim, Yoon, Hong, & Sim, 2021). In such cases, as described in Section
2.5, existing 2D vision-based methods cannot determine such metrics. Most of the existing vision
methods were developed for qualitative structural state identification (e.g., building collapse) and
space in many situations. Hence, 3D computer vision methods are considered in this research.
which consists of vision-based 3D reconstruction to generate dense 3D point cloud data (i.e., RGB
+ depth) from 2D images, multi-view CNN-based 3D object detection, and point cloud processing
for damage quantification, as shown in Figure 3.8. Image sources for 3D reconstruction can be
obtained using common consumer-grade cameras such as smartphone cameras, unmanned aerial
vehicles (UAVs) or unmanned ground vehicles (UGVs) with appropriate camera specs. These
procedures are described in the following sections. More details of the damage quantification
procedures with respect to specific structural components investigated in this research are
presented in Chapter 4.
It should be noted that in many situations, 3D reconstruction of the entire structural system
can be very computationally expensive and is generally not needed. In this dissertation, 2D vision-
based methods are first used to identify system-level structural collapse. If collapse is identified,
methods are first applied to recognize and localize the damaged area to provide qualitative damage
36
measures. Further, 3D reconstruction is only applied to the damaged structural components
identified, or the damaged area localized. In some situations, depending on the application
scenarios and the requirements of the damage evaluation outcomes, 3D reconstruction may not be
reconstruction is not needed because the column should be replaced, rather than repaired. On the
other hand, if a concrete column has light cracks and concrete spalling, quantification of crack
width and spalling volume will be useful to assess its residual performance, guide the repair action,
In this section, 3D reconstruction procedures are described. As shown in Figure 3.9, in general,
37
camera pose estimation using the geometrically validated image pairs, structure-from-motion
which determines sparse point cloud from the image scene graph, multi-view stereo which
generates dense point cloud based on the sparse point cloud and the input RGB images, and dense
point cloud preprocessing which cleans the reconstructed dense cloud using outlier removal
methods.
38
Figure 3.9 Vision-based 3D reconstruction procedures of an RC column
39
3.7.2.1 Data association
First, a sequence of image frames was collected to form an unstructured image database. It
should be noted that the resolution of the images should be maintained reasonably high (e.g.,
1080p, 4K, or higher) for better reconstruction accuracy. Second, the feature points extraction
algorithm, scale-invariant feature transform (SIFT) proposed by Lowe (2004), is applied to all the
images to detect feature points. Next, the image correspondence is established between image pairs
that share similar features. This can be achieved using the naïve search to find the most similar
features in one image to another image, and iterate through the entire database. However, this is
very computationally expensive. In recent years, many efficient algorithms have been proposed to
find the image correspondence as summarized by Schonberger & Frahm (2016). In this research,
the vocabulary tree (VocTree) algorithm (Nister & Stewenius, 2006), is selected for this purpose.
Finally, the output of the above steps is a scene graph, which contains many image pairs within
the database.
3.7.2.2 Structure-from-motion
Once the image correspondence is completed, feature matching is conducted to find the closest
matched features in the corresponding image pairs using the KD-tree-based approach (Muja &
Lowe, 2009). The camera matrix (i.e., camera projection matrices such as the fundamental matrix
for the uncalibrated camera, or essential matrix for the calibrated camera) for the image pair can
be estimated using epipolar geometry (Hartley & Zisserman, 2003). The corresponding point pairs
40
𝒙𝑇𝑖 𝑭𝒙𝑗 = 0, 𝑤ℎ𝑒𝑟𝑒 𝒙𝑖 ∈ 𝑃𝑖 𝑎𝑛𝑑 𝒙𝑗 ∈ 𝑃𝑗
(3.1)
𝑭 = 𝑲𝒊−𝑻 [𝒕]× 𝑹𝑲−𝑻 𝒋
𝑃𝑖 = (𝑥𝑎 , 𝑦𝑎 )|𝑎 = 1,2, … , 𝑁}
{
{ 𝑃𝑗 = {(𝑥𝑏 , 𝑦𝑏 )|𝑏 = 1,2, … , 𝑀}
where 𝒙𝒊 refers to the coordinates of a point from the feature set, 𝑃𝑖 , identified in the 𝑖𝑡ℎ image,
and 𝒙𝒋 is the corresponding point from the feature set, 𝑃𝑗 , identified in the 𝑗𝑡ℎ image. The camera
matrix is denoted as 𝑭, which is to be estimated. 𝑲𝒊 and 𝑲𝒋 are the camera intrinsic matrices. The
parameter 𝑹 and 𝒕 define the relative rotation and translation, respectively, between the two views,
where [𝒕]× is the anti-symmetric matrix of 𝒕. During this process, if a camera matrix is found that
maps a sufficient number of feature points from one image to the other, these points will be
(Fischler & Bolles, 1981) is required to reduce the effects of outliers. The image that is determined
to have insufficient shared features with any other images will be considered an outlier and
excluded from the computations that follow. The estimated camera matrices are used to generate
a sparse point cloud of the scene using the triangulation methods developed by Hartley, Gupta, &
Chang (1992), or Hartley & Sturm (1997), or more recently by Kang, Wu, & Yang (2014). Finally,
the bundle adjustment (BA) algorithm (Triggs et al., 1999) is applied to adjust the camera
parameters and points estimated previously such that the reprojection error is minimized, to further
41
3.7.2.3 Multi-view stereo
The estimated sparse point cloud and the images from the scene graph are used to generate a
dense 3D point cloud of the scene. This step is aimed to approximate the depth value for every
single pixel of the images, using the semi-global matching (SGM) method (Hirschmuller, 2007).
During this process, it is recommended to use CUDA-enabled GPU to speed up the computation.
Prior to any point cloud post-processing steps, it is important to denoise the original point
cloud data. In this study, statistical outlier removal (Carrilho, Galo, & Santos, 2018) and radius
outlier removal methods are implemented on the reconstructed dense cloud. The statistical outlier
removal assumes that the distance between a point and its neighboring points is normally
distributed. The algorithm checks for every point in the point cloud, and calculates the mean
distance between the point and its K nearest neighbors. The points will be considered outliers if
they are not within N standard deviations from the mean. The value of K and N are manually
specified, typically chosen as 20 and 2, respectively. The radius outlier removal method eliminates
points that have a small number of neighboring points in a sphere with a given radius around them.
The procedures are performed in Open3D (Python API) which is an open-source library for 3D
data processing. It should be noted that in the situation of highly noisy dense clouds (which may
be attributed to the source images collected in poor lighting conditions, insufficient image
resolution, or low-configuration cameras), additional manual point cloud cleaning can also be
considered.
42
3.7.3 Multi-view structural components localization
The proposed 3D vision reconstruction method generates the 3D coordinates as well as the
color information for the point cloud. Therefore, it can be efficiently processed using image
(Martin, Stefan, & Karl, 2018) and PointPillar (Lang et al., 2019), are designed to process
organized LiDAR point clouds only. Hence, they cannot be directly applied to the unorganized
detect structural components from an unorganized 3D scene cloud. This method builds upon a
CNN-based 2D object detector. Depending on the application criteria, The CNN-based detector
will be trained to detect various types of civil structural components with available training images.
Depending on the selection of the CNN-based object detector, the above procedures can achieve
one advantage of CNN-based object detection algorithms is their robustness in localizing objects
of interest in relatively complicated external environments, where many irrelevant objects and
background noise are present. In order to extract the object of interest from the 3D scene, a
minimum of two camera viewpoints are required to remove background objects sufficiently, as
illustrated in Figure 3.10. In this research, for convenience, the point cloud of the 3D scene is
automatically rendered onto the XZ and YZ plane view, which will then be processed by the CNN-
based object detector to localize the components. The generated bounding boxes on the two view
43
planes will be used to crop out the point cloud of the steel component from the original 3D scene
cloud.
It should be noted that the above object detection procedures to extract the structural
components are very important before further dense point cloud postprocessing. This is because,
a) processing of the entire original cloud is very inefficient and computationally demanding, and
b) once the structural components are extracted, algorithms can be strategically designed to process
the point cloud of structural components only, without dealing with any other irrelevant
surrounding objects. In fact, without removing the surrounding irrelevant objects, many standard
dense point cloud processing algorithms, such as line fitting and plane fitting, cannot be effectively
44
Figure 3.10 Multi-view structural component localization in a 3D point cloud
45
3.7.4 Dense point cloud postprocessing
Once the structural components are localized, the damage regions of the components should
be identified and quantified. Various types of point-cloud processing algorithms such as line/arc
fitting, plane segmentation, can be considered for different types of structures such as steel or
concrete components. For example, Figure 3.11 shows a spalling quantification approach applied
to a reinforced concrete column. This can be achieved by plane fitting and point projection
methods. Detailed algorithmic implementation of this will be presented in Chapter 4. In the end,
critical damage quantities, such as total concrete spalling volume and location, as well as the steel
It should be noted that damage quantification methods vary when processing different types
of structures. This research investigates three examples of structural components. Each one comes
from three prevalent structural types including reinforced concrete structures, steel structures, and
structural bolted devices. Details of the development and experiments of damage quantifications
for the structural components investigated in this research are presented in Chapter 4.
46
Figure 3.11 Point cloud processing: concrete spalling quantification
Structural damage evaluation methods presented in Section 3.4-3.7 provide the outcomes of
damage recognition, localization and quantification. In some situations, such outcomes are
sufficient for engineers and researchers to assess the damage condition and residual capacity of
the structure. However, as stated in Section 2.6, local damage information of specific structural
components may not be useful enough and can be difficult to understand for owners or decision-
makers who are likely to lack of engineering knowledge, but instead pay more attention to the
global condition of the structures, and cost to repair the structures if damaged. In such situations,
47
it will be extra useful to convert such damage information to other metrics (e.g., repair cost) that
are easier to be interpreted by owners, stakeholders, and decision-makers to make more informed
decisions.
quantification of the financial metric associated with damage outcomes. For this purpose, the 3D
vision-based damage evaluation methodology presented previously is combined with the loss (e.g.,
repair cost) quantification methodology. Notably, one of the loss quantification methodologies is
essential product of the PBEE methodology is the development of the fragility database and its
implementation guidelines. The fragility database comes from the ATC-58 project established by
the Applied Technology Council in contracts with the Federal Emergency Management Agency
and Implementation. Traditionally, the loss quantification step within the PBEE framework is
based on damage states estimated from structural response quantities (e.g., floor displacements,
framework for cost evaluation of buildings have been widely attempted (Goulet et al., 2007; Yang,
Moehle, Stojadinovic, & Der Kiureghian, 2009; Mitrani-Resier, Wu, & Beck, 2016).
In this research, rather than estimating damages from response quantities, the damage states
documented from Section 3.4 to Section 3.7. Once damage states are determined, the loss
quantification is adopted from the PBEE procedures. The total loss of the structures can be
48
calculated using the Monte Carlo simulations with the repair cost/time functions from the PBEE
database as follows.
where 𝐷𝑀 stands for damage measures. The parameter, 𝐷𝑉, stands for decision variables such as
repair cost or repair time. The result, 𝜆(𝐷𝑉) is typically represented as a cumulative loss
distribution curve (i.e., a summary of the probability of repair cost exceeding a certain value),
which can be easily conveyed to owners or decision makers. To illustrate the loss quantification
49
Chapter 4: Development and application of vision-based SHM methods
4.1 Overview
structural damage detection methods for civil structures. Therefore, this chapter presents the most
significant portion of the dissertation. The chapter consists of detailed studies on damage detection
of three examples of structural components, where each example comes from three common and
widely used structural categories including RC structures, steel structures and structural bolted
connections, respectively. Section 4.1 depicts the general workflow of the CNN-based 3D vision
methods, from damage recognition to localization and quantification. Section 4.2 is focused on the
damage evaluation of structural RC columns, where the majority of the findings have been
published in Pan & Yang (2020). Section 4.3 proposes an out-of-plane damage evaluation pipeline
of steel plate structures, where the research outcomes have been reported in Pan & Yang (2022).
Section 4.4 investigates vision methods for bolt loosening evaluation of a structural bolted
component, where part of the research outcomes has been published in Pan & Yang (2021).
Comparative studies and parameter studies have been conducted to demonstrate the
effectiveness and efficiency of the proposed damage detection methods over the existing methods.
The results in this chapter indicate that the proposed 3D vision and CNN-based methodology can
achieve high accuracy and efficiency, and more comprehensive evaluation, compared to many of
the existing methods. This provides a more solid foundation when these damage detection methods
are combined with the PBEE-based loss estimation scheme as will be shown in Chapter 5.
50
4.2 Vision-based SHM methods for RC structures
4.2.1 Introduction
Reinforced concrete (RC) structures are one of the most prevalent structural systems
constructed worldwide. With many of these structures built in high seismic zones, the performance
of RC buildings after strong earthquake shaking is becoming a significant concern for many
building owners. When an earthquake happens, decision-makers such as city planners, and
the damaged infrastructure. This requires rapid performance assessments of the facilities.
Traditional post-earthquake inspections were performed manually, and the results may be biased
and highly reliant on the proper training of the inspectors and qualitative engineering judgments.
The processing time may also be very long, due to the large amount of data processing required.
These deficiencies can be overcome if the current manual evaluation processes are fully automated
(Zhu & Brilakis, 2010). On the other hand, although conventional image processing techniques
(IPTs) have been applied in the past, these methods are relatively time-consuming and not robust
In recent years, CNN-based vision methods have been investigated for damage detection of
reinforced concrete structures, such as concrete crack detection using a sliding window technique
(Cha, Choi, & Büyüköztürk, 2017), structural damage classification of reinforced concrete
structures aided by transfer learning (Gao & Mosalam, 2018), structural multiple damage
localization using Faster-RCNN (Cha, Choi, Suh, Mahmoudkhani & Büyüköztürk, 2018), near-
real-time concrete defect detection with geolocalization using a unified vision-based methodology
(Li, Yuan, Zhang, & Yuan, 2018), and RC bridge column recognition and spalling localization
51
using deep learning with Bayesian optimization (Liang, 2019), concrete crack detection using
robotic technologies (Liu et al., 2020; Jiang & Zhang, 2020), and crack growth quantification using
While many of these CNN-based methods can provide reasonable accuracy, they are still
relatively slow in terms of achieving real-time practical application when the images were recorded
with high frames per second (FPS). To address this deficiency, Redmon, Divvala, Girshick, &
Farhadi, (2016) presented YOLO (i.e. You Only Look Once) for real-time object detection. While
YOLO is extremely fast, it makes more localization errors and achieved relatively low recall
compared to region-based CNN methods. To further improve recall and localization accuracy,
Redmon & Farhadi (2017) developed the YOLOv2 algorithm. They have shown that YOLOv2
significantly improves the recall and localization accuracy while still maintaining the speed to be
CNN-based classification for civil engineering applications has been hampered by limited
training data (Gao & Mosalam, 2018). In general, a single classification model can provide
reasonable accuracy if the training data covers a wide range of hidden features. However, even
when the size of the training data is sufficiently large, the classification model may still not perform
well if the training data is not properly pre-processed to identify the localized damage (Gao &
Mosalam, 2018). For example, an image may contain multiple damage states where a portion of
the structure has fractured, while the other part of the structure remains undamaged.
At the time of the publication of Pan & Yang (2020), although region-based CNN methods
have been widely applied in civil engineering, it remained almost little to no attempt at regression-
based detection methods such as YOLOv2, for structural damage detection. Moreover, existing
52
developments and applications of 3D vision methods remain very limited in structural damage
detection and are currently in the infancy stage (Bao, & Li, 2021). For example, Xiong et al. (2015)
algorithm was validated on both LiDAR point clouds and photogrammetric point clouds. Hu et al.
layouts of long-span steel bridges. These studies were focused on the investigation of algorithm
of 3D vision methods remain almost none in structural damage quantification and are currently in
To address the limitations aforementioned, this paper proposed a hierarchical approach which
evaluates the damage of RC structures at both the system level and component level. At the system
CNN-based classification is then implemented to identify the damage state of RC components (i.e.,
RC columns in this study). To enhance the accuracy in recognizing the damage state of the RC
column, a second CNN is added to focus on detecting steel reinforcement exposure. Steel
reinforcement exposure is a critical damage feature of the most severe damage state that imposes
a big life safety threat and requires high repair costs. Furthermore, 3D vision-based damage
quantification procedures are proposed for estimating concrete spalling volume and steel
longitudinal reinforcement exposure length. Therefore, the main contributions of the research are
summarized as follows: (a) it established component training data that follow the codified damage
state classification of the RC columns; (b) it effectively examined the applicability of the real-time
detector, YOLOv2, to identify the critical damage feature of RC columns; (c) proposed and
53
successfully implemented the dual CNN methods which incorporate the classification network and
YOLOv2 object detection network to improve the accuracy achieved by a single classification
the concept can also be generalized to other RC components such as RC beams and walls.
4.2.2 Methodology
4.2.2.1 Overview
and component-level evaluation. Section 4.2.2.2 presents the CNN-based classification methods
for system-level collapse identification and component-level damage state recognition. This step
provides a preliminary rapid damage assessment of the RC structures, which can be considered if
efficiency is the priority for decision-making, such as in post-disaster scenarios. Section 4.2.2.3
presents the CNN-based object detection algorithm which intends to localize critical damage
features of RC structures. Section 4.2.2.4 demonstrates the use of classification and object
detection results improves the reliability and accuracy in determining structural damage states.
Section 4.2.2.5 presents detailed 3D vision-based damage quantification procedures for two
In this research, CNNs were used to identify the damage states of the building systems and
components. Typical CNNs involve multiple types of layers, including Convolution (Conv) layers,
Rectified Linear Unit (ReLU) layers, Pooling layers, Fully-connected (FC) layers and Loss layer
54
(e.g. Softmax layer). The Conv layer combined with the subsequent layer, ReLU, constitute the
essential computational block for CNNs. This is the feature that distinguishes CNNs from the
traditional fully connected deep learning network. One of the advantages of CNNs is that it
drastically improves the computational efficiency of the traditional neural network because the
number of training parameters enclosed in the filter of CNNs is significantly less than the number
of weights utilized by fully connected layers which are the only layers presented in the traditional
feed-forward neural network. Besides, CNNs preserve the spatial locality of pixel dependencies
and enforce the learnable filters to achieve the strongest response to a local input pattern.
During the forward pass, the output from the previous layer is convolved with each one of the
learnable filters, which yields a stack of two-dimensional arrays. Applying the desired nonlinear
activation function (such as ReLU) to these two-dimensional arrays leads to a volume of two-
dimensional activation maps. After a single or multiple Conv-ReLU blocks, the pooling layer is
introduced which is a form of non-linear down-sampling. The objective of the pooling layer is to
reduce the number of parameters to improve the computation efficiency. During the pooling
process, the input image is partitioned into sub-regions which may or may not overlap with each
other. If max pooling is used, the maximum value of each sub-region is taken.
Following several Conv-ReLU blocks and pooling layers, the resulting layer is transposed to
an FC layer. The output can be computed as matrix multiplication followed by a bias offset, which
then substitutes into an activation function. For example, in VGG-19, the CNNs end up with 3 FC
layers with the dimension of 4096, 4096, and 1000, respectively. In addition, due to the fact that
FC layers occupy most of the parameters in the entire CNN, they are prone to overfitting which
can be alleviated by incorporating dropout layers. The idea of using dropout layers is to randomly
55
remove FC layers, which can improve the computation efficiency and has proven to alleviate the
concern for overfitting (Srivastava, Hinton, Krizhevsky, Sutskever, & Salakhutdinov, 2014). In
VGG-19, 50% dropout is applied to the FC layers. Finally, the output of the last FC layer is passed
to a Loss layer (i.e. Softmax in this study) which determines the probability of each class (i.e. how
confidently the model thinks the input image being each class). The result of the classification is
recognized as the output with the highest probability for each class.
Based on the CNNs presented, the status of the reinforced concrete building can be classified
as collapse or non-collapse. Multiple pre-trained models can be used to facilitate the training
process. In this study, transfer learning from the three pretrained models including AlexNet
(Krizhevsky, Sutskever, and Hinton, 2012), VGG-19 (Simonyan & Zisserman, 2014) and ResNet-
50 (He, Zhang, Ren, & Sun, 2016) is applied for the binary classification task. Transfer learning
is a new machine learning technique that takes advantage of certain pretrained models in the source
domain and fine-tunes part of the parameters with a few labeled data in the target domain, which
can greatly promote the training process in the situation of data scarcity.
In deep learning, there is a trend to develop a deeper and deeper network which aims at solving
more complex tasks and improving performance. However, research has shown training deep
neural networks becomes difficult and the accuracy can reach a plateau or even degrade (He,
Zhang, Ren, & Sun, 2016). ResNets were developed by He, Zhang, Ren, & Sun, (2016) to solve
the problems where the shortcut connections were proposed. It has been demonstrated that training
this form of networks is easier than training plain deep convolutional neural networks and the
56
problem of accuracy deterioration is resolved. The complete architecture of ResNet-50 adopted in
this study is shown in Figure 4.1. The ResNet-50 contains a sequence of Convolution-Batch
convolution and before ReLU activation to stabilize training, speed up convergence, and regularize
the model. After a series of Conv-BN-ReLU blocks, global average pooling (GAP) is performed
to reduce the dimensionality, which is then followed by the FC layer associated with the softmax
function.
Due to the limited number of images for civil engineering applications, 686 images are
collected from datacenterhub.org at Purdue University and google images, of which 240 images
are related to the collapse of buildings, while 446 images of non-collapsed buildings. The image
preprocessing is conducted to reduce the inconsistency in image classification following the same
approach adopted by Gao & Mosalam (2018). The preprocessed images will be resized
appropriately to 224x224 or 227x227 pixels (depending on what network is chosen) before being
fed into the CNNs for state and damage classification. The performance of the model is verified
through the training and validation process. In this case, 80% of the collected images are chosen
57
as the training data and the rest is chosen as the testing data. Further, within the training set, 20%
of the images are set as the validation data and the remaining images are used to train the model.
Therefore, 686 × 0.8 × 0.8 ≈ 439, 492 × 0.8 × 0.2 ≈ 110, and 492 × 0.2 ≈ 137 images are
As per the proposed evaluation scheme depicted in Figure 4.2, if the RC building is identified
as non-collapse, the subsequent step is to determine the damage state of the structural components.
In this study, the definition of several damage states for RC structural columns is shown in Table
4-1, which is adopted from the ATC-58 damage state (DS) definitions for reinforced concrete
columns. These procedures have been shown and practically implemented for many years as
demonstrated by Nakano, Maeda, Kuramoto, & Murakami (2004), and Maeda, Matsukawa, & Ito
(2014).
58
The RC columns, the critical load-bearing components of the RC buildings, are selected to
demonstrate the component-level classification. In total, there are 2260 images collected from the
damage survey conducted by Sim, Laughery, Chiou, & Weng (2018), EERI Learning from
Earthquake Reconnaissance Archive and Google Image. The number of images for DS 0, DS 1,
DS 2 and DS 3 is 496, 404, 580 and 780, respectively. Similar to before, image preprocessing and
resizing are applied before training. Also, 80% of the acquired images for each damage class are
chosen as its training set and 20% as the testing set. The validation set is chosen as 20% of the
training set and the rest of the training set is used to train the model.
DS index Description
0 No damage
1 Light damage: visible narrow cracks and/or
very limited spalling of concrete
2 Moderate damage: cracks, large area of
spalling concrete cover without exposure of steel
bars
3 Severe Damage: crushing of core concrete,
and or exposed reinforcement buckling or fracture
classification, the pretrained AlexNet, VGG nets, and ResNet-50 are selected for transfer learning.
The trained model with the highest test accuracy is adopted to demonstrate the applicability of the
classification of multiple damage states. The construction of the network is similar to the previous
one except for the last three layers, a fully connected layer, a Softmax layer, and a classification
59
output layer are updated with new labels and the new number of classes (i.e., 4 damage states in
this case).
introduced in this study to identify steel reinforcement exposed due to concrete spalling. In
comparison to image classification, object detection is one step further which localizes the object
within an image and predicts the class label of the object. The output of object detection would be
different bounding boxes with their labels in the image. While R-CNN methods (i.e., R-CNN, Fast
R-CNN, Faster R-CNN) have been widely attempted in civil engineering applications, they are
still relatively slow for real-time applications. This study designed and applied a specific YOLOv2
object detection network for the identification of reinforcement exposure. Compared to R-CNN
methods, A detailed comparison of object detection networks is presented in Redmon & Farhadi
(2017).
Conv-BN-ReLU blocks and pooling layers, followed by localization and classification layers
which predicts the bounding box location and the class score, respectively. In this study, YOLOv2
built on ResNet-50 is adopted for steel reinforcement detection. First, the layers after the third
Conv-BN-ReLU block of ResNet-50 (as shown in Figure 4.3) are removed such that the remaining
layers can work as a feature extractor. Second, a detection subnetwork is added which comprises
groups of serially connected Conv-BN-ReLU blocks. Details of layer properties within the
detection subnetwork are illustrated in Figure 4.3. In conclusion, the detection is modelled as a
60
regression problem. The output of the network contains S x S grid cells of which each predicts B
boundary boxes. Each boundary box includes 4 parameters for the position, 1 box confidence score
(objectness) and C class probabilities. The final prediction is expressed as a tensor with the size
of 𝑆 × 𝑆 × 𝐵 × (4 + 1 + 𝐶).
Figure 4.3 The schematic architecture of YOLOv2 built on ResNet-50 for steel
The objective of training of neural network is to minimize the multi-part loss function as
𝑜𝑏𝑗
shown in Equation (4.1) where 𝐼𝑖,𝑗 = 1 if the 𝑗𝑡ℎ boundary box in cell 𝑖 is responsible for
𝑜𝑏𝑗
detecting the object, otherwise 0. Similarly, 𝐼𝑖 = 1 if an object appears in cell 𝑖, otherwise 0,
𝑛𝑜𝑜𝑏𝑗 𝑜𝑏𝑗
and 𝐼𝑖,𝑗 is the complement of 𝐼𝑖,𝑗 . The parameters 𝑥𝑖 𝑎𝑛𝑑 𝑦𝑖 are the predicted bounding box
position, 𝑥̂𝑖 𝑎𝑛𝑑 𝑦̂𝑖 refer to the ground truth position, 𝑤𝑖 𝑎𝑛𝑑 ℎ𝑖 are the width and height of the
61
̂𝑖 𝑎𝑛𝑑 ℎ̂𝑖 . The term 𝐶 is
predicted bounding box, while the associated ground truth is denoted as 𝑤
the confidence score and 𝐶̂ is the intersection over union of the predicted bounding box with the
ground truth. The multiplier 𝜆𝑐𝑜𝑜𝑟𝑑 is the weight for the loss in the boundary box coordinates and
𝜆𝑛𝑜𝑜𝑏𝑗 is the weight for the loss in the background. As most boxes generated do not contain any
objects, indicating the model detects background more frequently than detecting objects, to put
more emphasis on the boundary box accuracy, 𝜆𝑐𝑜𝑜𝑟𝑑 is set to 5 by default and 𝜆𝑛𝑜𝑜𝑏𝑗 is chosen as
0.5 by default.
s2 B s2 B
obj obj 2
λcoord ∑ ∑ Ii,j [(xi -x̂i )2 +(yi -ŷi )2 ]+ λcoord ∑ ∑ Ii,j [(√wi -√w
̂)i + (√hi -
i=0 j=0 i=0 j=0
2 s2 B s2 B s2 (4.1)
obj 2 noobj 2 obj
√ĥi ) ] + ∑ ∑ Ii,j (Ci -Ĉi ) +λnoobj ∑ ∑ Ii,j (Ci -Ĉi ) + ∑ Ii ∑ (pi (c)-p̂i (c)) 2
The network learns to adapt predicted boxes appropriately with regards to ground truth data
during training. However, it would be much easier for the network to learn if better anchor priors
are selected. Therefore, to facilitate the training process, K-means clustering as suggested by
Redmon & Farhadi (2017) is implemented to search for the tradeoff between the complexity of
the model and the number of bounding boxes required to achieve the desired performance. Once
the number of anchors is specified, the K-means clustering algorithm takes as input the dimensions
of ground truth boxes labelled in the training data, and outputs the desired dimensions of anchor
boxes and the mean IoU with the ground truth data. Clearly, the selection of more anchor boxes
provides a higher mean IoU, but also causes more computational cost. Through the parametric
study on the number of anchors, the relationship between the mean Intersection-over-Union (IoU)
62
and the number of anchors is established in Figure 4.4, which shows the number of 10 anchors is
a reasonable choice, where the mean IoU can reach about 0.8. It should be noted that unlike the
original work by Redmon & Farhadi (2017) where the network utilizes 5 box priors to classify and
localize 20 different classes, this study only focuses on the detection of one class (i.e. steel
reinforcement), indicating more anchors can be used without losing too much computational
efficiency. Figure 4.5 depicts the dimensional properties of each ground truth box as well as its
clustering approach are reported in Table 4-2. These anchors will be utilized to determine the
bounding box properties as shown in equations (2) to (6). In summary, B is chosen as 10. C is
equal to 1 which corresponds to steel exposure. As a result, the predicted tensor has a size of
26×26×60.
Figure 4.4 Relationship between Mean IoU and number of dimension priors
63
Figure 4.5 Illustration of K-means clustering results with 10 anchor sizes
Group 1 2 3 4 5 6 7 8 9 10
Width
104 174 174 107 105 67 274 208 138 54
[pixels]
Height
98 309 132 285 167 206 338 213 199 77
[pixels]
The network predicts 10 bounding boxes at each cell in the output feature map. For each
4.6, the bold solid box is the predicted boundary box and the dotted rectangle is the anchor prior.
Assuming the cell is offset from the top left corner of the image by (𝑐𝑥 , 𝑐𝑦 ) and the anchor box
64
prior has a width of 𝑏𝑎𝑛𝑐ℎ𝑜𝑟 and height of ℎ𝑎𝑛𝑐ℎ𝑜𝑟 , then equations (4.2) to (4.6) can be derived.
Equations (4.2)-(4.3) predict the location of the bounding box and (4.4)-(4.5) predict the
dimensions of the bounding box based on anchor box dimensions. Equation (4.6) is related to
objectness prediction that involves the IoU of the ground truth, the proposed box, and the
𝑏𝑥 = 𝜎(𝑡𝑥 ) + 𝑐𝑥 (4.2)
𝑏𝑦 = 𝜎(𝑡𝑦 ) + 𝑐𝑦 (4.3)
𝑏 = 𝑏𝑎𝑛𝑐ℎ𝑜𝑟 𝑒 𝑡𝑤 (4.4)
ℎ = ℎ𝑎𝑛𝑐ℎ𝑜𝑟 𝑒 𝑡ℎ (4.5)
Similar to other CNN models, the YOLOv2 is trained by back-propagation and stochastic
gradient descent (SGD). The learning rate is constant and set to 10−4 and mini-batch size is set to
16. The input image size is 416 x 416, which is identical to what has been adopted by Redmon &
Farhadi (2017) for finetuning detection subnetwork. The training and testing images of YOLOv2
are taken separately from the DS 3 images which have been used in the training and testing of the
DS classification model. Data augmentation such as cropping, flipping, and small rotation is
applied such that the augmented images still contain the object that needs to be detected. The
i7-9700K @ 3.60 GHz, 16 GB DDR4 memory and 8 GB memory GeForce RTX 2070 GPU) and
a Lenovo Legion Y740 (a Core i7-8750H @2.20 GHz, 16 GB DDR4 memory and 8 GB memory
65
Figure 4.6 Geometric properties of the predicted bounding box and the anchor prior
accuracy of the damage state. A single classification model generally performs well if trained on
a large dataset which covers a wide range of hidden features. Besides, it is required the image
scene is properly pre-processed such that the targeted region (i.e., RC column with/without damage
in this study) dominates the entire image. Moreover, the classification model may not perform well
if different classes have obvious shared features. In case of multiple irrelevant objects in the
background or a column with multiple damage features presented in a single image (i.e., the crack
feature is shared by DS 1, DS2 and DS3; the spalling feature is shared by DS2 and DS3), the
classification model may fail to identify the damage state class correctly. For example, an image
of a column is shown in Figure 4.7 that shows a lot of small cracks and small spalling, and at the
66
same time also presents exposure of evident steel bars concentrated in relatively small regions, the
classification model fails to identify it as DS 3 (i.e., ground truth label) in our experiments. This is
interpretable because in this case, the damage features of DS 1, DS 2 and DS 3 are all included in
one single image, leading to the fact that the predicted probability for DS 3 is not the highest. This
is the moment when the detection of steel bars is needed to reinforce the severe damage state
67
Figure 4.7 Sample images of RC columns that classification model wrongly identifies as DS 2 while
In this regard, a novel dual CNN-based inspection approach (Figure 4.8) is proposed to
facilitate the process. The classification model is trained across all the damage states as defined in
Table 4-1. On the other hand, the localization of steel bars is implemented using the YOLOv2
68
object detection approach. The advantage of the object detection approach is its ability to focus on
damage-sensitive features (i.e., exposed reinforcement bars in this case) which distinguish DS 3
from DS 0, DS 1 and DS 2. It is noted that the detection of exposed reinforcement is crucial because
most of the tensile stiffness and strength of the reinforced concrete components are contributed by
DS 3 in case the classification fails to classify it. In fact, from the safety point of view,
these components are prone to fail completely in the aftershocks, which may lead to a partial or
complete collapse of the building and consequently significant increase of repair cost, injuries and
death rate. In other words, it is more conservative to maintain the second object detection network
even if in some rare cases, the final damage state is identified as DS3 while the ground truth label
69
The results of the classification model including the label and its associated probability are
obtained. Meanwhile, the reinforcement detection model checks the existence of exposed steel
bars. The final decision on the damage state is taking advantage of both outcomes from the
classification model and object detection model. As shown in Figure 4.8, each image is evaluated
by the two models in parallel. For a given image, it is first resized to fit the size of input layers of
the classification networks and object detection networks, respectively. The classification model
predicts the probability of each damage state and takes the one with the highest probability as the
result. Meanwhile, the object detection model aims at detecting the exposed steel bars. If the steel
bars are not detected, the classification result is directly output as the final inspection outcome. If
the steel bars are captured by the detection model, then DS 3 should be returned as the final
decision. The proposed dual CNN-based framework builds on the traditional damage classification
model, and extends the evaluation scope by analyzing the local details (i.e., steel bars in this case).
The object detection model does not change the fundamental diagnosis logic of the classification
model but reinforces the identification of DS 3 through the localization of exposed bars on top of
the classification model. This comes from three bases. First, object detection is a more complex
task than classification because it involves both localizing and classifying the target in the scene.
Solely relying on object detection results is more likely to lower the overall evaluation accuracy
(potentially caused by insufficient recall). Second, there are some other damage features of DS 3
such as the crushing of concrete, and a substantial shear failure mechanism while the proposed
object detection model is trained to detect exposed steel bars only. The identification of such
multiple features still partly relies on the classification model. Third, steel reinforcement is one of
the most essential load-carrying components in RC columns. The proposed detection network of
70
steel bars introduces one redundancy to reinforce the identification of the most severe damage case
which is more likely to cause an unexpected system failure in response to aftershocks, and
consequently higher chances of injuries and death, as well as substantial repair cost and longer
repair time.
In this section, component-level damage quantification procedures for two common shapes of
concrete columns are developed including cuboid-shape and cylinder-shape concrete columns.
Two damage features are considered to be quantified including concrete spalling volume and the
exposed length of steel reinforcements. The damaged columns to be quantified are assumed to
have some remaining portions of top, bottom, and side surfaces. In other words, those severely
damaged columns which have either one of the top, bottom or side surfaces of columns completely
missing, will be not considered for quantification. This assumption should be reasonable given
that these severely damaged columns should be immediately replaced, rather than repaired.
Therefore, quantifications of concrete spalling and steel rebar exposure in such scenarios are rather
unnecessary. For these columns, the CNN-based classification methods described in Section
4.2.2.2 to Section 4.2.2.4 should suffice to provide sufficient damage evaluation results, without
Quantification of the concrete spalling volume is achieved by two main processes: a) use
detected planes or surfaces to recover the undamaged geometry of the structural component, b)
subtract the volume of the damaged point cloud from the undamaged geometry to determine the
spalling volume quantification. Detailed procedures are presented in the sections below.
71
4.2.2.5.1 Cuboid-shape concrete columns
In this section, the damage evaluation methodology for the concrete spalling and steel
quantification of concrete spalling is not feasible using 2D vision methods. This section proposes
major planes of the RC columns. The plane segmentation is achieved by the M-estimator
SAmple Consensus (MSAC) algorithm (Torr & Zisserman, 2000), which is a similar
• Next, these planes are used to automatically recover the undamaged configuration of the
RC columns. Plane normal vector will be checked with respect to that of the ground plane
which is assumed to be a known metric. It should be noted that the prior knowledge of
ground plane vectors is reasonable as many existing algorithms (e.g., [ref]) can be used to
identify ground from images and point clouds. In fact, in the situation of large-scale 3D
scene reconstruction, the ground plane can be easily recognized using MSAC or
RANSAC. This is because the first principal plane detected by these algorithms is always
the ground plane due to the fact that the ground plane contains the greatest number of
points. To recover the original undamaged configuration of the RC columns, the top,
• The side surface planes are typically easy to segment. Using the intersection lines of the
side surfaces and the ground plane, the cross-sectional area of the RC column can be
72
determined. In order to determine the volume of the RC column, the column height is
required. The column height can be assumed as prior knowledge of the storey height, or
realized by an automated algorithm. In the former case, while human interventions are
needed, such information should be easy to obtain. In the latter case, a plane segmentation
algorithm can be applied to identify the ceiling plane, which is typically parallel to the
ground plane. Then, the distance between the ground plane and the ceiling plane can be
spalling volume can be calculated as the difference between the undamaged configuration
and the damaged configuration of the RC component. Calculation of the volume of the
typically in cuboid shape or cylinder shape. The calculation of the volume of the damaged
determined using the methodology first presented by Edelsbrunner, Kirkpatrick, & Seidel
(1983). The alpha shape defines piecewise linear curves that create a boundary for the
point cloud. To implement the algorithm, a shrink factor should be provided where a
shrink factor of 0 will lead to the convex hull of the points, while a shrink factor of 1 will
result in the most compact boundary for the point cloud. Therefore, it is generally
On the other hand, quantification of the steel reinforcement exposure is also presented. The
pretrained YOLOv2 is deployed to localize steel reinforcement in the rendered images of the 3D
reconstructed point cloud. In this study, for simplicity and the purpose of demonstrating the
73
concept, the vertical reinforcement exposure length is considered. The vertical exposure length is
the projected exposure length along the longitudinal direction of the RC column, regardless of
whether the steel rebar is buckled (or bent) or not. Within this context, the exposure length can be
estimated as the height of the bounding box detected for the steel rebars. In the situation of multiple
bounding boxes detected on the same RC column, the steel exposure length of that column is taken
as the union of the projected length from all the bounding boxes.
In the situation of circular-shaped columns, most processing steps remain similar to Section
4.2.5.2.1, except for the following. a) Plane segmentation can still be used to identify the top and
bottom surfaces of the columns. However, since the cylinder columns do not have flattened side
surfaces, rather than using plane segmentation to detect the side surfaces, a circle fitting algorithm
can be performed on the cross-section of the columns multiple times along the longitudinal
direction of the columns. The fitted circle, together with the distance between the top and bottom
surfaces are used to establish the original undeformed circular column. Detailed algorithmic
steel reinforcement exposure for the cylinder-shape RC columns can be done in the same way as
74
4.2.3 Experiments and results
This subsection presents the results for system-level collapse recognition, and component-
level damage detection and localization. The training parameter settings and the overall testing
accuracy of the CNN classification models have been summarized in Table 4-3.
Table 4-3 System-level and component-level training parameters and performance of transfer learning
Although the testing accuracy among all three pretrained models is relatively close, the
ResNet-50 which has the deepest architecture, yields slightly higher accuracy than the other two
CNN models. The loss and accuracy of ResNet-50 during training are presented in Figure 10. Both
the training and validation accuracy exceeds 90% after around 80 epochs of training and
approaches 100% at the end. It is generally acknowledged that the validation dataset can provide
an unbiased assessment of a model fit on the training dataset (Krizhevsky, Sutskever, and Hinton,
2012; He, Zhang, Ren, & Sun, 2016). An increase in the error on the validation dataset is a sign of
overfitting to the training dataset. A high and stable validation accuracy of the validation is
observed in Figure 4.9 which demonstrates the applicability of the system-level classification
75
model for collapse identification. Figure 4.10 compares the confusion matrices (Kohavi & Provost,
1998) between training and testing results. For instance, 95% of the testing images which have
ground-truth labels as collapse are successfully predicted while only 5% of these images are
misclassified as “non-collapse”. Moreover, sample testing images with probability for their
associated class are shown in Figure 4.11. More sample results are presented in Appendix C. The
trained model can predict the correct class for the images with high probability.
(a) (b)
Figure 4.9 System-level collapse identification for training and validation sets using ResNet-50: (a)
(a) (b)
Figure 4.10 System-level collapse versus no collapse: confusion matrices of (a) training set and (b) testing
set
76
Figure 4.11 Sample testing images of the building with predicted probability for each class
Similar to the system-level failure classification, Table 4-3 presents the identical training
parameters and performance comparison of three different pretrained models (AlexNet, VGG-19,
ResNet-50) for the classification of the component damage states. In general, all three models have
high accuracy, while the ResNet-50 has slightly higher accuracy than AlexNet and VGG-19. The
loss and accuracy for ResNet-50 during the training process are presented in Figure 4.12, which
shows both the training and validation accuracies are approaching 100% at the end. The
performance of the trained model is confirmed by the confusion matrix for training and testing as
shown in Figure 4.13. Figure 4.14 shows the classification of a few sample images with a correct
prediction. More sample results are presented in Appendix C.1. The results show that the trained
77
model can classify different damage states with reasonably high accuracy, although the
classification accuracy with respect to moderate damage (i.e., DS 2) and severe damage (i.e., DS
3) is not as high as that regarding the class of no damage (i.e., DS 0) and light damage (i.e., DS 1).
The results reflect there is an increasing difficulty in detecting damage features from DS 0 to DS
3. Basically, there is no damage feature in DS 0 when RC columns are in almost perfect condition
in which case the trained CNN model only needs to identify the column profile without any extra
damage features. Similarly, only cracks and very limited spalling are enclosed in DS 1, where
slightly more damage features are introduced compared to DS 0. However, more features are
usually observed in DS 2, such as light or severe cracks and a large area of spalling. In the case of
DS 3, the model performs reasonably well to detect its own damage features which include
exposure of significant length of steel bars, crushing of concrete and buckling or fracture of
reinforcement. However, it occasionally misclassifies the damage state as DS 2, while the ground
truth is DS 3 (Figure 4.15). There are two potential reasons. First, DS 2 and DS 3 have many
common damage features, such as cracks and a large amount of concrete spalling. Second,
exposure of steel reinforcement is not evident in some cases while cracks or significant spalling
may dominate the entire image (Figure 4.7). To overcome such deficiency, a novel object detection
algorithm is implemented and combined with the classification algorithm to identify the damage
states more accurately. Experiments and results of the dual CNN-based method will be discussed
in Section 4.2.3.4.
78
(a) (b)
Figure 4.12 Component-level DS classification for training and validation sets using ResNet-50: (a)
(a) (b)
Figure 4.13 Component-level damage state identification: confusion matrices of (a) training (left) and (b)
testing set.
79
Figure 4.14 True prediction of sample testing images of the building with predicted probability for each
class
Figure 4.15 False prediction of sample testing images with ground truth of “Severe Damage”
This subsection presents the results regarding the detection of exposed longitudinal
80
Van Gool, Williams, Winn, & Zisserman, 2010). A low false positive rate leads to high precision
and low false negative rate results in a high recall. In other words, a large area under the recall-
precision curve indicates the high performance of the detector with both high recall and precision.
A detector with high recall but low precision retrieves many results, but most of its predicted labels
are incorrect (e.g., incorrect bounding box locations, low IoU). A detector with high precision but
low recall can localize the object very accurately once the object is successfully recalled, but only
very few results can be recalled. The average precision (AP) is often used to quantify the
performance of an object detector (Girshick, 2015; Ren, He, Girshick, & Sun, 2017), which is
determined as the area under the precision-recall curve. Mean AP (mAP) is defined as the mean
of calculated APs for all classes. Figure 4.16 presents the precision-recall curve for training and
testing. The mAP for both training and testing has demonstrated the applicability of YOLOv2 for
detecting steel bars. The testing results in Figure 4.16 (b) indicate there still exists room for
improving the mAP with the larger training set, particularly improving the recall. It should be
noted that the detection of steel bars is more difficult than detecting objects with a more regular
pattern because the steel bars may buckle or fracture in a very complex way in different situations.
Figure 4.17 provides sample detection images where the steel bars are localized by rectangular
bounding boxes. Figure 4.17 (b) (upper) shows the images which were wrongly classified by the
traditional classification method, while Figure 4.17 (b) (lower) shows the same images in which
the exposed steel bars are localized by YOLOv2. More sample testing results are presented in
Appendix C.1.
81
(a) Training (b) testing
Figure 4.16 Recall - precision curve of training (upper) and testing (testing)
As illustrated in Figure 4.8, the proposed evaluation scheme combines the classification model
and object detection model to reinforce the identification of the most severe damage state. Figure
4.18 presents the identification summary of a single classification model and the dual CNNs model
using the confusion matrix method. Overall, the employment of the YOLOv2 object detector
improves the identification accuracy of DS 3 by 7.5%. It should be noted that the classification
accuracy can be potentially improved by expanding the training on the large dataset. However, the
training data for civil engineering applications is relatively limited as aforementioned. The
82
DS2, 56.2% DS2, 99.5%
(a) (b)
Figure 4.17 Detection of steel bars highlighted by yellow rectangular bounding boxes (a) Sample testing
images, (b) detection of exposed steel bars (lower) in testing images wrongly predicted by classification model
(upper)
Figure 4.18 Confusion matrix with consideration of only the classification model (left); the combined
83
4.2.3.5 Component-level damage quantification
This section presents the implementation procedures and results for the RC components. Due
to the limited experimental specimens available at the UBC structural laboratory, only three
cuboid-shape columns are considered in the experiments. The concept of the proposed
The 3D point clouds of the three RC column specimens are reconstructed first, using a series
camera. Such a resolution provides an image-to-object ratio of about 9 pixels/mm. Given that the
3D reconstruction pipeline used in this study can achieve sub-pixel accuracy, the reconstructed
point cloud using these high-resolution images can theoretically achieve very high accuracy (i.e.,
well below 1mm). To validate the accuracy, for each column, several benchmark points are marked
on different surfaces of the specimens before quantification procedures. After the validation, the
3D point clouds are processed by plane segmentation algorithms. In this study, the distance
threshold to identify planes is chosen as 0.25 mm. Figure 4.19 depicts the plane segmentation
results for a sample RC column investigated. It is shown that the ground plane, and all the side
planes of the RC column are successfully identified after 6 iterations of the plane segmentation
algorithm. Full sample results of all three RC columns are presented in Appendix C.1.
In order to automatically find the top surface plane, the following procedures are considered.
First, the furthest point with respect to the ground plane is determined, which is used as a reference
to establish a plane parallel to the ground plane. Then, the number of points (i.e., plane inliers) that
fit this plane is counted. The inlier points are defined as the points having a point-to-plane distance
less than a certain distance threshold. If the number of points fitting this plane is greater than a
84
predefined threshold, this plane is considered the top surface plane of the RC column. Otherwise,
this point is considered noise. The above procedures will be repeated on the next furthest point
until the number of points fitting the respective plane is greater than the predefined threshold. This
first plane that meets the requirement will be regarded as the top surface plane. Further, using all
the detected planes, a 3D cuboid shape can be reconstructed. The spalling volume can be quantified
by simply subtracting the volume of the damaged RC column from the volume of the reconstructed
3D cuboid.
The reconstruction by structure from motion algorithms can achieve high accuracy in both
field applications and laboratory settings. This has been well demonstrated in many existing
literatures (e.g., Hirschmuller, 2007; Westoby et al., 2012; Micheletti, Chandler, & Lane, 2015;
Morgan et al., 2017). To further demonstrate the effective and accurate reconstruction of the RC
components using structure from motion, this section presents the quantification results of concrete
About 450 images were collected for each RC column to reconstruct the 3D point cloud. The
reconstructed point cloud is then calibrated to the real-world scale. There are typically three ways
to achieve this: a) Calibration of the camera at each viewpoint to explicitly determine the intrinsic
and extrinsic parameters. This requires a lot of efforts and is generally not recommended. b)
Integration of distance measurement devices with the camera so that the reconstructed cloud can
known real-world locations (Hartley, Gupta, & Chang, 1992), which is a method widely used in
85
the surveying industry. In this study, as only a single uncalibrated camera is used, the method of
Table 4-4 shows a summary of the concrete spalling quantification results. Besides, the
estimation error with respect to the ground truth values is presented, which is around the range of
5-10% for the three columns examined. The average estimation error of the three columns is about
First, the alpha shape estimated based on the point cloud may not represent the true concrete
continuum, due to the suboptimal shrink factor used. It should be noted that this study is not
focused on searching for the optimal shrink factor that best fits the concrete column scenario. This
is because such optimal shrink factor may not hold consistent for different types of concrete
columns, nor other types of civil structures. Hence, it is recommended to use a rational value as
Second, in this study, the concrete spalled debris is collected and measured by a laboratory
weight scale. The concrete spalling volume is then calculated by dividing the measured mass by
the manufacturing claimed density of the concrete. Such processes inevitably contain errors such
as the errors during the concrete debris collection process, and also the difference between the
Overall, the estimation errors should be deemed reasonably small and acceptable for the
86
Figure 4.19 Plane segmentation (units are in mm)
87
Table 4-4 Summary of the spalling quantification of the RC columns
Table 4-5 summarizes the steel reinforcement exposure estimation results. The estimation
error with respect to the ground truth values is below 0.05m. These errors are mainly due to the
bounding box jitter (i.e., the effect formally defined as the inconsistency in aspect ratio and minor
random translations of the predicted bounding boxes). In other words, the predicted bounding
boxes may not always encompass the objects inside tightly. Figure 4.20 shows sample steel
reinforcement exposure detection performed by the pretrained YOLOv2. It is shown that all the
steel reinforcement exposure in all viewpoints is successfully localized. On the other hand, it is
also observed that the predicted boxes are not guaranteed to be the minimum boxes that fit the
objects inside. Nevertheless, such accuracy is deemed acceptable for the inspection of these
components.
88
Table 4-5 Summary of the steel exposure quantification of the RC columns
89
4.2.4 Conclusions
Rapid post-disaster damage estimation and cost evaluation of RC structures are becoming a
crucial need for owners and decision-makers for risk management and resource allocation. In this
reinforced concrete structures. Within the framework, a dual CNN scheme is proposed to enhance
the accuracy and reliability of a single CNN classification scheme. The main conclusions are
summarized herein: 1) Both system-level and component-level classification models were trained
and deployed successfully, which follows post-disaster damage state classification guidelines of
RC structures; 2) A novel real-time object detector, named YOLOv2 built on ResNet-50 was first
implemented to demonstrate its applicability for detecting exposure of steel bars with 98.2% and
84.5% mAP in both training and testing process. 3) In comparison to a single CNN classification
scheme, the combined YOLOv2 and classification scheme improves the classification accuracy by
7.5% in identifying the most severe damage state 4) The 3D vision-based quantification pipeline
yields a reasonably high accuracy in both concrete spalling quantification (below 10%) and steel
reinforcement exposure estimation (below 5 mm). Overall, this section demonstrates the
applicability of the proposed framework on RC structures at both the system level and component
level. The concept of the 3D vision-based pipeline and the dual CNN scheme can be generalized
to other types of structures to provide a more comprehensive assessment, and enhance the damage
90
4.3 Vision-based SHM methods for steel structures
4.3.1 Introduction
Steel structures are widely constructed worldwide. Common surface damages of steel
structures can be observed such as steel corrosion, delamination, cracks, and fracture. With the
growing need for faster manufacturing and economical construction, complimented by better
more traction in recent decades. To date, steel thin-walled structures have been widely used
worldwide as load-bearing elements in resisting both gravity forces and lateral loads from natural
events such as earthquakes and winds. Earlier research has shown one of the critical failure modes,
shear buckling, was observed in both hot-rolled and cold-form steel plate thin-walled elements
(Sabouri-Ghomi, Ventura, & Kharrazi, 2005; Park et al., 2007; Yi et al, 2008; Dou, Pi, & Gao,
2018). More recently, shear buckling has also been observed in steel plate damping devices
developed for energy dissipation in high seismic activity zones taking advantage of the high
ductility of the steel material (Zhang, Zhang, & Shi, 2012; Deng et al., 2015; Sahoo et al., 2015;
The out-of-plane displacements due to buckling of these steel plate components must be
identified and quantified to evaluate their residual performance after major earthquakes so that
repair or replacement actions can be executed accordingly. When performing buckling and post-
researchers and engineers (Singer, Arbocz, & Weller, 2002). Measuring the out-of-plane
displacements and determination of buckling and post-buckling shapes is crucial for further
analyses and interpretation of results (Singer, Arbocz, & Weller, 2002). In addition, measuring the
91
deformed geometry (i.e., both in-plane and out-of-plane deformations) of structures provides the
necessary information to perform geometry model updating for the structures in the field after
natural disasters (Zhang & Lin, 2022). Existing methods to measure out-of-plane displacements
due to buckling can be typically achieved by displacement sensors such as potentiometers (Singer,
Arbocz, & Weller, 2002), line laser device-based measurement systems (Zhao, Tootkaboni, &
Schafer, 2015), motion capture systems (Park et al., 2015), and fringe projection systems (Liu et
al., 2019). When using contact-type displacement sensors, preliminary numerical analyses are
typically required to identify critical buckling regions to determine appropriate sensor placements.
The measurement results may not be accurate if insufficient sensors are used, or if sensor locations
are not well identified. These methods are relatively expensive, which require the design of sensor
placement and relatively complicated installation processes. On the other hand, although the line
laser-based methods can provide higher accuracy, they require a dedicated and complicated
supporting setup, and the total cost is relatively high, which hampers their wide applications in the
structural engineering field. Motion capture systems can provide a time history of structural
conditions. Besides, motion capture systems generally require specific markers to be attached to
the test structures, where the installation can be very difficult for large-scale civil structures. Fringe
projection systems are generally limited to small-scale structures where the projector can capture
the entire object being scanned. Hence, it is difficult to apply fringe projection systems to full-
scale civil structures. Moreover, the cost of both motion capture systems and fringe projection
92
On the other hand, in recent years, research using TLS has also been conducted to identify
structural damages in 3D space. The scanning results are point cloud coordinates in 3D space. For
example, Mizoguchi et al. (2013) quantifies the scaling damage of concrete bridge pier using a
TLS. Kim et al. (2015) employed a TLS to localize and quantify concrete spalling. Kim et al.
(2021) proposed a damage quantification framework for concrete bridge piers with more
complicated shapes by processing the point cloud obtained using a TLS. The accuracy of the TLS
as reported in many studies generally ranges from 3 to 15mm, and is suitable for applications
where an error of 3-15mm in quantifying local damage areas does not greatly affect the global
health inspection of the entire structures. Although the results are promising in those studies, the
TLS devices may not be accurate enough to quantify structural damages at a relatively small
magnitude such as steel plate deformation (due to buckling) of less than 10 mm. Moreover, these
TLS are generally expensive and may not be readily available to many researchers and engineers.
Furthermore, although the low-end and mid-tier laser scanners cost less, they typically have much
In recent years, the effectiveness of the vision-based methods has been well demonstrated for
visual damage detection of different types of structural systems such as reinforced concrete (RC)
structures, masonry structures, and steel structures (Spencer Jr, Hoskere, & Narazaki, 2019). In the
case of steel structures, vision-based damage detection methods have been proposed with a focus
on identifying and localizing steel surface damages. For example, Yeum & Dyke (2015) employed
an integral channel-based sliding window method to localize bolts and utilized the Hessian matrix-
based edge detector to detect cracks near bolts on a steel beam. Yun et al. (2017) applied a
combined Gabor filter and double-thresholding binarization method to identify the shape of the
93
steel surface stains. Kong & Li (2018) examined a video tracking method to detect and quantify
the fatigue cracks of steel structures under repetitive loads. In their study, feature point detection
and tracking are applied to track the motion of the structure. The crack regions are detected by
searching for discontinuities in the motion during tracking. Finally, the crack opening is quantified
based on the tracked location of the two small windows deployed to the crack regions identified.
These methods were built on traditional vision algorithms, which are generally not robust against
background noise. Cha et al. (2018) employed a deep learning-based method, Faster RCNN, to
detect steel corrosion and delamination. The results indicated the CNN-based method can achieve
high accuracy and robustness. Despite the achievements made in these studies, several limitations
can be identified: (a) These methods were developed in 2D computer vision. The assessment
results using 2D vision are sensitive to the camera locations and angles; b) these methods are
primarily designed to detect in-plane damage, they are not directly capable of accurately
fracture. This is due to the limitation that the processing of 2D RGB or grayscale images
Since the majority of the existing studies on vision-based damage detection of steel structures
are limited to in-plane surface damages such as steel corrosion, delamination, cracks, and other
surface defects, to address these limitations and challenges, in this research, a 3D computer vision-
based framework is proposed to detect and quantify out-of-plane displacement of steel structures
at high accuracy and low cost. The framework is briefly described herein. First, a sequence of
image frames is used to reconstruct the 3D dense point cloud (i.e., RGB-D data) of the scene which
contains the steel components of interest using image association, structure-from-motion, and
94
multi-view stereo algorithms. Second, a multi-view object detection method is proposed to localize
the steel structures in the 3D scene. In this step, a CNN-based object detector is trained to perform
object detection for rendered images of the 3D scene at multiple camera views. The bounding
boxes generated at different views will be used to extract the 3D point cloud of the steel
components in the scene. This step is crucial to remove point clouds of irrelevant objects in the
background and allow only the object (s) of interest to remain in the scene. Further, the buckling
region of the steel components is isolated by using plane fitting algorithms to remove adjacent
proposed to cluster the points for quantification of out-of-plane amplitude. Experiments of the
proposed framework have been examined on a steel plate damping device (Yang et al., 2021) and
a full-scale steel corrugated plate wall. The results indicate the proposed framework can
successfully localize the steel components in a 3D scene, and accurately quantify the out-of-plane
damage with an error of about 1mm at the benchmark points of the experimental specimen. This
shows the proposed framework can achieve high accuracy at a significantly lower cost compared
to traditional methods to measure the out-of-plane damage extent. In addition, the proposed
4.3.2 Methodology
As shown in Section 4.3.1, many other in-plane damage types (e.g., steel surface cracks,
surface corrosion) of steel structures have been conducted by many existing studies, the
not be addressed by the existing 2D vision-based research. The proposed methodology consists of
95
vision-based 3D scene reconstruction, multi-view CNN-based steel component detection and out-
of-plane displacement quantification, as shown in Figure 4.21. As both undamaged and damaged
structural components may be present on-site, prior to the implementation of the proposed
whether the structural component is damaged. The proposed methodology will only be applied
when the structural component is identified as damaged. This will eliminate the need to reconstruct
and assess the undamaged structural components. As the demonstration of the system-level
classification method has been presented in Section 4.2, to maintain the focus of the dissertation,
the detailed implementation of a similar system-level classification method on the steel corrugated
panels is not repeated in this section. Instead, sample system-level classification results are
Image sources for 3D reconstruction can be obtained using common consumer-grade cameras
such as smartphone cameras, or unmanned aerial vehicles (UAVs) with appropriate camera specs.
In this study, two types of steel plate structures in the structural laboratory at the UBC are selected
to demonstrate the effectiveness of the proposed framework, including a damaged steel plate
damping device (Yang et al., 2021) and a full-scale damaged steel corrugated plate wall. Plate
dampers of similar types are used as energy-dissipating devices in the event of earthquakes and
received a wide range of investigations (Zhang, Zhang, & Shi, 2012; Deng et al., 2015; Sahoo et
al., 2015; Etebarian, Yang, & Tung, 2019; Yang et al., 2019). Meanwhile, buckling behavior
investigation and performance evaluation of steel corrugated panels were also widely conducted
(Vigh et al., 2013; Bahrebar et al., 2016; Mansouri, & Hu, 2018; Tong, Guo, & Zuo, 2018). As
these components represent common steel plate structures and are prone to experience out-of-plane
96
deformations due to buckling in the event of earthquakes, they are considered good candidates to
In this section, 3D reconstruction procedures of steel plate structures are briefly described. As
shown in Figure 4.22, the 3D reconstruction of steel plate structures consists of data association
which determines the image-to-image connection based on a pair of unstructured images of steel
plate structures that share similar features, structure-from-motion which determines sparse point
cloud of the plate structure from the estimated camera poses, and multi-view stereo which
generates a dense point cloud of the plate structure based on the sparse point cloud and the input
RGB images.
97
Figure 4.22 Vision-based 3D reconstruction of steel plate structures
The concept of a multi-view vision-based 3D object detection method has been presented in
Section 3.7. Similarly, the proposed 3D object detection method is applied to detect structural
components from an unorganized 3D scene cloud. In this section, the detector is trained to detect
the steel plate damper and the steel corrugated plate wall. In order to extract the object of interest
from the 3D scene, a minimum of two camera viewpoints are required to remove background
objects sufficiently, as illustrated in Figure 4.24. In this section, the point cloud of the 3D scene is
automatically rendered onto the XZ and YZ plane view, which will then be processed by a CNN-
based object detector to localize the steel component. In the presence of multiple steel plate
components, different cuboid 3D boxes will be generated for different components, where
98
displacement quantification will be performed for each component separately. In this section, the
In this section, the architecture and novelty of the proposed YOLOv3-tiny detector are
described. The architecture of the YOLOv3 has been adopted and modified to create a new version
of YOLOv3-tiny for the localization of bolts. Compared to the original YOLOv3 built on Darknet-
53 which has 247 layers in total, the developed YOLOv3-tiny has a total of 44 layers only. This is
achieved by reducing the depth in convolutional layers of YOLOv3. The main advantage of
YOLOv3-tiny is that it achieves over 10 times higher speed (as will be shown in Section 3.1) than
the original YOLOv3, while still maintaining high enough precision for bolt localization. The
blocks, and max pooling layers. The input image will be resized to 416 x 416 in width and height,
before entering the networks for training. During training, YOLOv3-tiny divides the image into
26x26 grid cells. Each grid cell has 6 anchor boxes, of which each has an object score, multiple
class scores (depending on the number of classes being detected), and 4 bounding box coordinates.
In this case, the number of classes is two, for steel plate dampers and steel corrugated plate walls.
Consequently, the output of YOLOv3-tiny has a dimension of 26 x 26 x 42, where ‘26’ represents
the number of grid cells for each output, and ‘18’ represents each of the three anchor box class
scores, object scores, and bounding boxes (i.e., 2 class score for bolt + 4 bounding box values + 1
99
Figure 4.23 Architecture of the YOLOv3-tiny object detector
Image data for training and testing of the YOLOv3-tiny algorithm were collected at the UBC
were also collected. Standard data augmentation techniques are applied such as horizontal flipping,
small translations, rotation, small cropping, and scaling. As a result, 3248 images were generated
for the steel damper, and 2702 images were generated for the steel corrugated panel, where 70%
of the images are randomly selected from each component type for training, and the rest is used
for testing. The YOLOv3-tiny detector was trained by back-propagation and stochastic gradient
descent with momentum (SGDM). The initial learning rate is set as 0.001. The mini-batch size and
number of training epochs were set to 0.001, 6 and 65, respectively. Training of the YOLOv3-tiny
was done in MATLAB R2021a, with the hardware configuration of a Core i7-9700K CPU and
100
RTX 2070 GPU. To train the YOLOv3-tiny networks, anchor box selection is required. Table 1
shows the anchor boxes estimated based on the training data. The YOLOv3-tiny predicts
(Everingham et al., 2010). A low false-positive rate corresponds to a high precision value, and a
low false-negative rate reflects a high recall value. If a detector attains high precision but low
recall, it can detect objects in only a few images, but the localization accuracy is high once the
object is recalled. If a detector has low precision but high recall, it can retrieve objects in many
images, but the localization precision is not high. The overall performance of the algorithm is
quantified as the average precision (AP), which is computed from the precision-recall plot as the
weighted average of precision at each recall value. The precision-recall curves of the YOLOv3-
tiny during training and testing are reported in Figure 4.25 and Figure 4.26, respectively, where
the APs for the steel plate damper and steel corrugated plate wall are above 0.9. This shows the
trained YOLOv3-tiny is capable of detecting the steel plate damper and corrugated plate wall at
high precision.
Figure 4.27 shows the deployment of the trained YOLOv3-tiny on sample testing rendered
scenes and real-world images. Both the steel plate damper and the steel corrugated wall are
successfully localized with a high probability. The trained YOLOv3-tiny model will be used in the
proposed multi-view 3D object detection method to extract the steel components from the rendered
images of the 3D scene. The predicted bounding box locations from multiple camera views are
101
Figure 4.24 Steel components identification
102
Table 4-6 Estimation of anchor box dimensions
Anchor index 1 2 3 4 5 6
Width
115 90 196 222 147 146
[pixels]
Height
107 85 187 96 72 141
[pixels]
103
Figure 4.26 Precision-recall curve of testing
104
Figure 4.27 Sample testing results of YOLOv3-tiny on the rendered scenes and real-world images
In most cases, the structural components identified may compose of multiple parts. In the
example shown in Figure 4.28, part A, B, and C are not the region of interest which need to be
removed manually or automatically using surface fitting strategies such as plane segmentation.
The M-estimator SAmple Consensus (MSAC) algorithm (Torr & Zisserman, 2000), which is a
105
similar variant of the RANSAC algorithm, can be used to for plane segmentation, where a distance
threshold should be provided. The distance threshold defines the maximum distance from a point
to the plane. If the distance between a point and the fitted plane is less than the threshold, the point
will be considered an inlier to the fitted plane. Such distance thresholds should be chosen according
to the resolution of the point cloud (which is dependent on the input image resolution), the point
cloud down-sampling rate if any, and also the computational power of the hardware (which should
be considered particularly for very large point clouds). The plane fitting algorithm is first applied
to the original point cloud data, where the first principal plane (i.e., the plane that contains the
maximum number of inliers) and the corresponding inliers will be identified. Next, these inlier
points are removed, and the fitting algorithm is applied to the remaining point cloud. The process
When dealing with structural assemblies with various shapes of parts, plane segmentation and
other types of surface fitting strategies (e.g., Kim et al., 2021) can be used to remove most of the
irrelevant surfaces, while minor manual interventions can be used to further refine the remaining
point cloud.
106
Figure 4.28 Plane segmentation for an I-shaped steel plate damper
Once the region to be measured is isolated, the out-of-plane displacements can be quantified
with respect to a predefined reference plane. The results can provide detailed and accurate out-of-
geometry model updating in field applications, as shown in a recent study by Zhang & Lin (2022).
To limit the scope of this study, the framework proposed in this paper is dedicated to quantifying
the out-of-plane displacement of steel plate structures, while model updating is not further
implemented.
107
To quantify the out-of-plane displacements of the steel plate structures, users only need to
take photos from one side of the structures (i.e., less than 180 degrees field of view). In this case,
data points will only appear on one surface of the isolated plate. However, if the data collection is
done using 360 degrees field of view, dual layers will exist corresponding to the two surfaces of
the plate. In such a situation, it is required to separate these two layers of points. There are three
options. 1) manually separate the points from two surfaces. 2) adjust the video prior to the 3D
reconstruction such that the extracted frames only contain views from one side of the structure. 3)
use a point cloud clustering method to separate the points from two surfaces. In this study, a
separate the two layers of points, as shown in Figure 4.29. Detailed implementation of the DBPPC
• The points on the top-most and bottom-most surfaces of the buckled plate are eliminated
such that these points are not considered in the clustering process.
• Starting from one corner of the remaining point cloud, a sub-cloud is sampled by taking a
relatively small grid step along the vertical direction. The sampled sub-cloud is then
• The DBPPC method is applied to the projected sub-clouds. For each projected sub-cloud,
a point at one end of the buckled plate is sampled first and its k nearest neighbors are
determined such that the Euclidean distance between the point and its furthest neighbor
does not exceed a predefined threshold (e.g., half of the thickness of the plate). These k
points will be grouped into cluster 1 and the rest of the points will be temporarily grouped
into cluster 2. Next, the furthest neighbor point is selected as the new point and its nearest
108
neighbors are determined from cluster 2 using the same distance threshold. These neighbor
points will be again assigned to cluster 1, and the remaining cloud will be temporarily
assigned to cluster 2. The process is repeated until no more points can be assigned to cluster
• The DBPPC method is implemented on the projected sub-clouds iteratively until all the
points within the entire cloud are successfully grouped into two clusters.
During these processes, point associations between the original points and projected points
will be stored so that the original points can be clustered based on the clustering of their
corresponding projected point indices. The algorithmic implementation of the DBPPC method is
presented in Appendix B. Figure 4.29 shows a comparison of sample point projected locations
when small and large grid step sizes are selected. This shows when the step size is relatively small,
the algorithm can be effectively performed. Finally, two separate clusters of the original points
will be obtained for the quantification of bucking out-of-plane displacements. To quantify the out-
of-plane displacement for the steel plate damper, a reference plane should be selected. The
reference plane may be arbitrarily defined by users, or defined as the undeformed (undamaged)
109
Figure 4.29 Illustration of DBPPC method
110
4.3.2.5 Accuracy validation
The reconstruction by structure from motion algorithms can achieve high accuracy in both
field applications and laboratory settings as presented in Section 4.2. To further demonstrate this
in the scenario of reconstructing steel plate structures considered in this research, this section
investigates the accuracy in displacement quantification results of the steel plate damper.
In this study, about 300 images are extracted from a 4K video recorded for the steel damper,
which are used to reconstruct the 3D point cloud. The reconstructed point cloud is then calibrated
to the real-world scale using the ground-controlled points. Next, after multi-view vision-based 3D
object detection and point cloud postprocessing, the buckled plate is isolated. The proposed
DBPPC method is applied to separate the two layers of points of the plate, where the vertical grid
step size is set constant as 10, and the distance threshold is set as 3. As the two clusters lead to
very similar quantification results, one cluster (i.e., the one in the top right corner in Figure 4.30)
is selected to generate a detailed out-of-plane displacement distribution plot. Figure 4.31 presents
the out-of-plane displacement distribution, where the horizontal axis represents the longitudinal
direction of the plate, and the vertical axis represents the vertical direction of the plate. The
quantification results of the steel plate damper are compared with the ground truth results
(measured by a Fowler digital caliber which has an accuracy of 0.02mm) at the 4 BPs of the
damper, as presented in Table 3. The locations of the 4 benchmark points (BP) have been indicated
in Figure 4.31 and Figure 4.32, where two of the four BPs are selected at the top edge, while the
other two are selected at the bottom edge. The mean error of the benchmark points is estimated to
be 1.07mm. The maximum error is 1.30mm observed at BP 1, and the minimum error is 0.85mm
111
at BP 4. This shows the proposed method can quantify the out-of-plane structural displacements
Table 4-7 Comparison of the estimated out-of-plane displacements with the ground truth values at four
benchmark points
112
Figure 4.31 Out-of-plane displacement measurements for the steel plate damper. The units are in [mm]
It should be noted that, depending on the accuracy needed for reconstructing different types
of structures, the number of images will be varied. For large structure applications such as
113
buildings, more images should be collected. This can be efficiently achieved using drones
equipped with consumer-grade camera recording video at sufficiently high resolution (e.g., 1080p,
4.3.3 IMPLEMENTATION
The proposed method is implemented on a full-scale steel corrugated plate wall, which has a
The image database is established using a smartphone camera with automatic capturing mode.
There are 495 images collected for the steel corrugated plate wall with a resolution of 3840 x 2160
(which were extracted from a 4K video recorded for about 20 seconds). This can also be achieved
efficiently using drones recording videos of reasonably high resolution. In field applications, there
may not be a sufficient open field of view to capture images at 360 degrees. To demonstrate the
effectiveness of the proposed methodology under a relatively small field of view, the video for the
steel corrugated plate wall was recorded within about a 120-degree field of view on one side of
As shown in Figure 4.33, with data association, structure from motion, multi-view stereo, and
point cloud preprocessing, the point clouds are successfully reconstructed for the steel corrugated
plate wall. Sample images in the input database and scene graph are also shown for illustration
114
purposes. Implementation of the vision-based 3D reconstruction procedures including data
association, structure from motion, and multi-view stereo are done in Meshroom (Python API) and
Metashape (Python API). The point cloud preprocessing (cleaning) procedures are performed in
After the proposed 3D object detection method is applied, the point cloud of the steel
corrugated plate wall is extracted. In this case, only one iteration of the plane segmentation is
applied to remove the ground plane successfully, as shown in Figure 4.34. Next, the out-of-plane
displacement quantification results are reported. For convenience, the reference plane is manually
defined as a flat plane (following the centerline of the corrugation) as shown in Figure 4.35. Once
the reference plane is defined, the relative out-of-plane displacements can be determined, as shown
in Figure 4.36. This shows the developed framework can be effectively used to provide detailed
115
Figure 4.33 Vision-based 3D reconstruction procedures for the steel corrugated plate wall
Figure 4.34 One iteration of plane segmentation for the steel corrugated panel. The units are in [mm]
116
Figure 4.35 Illustration of the reference plane for the steel corrugated plate wall. The units are in [mm]
117
Figure 4.36 Quantification of out-of-plane displacement distribution for the steel corrugated wall panel.
118
4.3.4 Conclusions
Steel plate structures are commonly used as load-bearing elements and energy dissipation
devices. Buckling is one of the dominant damage types experienced by many steel plate
components. In this section, a 3D vision-based pipeline has been developed and implemented to
quantify out-of-plane damage for steel plate structures. The framework consists of a 3D vision-
based scene reconstruction pipeline, a newly proposed multi-view 3D object detection method,
and point cloud postprocessing methods including a newly proposed DBPPC algorithm. The
results indicate the proposed framework can successfully reconstruct the steel plate structures,
effectively localize the steel components from a 3D scene, and accurately quantify the out-of-plane
displacements with an accuracy of ~1mm. The main contributions and novelties of the section are
method is implemented to detect structural components. c) The proposed method provides a non-
contact measurement approach, and a more economical solution in both equipment and setup cost,
devices; d) The proposed method provides finer measurement results compared to traditional
contact-type displacement sensors which measure displacement at limited sensor locations. e) The
proposed method can provide information to perform geometry model updating of steel plate
There are certain limitations in this study. Therefore, recommendations for further studies are
common steel plate structures which have primary failure mode as buckling. In the future, the
119
damage quantification algorithms should be further developed and quantified for more
types of steel structures should be further investigated, such as a combination of buckling, fracture
4.4.1 Introduction
Structural bolts are critical parts to connect structural elements in place. Structural
components such as beams-column joints and column-base connections can experience complete
failure if the bolts get loosened to a certain level, which may result in a catastrophic system-level
collapse. Besides, some of the innovative energy dissipation devices such as friction dampers
heavily rely on the bolts to generate the desired friction force. The seismic energy absorption
potential of such damping devices will deteriorate with the loosening of bolts, and consequently,
affect the global performance of the building. Therefore, robust monitoring methods should be
developed to detect damages in bolted components, and if the damage has been found, repair or
replacement actions should be applied to maintain the structural integrity, prior to extreme natural
hazards.
Earlier, traditional structural health monitoring (SHM) methods (Wang et al., 2013) were
methods using contact-type sensors identify damage based on the structural modal properties (i.e.,
stiffness and damping), which are related to natural frequencies and mode shapes. The contact
sensor-based SHM methods to identify bolt loosening have also been developed in recent years
120
(Yang & Chang, 2006; Wang et al., 2013; Sevillano, Sun, & Perera, 2016). However, these
contract sensor-based methods have several limitations. Contact sensors are unreliable when
subjected to changes of environmental conditions, such as temperature and humidity, which could
lead to false detection (Xia et al., 2012; Li, Deng, & Xie, 2015). These methods require dedicated
experts to set up the sensors, high-precision instrumentation, and a software package to account
for environmental variation effects (Huynh, & Kim, 2017; Huynh, & Kim, 2018). Moreover, in
the case of bolt loosening detection, these methods could recognize damage in the bolted
assemblies, but could not precisely localize the loosened bolts (Ramana, Choi, & Cha, 2019). Such
methods are labor-intensive, expensive, and may be impractical in real-world applications to assess
the bolt loosening in a device with a large number of bolts of different types, which requires many
In recent years, vision-based SHM has evolved as a reliable and efficient way for structural
sensors, vision-based methods offer non-contact detection, and have low sensor cost, and easier
installation and operation. For example, a simple commercial-grade camera can be easily fix-
mounted to a beam-column bolted connection and capture multiple bolts at the same time. The
image or video data can be acquired wirelessly in real time, and efficiently processed and analyzed
by modern consumer-grade computers. In addition, due to the nature of cameras, the damage
In recent years, convolutional neural networks (CNNs), which fall into a category of deep
neural networks (or deep learning), have been shown to prominently outperform traditional image
processing techniques (IPTs). CNN-based vision methods have been effectively implemented in
121
the damage detection of various structural components (e.g., Cha, Choi, & Büyüköztürk, 2017;
Xu, Gui, & Han, 2020; Azimi, & Pekcan, 2020; Miao, Ji, Okazaki, & Takahashi, 2021; Gao, Zhai,
& Mosalam, 2021; Sajedi, & Liang, 2021). To date, a limited number of vision-based studies for
bolt loosening detection were conducted. In summary, most of the existing studies on vision-based
bolt loosening detection have been developed upon 2D computer vision, which is generally
categorized into front view-based detection and side view-based detection methods.
image captured from the front view, followed by quantification of the bolt loosening rotation angle.
Many existing front view-based detection methods rely on Hough Transform (HT) algorithm
(Hart, & Duda, 1972), such as the work by Park, Kim, & Kim (2015) or Park, Huynh, Choi, &
Kim (2015). Kong & Li (2018) adopted an image registration approach for bolt loosening angle
estimation. Huynh et al. (2019) employed HT and R-CNN algorithms to estimate the bolt rotation
angle, which was validated on a full-scale bridge connection using an unmanned aerial vehicle. Ta
and Kim (2020) further applied similar algorithms to detect a combination of bolt loosening and
corrosion. Zhao and Wang (2019) applied a dual-class single shot detector to localize the bolt and
a specific symbol on the bolt simultaneously, where the bounding box locations are used to
quantify the bolt rotation angle. Although the results of these studies have been promising, their
a) The methodology relied on the HT algorithm to detect lines and circles, which may not
perform well when the bolt assembly becomes more complicated, with the existence of washers,
or in the situation of light reflection on the bolts, shades of the surrounding objects on the bolts, or
122
b) In the HT-based studies, the estimation of rotation angle is conducted on two static images,
using geometric transformation analysis of edge lines of bolts. This only works effectively when
the rotation angle between these two images is less than 60 degrees, due to the geometric nature
of the hexagon-shaped bolt nuts examined. This constraint may not be satisfied in real-world
scenarios when structures are experiencing severe shaking (due to major earthquakes) and the bolts
c) Despite the RCNN methods implemented in some of these studies provide reasonable
accuracy with proper training, the speed of RCNN is too slow for real-time applications (Redmon,
& Farhadi, 2018). To be more specific, the rotational speed of bolts under severe earthquake
shaking can be relatively high (e.g., up to or even over 90 degrees per second). In order to ensure
the tracking method can track bolt rotation, it requires the rotation of bolts to be less than 60
degrees between two adjacent video frames. This means the minimum processing speed for real-
time detection and tracking of bolt rotation is 1.5 FPS (= 90 degrees per second / 60 degrees per
computers.
d) In the study presented by Zhao and Wang (2019), the dual-class single shot detector is
trained to detect a specific symbol on the bolt. However, different types of bolts have different
symbols. The trained detector cannot be directly applied to other types of bolts without excessive
training for many different bolts. Moreover, this method is not very accurate, because the predicted
bounding box is typically subjected to jittering, which cannot precisely localize the symbols and
bolts.
123
On the other hand, side view-based detection methods localize bolts and quantify bolt
loosening in images captured from the side view. The bolt loosening length along the longitudinal
direction of the bolt can be estimated. In recent years, side view-based detection methods have
been implemented, such as the integration of HT and support vector machine for bolt loosening
quantification (Cha et al., 2016; Ramana et al., 2017; Ramana et al., 2019). These studies were
more focused on qualitative evaluation by localizing the bolts in an image and assigning
quantifications were implemented. Zhang et al. (2020) employs Faster R-CNN to localize tight
and loose bolts, and quantify the extension length of the bolt loosening. While this method shows
the capability of localizing the bolts, the extension length quantification requires the camera to be
placed at an appropriate angle to achieve high accuracy. Besides, a reference ruler is used in this
method to aid the quantification process. Although this can be practically done, it requires more
human interventions. More recently, Zhang and Yuen (2021) proposed a bolt loosening
quantification method using an orientation-aware bounding box approach, which can address
multiple orientations of bolt assemblies in the image. The method uses the aspect ratio of the
loosened bolts as the looseness quantification metric. However, the aspect ratio does not provide
a direct measurement of the longitudinal loosened length. Moreover, the experiments in this study
were conducted on bird-eye view images, this means the quantification results need to be further
124
4.4.2 Overview and application scenarios of the proposed bolt loosening quantification
methodologies
To address these issues, in this research, two methodologies are proposed. The first method is
a front view-based method, while the second method is a full-view (i.e., full 3D view) method.
The first method is built upon 2D vision methods, which should be considered as long-term
monitoring solutions for bolts. In this case, cameras are assumed to be preinstalled at appropriate
locations and facing towards the front face of the bolt cap. The first proposed method is aimed at
On the other hand, the second method is built upon 3D vision-based evaluation pipeline, which
reconstructs the bolted devices from 2D images captured. The application of such a method should
be considered for post-disaster inspection where photos or videos of the bolted devices can be
taken at multiple locations and angles. This can be conducted manually, or more efficiently by
UAVs or UGVs equipped with cameras recording videos at high frame rates (e.g., 30/60 frames
per second). The second proposed method is aimed at addressing the limitations of the existing
named RTDT-bolt, for bolt rotation is proposed. The proposed method is the first-of-its-kind to
detect and track bolt rotation interactively, using vision-based techniques. The procedures have
been briefly summarized herein, and the implementation details will be explained in Section 4.4.5.
First, the object detection algorithm, YOLOv3-tiny, which was built upon the original architecture
125
of YOLOv3 (Redmon, & Farhadi, 2018), was trained for accurate real-time bolt localization in an
image, under various lighting conditions. YOLOv3-tiny has a reduced depth of the convolutional
layers compared to the original YOLOv3, thus greatly improving the detection speed, while still
maintaining competitive accuracy. Second, feature points (FPs) are generated using the Shi-
Tomasi corner detection algorithm (Shi, &Tomasi, 1994) within the regions of interest (ROIs)
tracking algorithm (Tomasi, & Kanade, 1991). The geometric transformation analysis based on
M-estimator Sample Consensus (MSAC) algorithm (Torr, & Zisserman, 2000) is then developed
to estimate the frame-to-frame rotation. Third, given that the optical flow algorithm tends to fail
in the presence of sudden changes in pixel values, likely due to illumination changes and
accumulated errors from background noise during tracking (Nixon, & Aguado, 2019), to address
this issue, the tracking algorithm is combined with the YOLOv3-tiny algorithm to re-detect the
target when the tracking gets lost. The tracking continues with the new FPs generated every time
when the new detection is imposed. The proposed method can allow the users to continuously
track the bolt rotation in real time. To demonstrate the effectiveness and examine the potential
limitations of the proposed method, extensive parameter studies have been conducted, including
the number of image pyramid levels (NP), bi-directional error threshold (BE), search block size
(BS), and the maximum number of iterations during tracking (NI). Details of such parameters will
It should be noted that the traditional HT-based method may potentially be used to monitor
the total rotation greater than 60 degrees, provided that the HT algorithm can reliably identify the
bolt edges, and also the incremental rotation of the bolt between two adjacent frames is less than
126
60 degrees. However, the processing of the HT algorithm to detect edges of the bolt may not
perform well in the relatively complicated situations aforementioned. Even if the HT algorithm
can accurately detect the edges of the bolt in every frame, the processing of such an algorithm on
all the frames is time-consuming, compared to the optical flow tracking algorithms (Nixon, &
Aguado, 2019).
Figure 4.37 depicts the integrated method, RTDT-bolt, to robustly monitor the rotation of
structural bolts. First, YOLOv3 has been adopted and modified to create a new version of
YOLOv3-tiny to localize the bolts with ROI bounding boxes, in the 1st video frame (Pan & Yang,
2021). The Shi-Tomasi algorithm is then employed to identify high-quality FPs within the ROIs
for tracking purposes. Second, the optical-flow KLT feature-tracking algorithm is applied to track
the FPs generated, from frame to frame. Third, the YOLOv3-tiny algorithm will be integrated with
the KLT tracking algorithm to ensure the high performance of tracking. The YOLOv3-tiny
detector will generate new ROIs for the bolts, if the number of FPs being tracked falls below a
certain threshold (e.g., less than 50% of the initial number of FPs identified). This will not only
eliminate the loss of tracking problem due to external environment changes, such as changes in
lighting conditions, but also effectively reduce the accumulated error from long-time tracking.
Lastly, the total rotation angle of the bolt can be calculated as the sum of the rotation angle
determined at each detect-track interval. Specific details of these procedures are discussed in
127
Figure 4.37 Flowchart of the RTDT-Bolt method
In this section, the architecture of the YOLOv3-tiny detector for bolt localization is presented
Convolution, Batch Normalization, Leaky ReLU (Conv-BN-Leaky ReLU) blocks, and max
pooling layers. The input image will be resized to 416 x 416 in width and height, before entering
the networks for training. During training, YOLOv3-tiny divides the image into 26x26 grid cells.
Each grid cell has 3 anchor boxes, of which each has an object score, multiple class scores
(depending on the number of classes being detected), and 4 bounding box coordinates. In this case,
the number of classes is one, for structural bolts. Consequently, the output of YOLOv3-tiny has a
dimension of 26 x 26 x 18, where ‘26’ represents the number of grid cells for each output, and ‘18’
represents each of the three anchor box class scores, object scores, and bounding boxes (i.e., 1
class score for bolt + 4 bounding box values + 1 object score = 6 values. 6 values x 3 anchor boxes
= 18 values).
128
Figure 4.38 Architecture of the YOLOv3-tiny object detector
The training data were collected in the structural laboratory at The University of British
Columbia. The friction damping device recently developed at UBC was selected to demonstrate
the integrated method (Figure 4.39). First, bolt rotation was achieved by rotating the bolt on the
backside of the damping device. Meanwhile, the videos of bolt rotation were recorded by the
iPhone Xs Max smartphone device on the front side, at 4K video quality settings. The phone
camera was placed at various angles in order to generate more variety in the dataset. Then, the
video frames were processed and extracted to generate the training images for YOLOv3-tiny.
Standard data augmentation techniques such as cropping, horizontal flipping, small translations,
rotation, and small scaling are also applied such that the object being localized is still included in
the augmented images. In this case, 3808 images were generated from the original frames of all
video files.
129
Figure 4.39 Image of the experimental bolted component
The proposed integrated method is designed to deal with illumination changes whose effects
have rarely been studied by existing structural engineering vision-based research, but can happen
quite often in real-world situations. In order to enhance the data variety and enhance the robustness
augmentation approach (Chaichulee et al., 2017) is applied to generate three different lighting
conditions for an image. The procedures are briefly explained as follows: a) convert the image
from red-green-blue (RGB) space to hue-saturation-lightness (HSL) color space; b) obtain the
histogram of average lightness value of all the original images; c) evenly divide the histogram into
3 sections and compute the mean of lightness for each section; d) for each original image falling
in one section, two more images are generated by scaling its average lightness to the other two
mean of lightness levels in the other two sections, respectively. Details of such augmentation
130
method are described in (Chaichulee et al., 2017). In this regard, the database after the lighting
In the end, 70% of the augmented image database is assigned as the training dataset and the
rest is selected as the testing dataset. Consequently, 11424 × 0.7 ≈ 7997, and 11424 × 0.3 ≈
3427 images are distributed for training, and testing sources, respectively. The images are resized
to 416 × 416, before being input into the networks for training. Then, the selection of anchor boxes
is conducted for training data using the methodology presented in Redmon and Farhadi (2018) and
Pan and Yang (2020). The anchor box dimensions will be utilized by YOLOv3-tiny to predict the
bounding box location for objects in input images. The training of YOLOv3-tiny was implemented
by back-propagation and stochastic gradient descent with momentum (SGDM), where the learning
rate was chosen as 0.001, the mini-batch size was chosen as 6 and the maximum number of training
In addition, to demonstrate the speed and accuracy of the YOLOv3-tiny against the existing
object detection algorithms, the RCNN and YOLOv3 detection algorithms are also implemented,
respectively. The RCNN algorithm is built on AlexNet, which is adopted from Ta and Kim (2020).
Similarly, the training of RCNN was implemented by back-propagation and stochastic gradient
descent (SDG), where the learning rate was chosen as 0.000001, the mini-batch size was chosen
as 32 and the maximum number of training epochs was set to 10. The YOLOv3 algorithm is built
on Darknet-53, which is adopted from Redmon and Farhadi (2018). The training setting of
YOLOv3 is the same as that of YOLOv3-tiny. All the training was implemented in MATLAB
R2021a (MATLAB, 2021) on two computers: a Lenovo Legion Y740 (a Core i7-8750H @2.20
GHz, 16GB DDR4memory and 8GBmemory GeForce RTX 2070 max-q GPU), an Alienware
131
Aurora R8 (a Core i7-9700K@3.60 GHz, 16 GB DDR4 memory and 8 GB memory GeForce RTX
2070 GPU).
In this section, the integrated RTDT-Bolt method, for robust detection and tracking of bolt
loosening is described. Earlier research (Park et al., 2015b; Huynh et al., 2019; Ta, & Kim, 2020)
has demonstrated the effectiveness of the vision-based methods for bolt rotation estimation.
Although their results are promising, these studies have several limitations aforementioned. In
particular, the HT algorithm employed in these studies may not be able to accurately detect lines
and circles in complex images, e.g., when the image contains washers, light reflections, shades of
algorithm using three different methods (i.e., Canny, Prewitt and Log) is applied to three types of
images, including the original image, smoothed image, and sharpened image. The original image
is cropped from an image of the friction damping device presented. The smoothed image is
obtained by applying the Gaussian image filter with a standard deviation of two to the original
image, while the sharpened image is generated by subtracting a blurred (unsharp) variant of the
image from itself. The original, blurred and sharpened images were processed by the three methods
(i.e., Canny, Prewitt and Log), respectively, to identify edges in the images. Then, the HT
algorithm is used to identify the straight line edges, which is achieved by collecting the votes of
the identified edges by the three methods in the Hough space and selecting the highest votes as the
fitted line edges. The procedures were implemented in MATLAB R2021a, where the edge
sensitivity threshold for the three HT methods is set as [0.1 0,8], 0.05, and 0.004, respectively.
132
Results in Figure 4.40 and Figure 4.41 indicate that the identification of the hexagon-shaped edges
of the bolts in our experiments is difficult using HT methods. The neighboring object(s), circular-
shaped washer, shades, and background noise can cause the algorithm to fail in identifying bolt
edges. Besides, although the RCNN method employed by some of these studies provides good
localization accuracy, the architecture of the RCNN is relatively heavy. Hence, its speed is too
slow for real-time applications when it is desired. On the other hand, the effectiveness of optical-
flow-based tracking algorithms has been demonstrated in vision-based structural motion tracking
(Ji, & Chang, 2008; Chen et al., 2015; Zheng, Shao, Racic, & Brownjohn, 2016; Cha, Chen, &
Büyüköztürk, 2017; Kuddus et al., 2019). Two main limitations have been identified. First, these
studies focused on the extraction of horizontal or vertical translations, the investigation of rotation
estimation of structural components that exhibit rotation behavior, such as bolts, remains very rare.
Second, although these studies have shown promising results, there existed several challenges,
such as outdoor lighting conditions. In essence, the optical flow methods are known to work well
for tracking objects that have a rigid-body profile and distinct visual texture, but it tends to fail in
the situation of sudden external environment changes, such as a change in outdoor lighting
conditions, light reflection, or shades of neighboring objects (Nixon, & Aguado, 2019).
133
Figure 4.40 Image preprocessing of the structural bolts.
134
Figure 4.41 Hough transformation of the original image, smoothed image, and sharpened image, using
The proposed RTDT-Bolt method is aimed to address these issues. The scope herein is to, a)
achieve real-time performance in both detection and tracking; b) provide solutions to measure the
135
rotation of bolts up to any range; c) enhance the robustness of traditional KLT tracking algorithms
against illumination changes and background noise. Detailed implementation of the proposed
generate the bounding box (i.e., ROI) for each bolt in the 1st frame of the video. Second, these
ROIs will be extracted from the original video frame and the Shi-Tomasi algorithm is applied to
generate FPs inside the ROIs. This step is essential to eliminate the need to process the entire video
frame, but rather focus on the ROIs, thus greatly reducing the computational burden. There are
two essential parameters involved in the Shi-Tomasi algorithm, including the minimum quality
measure, and the Gaussian filter dimension. The minimum quality measure determines the
minimum threshold below which the FPs will be discarded. It is recommended to set a reasonably
large value to eliminate low-quality points. The Gaussian filter dimension determines the
dimension of the Gaussian filter used to smooth the gradient of the input image. In this study, the
minimum quality of the Shi-Tomasi FP generation is set as 0.2, while the Gaussian filter dimension
is set as 5.
Further, the KLT tracking algorithm initiates on the FPs generated. As the tracking moves
forward, some FPs may get lost, due to external environmental changes such as lighting conditions,
the variation of background noise, or the change of the relative location of cameras with respect to
the bolts under severe earthquake shaking. In this case, if the number of the FPs being tracked is
below a predefined threshold, the tracking is considered lost. In this case, the real-time detector,
YOLOv3-tiny, will interfere with the specific frame where the tracking gets lost, and generate new
ROIs again. New FPs will be generated inside the ROIs and the tracking continues on the new FPs
in the same way as before. The above detect-track steps are repeated whenever the tracking is
136
considered lost throughout the videos. Meanwhile, the geometric transformation matrix about the
origin, 𝑇, can be evaluated based on the locations of FPs obtained by the KLT tracking algorithm
between two adjacent frames (MathWorks, 2021). Similar to Kuddus et al. (2019), the MSAC
algorithm (Torr & Zisserman, 2000) is applied to remove outliers in this step. Then, the rotation
angle can be extracted from the transformation matrix using the following steps. Consider a rigid-
body object in MATLAB image coordinate system are rotated by 𝜃 angle about the origin, and
translated by 𝑡𝑥 pixel in the horizontal direction, and 𝑡𝑦 pixel in the vertical direction, which can
be expressed by:
where 𝑥𝑖+1 , 𝑦𝑖+1 are the horizontal and vertical pixel coordinates, respectively, of a feature
point on the object in the (𝑖 + 1)th frame. Similarly, 𝑥𝑖 and 𝑦𝑖 are those in the 𝑖th frame. 𝑇 is the
𝑐𝑜𝑠𝜃 𝑠𝑖𝑛𝜃 0
𝑇=[−𝑠𝑖𝑛𝜃 𝑐𝑜𝑠𝜃 0] (4.8)
𝑡𝑥 𝑡𝑦 1
Further, the transformation matrix 𝑇 ∗ with respect to an arbitrary point (e.g., the center of
1 0 0 0 0 0
[𝑥𝑖+1 𝑦𝑖+1 1] = [𝑥𝑖 𝑦𝑖 1] ∗ ([ 0 1 0] 𝑇 + [ 0 0 0]) = [𝑥𝑖 𝑦𝑖 1] ∗ 𝑇 ∗ (4.10)
−𝑎 −𝑏 1 𝑎 𝑏 0
137
where,
1 0 0 0 0 0
∗
𝑇 =[ 0 1 0] 𝑇 + [0 0 0] (4.11)
−𝑎 −𝑏 1 𝑎 𝑏 0
It is observed that the 1st row and 2nd row of 𝑇 ∗ are the same as those of 𝑇. Therefore, the
incremental bolt rotation angle can be easily extracted from the transformation matrix 𝑇 estimated.
Finally, the incremental rotation estimated at each interval, is summed up to determine the total
rotation angle of the bolt, φ. The time history of the rotation can also be generated. The procedures
To assess the feasibility of the proposed integrated method, it is necessary to obtain the
ground-truth value for the rotation of the bolt. This can be done using the following simple steps:
a) manually label the line edges of the bolts, at an appropriate interval of video frames such that
the rotation experienced by the bolt does not exceed 60 degrees within each interval; b) similar to
the method presented by Ta and Kim (2020), apply geometric transformation method for all the
labeled line edges, and compute the rotation of the bolt, 𝜃𝐺𝑇,𝑗 , as the mean rotation of all the line
edges, for each interval, 𝑗; c) sum up the rotation for each interval to determine the total ground-
𝑛=𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙𝑠
In the end, the accuracy of bolt rotation estimation in percentage can be determined as follow
138
𝜑−𝜑𝐺𝑇
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = max (0, 1 − | |) (4.13)
𝜑𝐺𝑇
where 𝜑 is the estimated total rotation angle by the RTDT-Bolt method, and 𝜑𝐺𝑇 is the ground
In order to examine the capability and potential limitations of the proposed RTDT-Bolt
method, extensive parameter studies are conducted. In this section, the scope of the parameter
studies is described. The sensitivity of results to the selection of detection and tracking parameters
Given that the KLT algorithm is the essential part of the proposed RTDT-Bolt method, the
parameters of the KLT algorithm are investigated with the values shown in Table 4-8. These
a) Number of pyramid levels (NP): The KLT tracking algorithm generates an image pyramid,
where each subsequent level of the pyramid decreases in resolution by a factor of two compared
to the previous level. If the number of pyramid levels is set to greater than one, the algorithm tracks
the points at multiple levels of resolutions, which may potentially enhance the tracking
effectiveness. However, as the computational cost increases with the increase in the number of
pyramid levels, it is recommended to select an appropriate value to balance between speed and
accuracy.
b) Bi-directional error threshold (BE): The bi-directional error is calculated based on the FPs
in the two adjacent frames. Essentially, the algorithm conducts forward-backward tracking. It
tracks the FPs from the preceding frame to the current frame, and then traces back the same FPs
139
to the previous frame. The bi-directional error represents the pixel distance in the image coordinate
system, between the original location of the points and the backward-tracing location. The FPs
will be abandoned when the error associated with them is greater than the threshold.
c) Search block size (BS): This metric determines the neighboring area around the point being
d) Maximum number of iterations (NI): This parameter is the maximum number of iterative
searches performed by the KLT algorithm to determine the new location of each FP until it
converges.
The parameter studies are initiated from a base model, whose selection of values for each
in Section 4.4.1, existing side view-based quantification methods developed in 2D computer vision
reconstruction, multi-view CNN-based detection of structural bolted devices, and bolt loosening
140
length quantification, as shown in Figure 4.42. Image sources for 3D reconstruction can be
obtained using common consumer-grade cameras such as smartphone cameras, or unmanned aerial
vehicles (UAVs) with appropriate camera specs. In this study, a structural bolted device with
loosened bolts at UBC structural laboratory is selected to examine the effectiveness of the
proposed methodology. Other similar experimental setups have been proposed in recent years (Cha
et al., 2016; Ramana et al., 2017; Ramana et al., 2019; Zhang et al., 2020; Zhang & Yuen, 2021).
At the time of writing, to the best of the author’s knowledge, the proposed methodology is the
first-of-its-kind using 3D vision, multi-view CNN-based methods, and advanced point cloud
processing, for bolt loosening quantification in the structural engineering field. The proposed
methodology is fully automated which does not require a specific camera viewing angle for
4.4.1.
The procedures for the 3D reconstruction of the bolted device are similar to those presented
in Section 4.2 and Section 4.3. As shown in Figure 4.43, the procedures consist of data association
motion which determines sparse point cloud of the plate structure from the estimated camera poses,
and multi-view stereo which generates a dense point cloud of the plate structure based on the sparse
point cloud and the input RGB images. Details of the 3D reconstruction pipeline have been
142
Figure 4.43 Vision-based 3D reconstruction of the structural bolted device
143
4.4.4.2 Multi-view structural bolted device detection
Once the scene point cloud is generated, a multi-view object detection method based on
YOLOv3-tiny is applied, as previously presented in Section 4.3.2. In this case, the YOLOv3-tiny
is retrained to localize the structural bolted device. The scene cloud is projected to two predefined
views where the YOLOv3-tiny is applied to generate the bounding box on the rendered views,
respectively. The generated bounding boxes on the two view planes will be fused to extract the
point cloud of the structural bolted component from the original 3D scene cloud (Figure 4.44).
144
Figure 4.44 Structural bolted device localization
145
4.4.4.3 Structural bolt loosening quantification
This section presents a fully automated bolt looseness quantification method, which consists
of a front-view bolt localization method to find a reference plane, followed by a side-view bolt
This section describes the bolt localization method, which will provide a reference for bolt
loosening localization in the subsequent section. In this study, localization of the bolts is achieved
by applying CNN-based vision methods on an image which shows a front view of the bolts. This
requires the projection of the point cloud onto the plane which provides a clear front view of the
bolts. Plane fitting is implemented using the MSAC algorithm (Torr & Zisserman, 2000) to
identify multiple planes of the bolted devices. Next, the point cloud is projected onto these planes
detected, which will lead to multiple rendered 2D images. In order to identify the plane which
contains the front face of the bolts, a YOLOv3-tiny detector is adopted from Pan and Yang (2021)
and then deployed to localize the bolts. The plane where front-view bolts are detected will be
146
Figure 4.45 Front view-based bolt localization
This section presents the procedures to quantify bolt loosening length, which is defined as the
distance between the bottom of the bolt cap and the supporting plate or surface underneath the
147
Figure 4.46 Illustration of loosened bolts and tight bolts.
A combined YOLO detection and top-down convex hull (YOLO-TDCH) method is proposed
for automated quantification of bolt loosening length. The detailed implementation procedures are
described herein:
• The predicted bounding boxes by YOLOv3-tiny (in the previous section) will be utilized
to form a cuboid boundary (Figure 4.47) for each bolt along the direction of the normal
vector of the front-view reference plane identified (in the previous section).
• A series of small sub-cloud (with a relatively small step) is sampled along the direction of
the normal vector of the front-view reference plane (Figure 4.48). Then, within the sub-
cloud, the points within each cuboid boundary obtained in step 1 will be projected to the
reference plane. Further, the convex hull of the projected points belonging to each cuboid
boundary will be determined, respectively. The area of each convex hull will be calculated.
148
• The above process is repeated for all the small sub-clouds from the top down. The area of
each convex hull will be consistently checked for each cuboid boundary within each sub-
cloud. The bolt loosening length can be estimated considering the two following criteria.
a) If a bolt is tight (e.g., Bolt 1 and Bolt 3 in Figure 4.46), there will be a relatively constant
convex hull area at the beginning (i.e., bolt cap region) of the sub-cloud stepping-down
process, and then a sudden increase of the convex hull area when the sub-cloud sampling
reaches the structural surface underneath the bolt. b) If a bolt is loosened (e.g., Bolt 2 and
Bolt 4 in Figure 4.46), there will be a relatively constant convex hull area in the first region
(i.e., bolt cap region), followed by a sudden decrease of the convex hull area at the
beginning of the second region (i.e., bolt thread region due to loosening), and then followed
by a sudden increase of the convex hull area when the sub-cloud sampling reaches the steel
surface underneath the bolt. The plane travel distance within the second region will be
149
Figure 4.48 Sub-cloud top-down sampling process
YOLO-TDCH method is fully automated during the image and point cloud data processing,
without human intervention to identify reference planes for quantification of bolt loosening length.
This section presents the results by method 1, using the RTDT-bolt method. In order to
examine the effectiveness of method 1, a friction damper device is selected at the UBC structural
laboratory.
In this section, the training and testing results of the RCNN, YOLOv3, and YOLOv3-tiny for
bolt localization are presented. Six anchor boxes are selected based on the training data and applied
150
for the training of both YOLOv3 and YOLOv3-tiny, using the methodology described in Section
2.2. The dimensions of the anchor boxes are presented in Table 4-9. Figure 4.49 provides the
precision-recall curves of the three object detectors, for training and testing, respectively. The
(Everingham, Van Gool, Williams, Winn, & Zisserman, 2010). In short, a low false positive rate
corresponds to a high precision value, and a low false negative rate reflects a high recall value.
The overall performance of the algorithm is reflected by the area under the recall-precision curve,
where a large value of area indicates the detector has both high recall and precision. In other words,
if a detector has high precision but low recall, it can only detect objects in a few sample images,
although the localization accuracy is high once the object is recalled. A detector with high recall
but low precision can retrieve objects in many images, but the localization error is high inside the
images. In the end, the average precision (AP) can be computed from the precision-recall plot, as
the weighted average of precision at each recall value. The AP values for both training and testing
of the three object detectors are presented in Figure 4.49. Overall, all the three detectors have
achieved high AP during training. However, during testing, YOLOv3 and YOLOv3-tiny show
similar performance, while RCNN achieves slightly lower AP. This is because only one class (i.e.,
bolt) needs to be detected, which does not require over-complex CNNs such as RCNN and the
original YOLOv3 to achieve the desired accuracy. Figure 4.50 provides sample images processed
by the YOLOv3-tiny where all the bolts are detected by bounding boxes with a high confidence
score. More sample testing results are presented in Appendix C. In addition, the speed of the three
object detectors was also examined by applying the RCNN, YOLOv3 and YOLOv3-tiny,
respectively, through the full testing set 5 times. The average speed is calculated as the number of
151
frames or images processed per second (FPS). Table 4-10 shows a speed comparison of RCNN,
YOLOv3 and YOLOv3-tiny, using the Alienware Aurora R8 computer and software platform
presented in Section 2.2.2. As shown in Table 4-10, the RCNN method achieves about 0.05 FPS,
while YOLOv3 runs at 2.23 FPS. The speed of RCNN is less than the required minimum speed
(1.5 FPS, assuming 90 degrees per second of bolt rotational speed), which indicates RCNN is too
slow to be adopted in the proposed RTDT method. On the other hand, the speed of YOLOv3 is
very close to the minimum required speed. In the situation of slightly lower hardware specs,
YOLOv3 cannot achieve real-time speed. The proposed YOLOv3-tiny achieves about 25 FPS,
which is about 500 times faster than the RCNN, and about 10 times faster than YOLOv3. This
demonstrates the speed and accuracy of YOLOv3-tiny for localizing the steel bolts in real time.
Index 1 2 3 4 5 6
Width [pixels] 53 47 36 37 32 29
Height [pixels] 42 37 38 35 33 30
152
(a) Precision-recall curve of training
Figure 4.49 Precision-recall curve for (a) training, and (b) testing
153
Figure 4.50 Sample results of YOLOv3-tiny detection of steel bolts
As one of the major goals of this study is to deal with illumination changes, which hampers
section intends to showcase the effectiveness of the proposed RTDT-Bolt method against
illumination changes. The RTDT-Bolt method is implemented with the parameters of the base
model. The minimum threshold for the number of FPs is set to 7 in these experiments. If the
number of FPs during tracking is below this threshold, the YOLOv3-tiny re-applies detection, and
new FPs will be generated for continuous tracking. Besides, to demonstrate the advantages of the
integrated method over the traditional optical flow algorithms, the KLT tracking algorithm without
154
the YOLOv3-tiny was also investigated in parallel. Figure 4.51 illustrates the light-changing
scenarios conducted in the laboratory, where the light was switched on and off about every 10
seconds. Figure 4.52 shows a close-up video frame montage of the bolt being processed by the
RTDT-Bolt method. It can be observed the extra light reflection appeared on the surface of the
bolt, when the light was switched on, while disappearing if the light was switched off. The
experiment results indicate the KLT tracking algorithm without the YOLOv3-tiny instantly lost
tracking when the light was switched on for the first time (i.e., at around the 350th frame), and all
the remaining frames of the bolt cannot be tracked. In comparison, the RTDT-Bolt method can
redetect and continue to track the bolt, when the previous tracking got lost due to light change, as
shown in Figure 4.52. The rotation transformation of the points being tracked is imposed on the
ROI bounding box for better visualization. The total rotation angle estimated by the RTDT-Bolt
method is 12.68 rads in the anti-clockwise direction, which corresponds to an accuracy of 95.1%
(with a ground-truth value of 13.25 rads). In addition, the processing speed has also been
examined. The proposed method achieves about 17 frames per second on the original 4K video
frames, and about 325 frames per second on the cropped video frames (cropped from the 4K video
frame using the ROI detected by the YOLOv3-tiny). This demonstrates both the accuracy and real-
time speed of the proposed method in monitoring the bolt rotation angle.
155
Figure 4.51 Montage of videos processed by the RTDT-bolt method: original video frame with the
illustration of the changing light conditions, and a highlight of the bolt under investigation by the rectangular
box; closed-up video frame, with the illustration of detection, tracking, and re-detection. (Note: the frame
index is shown at the top-left corner of each thumbnail image. Frame rate: 30 frames per second)
156
Figure 4.52 Montage of videos processed by the RTDT-bolt method: closed-up video frame, with the
illustration of detection, tracking, and re-detection. (Note: the frame index is shown at the top-left corner of
157
4.4.5.3 Parameter studies
Parameter studies are conducted on the proposed RTDT-Bolt method to assess the sensitivity
of the rotation estimation to the selected parameters shown in Table 4-11. There is a total of 4 x 4
x 4 x 4 = 256 runs. In order to maintain the unique controlled parameter in each set of runs, the
light condition is maintained consistently in all the parameter studies. The comparison between
the estimated rotation of the bolts with the ground truth values in both short and long video
Table 4-11 Comparison of the estimated rotation and ground truth rotation for the six bolts in the short
158
Figure 4.53 Illustration of bolt index
The results were obtained by the base model whose parameters are shown in Table 4-8. The
results indicate the proposed model can accurately quantify the rotation of the loosened bolts and
non-loosened bolts in both short and long video scenarios. Figure 4.54 shows a detailed
comparison between the time-history rotation of the bolts with the ground truth for bolt 6 in the
short video scenario. The corresponding montage of sample video frames is shown in Figure 4.55,
where two reference lines are used to better visualize the tracking process of the bolt rotation. The
initial location of the bolt is indicated by the line with smaller line width, while the current
rotational position of the bolt is represented by the line with a larger line width. In addition, the
rotation angle corresponding to each thumbnail image is indicated at the bottom right of the image,
while the frame number is shown at the top left of the image.
159
Figure 4.54 Time-history rotation estimation of the bolt in the short video with the base model
Figure 4.55 Montage of sample close-up frames of the short video processed by the base model in
parameter studies (Note: each thumbnail image with the labeled frame index corresponds to the associated
Figure 4.56 depicts the effects of these parameters on the rotation estimation results. As shown
in Figure 4.56 (a1), as the number of pyramids increases, the accuracy increases in general, which
is particularly well reflected in the situation of medium accuracy due to the large block size (i.e.,
160
31) being used. However, the number of pyramid levels does not have a substantial effect on the
rotation estimation, when the accuracy is relatively high already, as depicted in Figure 4.56 (a2,
a3). On the other hand, the effect of the maximum bidirectional error on the estimation accuracy
is quite limited in general, as shown in Figure 4.56 (a2, b, c2, and d2). Figure 4.56 (c) indicates
that the search block size has a great impact on the accuracy of rotation estimation. In general, as
the block size increases, the performance of the proposed method decreases. This can be attributed
to the fact that the KLT tracking algorithm is more likely to mismatch the points being tracked at
the current frame, with the incorrect points in the next frame, when the search area becomes larger
where more adjacent (but irrelevant) points are included. Figure 4.56 (c1) also shows that when
the number of pyramid levels is low, as the search block size increases, the accuracy degrades
faster than the case where the number of pyramid levels is set higher. However, the search block
size will have less impact on the performance, if the maximum bidirectional error, the maximum
number of iterations, and the number of pyramid levels are set appropriately, as shown in Figure
4.56 (c2, c3). In these cases, the accuracy converges to about 90%, with the increase of the block
size, maximum bidirectional error, and the maximum number of iterations. Lastly, the maximum
number of iterations has minor effects on the results, as shown in Figure 4.56 (d). This implies that
the number of iterations can be set reasonably low to achieve faster speed, if there exist hardware
limitations.
161
Table 4-12 Complete results of parameter studies, expressed by the accuracy of rotation estimation
(expressed in percentage)
NI = 10 NI = 20 NI = 30 NI = 40
BE = 2 BE = 6 BE = 10 BE = 15 BE = 2 BE = 6 BE = 10 BE = 15 BE = 2 BE = 6 BE = 10 BE = 15 BE = 2 BE = 6 BE = 10 BE = 15
NP =1 99.81% 97.81% 97.83% 97.19% 98.34% 98.31% 98.83% 98.78% 98.63% 98.95% 98.89% 99.06% 98.56% 98.92% 98.83% 98.74%
NP =2 99.40% 96.58% 99.12% 99.16% 99.07% 99.19% 99.34% 99.07% 99.18% 98.86% 98.85% 98.83% 98.66% 98.86% 98.86% 98.42%
BS = 5
NP =3 99.85% 98.59% 97.01% 97.24% 99.38% 99.45% 99.70% 99.61% 99.74% 99.66% 99.51% 99.82% 99.50% 99.52% 99.39% 99.44%
NP =4 99.49% 97.79% 97.78% 97.68% 99.85% 99.43% 99.58% 99.42% 99.86% 99.32% 99.54% 99.64% 99.62% 99.41% 99.48% 99.59%
NP =1 92.11% 91.23% 91.48% 91.02% 93.03% 92.44% 91.51% 92.15% 93.14% 91.99% 92.25% 92.68% 92.86% 92.51% 92.64% 91.90%
NP =2 92.20% 91.05% 91.08% 90.60% 92.10% 93.60% 91.95% 92.12% 92.09% 92.30% 92.39% 92.43% 92.08% 92.44% 92.32% 92.29%
BS = 11
NP =3 91.81% 91.13% 91.39% 92.89% 93.38% 92.12% 91.80% 92.44% 92.01% 91.97% 92.41% 92.38% 92.42% 92.31% 93.03% 92.40%
NP =4 92.14% 91.27% 91.30% 90.86% 92.26% 91.81% 92.16% 91.88% 92.39% 91.95% 92.39% 92.30% 92.10% 92.12% 91.88% 92.42%
NP =1 88.43% 87.30% 88.43% 87.23% 92.92% 87.46% 87.38% 87.24% 88.34% 87.23% 87.30% 92.83% 92.95% 89.43% 88.43% 87.16%
NP =2 87.78% 88.40% 93.61% 87.45% 92.79% 87.82% 87.27% 92.85% 88.22% 88.35% 90.49% 88.01% 88.56% 87.32% 88.34% 87.12%
BS = 21
NP =3 88.43% 87.26% 88.22% 93.13% 88.34% 88.56% 87.18% 88.54% 88.31% 92.79% 88.59% 89.98% 88.39% 92.79% 87.22% 93.00%
NP =4 88.22% 92.97% 87.21% 88.33% 87.65% 88.31% 87.34% 92.84% 84.98% 88.45% 92.91% 88.28% 88.56% 88.40% 92.90% 88.24%
NP =1 90.58% 90.58% 79.84% 85.54% 84.67% 80.27% 79.63% 79.91% 90.58% 79.57% 90.58% 81.62% 85.07% 90.58% 90.58% 46.90%
NP =2 90.57% 90.58% 90.63% 90.92% 90.87% 90.58% 90.87% 90.58% 90.58% 71.85% 90.87% 90.58% 77.80% 73.60% 85.07% 90.70%
BS = 31
NP =3 85.10% 90.58% 90.66% 90.58% 84.84% 90.58% 90.58% 90.58% 90.58% 90.58% 85.04% 90.58% 90.58% 90.63% 73.21% 85.09%
NP =4 79.77% 90.57% 90.57% 79.55% 90.57% 90.57% 90.57% 90.57% 81.79% 90.57% 90.63% 90.86% 90.56% 90.57% 85.33% 90.57%
In addition, a complete set of results (i.e., 256 runs) obtained from the parameter studies is
summarized in Table 4-12, where the highest and lowest accuracies are highlighted. In these
experiments, the run with the highest accuracy (i.e., 99.86%) is achieved, when the number of
pyramid levels is set at the highest, the maximum bidirectional error and search block size are set
at the lowest. This is explicable because this set of parameters imposes the most stringent
requirements for the algorithm to limit the tracking error. On the other hand, the run with the lowest
accuracy (i.e., 46.9%) is observed, where the number of pyramid levels, maximum bidirectional
error, search block size, and the maximum number of iterations are set to 1, 20, 31, and 40,
respectively. This is close to the worst-case scenario among all the experiments, which is also in
In short, a reasonably high value (i.e., greater than 2) for the number of pyramid levels, and a
sufficiently low value (i.e., less than 21) for the search block size must be specified to achieve the
desired accuracy (i.e., over 90%). Meanwhile, the maximum number of iterations does not have
162
noticeable effects on the accuracy, and therefore, it can be set at a reasonably low value (e.g., 10)
to reduce computational cost. Overall, the parameter studies confirm the effectiveness of the
proposed method, and provide recommendations to achieve high accuracy and speed in both the
short video and long video scenarios. It should be noted that, in the situation of long-term
monitoring which can last for months or years, the detection of bolts by the YOLOv3-tiny can be
set to apply at an appropriate interval (e.g., every 5 min) to reset the tracking. This will further
alleviate the potential accumulated errors due to noise during long-time tracking.
163
(a1) (a2) (a3)
of (a1-a3) number of pyramid levels, (b1-b3) maximum bidirectional error, (c1-c3) search block size, and (d1-
164
4.4.6 Experiments and results – method 2
This section presents the results by method 2, using the proposed 3D vision-based pipeline to
quantify the bolt loosening length from a common experimental setup (shown in Section 4.4.4). In
this experiment, a friction damper developed in the structural laboratory at The University of
British Columbia is selected to examine the proposed method. First, photos of the friction device
drone can be deployed for large-scale civil structures to capture images more efficiently. Next, the
3D reconstruction methods were applied to obtain a point cloud of the friction device. This point
cloud generated typically contains irrelevant surrounding objects. Therefore, the region of interest
(where bolts are located) should be identified first. This can be done manually, or by using 3D
object detection methods (pretrained to localize the bolted component) as presented earlier in this
dissertation. This may also be done using point cloud classification algorithms (e.g., PointNet) to
classify the region of interest into a specific class. This requires a robust pretrained deep learning
algorithm and a serious amount of time in labelling. Besides, this typically does not provide highly
accurate localization, because some points of the irrelevant objects, or points in the background,
may be wrongly classified into the desired class, which can lead to the follow-up bolt loosening
Given the point cloud processing demand and the available computational power, in this
section, the region of interest is manually specified to be the central area of the structural
component which includes the 6 central bolts only (Figure 4.57), while other bolts are neglected
165
in the quantification procedures. It should be noted that the concept of the loosening quantification
Next, as bolts are typically used to connect plates, a plane segmentation algorithm was applied
to identify the major plane of interest where bolts are placed. Then, the proposed YOLO-TDCH
method was applied. At each time of plane segmentation, the point cloud was rendered in that
plane to form a 2D image, which was processed by the pretrained YOLOv3-tiny to check the
existence of the bolts. If the bolts are not found by the YOLOv3-tiny, the plane will be discarded.
If the bolts are found by the YOLOv3-tiny, the segmented plane will be recorded. In this
166
experiment, the first principal plane identified by the plane segmentation algorithm is shown in
Figure 4.58. When the point cloud was rendered onto the plane, all the bolts were successfully
Figure 4.58 Plane segmentation of the 3D point cloud of the friction damper (units are in mm)
167
Figure 4.59 Bolt localizations by the pretrained YOLOv3-tiny
Table 4-13 presents a summary of the bolt loosening quantification results where the index
for each bolt is shown in Figure 4.53. Results show that the quantification error for all the bolts is
less than 0.5 mm compared to the ground truth values measured by the digital Caliber presented
in Section 4.3. This shows the proposed full view-based bolt loosening quantification method can
achieve high accuracy. Besides, it does not require human intervention during image and point
cloud data processing. Although the manual selection of the region of interest is used in this
168
Table 4-13 Bolt loosening quantification results
4.4.7 Conclusions
Structural bolts are commonly used to connect structural components. The forces in the
structural bolts are highly dependent on bolt rotation. This research proposed two different
detect and track the rotation of bolts in the front view. The efficient YOLOv3-tiny detector has
been established and trained to precisely localize the bolts in real time. Then, the YOLOv3-tiny is
combined with the KLT tracking algorithm to improve the tracking performance. The effectiveness
of the proposed method, in dealing with tracking loss problems due to light changes, has been
parameter studies have been conducted to examine the capability and potential limitations of the
proposed method. The results indicate the proposed RTDT-Bolt method can reliably quantify the
169
bolt rotation with over 90% accuracy using the recommended range for the parameters. It is also
found that the number of pyramid levels and the search block size have great impacts on the
rotation estimation, while the maximum number of iterations and the maximum bidirectional error
do not have substantial effects on the results. The proposed RTDT method has multiple advantages
including a) achieving real-time performance in both detection and tracking; b) providing solutions
to measure the rotation of bolts up to any range; c) enhancing the robustness of traditional KLT
The second method is built upon 3D vision-based evaluation pipeline. This method aims to
computer vision. The proposed method consists of vision-based 3D scene reconstruction and bolt
loosening length quantification using a newly proposed YOLO-TDCH method. At the time of
writing, to the best of the author’s knowledge, the proposed methodology is the first-of-its-kind
using 3D vision and advanced point cloud processing algorithms, for bolt loosening quantification
in the structural engineering field. Experimental results indicate that the average quantification
accuracy is as low as ~1mm. Besides, the proposed methodology is fully automated which does
not rely on a specific camera viewing angle to provide quantification results, as opposed to the
existing 2D vision-based methods described in Section 4.4.1. Moreover, it does not require human
intervention during the image and point cloud data processing to determine the bolt loosening
length.
Overall, the two proposed methods can provide economical and accurate solutions to bolt
solution for bolts using preinstalled camera facing towards the front view of bolts, and a post-
170
disaster inspection solution where photos or videos of the bolted devices can be taken at multiple
locations and angles by human inspectors, or more efficiently by drones or ground robots. Both
two methods have been demonstrated to address the limitations of the existing methods in their
171
Chapter 5: Combined vision-based SHM and loss estimation framework
5.1 Overview
Over the last decade, vision-based methods have received great success in structural visual
damage detection. Vision-based methods provide structural damages as output, such as concrete
cracks, concrete spalling and steel corrosion, of specific structural components. While damage
evaluation provides important and necessary information to engineers and researchers to assess
the residual performance and safety of the structural components in many situations, local damage
information of specific structural components may not be useful enough compared to global
damage information at the system level for owners or decision makers. Besides, damage
information can be difficult to understand for owners or decision-makers who are likely to lack
engineering knowledge, but instead pay more attention to the repair or replacement cost of the
structures if they are damaged or completely collapse. In such situations, it is necessary to convert
such damage information to other metrics (e.g., repair cost) that are easier to be interpreted by
Within this context, this chapter presents a combined vision-based structural damage detection
and loss quantification framework. The loss quantification procedures are adopted from part of the
existing PBEE loss evaluation procedures. To implement the framework, the local damage
information will be integrated into global damage information and combined with the PBEE
methodology to estimate the loss information, which can be more easily conveyed to stakeholders
to aid their decision-making. This chapter first provides a detailed description of the combined
vision-based SHM and loss estimation framework proposed in this dissertation, followed by a case
172
study showing its implementation on a reinforced concrete building for illustration purposes. Part
of this chapter is adopted from the author’s publication, Pan & Yang (2020).
5.2 Methodology
5.2.1 Overview
The proposed framework is described in this section. As depicted in Figure 5.1, first, images
or video frames should be acquired from a post-disaster site inspection, which can be performed
manually, or captured by preinstalled cameras, or more flexibly and efficiently by UAVs and
UGVs. In an ideal situation, images of the structural systems and components should be taken
from multiple views to facilitate the comprehensive evaluation. Second, the collected image data
are processed by vision-based methods, such as those presented in Chapter 4, to determine the
damage status at both the system level and component level. If a structural system is identified as
collapse, then the total loss is determined as replacement cost. If a structural system is estimated
as non-collapse, component-level damage evaluation methods are applied to determine the damage
states of all the structural components and non-structural components of interest. Once the
component damage states are identified, the corresponding repair costs for the components were
identified using the fragility database and the corresponding PBEE loss quantification procedures
(ATC-58, 2007). Finally, the total repair cost of the building is determined by adding the total
repair quantities from all structural and non-structural components taking into account their
suitable unit cost distribution. The process will be repeated using Monte Carlo simulation to
consider uncertainties arising from different involved stages within the PBEE framework. In the
173
end, a cumulative loss distribution curve can be obtained as an additional financial metric to aid
decision-making.
174
5.2.2 Vision-based damage evaluation
methods. Depending on the structural system types, component types and damage types, different
vision-based damage evaluation methods should be considered. These include but are not limited
to the vision-based damage detection methods presented in this dissertation and other existing
methods reviewed in Section 2. It should be noted that new vision-based methods are being
developed and remain an active area of research in the field, which can be later integrated into the
proposed framework in the future. This will allow the framework to be implemented on various
types of civil structures at both the component level and the system level.
The loss estimation methodology used in this dissertation is adopted from the PBEE
methodology. The theory of the PBEE methodology was mainly developed by researchers at the
Pacific Engineering Research Center (PEER) between 1997 and 2010. As early as 2004, the
concept of the methodology was first presented by Moehle and Deierlein (2004) which involves
measure), structural response (engineering demand parameter), fragility data (damage measure),
and loss data (decision variable). Further, the framework was further developed by Yang et al.
(2009) and implemented for the seismic evaluation of a building. Since then, the implementation
of the framework has been applied in seismic assessments of structures under various scenarios.
175
structures. It has been shown that the framework can be effectively adopted into the current design,
The equation consists of four components. The first part, 𝜆(𝐼𝑀), is the probabilistic seismic
hazard analysis (PSHA) which is associated with the intensity measure such as earthquake
magnitude, source-site distance. The second part, 𝐺〈𝐸𝐷𝑃|𝐼𝑀〉, represents the structural analysis
which determines structural responses (e.g., floor displacements, accelerations) using the
representative ground motions with proper scaling factors obtained from PSHA. The third part,
𝐺〈𝐷𝑀|𝐸𝐷𝑃〉, is the estimation of damage extent based on the structural responses. The fourth part,
𝐺〈𝐷𝑉|𝐷𝑀〉, is the estimation of loss information for decision making based on the damage extent.
The output of the equation, 𝜆(𝑑𝑣 < 𝐷𝑉), is the decision variable which is selected by owners or
decision makers depending on different application scenario. For example, if the probable repair
cost under a specific earthquake hazard level is of interest to the owners, 𝜆(𝑑𝑣 < 𝐷𝑉) can be
In this dissertation, the major potion is dedicated to the development and application of vision-
based damage detection methods. The output of the vision-based methodology as presented in
Chapter 3 provide damage information. Therefore, the loss information can be quantified based on
176
As shown in Equation (5.2), the seismic hazard information and the structural analysis need
not be considered. The damage output by vision-based damage detection methods can be directly
used to evaluate the loss distribution curve, which can be represented as the probability of not
exceeding a certain repair cost or repair time. When implementing Equation (5.2), the loss
information of each component is first evaluated using the fragility database developed as a
companion product of the PBEE framework. The loss information for the local components will
then be integrated to obtain the total loss distribution of the entire structure at the system level.
The process can be done efficiently using Monte Carlo simulations, as described in Yang et al.
(2009).
This section presents a brief description of the fragility database. For better illustration, a
sample component is taken from the fragility database (ATC 58, 2007) as shown in Table 5-1 and
Table 5-2. In this example, the cost information related to each damage state (DS) is presented.
For each damage state, the minimum cost refers to the unit cost to conduct a repair action,
considering all possible economies of scale (which corresponds to maximum quantity) and
operation efficiencies. On the contrary, the maximum cost is the unit cost with no benefits from
scale and operation efficiencies, which corresponds to the minimum quantity. The uncertainties of
the unit cost are typically assumed to follow a normal or lognormal distribution. Figure 5.2
177
Table 5-1 Fragility data for a sample concrete column (component ID: B1041.031a).
Table 5-2 Description of the damage states for the sample component
DS index Description
0 No damage
1 Light damage: visible narrow cracks and/or very limited
spalling of concrete
2 Moderate damage: cracks, large area of spalling concrete
cover without exposure of steel bars
3 Severe Damage: crushing of core concrete, and or
exposed reinforcement buckling or fracture
178
5.3 A case study on a post-disaster damage inspection survey
For the purpose of illustration, a prototype RC building (Sim et al., 2015), as shown in Figure
5.3, is selected to evaluate its damage and repair cost after an earthquake. The prototype building
is a mid-rise RC building.
As the case study is conducted on a concrete building, the CNN-based methods described in
Section 4.2 are applied to determine the damage states of the RC structural components from the
available images collected. The repair cost is then estimated based on the damage states. In
summary, the dual CNNs algorithms determine that 17 of such columns are in DS1, 26 in DS2,
and 14 in DS3. Using the unit cost information provided in Table 5-1, the total repair cost of these
RC components is calculated. It should be noted that as the survey contains primarily the RC
columns, only the RC columns are considered in this case study. However, a similar concept can
179
Figure 5.3 Case study of an RC building with sample results (a) system-level identification: non-collapse
and (b) component-level damage evaluation: severe damage with detection of steel exposure (left), and
The process is repeated 10000 times with Monte Carlo procedures to simulate the dispersion
of the repair costs. Finally, the results are presented in a cumulative distribution function as shown
in Figure 5.4. The cost simulation results can provide critical risk data for decision-making and
resource allocation during post-disaster reconstruction. For example, the decision maker can use
the 50% probability of non-exceedance to identify the median repair cost for the building. In the
example presented in Figure 5.4, the median repair cost is $2.69 million USD for the prototype
building.
180
Figure 5.4 Repair cost distribution corresponding to the hypothetical case
This chapter presents the theory and implementation of the proposed framework which
combines vision-based structural damage detection and PBEE-based loss estimation to facilitate
the loss evaluation. The aim of this chapter is to provide an additional financial metric which can
be more easily conveyed to decision-makers and stakeholders who may lack engineering
knowledge. Overall, the results of the case study indicate the rapid loss estimation of buildings can
However, the case study presented is certainly not comprehensive. Several limitations are
summarized below:
• Due to COVID-19 impacts and limited access to a post-disaster site, the author of the
dissertation did not get a chance to conduct an in-person post-disaster site inspection
survey. Instead, the case study presented in this chapter is based on an earlier post-disaster
site inspection conducted by Sim et al. (2015). The survey was conducted for an RC
181
building which primarily contains damaged RC columns based on the images collected.
Therefore, in this study, the total loss information of the building is determined based on
the damage information of the RC columns captured by the images collected. Therefore,
the component type variability is rather limited in the case study. Besides, the repair cost
estimation through the proposed framework cannot be compared with the real financial
loss which is unavailable at the time of writing. Nevertheless, the author deems the concept
of the proposed framework can be generalized to quantify the total loss of buildings with
the presence of more component types, when more data are available.
• The post-disaster site inspection survey does not provide detailed information about the
non-structural components. Therefore, the case study presented only considers structural
on each floor of the structural systems. This can be achieved by traditional contact-type
discussed in Chapter 6 as part of the ongoing and future work. The measured floor response
can be used to estimate the damage state and the associated repair cost of the non-structural
components. Further, the total loss of the entire structure can be calculated as the sum of
182
• In the case study presented, due to the limited available data, the unit cost functions of the
RC component are directly taken from the PBEE fragility database without calibration. In
contractors, and cost inflation particularly right after a major disaster that may cause a wide
• To limit the scope, implementation of the loss estimation framework is only limited to the
repair cost for the prototype building. The framework should be further extended to
While the case study presented in this chapter is relatively simple, the concept of the proposed
vision-based damage detection and loss estimation framework can be generalized to different types
of structural systems which are built with different structural components. It should be noted that,
although the proposed framework is general, when dealing with different types of structural types
(e.g., steel, masonry, timber, etc.) and structural components, different damage evaluation
methods, such as the methods presented in Chapter 4, existing damage evaluation methods
include more damage types to facilitate multi-damage type evaluation of more complex structural
critical structural response quantities such as floor displacements and accelerations to facilitate a
more comprehensive damage and loss evaluation, particularly for non-structural components.
183
Chapter 6: Conclusions
This chapter first presents a summary of the dissertation, followed by the main research
findings and major contributions of the dissertations to the field. Further, it discusses the
limitations of the completed research, the active ongoing research, as well as recommendations on
6.1 Summaries
This dissertation first provides an extensive literature review on SHM methods including
methods. Limitations of the methods falling in each of these categories are discussed, such as high
Subsequently, more attention has been given to the review and discussion of the contributions and
limitations of existing 2D vision-based methods. Despite the promising results achieved by the 2D
vision methods in recent years, it is noted that there exist several main issues that are hard or even
impossible to be addressed by many of these 2D vision methods. These include but are not limited
to their unstable robustness against background noise, insufficient algorithm speed for real-world
based structural damage assessment and loss estimation framework, which aims to address the
limitations of existing 2D vision-based methods, and provide more rapid and comprehensive
evaluation solutions, thus offering a more effective complementary assessment tool in addition to
184
6.2 Main contributions
methods, which have been validated on three prevalent types of structural components including
RC structures, steel structures, and structural bolted connections that are widely used in the field.
At the time of writing, there exist almost no attempts at 3D vision methods in structural damage
evaluation. Their developments and applications are currently in the infancy stage. Below
1) For the first time, this dissertation proposes a structural damage and performance
detection methods and PBEE loss quantification procedures. The outcomes of the
additional critical information such as repair cost or repair time for owners or decision
2) Within the framework, the research first proposed enhanced 2D vision-based methods
from several aspects as follows. a) It has improved the accuracy of the structural
component damage classification for RC structures, using the dual CNN scheme (by
combining the classification and object detection CNNs). Such a concept can be
environmental effects such as background noise and illumination changes, using the
185
to many existing methods for damage classification and localization, it has optimized the
speed of the local damage evaluation algorithms towards real-time performance, which
paves a way for rapid real-world applications. d) It has achieved very high accuracy (>
95%), and eliminated the hard limitations of many existing vision-based studies in bolt
loosening quantification (i.e., the 60-degree constraint to estimate hexagon bolt loosening
angle).
3) Within the framework, the research proposed novel 3D vision-based methods, leveraging
the recent advancements of deep learning and computer vision. This includes the adoption
These 3D vision methods address two main limitations of the 2D vision-based methods
which are widely developed and applied by the existing published literature. a) The
assessment results by 2D vision-based methods are sensitive to the camera locations and
poses; b) The 2D vision-based methods can only assess the in-plane damage features,
while incapable of analyzing out-of-plane damage patterns. The research expands the
4) The proposed 3D vision-based methods are much more economical compared to TLS-
186
compared to contact-type sensors which typically require a relatively complicated setup,
the proposed methods provide a more economical and convenient instrumentation and
5) The concept of the proposed framework, and the 3D vision-based damage evaluation
methods can be generalized to other types of structures such as masonry and timber
structures.
6) As will be presented in detail in Section 6.3, one additional contribution of this dissertation
is to bridge the gap between the vibration-based SHM and vision-based SHM. As an
ongoing and future research, Section 6.3 has discussed an economical and efficient vision-
based method to measure the vibration response of structures. The outcomes can be
analyzed using vibration-based theories to evaluate the health condition of the structures.
Besides, the outcomes will allow the incorporation of non-structural components into the
The research presented in this dissertation has certain limitations aforementioned. Future
research will be focused on addressing these limitations. In addition, with a brief review of recent
literatures between 2020-2022 in the field of vision-based structural damage detection, several
space,
187
• A transition from manual data collection to autonomous data collection through the use of
advanced robotic systems (e.g., drones, ground robots, etc.), with the development of
Within this context, the below subsections summarize the ongoing and future works to be
continued.
In this dissertation, the proposed 3D vision methods have been validated on structural
components of reinforced concrete columns, single-panel steel corrugated wall structures, and
structural bolted connections, respectively. It is envisioned that new 3D vision algorithms will be
developed and examined on other structural component types such as RC walls, RC slabs, steel
beams or columns of different shapes, and more complicated structural connections. Besides, new
3D vision-based methods should be expanded to other structural types such as masonry and timber
structural components.
At the time of writing, extensive laboratory experimental tests are being conducted at the
structural laboratory at UBC. These include monotonic and cyclic pushover tests on full-scale
three-panel and six-panel steel corrugated plate walls (with a more realistic real-world construction
setup that consists of the roof panel and footing connections), reinforced masonry walls, and
innovative timber connections. Currently, new 3D vision methods are being developed and
188
investigated to evaluate the damages of these structures. The damage data collected during the
experimental tests will be utilized to train and validate the new damage detection methods.
In this dissertation, the vision methods are developed to evaluate the damages of structural
components, while non-structural components (e.g., furniture, equipment) are neglected. Although
the evaluation of the damage status of non-structural components is difficult and imposes
uncertainties, previous studies by FEMA P-58 suggested their damage states can be related to floor
evaluation, at the time of writing, one of the active ongoing works is to develop accurate and
are widely used in structural engineering to provide structural response parameters such as
displacements, accelerations and strain (Mukhopadhyay, 2011; Kralovec & Schagerl, 2020).
Traditional contact-type sensors such as linear potentiometers are widely used in laboratories to
measure structural displacements. However, the accuracy of the linear potentiometer is not very
high, which is not suitable for dynamic shake table testing. Besides, the linear potentiometer is
only able to measure displacement along a single axis and is easy to break when subjected to
accidental bending moment during the test. The high degree of robustness and accuracy of the
linear variable differential transducer (LVDT) results in better performance than other types of
displacement transducers. However, its performance can be affected by stray capacitance effects
189
and electromagnetic interference (Masi et al., 2011; Mandal et al., 2018). In addition, these contact-
type sensors need to be placed on tested structures, and consequently the properties and responses
of the structures may be affected, especially for small-scale structures. On the other hand,
noncontact-type sensors such as laser displacement transducers (Stanbridge & Ewins, 1999) are
easier to install and can provide relatively accurate displacements. However, the measurement
range of these transducers is relatively small. Radar interferometry is another type of noncontact
sensing technique, but the displacement measured can be easily affected by a systematic and
deterministic error which cannot be eliminated (Pieraccini, 2013). Global Positioning System
at a reasonable accuracy (Im, Hurlebaus, & Kang, 2013; Häberling et al., 2015). However, its
Hurlebaus, & Kang, 2013). In addition, non-contact type wireless sensors are usually expensive
and require specialized workers to install and operate, which hampers their applications in many
structural vibrations have received more attention. A vision-based sensor setup is typically built
by a camera with zoom capability, and image processing software and hardware platforms. Earlier,
the effectiveness of the different vision-based methods has been demonstrated in vibration
measurement of various structures, such as the digital image correlation (DIC) method (e.g.,
Dutton, Take, & Hoult, 2014; Ghorbani, Matta, & Sutton, 2015), template matching technique
(e.g., Fukuda, Feng, & Shinozuka, 2010; Feng, & Feng, 2016), phase-based motion magnification
methods (e.g., Chen et al., 2015; Yang et al., 2017). Several limitations can be observed. Although
190
the DIC algorithm can provide full-field measurement with high accuracy, it is sensitive to image
noise. On the other hand, implementation of DIC usually requires large physical targets (e.g.,
spray-painting of the speckle pattern, chessboard) covering a large region of the specimen to be
unreliable against light variation. Despite the phase-based motion magnification methods can
achieve high accuracy in extracting modal frequencies and damping ratios, the algorithms are only
suitable for linear structures. More recently, several researchers have examined optical flow-based
algorithms for structural motion tracking. There exist numerous computational models for
estimating the optical flow (Beauchemin & Barron, 1995). One of the intensity-based differential
methods, named Kanade–Lucas–Tomasi (KLT) (Tomasi & Kanade, 1991), is commonly used in
structural motion tracking for its high precision under a stable and well-controlled environment
(Yoon et al., 2016; Zhao et al., 2019; Kuddus et al., 2019). Even though these studies have well
demonstrated the effectiveness of the KLT method for structural vibration measurements, several
limitations have been observed: a) Most studies relied on the manual specification of the region of
interest (ROI) at the initial video frame where tracking is initiated on this ROI throughout the
video. The performance of the tracking algorithm employed in these studies is relatively sensitive
objects, which can happen frequently in real-world situations. Consequently, the experiments can
fail if the tracking gets lost in the middle of the experiments due to the change in these external
conditions. b) Some studies used fiducial markers (i.e., an object used as a point of reference in
imaging tasks) and the associated detection and tracking algorithm, but these algorithms are not
robust against background noise and relatively poor lighting conditions; c) Some other studies
191
implemented target-free tracking algorithm, which tracked objects based on their local distinct
texture features. Although these methods can greatly enhance the convenience of the vibration
tests, they require the existence of such distinct features which may not always be present. Besides,
these methods are unreliable if the local texture features get changed, such as the development of
cracks on the texture region when the structure undergoes relatively large deformations. Further,
these target-free tracking algorithms are also prone to fail when lighting conditions change, and
Recently, convolutional neural networks (CNNs) have been widely applied in many structural
engineering problems such as image classification, object detection and segmentation. Compared
to the classical computer vision algorithms, one advantage of these data-driven methods is their
robustness against background noise and changes in external environmental conditions. To address
the aforementioned limitations, at the time of writing, one of the active ongoing works is to
examine a combined CNN-based object detection and tracking pipeline for real-time structural
localize the region of interest to initiate the measurement, and a real-time tracker to continuously
track and record the motion of the structures. Preliminary results indicate the proposed method can
achieve very high accuracy at a low cost in laboratory conditions (e.g., shake table tests, static
pushover tests). Furthermore, it is expected the proposed method will be examined in the field in
the near future. Once the method is fully validated, it is envisioned that:
• The proposed method will yield higher robustness against environmental effects compared
192
• The proposed method will have a lower cost and easier installation process than most
• More importantly, the proposed method can bridge the gap between vision-based and
vibration-based methods in SHM. It can be easily integrated into the proposed 3D vision-
based structural damage evaluation and loss quantification framework proposed in this
structural vibrations and local structural component damages concurrently, without a need
enhancing the data collection autonomy and efficiency. In recent years, some researchers have
made limited attempts at robot-based structural damage inspection. Despite the achievements
made, most of these studies are limited to applications of a single UAV or UGV for inspection of
small-scale structures, or a small portion of full-scale structures. A single UAV or UGV is usually
incapable of examining full-scale civil structures which have large sizes and typically consist of
many structural components placed in different spatial locations. The UAVs or UGVs in most
existing studies are manually controlled while a few are semi-automated, not achieving fully
193
Within this context, one future research will be focused on the implementation of advanced
multi-robot systems into the proposed framework. The multi-robot systems to be developed will
be based on multiple UAVs or UGVs, or a combination of drone fleets and UGVs. The idea behind
this is to leverage the advantages of all the robots by integrating them, while alleviating the
disadvantages of each robot. The UAVs and UGVs will be equipped with data acquisition,
transmission and processing systems, and a variety of manipulators or sensors. It is expected that
the robots will be designed to collaborate with each other during the inspection. Novel machine
learning, reinforcement learning, and deep learning-based algorithms will be adopted from other
scientific and engineering fields, and further developed to control the robotic systems for
autonomous navigation, efficient data collection, and sensor data processing. Once validated, it
will significantly enhance the inspection capability of the existing robot-based inspection methods.
This is because civil infrastructure such as bridges or multi-storey buildings is typically very large
and complex, which is extremely difficult or impossible to be fully inspected by existing ground
robots or drones with the current navigation and data collection philosophy.
194
Bibliography
Abdeljaber, O., & Avci, O. (2016). Nonparametric structural damage detection algorithm for
ambient vibration response: utilizing artificial neural networks and self-organizing maps. Journal
Abdeljaber, O., Avci, O., Kiranyaz, S., Gabbouj, M., & Inman, D. J. (2017). Real-time
Agarwal, S., Furukawa, Y., Snavely, N., Simon, I., Curless, B., Seitz, S. M., & Szeliski, R.
Amezquita-Sanchez, J. P., & Adeli, H. (2016). Signal processing techniques for vibration-
sensors for monitoring the health condition of civil infrastructure. Scientia Iranica, 25(6), 2913-
2925.
An, Y., Chatzi, E., Sim, S. H., Laflamme, S., Blachowski, B., & Ou, J. (2019). Recent progress
and future trends on damage identification methods for bridge structures. Structural Control and
procedures for new and existing buildings. ATC, Red- wood City, CA, USA.
195
Azimi, M., & Pekcan, G. (2020). Structural health monitoring using extremely compressed
data through deep learning. Computer- Aided Civil and Infrastructure Engineering, 35(6), 597–
614.
Bahrebar, M., Kabir, M. Z., Zirakian, T., Hajsadeghi, M., & Lim, J. B. (2016). Structural
Bao, Y., & Li, H. (2021). Machine learning paradigm for structural health
Bayissa, W. L., Haritos, N., & Thelandersson, S. (2008). Vibration-based structural damage
identification using wavelet transform. Mechanical systems and signal processing, 22(5), 1194-
1215.
Beauchemin, S. S., & Barron, J. L. (1995). The computation of optical flow. ACM computing
Beckman, G. H., Polyzois, D., & Cha, Y. J. (2019). Deep learning-based automatic volumetric
Betti, M., Facchini, L., & Biagini, P. (2015). Damage detection on a three-storey steel frame
using artificial neural networks and genetic algorithms. Meccanica, 50(3), 875-886.
Bochkovskiy, A., Wang, C. Y. & Liao, H. Y. M. (2020). Yolov4: Optimal speed and accuracy
Brincker, R., & Ventura, C. (2015). Introduction to operational modal analysis. John Wiley &
Sons.
196
Carrilho, A. C., Galo, M., & Santos, R. C. (2018). STATISTICAL OUTLIER DETECTION
Cha, Y. J., Chen, J. G., & Büyüköztürk, O. (2017). Output-only computer vision based
damage detection using phase-based optical flow and unscented Kalman filters. Engineering
Cha, Y. J., Choi, W., & Büyüköztürk, O. (2017). Deep learning- based crack damage detection
Cha, Y. J., Choi, W., Suh, G., Mahmoudkhani, S., & Büyüköztürk, O. (2018). Autonomous
structural visual inspection using region- based deep learning for detecting multiple damage types.
Cha, Y. J., You, K., & Choi, W. (2016). Vision-based detection of loosened bolts using the
Hough transform and support vector machines. Automation in Construction, 71, 181–188.
Chaichulee, S., Villarroel, M., Jorge, J., Arteta, C., Green, G., McCormick, K., Zisserman, A.,
& Tarassenko, L. (2017). Multi-task convolutional neural network for patient detection and skin
segmentation in continuous non-contact vital sign monitoring. 2017 12th IEEE International
Conference on Automatic Face & Gesture Recognition (FG 2017), Washington, DC (pp. 266–
272).
damage detection of a damaged steel truss bridge. Engineering Structures, 122, 156-173.
197
Chen, J. G., Wadhwa, N., Cha, Y. J., Durand, F., Freeman, W. T., & Buyukozturk, O. (2015).
Modal identification of simple structures with high-speed video using motion magnification.
Cheng, C. S., Behzadan, A. H., & Noshadravan, A. (2021). Deep learning for post-hurricane
Chun, P., Izumi, S., and Yamane, T. (2021), Automatic detection method of cracks from
concrete surface imagery using two-step Light Gradient Boosting Machine, Computer-Aided Civil
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In
International Conference on Computer Vision & Pattern Recognition (CVPR’05), Vol. 1, 886–
Deng, J., Lu, Y., and Lee, V. C. (2020). “Concrete crack detection with handwriting script
Deng, K., Pan, P., Li, W., & Xue, Y. (2015). Development of a buckling restrained shear panel
Dou, C., Pi, Y. L., & Gao, W. (2018). Shear resistance and post-buckling behavior of
corrugated panels in steel plate shear walls. Thin-Walled Structures, 131, 816-826.
Dutton, M., Take, W. A., & Hoult, N. A. (2014). Curvature monitoring of beams using digital
198
Dwivedi, S. K., Vishwakarma, M., & Soni, A. (2018). Advances and researches on non
Edelsbrunner, H., Kirkpatrick, D., & Seidel, R. (1983). On the shape of a set of points in the
for Structural Damage Detection, Computer-Aided Civil and Infrastructure Engineering, 36:10,
1249-1269.
Etebarian, H., Yang, T. Y., & Tung, D. P. (2019). Seismic design and performance evaluation
Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisser- man, A. (2010). The
PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2),
303–338.
Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal
visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.
Farzampour, A., Mansouri, I., & Hu, J. W. (2018). Seismic behavior investigation of the
Feng, D., & Feng, M. Q. (2016). Vision‐based multipoint displacement measurement for
structural health monitoring. Structural Control and Health Monitoring, 23(5), 876-890.
Fischler, M. A., & Bolles, R. C. (1981). Random sample consensus: a paradigm for model
fitting with applications to image analysis and automated cartography. Communications of the
monitoring dynamic response of civil engineering structures. Structural Control and Health
Future Cities Canada. (2018). Building Our Urban Futures: Inside Canada’s Infrastructure and
Gao, Y., & Mosalam, K. M. (2018). Deep transfer learning for image- based structural damage
Gao, Y., Zhai, P., & Mosalam, K. M. (2021). Balanced semisupervised generative adversarial
network for damage assessment from low- data imbalanced-class regime. Computer-Aided Civil
German, S., Brilakis, I., & DesRoches, R. (2012). Rapid entropy-based detection and
properties measurement of concrete spalling with machine vision for post-earthquake safety
Ghiasi, R., Torkzadeh, P., & Noori, M. (2016). A machine-learning approach for structural
damage detection using least square support vector machine based on a new combinational kernel
Ghorbani, R., Matta, F., & Sutton, M. A. (2015). Full-field deformation measurement and
crack mapping on confined masonry walls using digital image correlation. Experimental
200
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate
object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer
Goulet, C. A., Haselton, C. B., Mitrani-Reiser, J., Beck, J. L., Deierlein, G. G., Porter, K. A.,
& Stewart, J. P. (2007). Evaluation of the seis- mic performance of a code-conforming reinforced-
concrete frame building—From seismic hazard to collapse safety and economic losses.
Gulgec, N.S., Takáč, M., and Pakzad, S.N. (2020), Structural Sensing with Deep Learning:
Strain Estimation from Acceleration Data for Fatigue Assessment, Computer-Aided Civil and
Häberling, S., Rothacher, M., Zhang, Y., Clinton, J. F., & Geiger, A. (2015). Assessment of
high-rate GPS using a single-axis shake table. Journal of Geodesy, 89(7), 697-709.
Hakim, S. J. S., Razak, H. A., & Ravanfar, S. A. (2015). Fault diagnosis on beam-like
structures from modal parameters using artificial neural networks. Measurement, 76, 45-61.
Ham, Y., Han, K. K., Lin, J. J., & Golparvar-Fard, M. (2016). Visual monitoring of civil
Hart, P. E., & Duda, R. O. (1972). Use of the Hough transformation to detect lines and curves
Hartley, R. I., & Sturm, P. (1997). Triangulation. Computer vision and image
201
Hartley, R. I., Gupta, R., & Chang, T. (1992, June). Stereo from uncalibrated cameras.
Hartley, R., & Zisserman, A. (2003). Multiple view geometry in computer vision. Cambridge
university press.
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask R-CNN. Proceedings of the
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778.
Hinton, G. E., Osindero, S., & Teh, Y. (2006). A fast learning algorithm for deep belief nets.
information. IEEE Transactions on pattern analysis and machine intelligence, 30(2), 328-341.
Hosang, J., Benenson, R., & Schiele, B. (2014). How good are detection proposals, really?
Hosang, J., Benenson, R., Dollár, P., & Schiele, B. (2015). What makes for effective detection
proposals? IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(4), 814–830.
Hoskere, V., Park, J. W., Yoon, H., & Spencer, B. F., Jr. (2019). Vision- based modal survey
of civil infrastructure using unmanned aerial vehicles. Journal of Structural Engineering, 145(7),
04019062.
Hu, F., Zhao, J., Huang, Y., & Li, H. (2019). Learning structural graph layouts and 3D shapes
202
Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected
convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, 4700–4708.
monitoring via PZT interface for prestressed tendon anchorage. Smart Materials and Structures,
26(12), 125004.
Huynh, T. C., & Kim, J. T. (2018). RBFN-based temperature compensation method for
Huynh, T. C., Park, J. H., Jung, H. J., & Kim, J. T. (2019). Quasi- autonomous bolt-loosening
detection method using vision-based deep learning and image processing. Automation in
Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K., Dally, W. J. & Keutzer, K. (2016).
SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size. arXiv
preprint arXiv:1602.07360.
Im, S. B., Hurlebaus, S., & Kang, Y. J. (2013). Summary review of GPS technology for
Jahanshahi, M. R., Kelly, J. S., Masri, S. F., & Sukhatme, G. S. (2009). A survey and
evaluation of promising approaches for automatic image- based defect detection of bridge
203
Jang, K., An, Y.-K., Kim, B., and Cho, S. (2021). “Automated crack evaluation of a high-rise
bridge pier using a ring-type climbing robot.” Computer-Aided Civil and Infrastructure
Jiang, S., and Zhang, J. (2020). “Real‐time crack assessment using deep neural networks
with wall ‐ climbing unmanned aerial system. ” Computer-Aided Civil and Infrastructure
Jiang; K., Han, Q., Du, X., and Ni, P. (2021), A decentralized unsupervised structural
condition diagnosis approach using deep auto-encoders, Computer-Aided Civil and Infrastructure
Kang, L., Wu, L., & Yang, Y. H. (2014). Robust multi-view l2 triangulation via optimal inlier
Kim, H., Yoon, J., Hong, J., & Sim, S. H. (2021). Automated Damage Localization and
Kim, M. K., Sohn, H., & Chang, C. C. (2015). Localization and quantification of concrete
spalling defects using terrestrial laser scanning. Journal of Computing in Civil Engineering, 29(6),
04014086.
Koch, C., Georgieva, K., Kasireddy, V., Akinci, B., & Fieguth, P. (2015). A review on
computer vision based defect detection and condition assessment of concrete and asphalt civil
204
Koch, C., Paal, S. G., Rashidi, A., Zhu, Z., König, M., & Brilakis, I. (2014). Achievements
Kohavi, R., & Provost, F. (1998). Confusion matrix. Machine Learning, 30(2–3), 271–274.
Kong, S.Y., Fan, J.S., Liu, Y.F., Wei, X.C., and Ma, X.W. (2021), Automated Crack
Kong, X., & Li, J. (2018). Image registration-based bolt loosening detection of steel joints.
Kong, X., & Li, J. (2018). Vision‐based fatigue crack detection of steel structures using video
Kopsaftopoulos, F. P., & Fassois, S. D. (2013). A functional model based statistical time series
method for vibration based damage detection, localization, and magnitude estimation. Mechanical
Kralovec, C., & Schagerl, M. (2020). Review of structural health monitoring methods
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep
1105.
205
Kuddus, M. A., Li, J., Hao, H., Li, C., & Bi, K. (2019). Target-free vision-based technique for
or operational conditions. In New trends in vibration based structural health monitoring (pp. 107-
Kumar, S., & Mahto, D. G. (2013). Recent trends in industrial and other engineering
Research, 4(9).
Lang, A. H., Vora, S., Caesar, H., Zhou, L., Yang, J., & Beijbom, O. (2019). PointPillars: Fast
encoders for object detection from point clouds. In Proceedings of the IEEE/CVF Conference on
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to
Li, H., Deng, X., & Dai, H. (2007). Structural damage detection using the combination method
of EMD and wavelet analysis. Mechanical Systems and Signal Processing, 21(1), 298-306.
Li, J., Deng, J., & Xie, W. (2015). Damage detection with streamlined structural health
Li, R., Yuan, Y., Zhang, W., & Yuan, Y. (2018). Unified vision-based methodology for
206
Liang, X. (2019). Image-based post-disaster inspection of reinforced concrete bridge systems
using deep learning with Bayesian optimization. Computer-Aided Civil and Infrastructure
Liu, D., Bai, R., Wang, R., Lei, Z., & Yan, C. (2019). Experimental study on compressive
buckling behavior of J-stiffened composite panels. Optics and Lasers in Engineering, 120, 31-39.
Liu, J., Yang, X., Lau, S., Wang, X., Luo, S., Lee, C.S., and Ding, L., (2020), Automated
pavement crack detection and segmentation based on two-step convolutional neural network,
Liu, Y.F., Nie, X., Fan, J.S., and Liu, X.G. (2020), “Image-based Crack Assessment of Bridge
Piers using Unmanned Aerial Vehicles and 3D Scene Reconstruction,” Computer-Aided Civil and
Lowe, D. G. (2004). Distinctive image features from scale-invariant key- points. International
Lu, X., Xu, Y., Tian, Y., Cetiner, B., & Taciroglu, E. (2021). A deep learning approach to
rapid regional post-event seismic damage assessment using time-frequency distributions of ground
Luo, C., Yu, L., Yan, J., Li, Z., Ren, P., Bai, X., Yang, E., and Liu, Y. (2021), Autonomous
detection of damage to multiple steel surfaces from 360° panoramas using deep neural networks,
Maeda, H., Kashiyama, T., Sekimoto, Y., Seto, T., Omata, H. (2021), Generative Adversarial
Networks for Road Damage Detection, Computer-Aided Civil and Infrastructure Engineering,
36:1, 47-60.
207
Maeda, M., Matsukawa, K., & Ito, Y. (2014, July). Revision of guideline for post-earthquake
Mandal, H., Bera, S. K., Saha, S., Sadhu, P. K., & Bera, S. C. (2018). Study of a modified
LVDT type displacement transducer with unlimited range. IEEE Sensors Journal, 18(23), 9501-
9514.
Martin, S., Stefan, M., & Karl, A. (2018). Complex-yolo: real-time 3d object detection on
Masi, A., Danisi, A., Losito, R., Martino, M., & Spiezia, G. (2011). Study of magnetic
Sensors, 2011.
https://www.mathworks.com/help/vision/ref/estimate geometrictransform.html
MATLAB. (2021). Version 9.10.0 (R2021a). The MathWorks Inc. Nixon, M., & Aguado, A.
(2019). Feature extraction and image processing for computer vision. Academic Press.
McCulloch, W. S., & Pitts, W. (1943). A logical calculus of ideas immanent in nervous
Miao, Z., Ji, X., Okazaki, T., Takahashi, N. (2021), Pixel-level multi-category detection of
208
Micheletti, N., Chandler, J. H., & Lane, S. N. (2015). Investigating the geomorphological
Mitrani-Resier, J., Wu, S., & Beck, J. L. (2016). Virtual Inspector and its application to
immediate pre-event and post-event earthquake loss and safety assessment of buildings. Natural
Mizoguchi, T., Koda, Y., Iwaki, I., Wakabayashi, H., Kobayashi, Y., Shirai, K., ... & Lee, H.
Moehle, J., & Deierlein, G. G. (2004, August). A framework methodology for performance-
based earthquake engineering. In 13th world conference on earthquake engineering (Vol. 679).
Vancouver: WCEE.
Muja, M., & Lowe, D. G. (2009). Fast approximate nearest neighbors with automatic
N. Wang, Q. Zhao, S. Li, X. Zhao, and P. Zhao, “Damage Classification for Masonry Historic
Structures Using Convolutional Neural Networks Based on Still Images,” Computer-Aided Civil
and Infrastructure Engineering, vol. 33, no. 12, pp. 1073–1089, 2018, doi: 10.1111/mice.12411.
N. Wang, X. Zhao, P. Zhao, Y. Zhang, Z. Zou, and J. Ou, “Automatic damage detection of
historic masonry buildings based on mobile deep learning,” Autom. Constr., vol. 103, pp. 53–66,
Ni, F., Zhang, J., and Noori, M.N. (2020), Deep Learning for Data Anomaly Detection and
Nister, D., & Stewenius, H. (2006, June). Scalable recognition with a vocabulary tree. In 2006
Pan, X., & Yang, T. Y. (2020). Postdisaster image-based damage detection and repair cost
estimation of reinforced concrete buildings using dual convolutional neural networks. Computer-
Pan, X., & Yang, T. Y. (2021). Image-based monitoring of bolt loosening through deep-
for steel plate structures using structure ‐ from ‐ motion, deep learning, and point ‐ cloud
https://doi.org/10.1111/mice.12906
Park, H. G., Kwack, J. H., Jeon, S. W., Kim, W. K., & Choi, I. R. (2007). Framed steel plate
wall behavior under cyclic lateral loading. Journal of structural engineering, 133(3), 378-388.
210
Park, H. S., Lee, H. M., Adeli, H., & Lee, I. (2007). A new approach for health monitoring of
structures: terrestrial laser scanning. Computer‐Aided Civil and Infrastructure Engineering, 22(1),
19-30.
Park, J. H, Kim, T., & Kim, J. (2015). Image-based bolt-loosening detection technique of bolt
Engineering 11th International Workshop on Advanced Smart Materials and Smart Structures
Park, J. H., Huynh, T. C., Choi, S. H., & Kim, J. T. (2015). Vision-based technique for bolt-
loosening detection in wind turbine tower. Wind and Structures, 21(6), 709–726.
Park, S. W., Park, H. S., Kim, J. H., & Adeli, H. (2015). 3D displacement measurement model
for health monitoring of structures using a motion capture system. Measurement, 59, 352-362.
Peeters, B., Maeck, J., & De Roeck, G. (2001). Vibration-based damage detection in civil
engineering: excitation sources and temperature effects. Smart materials and Structures, 10(3),
518.
Ph Papaelias, M., Roberts, C., & Davis, C. L. (2008). A review on non-destructive evaluation
Ramana, L., Choi, W., & Cha, Y. J. (2017). Automated vision-based loosened bolt detection
using the cascade detector. In C. Walber, E. Wee Sit, P. Walter, & S. Seidlitz (Eds.), Sensors and
detection using the Viola–Jones algorithm. Structural Health Monitoring, 18(2), 422–434.
Ramana, L., Choi, W., & Cha, Y. J. (2019). Fully automated vision-based loosened bolt
detection using the Viola–Jones algorithm. Structural Health Monitoring, 18(2), 422-434.
arXiv:1804.02767.
Redmon, J., & Farhadi, A. (2017). YOLO9000: Better, faster, stronger. In Proceedings of the
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified,
real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object
detection with region proposal net- works. Advances in Neural Information Processing Systems,
28, 91–99.
Ren, S., He, K., Girshick, R., & Sun, J. (2017). Faster R-CNN: Towards real-time object
detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine
Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and
Rumelhart, D., Hinton, G., & Williams, R. (1986). Learning representations by back-
212
Sabouri-Ghomi, S., Ventura, C. E., & Kharrazi, M. H. (2005). Shear analysis and design of
Sahoo, D. R., Singhal, T., Taraithia, S. S., & Saini, A. (2015). Cyclic behavior of shear-and-
flexural yielding metallic dampers. Journal of Constructional Steel Research, 114, 247-257.
Sajedi, S. O., & Liang, X. (2021). Uncertainty-assisted deep vision structural health
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. (2018). MobileNetV2:
Inverted residuals and linear bottlenecks. In Proceed- ings of the IEEE Conference on Computer
of the IEEE conference on computer vision and pattern recognition (pp. 4104-4113).
Sevillano, E., Sun, R., & Perera, R. (2016). Damage detection based on power dissipation
measured with PZT sensors through the combi- nation of electro-mechanical impedances and
Shi, J., & Tomasi, C. (1994). Good features to track. In IEEE Conference on Computer Vision
and Pattern Recognition, pages 593–600. Simonyan, K. & Zisserman, A. (2014). Very deep
Sim, C., Laughery, L., Chiou, T. C., & Weng, P. (2018). 2017 Pohang Earthquake:
https://datacenterhub.org/resources/14728
213
Sim, C., Song, C., Skok, N., Irfanoglu, A., Pujol, S., & Sozen, M. (2015). Database of low-
rise reinforced concrete buildings with earth- quake damage. Retrieved from
https://datacenterhub.org/dv_dibbs/ view/1012:dibbs/experiments_dv/
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale
Singer, J., Arbocz, J., & Weller, T. (2002). Buckling experiments: experimental methods in
buckling of thin-walled structures, volume 2: shells, built-up structures, composites and additional
Sinha, S. K., Fieguth, P. W., & Polak, M. A. (2003). Computer vision techniques for automatic
Sirca Jr, G. F., & Adeli, H. (2018). Infrared thermography for detecting defects in concrete
Soukup, D., & Huber-Mörk, R. (2014). Convolutional neural networks for steel surface defect
detection from photometric stereo images. International Symposium Visual Computing, New
Spencer Jr, B. F., Hoskere, V., & Narazaki, Y. (2019). Advances in computer vision-based
Stanbridge, A. B., & Ewins, D. J. (1999). Modal testing using a scanning laser Doppler
214
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., … Rabinovich, A. (2015).
Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and
Ta, Q. B., & Kim, J. T. (2020). Monitoring of corroded and loosened bolts in steel structures
The Globe and Mail. (2016). Municipalities not spending enough to maintain infrastructure:
review. https://www.theglobeandmail.com/news/politics/municipalities-not-spending-enough-to-
maintaininfrastructure-review/article28234692/.
Tomasi, C., & Kanade, T. (1991). Detection and tracking of point. Inter- national Journal of
Tong, J. Z., Guo, Y. L., & Zuo, J. Q. (2018). Elastic buckling and load-resistant behaviors of
double-corrugated-plate shear walls under pure in-plane shear loads. Thin-Walled Structures, 130,
593-612.
Tong, Z., Yuan, D., Gao, J., and Wang, Z. (2020), Pavement defect detection with fully
Torr, P. H., & Zisserman, A. (2000). MLESAC: A new robust estimator with application to
estimating image geometry. Computer Vision and Image Understanding, 78(1), 138–156.
Triggs, B., McLauchlan, P. F., Hartley, R. I., & Fitzgibbon, A. W. (1999, September). Bundle
215
Turner, J., & Pretlove, A. J. (1991). Acoustics for engineers. Macmillan International Higher
Education.
damage detection for a population of nominally identical structures: Unsupervised Multiple Model
(MM) statistical time series type methods. Mechanical Systems and Signal Processing, 111, 149-
171.
Van Genechten, B. (2008). Theory and practice on Terrestrial Laser Scanning: Training
Spain.
Vetrivel, A., Gerke, M., Kerle, N., Nex, F. C., & Vosselman, G. (2018). Disaster damage
detection through synergistic use of deep learning and 3D point cloud features derived from very
high resolution oblique aerial images, and multiple-kernel-learning, ISPRS Journal of Pho-
Vigh, L. G., Deierlein, G. G., Miranda, E., Liel, A. B., & Tipping, S. (2013). Seismic
performance assessment of steel corrugated shear wall system using non-linear analysis. Journal
Wang, N., Zhao, Q., Li, S., Zhao, X., & Zhao, P. (2018). Damage classification for masonry
historic structures using convolutional neural networks based on still images. Computer‐Aided
Wang, N., Zhao, X., Zhao, P., Zhang, Y., Zou, Z., & Ou, J. (2019). Automatic damage
216
Wang, T., Song, G., Liu, S., Li, Y., & Xiao, H. (2013). Review of bolted connection
Westoby, M. J., Brasington, J., Glasser, N. F., Hambrey, M. J., & Reynolds, J. M. (2012).
Wu, R. T., & Jahanshahi, M. R. (2020). Data fusion approaches for structural health
monitoring and system identification: past, present, and future. Structural Health
Xia, Y., Chen, B., Weng, S., Ni, Y. Q., & Xu, Y. L. (2012). Temperature effect on vibration
properties of civil structures: a literature review and case studies. Journal of Civil Structural Health
Xiong, B., Jancosek, M., Elberink, S. O., & Vosselman, G. (2015). Flexible building
Xu, J., Gui, C., and Han, Q. (2020), Recognition of rust grade and rust ratio of steel structures
Xue, Y. D., & Li, Y. C. (2018). A fast detection method via region-based fully convolutional
neural networks for shield tunnel lining defects. Computer-Aided Civil and Infrastructure
Yang, J., & Chang, F. K. (2006). Detection of bolt loosening in C– C composite thermal
protection panels: II. Experimental verification. Smart Materials and Structures, 15(2), 591.
217
Yang, T. Y., Banjuradja, W., Etebarian, H., & Tobber, L. (2021). Numerical modeling of
Yang, T. Y., Li, T., Tobber, L., & Pan, X. (2019). Experimental Test of Novel Honeycomb
Yang, T. Y., Li, T., Tobber, L., & Pan, X. (2020). Experimental and numerical study of
https://doi.org/10.1016/j.engstruct.2019.109814
Yang, T. Y., Moehle, J., Stojadinovic, B., & Der Kiureghian, A. (2009). Seismic performance
135(10), 1146–1154.
Yang, Y., Dorn, C., Mancini, T., Talken, Z., Kenyon, G., Farrar, C., & Mascareñas, D. (2017).
Blind identification of full-field vibration modes from video measurements with phase-based video
Yeum, C. M., & Dyke, S. J. (2015). Vision-based automated crack detec- tion for bridge
Yeum, C. M., Dyke, S. J., Ramirez, L., & Benes, B. (2016). Big visual data analysis for
Yi, J., Gil, H., Youm, K., & Lee, H. (2008). Interactive shear buckling behavior of
218
Yoon, H., Elanwar, H., Choi, H., Golparvar‐Fard, M., & Spencer Jr, B. F. (2016). Target‐free
Yuen, K. V., & Lam, H. F. (2006). On the complexity of artificial neural networks for smart
Yun, J. P., Kim, D., Kim, K., Lee, S. J., Park, C. H., & Kim, S. W. (2017). Vision-based
surface defect inspection for thick steel plates. Optical Engineering, 56(5), 053108.
Zhang, A., Wang, K. C. P., Li, B., Yang, E., Dai, X., Peng, Y., … Chen, C. (2017). Automated
Zhang, C., Chang, C., and Jamshidi, M. (2020). “Concrete bridge surface damage detection
389–409.
Zhang, C., Zhang, Z., & Shi, J. (2012). Development of high deformation capacity low yield
strength steel shear panel damper. Journal of Constructional Steel Research, 75, 116-130.
Zhang, Y. and Yuen, K.V. (2021), Crack detection using fusion features-based broad learning
system and image processing, Computer-Aided Civil and Infrastructure Engineering, 36:12.
Zhang, Y., & Lin, W. (2022). Computer‐vision‐based differential remeshing for updating the
geometry of finite element model. Computer‐Aided Civil and Infrastructure Engineering, 37(2),
185-203.
219
Zhang, Y., & Yuen, K. V. (2022). Bolt damage identification based on orientation-aware
Zhang, Y., Sun, X., Loh, K. J., Su, W., Xue, Z., & Zhao, X. (2020). Autonomous bolt
loosening detection using deep learning. Structural Health Monitoring, 19(1), 105-122.
Zhao, J., Bao, Y., Guan, Z., Zuo, W., Li, J., & Li, H. (2019). Video‐based multiscale
identification approach for tower vibration of a cable‐stayed bridge model under earthquake
Zhao, X., Tootkaboni, M., & Schafer, B. W. (2015). Development of a laser-based geometric
Zhao, X., Zhang, Y., & Wang, N. (2019). Bolt loosening angle detection technology using
220
Appendices
This section describes the possible data collection methods in current inspection practices.
Currently, manual inspection is widely used in the field. For example, in the case of a typical
bridge inspection, specialized trucks and supporting mechanical arms are required during the
process. In addition, ropes are required to allow the inspector to conduct a more detailed
investigation of some portions of the bridges. In this case, an inspector can use measurement
devices and cameras to capture critical damage information, which will be reported in the site
inspection survey. This clearly shows that the manual inspection approaches are not only relatively
inefficient, and potentially cause traffic shutdown, but also impose life safety risks to the
inspectors.
The current manual inspection practices are time-consuming, inherently based, highly
dependent on the proper training of the inspectors, and impose a threat to the life safety of the
structural damage inspection is an active area of research. Rapid data collection and analysis right
informed risk management decisions. Therefore, this section presents potential robotic solutions
(i.e., UAVs and UGVs) to facilitate the data collection process. The deployment of the UAV fleet
and UGVs on sites is a future trend for efficient and safe data collection, especially for inspections
221
at large scale, which is very labor-intensive, time-consuming, and expensive when conducted
manually.
The UAVs (or drones) have the flexibility of flying freely in many possible directions at
different altitudes in the air where the number of obstacles is generally less than that on the ground.
With the rapid development of consumer-grade drones by companies such as DJI and Parrot, it is
more feasible for engineers and researchers to embrace such advanced technologies at a more
affordable cost. In addition, the robotic research community has made great efforts in collaborating
with industrial partners to create programmable drones with the supporting software development
toolkit (SDK), which allows researchers from different disciplines to collaborate on a more
integrated project. For example, with the SDK, state-of-the-art computer vision algorithms
developed by computer vision researchers can be integrated with drone technologies, such as novel
autonomous navigation and data collection with less or even zero amount of human control. The
figure below depicts an example of a programmable drone, ANAFI AI, developed by Parrot. The
drone can reach a maximum speed of 17m/sec forward, and 16m/sec backward and laterally, with
a wind resistance of 14m/sec. The drone has a built-in front camera setup which consists of stereo
cameras and an RGB camera, which has the capability of rotating in three axes, at a speed of 300°/s
on the pitch and roll axes, while 200°/s on the yaw axis. The stereo camera can be effectively used
for mapping and 3D vision-based scene reconstruction, while the RGB camera can provide high-
resolution image which can be processed by state-of-the-art vision algorithms. In addition, the
built-in time-of-flight (ToF) sensor measures the ground distance, which can be used to calibrate
and refine the 3D reconstruction of structures. In the situation of indoor flight where the GPS signal
222
is normally denied, the measurements from the built-in ToF sensor, the vertical camera and the
IMU device will be fused to provide a more reliable estimation of velocity and ground distance.
In the situation of low-light conditions, the built-in LED lights beside the vertical camera allow
the vision algorithms developed to be more effectively used to ensure more stable navigation. In
addition, the drone has 4G connectivity and can switch automatically between WIFI and 4G for
wireless data transmission. This distinguishes its applicability and operation range significantly
from the other drones which only support WIFI or other local area networks. In short, the hardware
configuration, the SDK, and the resources from the open-source communities make the ANAFI
AI drone a strong candidate for UAV-based structural damage detection, such as bridge damage
is its small size which makes it easier to pass narrow passages of civil structures. Although it has
a limited battery capacity at the current time, it is programmable which makes it ideal to develop
On the other hand, in recent years, UGVs have gained more attraction in both research
developments and filed applications. For civil engineering applications, one of the UGV
candidates is the Husky UGV. The Husky UGV is a medium-sized ground robot which has a
relatively large payload to accommodate more hardware units than UAVs in general. Users can
install different types of manipulators such as robotic arms, or sensors such as RGB cameras, stereo
224
cameras, LiDAR, GPS, and IMU, onto the Husky robot. The diversity of sensors equipped by this
robot can allow various types of measurements which reflect structural health conditions to be
recorded. The sensor data can be processed by the state-of-the-art structural damage evaluation
algorithms developed in vision-based SHM and vibration-based SHM domains. This will facilitate
a more comprehensive structural damage evaluation. Moreover, the robot is fully supported by
Robot Operating System (ROS) which is an open-source robotics development platform. Using
the stereo camera or LiDAR sensors, SLAM algorithms can be effectively developed, while the
control algorithms for the manipulators installed can also be developed, all on ROS. This will
greatly enhance the level of autonomous navigation and data collection. In short, the diversity of
the hardware configuration, the full support in ROS, and the additional resources from the open-
source communities make the Husky robot a strong candidate for UGV-based structural damage
In addition, the development of robot dogs has received great momentum in recent years.
Currently, most applications of robot dogs are for entertainment, security and public safety
purposes. However, applications of robot dogs for structural damage evaluation remain almost
none. The author of the dissertation believes the robot dog can be effectively used for structural
damage detection in the near future. For example, the robot dog, Go 1, developed by
UnitreeRobotics, is a potential candidate. This robot dog has a so-called super sensory system (5
sets of fish-eye stereo depth cameras and 3 sets of hypersonic sensors), which allows the robot to
have full view coverage. It has built-in AI processing units which can reach a total computational
power of 1.5 TFLOPS. The robot can also be equipped with LiDAR, which can be used together
with the advanced camera system to allow more reliable and accurate autonomous positioning and
225
obstacle avoidance both indoors and outdoors. The robot can also be equipped with manipulators
such as a robotic arm to perform different active control tasks. In addition, the Go 1 robot can be
configured to have both 4G and WIFI connectivity for efficient wireless data transmission. Lastly,
the educational version of the Go 1 robot dog provides multiple programming APIs which allows
researchers to further develop the robot dog for specific applications. For example, the Go 1 robot
dog can be potentially developed for autonomous indoor damage inspection of buildings. The
robot dog can climb stairs, and the robotic arm can be programmed to open the doors. These two
combined features make it superior to many of the existing wheeled ground robots and drones.
The advanced sensory system allows the robot dog to perform the SLAM task and damage data
collection simultaneously both indoors and outdoors during a building damage assessment. The
pretrained AI model can be deployed to the robot dog using the Python programming API
provided, such that real-time local image processing can be potentially achieved using the built-in
computational units, without a separate computer. In the situation of a large amount of data
processing (e.g., dense point cloud from the 3D reconstruction of civil structures) which cannot be
handled locally, the data can be wirelessly streamed back via the built-in 4G or WIFI connectivity
It is envisioned that in the future, robotic technologies such as UAVs and UGVs will be
gradually adopted into civil infrastructure inspection and maintenance practices, with the
226
Appendix B Pseudocodes
In this section, pseudocodes are presented to provide an additional illustration for detailed
algorithmic implementation of some key algorithms and methods developed and validated in this
research.
this research or adopted from other research studies, are not presented here. Guidelines for the
development, optimization, finetuning, and deployments of these algorithms are available on many
Similarly, the development and implementation of the 3D reconstruction pipeline can be accessed
227
Pseudocode for concrete spalling quantification for cuboid RC columns
% Input
% Search for ground plane and side surface planes of the RC columns
While the number of side planes detected < 4 and ground plane is not found
Check the similarity between the segmented plane and the ground plane
If the segmented plane normal and the ground plane normal are close enough
Save the fitted plane information and the inlier point fitting the plane
Remove all the inlier point fitting this plane from the point cloud
End if
Save the fitted plane information and the inlier point fitting the plane
228
Remove all the inlier point fitting this plane from the point cloud
Else
End if
End while
While the number of points fitting the respective plane < the predefined threshold
Establish a plane that is parallel to the ground plane through this point
Count the number of points (i.e., plane inliers) that fit this plane. % The inlier points are
defined as the points having a point-to-plane distance less than a certain distance
threshold
Else
End if
End while
% Recover 3D geometry
229
% Determine the volume of the damaged RC column
Volume of the damaged RC column = Volume of the alpha shape of the point cloud
230
Pseudocode for concrete spalling quantification for circular-shape RC columns
% Input
Check the similarity between the segmented plane and the ground plane
If the segmented plane normal and the ground plane normal are close enough
Save the fitted plane information and the inlier point fitting the plane
Remove all the inlier point fitting this plane from the point cloud
End if
Else
231
End if
End while
While the number of points fitting the respective plane < the predefined threshold
Establish a plane that is parallel to the ground plane through this point
Count the number of points (i.e., plane inliers) that fit this plane. % The inlier points are
defined as the points having a point-to-plane distance less than a certain distance
threshold
Else
End if
End while
% Initialize the stepping position (at the lower or upper bound of the point cloud along one
selected axis)
232
Stepping range = [Lower stepping position, Upper stepping position]
End while
Count and rank the fitted radius falling into different range of the bins
% Recover 3D geometry
Recover 3D geometry using the detected plane and the fitted cross-sectional radius
Volume of the damaged RC column = Volume of the alpha shape of the point cloud
233
Pseudocode for the DBPPC method
% Input
% Initialize the stepping position (at the lower or upper bound of the point cloud along one
selected axis)
Select a point close to one side along the longitudinal direction of the plate
Determine the distance (after projection) of the selected point to all other points
234
Sample k nearest neighbors of the selected point within the sub-cloud such that the
point-to-point distance does not exceed the distance threshold
Determine the furthest point in the k nearest neighbors of the selected point and set
the furthest point to the newly selected point
Sample k nearest neighbors of the newly selected point from Cluster 2 such
that the point-to-point distance does not exceed the distance threshold
End if
End while
Assemble local clusters (of each sub-cloud) to global clusters (of the total cloud)
End while
235
Pseudocode for the YOLO-TDCH method
% Input
Require: Vision-based object detector (e.g., pretrained YOLO in this dissertation) for bolt
localization
Apply plane segmentation algorithm until no more principal planes can be found % principal
plane is defined as the plane that has a sufficient number of inlier points fitting the plane
Project the point cloud onto the planes which lead to multiple render images
236
End if
End for
Establish a cuboid boundary box for each bolt, using the respective detected
bounding boxes and the normal vector of the reference planes identified
% Stepping along the direction of the normal vector of the reference plane
While stepping position is inside the range of the cuboid boundary box:
End while
End for
Use the recorded convex hull information to identify the bolt cap
237
Use the recorded convex hull information to identify the supporting surface beneath
the bolt
% The above process is repeated for all the small sub-clouds from the top down. The
area of each convex hull will be consistently checked for each cuboid boundary within
each sub-cloud. The bolt loosening length can be estimated considering the two following
criteria. a) If a bolt is tight, there will be a relatively constant convex hull area at the
beginning (i.e., bolt cap region) of the sub-cloud stepping-down process, and then a
sudden increase of the convex hull area when the sub-cloud sampling reaches the
structural surface underneath the bolt. b) If a bolt is loosened, there will be a relatively
constant convex hull area in the first region (i.e., bolt cap region), followed by a sudden
decrease of the convex hull area (i.e., bolt thread region due to loosening) at the
beginning of the second region, and then followed by a sudden increase of the convex
hull area when the sub-cloud sampling reaches the steel surface underneath the bolt. The
plane travel distance within the second region will be determined as the bolt loosening
length.
Bolt loosening length = Distance between the bottom of the bolt cap and the supporting
surface underneath the bolt
238
Appendix C Additional results
This section presents the additional sample system-level testing results including system-level
239
Collapse
240
Non-collapse
241
C.2 RC structures
This section presents additional sample results for component-level damage states
242
C.2.1 RC component-level damage classification
No damage
No damage
243
No damage
244
No damage
245
No damage
246
No damage
247
Light damage
Light damage
248
Light damage
249
Light damage
250
Light damage
251
Moderate damage
Moderate damage
252
Moderate damage
253
Moderate damage
254
Moderate damage
255
Severe damage
Severe damage
256
Severe damage
257
Severe damage
258
Severe damage
259
Severe damage
260
Severe damage
261
Severe damage
262
C.2.2 RC steel exposure detection
263
264
265
266
267
268
269
270
C.2.3 3D reconstruction and plane segmentation of RC columns
271
3D reconstruction of RC column 1
272
Plane segmentation results of RC column 1
273
274
3D reconstruction of RC column 2
275
Plane segmentation results of RC column 2
276
277
278
3D reconstruction of RC column 3
279
Plane segmentation results of RC column 3
280
C.3 Steel plate structures
This section presents the sample classification results for the steel plate dampers and steel
corrugated plate walls. In addition, it presents some additional sample 3D reconstruction results of
Undamaged steel plate dampers Damaged steel plate dampers (due to buckling)
281
(a) Undamaged steel corrugated plate wall (b) Damaged steel corrugated plate wall (due to
buckling)
282
C.3.2 3D reconstruction of steel solid plate dampers
283
3D reconstruction of steel solid plate damper 1 (multiple views)
284
285
3D reconstruction of steel solid plate damper 2 (multiple views)
286
C.3.3 3D reconstruction of steel honeycomb plate dampers
287
3D reconstruction of steel honeycomb damper 1 (multiple views)
288
289
3D reconstruction of steel honeycomb damper 2 (multi-views)
290
291
3D reconstruction of steel honeycomb damper 3 (multiple views)
292
C.3.4 3D reconstruction of steel corrugated panels
293
3D reconstruction of the steel corrugated panel
294
C.4 Structural bolted components
This section presents some additional sample bolt localization results by the pretrained
295
C.4.1 Structural bolts detection
296
297
298
299
C.4.2 3D reconstruction of structural bolted devices
300
3D reconstruction results of bolted component 1 (in multiple views)
301
302
3D reconstruction results of bolted component 2, friction damper, in multiple views
303
C.5 Parameter studies
The plane segmentation results are directly attributed to the selection of the distance threshold
during plane fitting. If the threshold is set too low, only a small subset of inliers will be determined
to fit a plane, which means a large number of iterations are required to isolate the buckled plates.
This can be very computationally expensive if the point cloud is too large. On the other hand, if
the selection of distance threshold is too high, the quality of the fitting will be downgraded. In this
study, three values of distance threshold are examined. The figure below shows the results of plane
segmentation under three different distance thresholds, where the red region indicates the fitted
plane to which the inlier points are removed after each iteration. As the distance threshold goes
lower, the number of inliers during each plane fitting iteration will be smaller. This means more
iterations are required to isolate the buckled plate. In this study, to achieve more efficient fitting
for the non-buckled plates while still maintaining a relatively high plane fitting quality, the distance
threshold is selected as 20mm. In this case, the number of iterations required to isolate the buckled
plate is only 3, which is computationally efficient. After the plane segmentation, the central
buckled plate is isolated for further processing. It should be noted that this process can be semi-
automated when dealing with highly complicated structural assemblies, or when the selection of
the distance threshold is difficult. In these scenarios, plane segmentation can be used to remove
304
most of the irrelevant flat planes, while minor human intervention can be considered to further
(a)
(b)
(c)
Figure Three iterations of plane segmentation results using the distance threshold of (a)
10mm; (b) 15mm; (c) 20mm (Note: the unit for x, y, and z axis is in “mm”)
In this parameter study, grid sizes of 5 mm, 10 mm, 20 mm, and 30 mm have been
investigated. Similarly, distance thresholds of 2 mm, 3 mm, 4 mm, and 5 mm have been
305
investigated. The table below show the results of the parameter study. As shown in the table, when
grid size is 20 or less, and the distance threshold is 4 or less, the DBPPC method can successfully
separate the points from two surfaces. The figures below show a success case and a fail case.
306
Success (Distance threshold = 3, grid size = 10) in the planar view