Download as pdf or txt
Download as pdf or txt
You are on page 1of 336

THREE-DIMENSIONAL VISION-BASED STRUCTURAL DAMAGE DETECTION

AND LOSS ESTIMATION – TOWARDS MORE RAPID AND COMPREHENSIVE


ASSESSMENT

by

Xiao Pan

B.E., National University of Ireland, Galway, 2016


B.E., Jiangnan University, 2016
M.Sc., Imperial College London, 2017

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF


THE REQUIREMENTS FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY
in
THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES
(Civil Engineering)

THE UNIVERSITY OF BRITISH COLUMBIA


(Vancouver)

December 2022

© Xiao Pan, 2022


The following individuals certify that they have read, and recommend to the Faculty of Graduate
and Postdoctoral Studies for acceptance, the dissertation entitled:

THREE-DIMENSIONAL VISION-BASED STRUCTURAL DAMAGE DETECTION AND


LOSS ESTIMATION – TOWARDS MORE RAPID AND COMPREHENSIVE
ASSESSMENT

submitted by Xiao Pan in partial fulfillment of the requirements for

the degree of Doctor of Philosophy

in Civil Engineering

Examining Committee:
Tony T.Y. Yang, Professor, Department of Civil Engineering, UBC
Supervisor
Carlos Ventura, Professor, Department of Civil Engineering, UBC
Supervisory Committee Member

Hojjat Adeli, Professor, Departments of Civil, Environmental and Geodetic Engineering,


Electrical and Computer Engineering, Biomedical Engineering, Neurology, and Neuroscience,
The Ohio State University
Supervisory Committee Member
Frank Lam, Professor, Department of Wood Science, UBC
University Examiner
Minghao Li, Associate Professor, Department of Wood Science, UBC
University Examiner

ii
Abstract

Civil engineering structures such as buildings and bridges inevitably experience damage due

to aging effects and natural disasters such as earthquakes. Damage inspection of these structures

is of vital importance to maintain their functionalities. Earlier damage identification before natural

disasters can greatly alleviate or prevent catastrophic failure in the event of natural disasters.

Traditional manual inspection is inefficient and highly reliant on the proper training and experience

of inspectors, which may result in false conclusions and erroneous evaluation reports. In recent

decades, structural health monitoring (SHM) methods such as vibration-based SHM and non-

destructive testing and evaluation (NDTE) methods have been developed to automate the

inspection process. These methods generally require relatively complicated and expensive

instrumentation to evaluate the conditions of structures. More recently, computer vision-based (or

vision-based) SHM has been established as a convenient, economical and efficient complementary

approach to the other SHM methods for civil structures. In comparison to contact-type vibration

sensors, vision-based methods use low-cost and non-contact sensors, and easy installation and

operation. However, most existing vision-based SHM methods are built on 2D computer vision

where the evaluation outcomes are sensitive to camera locations and poses. Besides, these 2D

vision methods are limited to the evaluation of in-plane damages, while not directly capable of

quantifying damages in 3D space. In short, existing 2D vision methods may not provide a reliable

and comprehensive damage evaluation outcome.

To address these limitations, this dissertation proposes a 3D vision-based SHM and loss

estimation framework, which aims to provide a more rapid and comprehensive damage evaluation

iii
and loss assessment of civil structures. Within the framework, the dissertation is strongly focused

on the development and application of advanced 2D and 3D vision-based SHM methods for civil

structures. Experiments of the vision algorithms developed have been conducted on three prevalent

structural types including reinforced concrete structures, steel structures and structural bolted

components. Results show that the proposed 3D vision-based damage evaluation and loss

quantification framework can achieve high accuracy and low cost in damage recognition,

localization and quantification, and provide more comprehensive assessment results which can be

more easily conveyed to owners and decision-makers.

iv
Lay Summary

The civil and structural engineering industries in many countries are pursuing faster

construction, more efficient and automated operation and maintenance of civil structures which

are smarter, more sustainable and resilient. This requires advanced and efficient automated

technologies to be developed and validated in this field. This dissertation proposes a 3D computer

vision-based structural damage evaluation and loss quantification framework, which aims to

provide a more rapid and comprehensive automated inspection methodology for civil structures.

Successful implementation of these automated technologies will greatly enhance the consistency

and efficiency in civil infrastructure inspection and maintenance, thus making the cities and civil

structures smarter, more resilient and sustainable. Moreover, it will greatly help alleviate the labor

shortage of skilled workers in the field of infrastructure inspection in Canada.

v
Preface

Most of this dissertation has been published or under review in peer-reviewed journals. The

following summarizes the publications and manuscripts under review where the contributions of

the authors are indicated.

Published/Accepted contributions

The following publications were prepared by the primary author, Xiao Pan, whereas the

coauthors provided technical and editorial comments. The author of this dissertation is responsible

for the literature review, experimental tests, data collection, formulation development,

computational analysis, data processing, and results presentation of the publications as detailed

below:

1. Pan, X., & Yang, T. Y. (2020). Postdisaster image‐based damage detection and repair cost

estimation of reinforced concrete buildings using dual convolutional neural networks.

Computer‐Aided Civil and Infrastructure Engineering, 35(5), 495-510. – Chapter 3, 4 and

5.

2. Pan, X., & Yang, T. Y. (2021). Image-based monitoring of bolt loosening through deep-

learning-based integrated detection and tracking. Computer‐Aided Civil and Infrastructure

Engineering, 1–16. – Chapter 4.

3. Pan, X., Yang, T. Y. (2022). 3D vision-based out-of-plane displacement quantification for

steel plate structures using structure from motion, deep learning and point cloud

processing. Computer-aided Civil and Infrastructure Engineering, 00, 1– 15. – Chapter 4

vi
4. Pan, X., Vaze, S., Xiao, Y., Tavasoli, S., Yang T.Y. “Structural damage detection of steel

corrugated panels using computer vision and deep learning.” Canadian Society for Civil

Engineering (CSCE) conference, 2022, Whistler, British Columbia, Canada. – Chapter 4

and Appendix B

In the below manuscript, the author of this dissertation is responsible for the experimental

testing, data collection and data preprocessing for training and validation of algorithms in Chapter

4 of the dissertation.

5. Yang, T. Y., Li, T., Tobber, L., & Pan, X. (2020). Experimental and numerical study of

honeycomb structural fuses. Engineering Structures, 204, 109814. – Chapter 4.

Under-review contributions

The following under-review manuscripts were prepared by the primary author, Xiao Pan,

whereas the coauthors provided technical and editorial comments. The author of this dissertation

is responsible for the literature review, experimental tests, data collection, formulation

development, computational analysis, data processing, and results presentation of the publications

as detailed below:

6. Pan, X., Yang, T. Y., Xiao, Y., Yao, H., & Adeli, H. (2022). Vision-based real-time

structural vibration measurement through interactive deep-learning-based detection and

tracking methods. Under review. – Chapter 6.

7. Pan, X., Yang, T. Y. (2022). 3D vision-based structural bolt loosening quantification using

deep learning and photogrammetry techniques. Under review. – Chapter 4.

vii
In the below manuscript, the author of this dissertation is responsible for the implementation

of the vision-based damage detection algorithms in Chapter 4 of the dissertation.

8. Tavasoli, S., Pan, X., Yang, T. Y. (2022). Autonomous indoor navigation and vision-based

damage assessment of reinforced concrete structures using low-cost drones. Under review

– Chapter 4.

viii
Table of Contents

Abstract ................................................................................................................................. iii

Lay Summary ..........................................................................................................................v

Preface ................................................................................................................................... vi

Table of Contents .................................................................................................................. ix

List of Tables ....................................................................................................................... xvi

List of Figures ................................................................................................................... xviii

List of Abbreviations ........................................................................................................ xxiv

Acknowledgements .......................................................................................................... xxvii

Dedication .......................................................................................................................... xxix

Chapter 1: Introduction .........................................................................................................1

1.1 Background ................................................................................................................. 1

1.2 Research goals and contributions................................................................................ 3

1.3 Objectives and methodology....................................................................................... 6

1.4 Thesis organization ..................................................................................................... 8

Chapter 2: Literature review ...............................................................................................10

2.1 Overview ................................................................................................................... 10

2.2 Vibration-based SHM methods................................................................................. 12

2.3 NDTE-based SHM methods ..................................................................................... 13

ix
2.4 TLS-based SHM methods ......................................................................................... 14

2.5 Vision-based SHM methods ..................................................................................... 16

2.5.1 Overview ........................................................................................................... 16

2.5.2 Non-DL-based vision methods for SHM .......................................................... 17

2.5.3 DL-based vision methods for SHM .................................................................. 18

2.5.3.1 Advances in artificial neural networks........................................................ 18

2.5.3.2 Developments and applications of DL-based vision methods for SHM of civil

structures 19

2.6 Discussion of limitations of the existing methods .................................................... 21

2.6.1 Limitations of vibration-based and NDTE-based SHM methods ..................... 21

2.6.2 Limitations of TLS-based methods ................................................................... 22

2.6.3 Limitations of existing vision-based methods .................................................. 22

2.7 Research motivations ................................................................................................ 24

Chapter 3: 3D vision-based SHM and loss estimation framework for civil structures –

towards more rapid and comprehensive assessment. ...............................................................26

3.1 Overview ................................................................................................................... 26

3.2 Introduction ............................................................................................................... 26

3.3 Data collection .......................................................................................................... 28

3.4 System-level collapse recognition ............................................................................ 29

3.5 Component-level damage recognition ...................................................................... 32

3.6 Component-level damage localization ...................................................................... 33


x
3.7 Component-level damage quantification .................................................................. 35

3.7.1 Overview ........................................................................................................... 35

3.7.2 Vision-based 3D reconstruction ........................................................................ 37

3.7.2.1 Data association .......................................................................................... 40

3.7.2.2 Structure-from-motion ................................................................................ 40

3.7.2.3 Multi-view stereo ........................................................................................ 42

3.7.2.4 Dense point-cloud preprocessing ................................................................ 42

3.7.3 Multi-view structural components localization ................................................ 43

3.7.4 Dense point cloud postprocessing..................................................................... 46

3.8 Loss quantification .................................................................................................... 47

Chapter 4: Development and application of vision-based SHM methods .......................50

4.1 Overview ................................................................................................................... 50

4.2 Vision-based SHM methods for RC structures ......................................................... 51

4.2.1 Introduction ....................................................................................................... 51

4.2.2 Methodology ..................................................................................................... 54

4.2.2.1 Overview ..................................................................................................... 54

4.2.2.2 CNN-based classification ............................................................................ 54

4.2.2.2.1 System-level collapse recognition ....................................................... 56

4.2.2.2.2 Component-level damage state pre-classification................................ 58

4.2.2.3 Steel reinforcement object detection ........................................................... 60

xi
4.2.2.4 Component-level damage state determination ............................................ 66

4.2.2.5 Component-level damage quantification .................................................... 71

4.2.2.5.1 Cuboid-shape concrete columns .......................................................... 72

4.2.2.5.2 Cylinder-shape concrete columns ........................................................ 74

4.2.3 Experiments and results .................................................................................... 75

4.2.3.1 System-level failure classification .............................................................. 75

4.2.3.2 Component-level damage state classification ............................................. 77

4.2.3.3 Steel reinforcement object detection ........................................................... 80

4.2.3.4 Component-level damage state determination ............................................ 82

4.2.3.5 Component-level damage quantification .................................................... 84

4.2.3.6 Quantification accuracy validation ............................................................. 85

4.2.4 Conclusions ....................................................................................................... 90

4.3 Vision-based SHM methods for steel structures ....................................................... 91

4.3.1 Introduction ....................................................................................................... 91

4.3.2 Methodology ..................................................................................................... 95

4.3.2.1 Vision-based 3D reconstruction .................................................................. 97

4.3.2.2 Multi-view vision-based structural component detection ........................... 98

4.3.2.2.1 Description of the 3D object detection setup ....................................... 98

4.3.2.2.2 YOLOv3-tiny real-time detection networks ........................................ 99

4.3.2.2.3 Training and validation of YOLOv3-tiny .......................................... 100

xii
4.3.2.3 Localization of points of interest .............................................................. 105

4.3.2.4 Structural out-of-plane displacements quantification ............................... 107

4.3.2.5 Accuracy validation .................................................................................. 111

4.3.3 IMPLEMENTATION ..................................................................................... 114

4.3.3.1 Description of image collection setup ....................................................... 114

4.3.3.2 Dense point cloud 3D reconstruction ........................................................ 114

4.3.3.3 Out-of-plane displacement quantification ................................................. 115

4.3.4 Conclusions ..................................................................................................... 119

4.4 Vision-based SHM methods for structural bolted components .............................. 120

4.4.1 Introduction ..................................................................................................... 120

4.4.2 Overview and application scenarios of the proposed bolt loosening quantification

methodologies ..................................................................................................................... 125

4.4.3 Methodology – method 1 ................................................................................ 125

4.4.3.1 Overview of real-time integrated detection and tracking framework ....... 127

4.4.3.2 YOLOv3-tiny real-time detection networks ............................................. 128

4.4.3.3 RTDT-Bolt method ................................................................................... 132

4.4.3.4 Evaluation of the ground truth rotation angle ........................................... 138

4.4.3.5 Description of parameter studies............................................................... 139

4.4.4 Methodology – method 2 ................................................................................ 140

4.4.4.1 Vision-based 3D reconstruction ................................................................ 142

xiii
4.4.4.2 Multi-view structural bolted device detection .......................................... 144

4.4.4.3 Structural bolt loosening quantification .................................................... 146

4.4.4.3.1 Front-view structural bolt localization ............................................... 146

4.4.4.3.2 Side-view bolt looseness quantification ............................................. 147

4.4.5 Experiments and results – method 1 ............................................................... 150

4.4.5.1 Training and testing results of RCNN, YOLOv3 and YOLOv3-tiny ....... 150

4.4.5.2 The integrated method against illumination changes ................................ 154

4.4.5.3 Parameter studies ...................................................................................... 158

4.4.6 Experiments and results – method 2 ............................................................... 165

4.4.7 Conclusions ..................................................................................................... 169

Chapter 5: Combined vision-based SHM and loss estimation framework ...................172

5.1 Overview ................................................................................................................. 172

5.2 Methodology ........................................................................................................... 173

5.2.1 Overview ......................................................................................................... 173

5.2.2 Vision-based damage evaluation .................................................................... 175

5.2.3 PBEE methodology......................................................................................... 175

5.2.4 Description of the PBEE fragility database .................................................... 177

5.3 A case study on a post-disaster damage inspection survey..................................... 179

5.3.1 Description of the case study. ......................................................................... 179

5.3.2 Evaluation results ............................................................................................ 179

xiv
5.4 Discussion of the case study ................................................................................... 181

Chapter 6: Conclusions ......................................................................................................184

6.1 Summaries............................................................................................................... 184

6.2 Main contributions .................................................................................................. 185

6.3 Ongoing and future work ........................................................................................ 187

6.3.1 Development of new 3D vision methods for structural damage assessment .. 188

6.3.2 Development of new vision-based structural vibration measurements .......... 189

6.3.3 Integration of advanced robotic technologies into the proposed framework .. 193

Bibliography ........................................................................................................................195

Appendices...........................................................................................................................221

Appendix A Data collection methods .............................................................................. 221

A.1 Manual data collection .................................................................................... 221

A.2 Robot-based data collection ............................................................................ 221

Appendix B Pseudocodes ................................................................................................ 227

Appendix C Additional results ........................................................................................ 239

C.1 Sample system-level classification ................................................................. 239

C.2 RC structures ................................................................................................... 242

C.3 Steel plate structures ....................................................................................... 281

C.4 Structural bolted components ......................................................................... 295

C.5 Parameter studies ............................................................................................ 304

xv
List of Tables

Table 3-1 Description of damage state classes ...................................................................... 32

Table 4-1 Description of damage state classes ...................................................................... 59

Table 4-2 Selection of Anchor properties .............................................................................. 64

Table 4-3 System-level and component-level training parameters and performance of transfer

learning from three different pretrained models ........................................................................... 75

Table 4-4 Summary of the spalling quantification of the RC columns ................................. 88

Table 4-5 Summary of the steel exposure quantification of the RC columns ....................... 89

Table 4-6 Estimation of anchor box dimensions ................................................................. 103

Table 4-7 Comparison of the estimated out-of-plane displacements with the ground truth

values at four benchmark points ................................................................................................. 112

Table 4-8 Parameters examined for KLT tracking algorithms ............................................ 140

Table 4-9 Estimation of anchor box dimensions ................................................................. 152

Table 4-10 Speed comparison of RCNN, YOLOv3 and YOLOv3-tiny ............................. 152

Table 4-11 Comparison of the estimated rotation and ground truth rotation for the six bolts in

the short video processed by the base model .............................................................................. 158

Table 4-12 Complete results of parameter studies, expressed by the accuracy of rotation

estimation (expressed in percentage) .......................................................................................... 162

Table 4-13 Bolt loosening quantification results ................................................................. 169

Table 5-1 Fragility data for a sample concrete column (component ID: B1041.031a). ...... 178
xvi
Table 5-2 Description of the damage states for the sample component .............................. 178

xvii
List of Figures

Figure 2.1 Vibration-based structural health monitoring setup ............................................. 13

Figure 3.1 Vision-based SHM and loss estimation framework ............................................. 27

Figure 3.2 Building collapse in Wenchuan earthquake, 2008 ............................................... 30

Figure 3.3 Vision-based collapse identification .................................................................... 31

Figure 3.4 Vision-based damage state recognition ................................................................ 33

Figure 3.5 Vision-based damage localization: concrete cracks localization ......................... 34

Figure 3.6 Vision-based damage localization: steel reinforcement exposure localization .... 35

Figure 3.7 Vision-based damage localization: concrete spalling localization ....................... 35

Figure 3.8 3D vision-based damage quantification pipeline for structural components ....... 37

Figure 3.9 Vision-based 3D reconstruction procedures of an RC column ............................ 39

Figure 3.10 Multi-view structural component localization in a 3D point cloud ................... 45

Figure 3.11 Point cloud processing: concrete spalling quantification ................................... 47

Figure 4.1 Architecture of ResNet-50 for system-level collapse identification .................... 57

Figure 4.2 Architecture of ResNet-50 for component-level damage classification .............. 58

Figure 4.3 The schematic architecture of YOLOv2 built on ResNet-50 for steel ................. 61

Figure 4.4 Relationship between Mean IoU and number of dimension priors ...................... 63

Figure 4.5 Illustration of K-means clustering results with 10 anchor sizes........................... 64

Figure 4.6 Geometric properties of the predicted bounding box and the anchor prior ......... 66

xviii
Figure 4.7 Sample images of RC columns that classification model wrongly identifies as DS

2 while ground truth is DS3 .......................................................................................................... 68

Figure 4.8 Flowchart of component damage state inspection scheme .................................. 69

Figure 4.9 System-level collapse identification for training and validation sets using ResNet-

50: (a) accuracy and (b) loss ......................................................................................................... 76

Figure 4.10 System-level collapse versus no collapse: confusion matrices of (a) training set

and (b) testing set .......................................................................................................................... 76

Figure 4.11 Sample testing images of the building with predicted probability for each class

....................................................................................................................................................... 77

Figure 4.12 Component-level DS classification for training and validation sets using ResNet-

50: (a) accuracy and (b) loss ......................................................................................................... 79

Figure 4.13 Component-level damage state identification: confusion matrices of (a) training

(left) and (b) testing set. ................................................................................................................ 79

Figure 4.14 True prediction of sample testing images of the building with predicted probability

for each class ................................................................................................................................. 80

Figure 4.15 False prediction of sample testing images with ground truth of “Severe Damage”

....................................................................................................................................................... 80

Figure 4.16 Recall - precision curve of training (upper) and testing (testing) ...................... 82

Figure 4.17 Detection of steel bars highlighted by yellow rectangular bounding boxes (a)

Sample testing images, (b) detection of exposed steel bars (lower) in testing images wrongly

predicted by classification model (upper) ..................................................................................... 83

xix
Figure 4.18 Confusion matrix with consideration of only the classification model (left); the

combined classification and object detection model (right) ......................................................... 83

Figure 4.19 Plane segmentation (units are in mm) ................................................................ 87

Figure 4.20 Steel reinforcement exposure detection (place holder) ...................................... 89

Figure 4.21 3D vision-based steel buckling quantification methodology ............................. 97

Figure 4.22 Vision-based 3D reconstruction of steel plate structures ................................... 98

Figure 4.23 Architecture of the YOLOv3-tiny object detector ........................................... 100

Figure 4.24 Steel components identification ....................................................................... 102

Figure 4.25 Precision-recall curve of training ..................................................................... 103

Figure 4.26 Precision-recall curve of testing ....................................................................... 104

Figure 4.27 Sample testing results of YOLOv3-tiny on the rendered scenes and real-world

images ......................................................................................................................................... 105

Figure 4.28 Plane segmentation for an I-shaped steel plate damper ................................... 107

Figure 4.29 Illustration of DBPPC method ......................................................................... 110

Figure 4.30 Illustration of reference planes ......................................................................... 110

Figure 4.31 Out-of-plane displacement measurements for the steel plate damper. The units are

in [mm]........................................................................................................................................ 113

Figure 4.32 Illustration of benchmark points ...................................................................... 113

Figure 4.33 Vision-based 3D reconstruction procedures for the steel corrugated plate wall

..................................................................................................................................................... 116

xx
Figure 4.34 One iteration of plane segmentation for the steel corrugated panel. The units are

in [mm]........................................................................................................................................ 116

Figure 4.35 Illustration of the reference plane for the steel corrugated plate wall. The units are

in [mm]........................................................................................................................................ 117

Figure 4.36 Quantification of out-of-plane displacement distribution for the steel corrugated

wall panel. The units are in [mm] ............................................................................................... 118

Figure 4.37 Flowchart of the RTDT-Bolt method ............................................................... 128

Figure 4.38 Architecture of the YOLOv3-tiny object detector ........................................... 129

Figure 4.39 Image of the experimental bolted component .................................................. 130

Figure 4.40 Image preprocessing of the structural bolts...................................................... 134

Figure 4.41 Hough transformation of the original image, smoothed image, and sharpened

image, using the Canny method, Prewitt method and Log method, respectively. ...................... 135

Figure 4.42 3D vision-based bolt loosening quantification methodology. .......................... 141

Figure 4.43 Vision-based 3D reconstruction of the structural bolted device ...................... 143

Figure 4.44 Structural bolted device localization ................................................................ 145

Figure 4.45 Front view-based bolt localization ................................................................... 147

Figure 4.46 Illustration of loosened bolts and tight bolts. ................................................... 148

Figure 4.47 Cuboid boundaries formed by YOLOv3-tiny predictions................................ 149

Figure 4.48 Sub-cloud top-down sampling process ............................................................ 150

Figure 4.49 Precision-recall curve for (a) training, and (b) testing ..................................... 153

Figure 4.50 Sample results of YOLOv3-tiny detection of steel bolts ................................. 154
xxi
Figure 4.51 Montage of videos processed by the RTDT-bolt method: original video frame

with the illustration of the changing light conditions, and a highlight of the bolt under investigation

by the rectangular box; closed-up video frame, with the illustration of detection, tracking, and re-

detection. (Note: the frame index is shown at the top-left corner of each thumbnail image. Frame

rate: 30 frames per second) ......................................................................................................... 156

Figure 4.52 Montage of videos processed by the RTDT-bolt method: closed-up video frame,

with the illustration of detection, tracking, and re-detection. (Note: the frame index is shown at the

top-left corner of each thumbnail image. Frame rate: 30 frames per second) ............................ 157

Figure 4.53 Illustration of bolt index ................................................................................... 159

Figure 4.54 Time-history rotation estimation of the bolt in the short video with the base model

..................................................................................................................................................... 160

Figure 4.55 Montage of sample close-up frames of the short video processed by the base

model in parameter studies (Note: each thumbnail image with the labeled frame index corresponds

to the associated ground truth point in Figure 4.54 .................................................................... 160

Figure 4.56 Sensitivity study of rotation estimation to the selected parameters, with a control

parameter of (a1-a3) number of pyramid levels, (b1-b3) maximum bidirectional error, (c1-c3)

search block size, and (d1-d3) maximum number of iterations .................................................. 164

Figure 4.57 Illustration of the region of interest .................................................................. 166

Figure 4.58 Plane segmentation of the 3D point cloud of the friction damper (units are in mm)

..................................................................................................................................................... 167

Figure 4.59 Bolt localizations by the pretrained YOLOv3-tiny .......................................... 168

Figure 5.1 A picture of the workflow .................................................................................. 174


xxii
Figure 5.2 Typical consequence function for repair costs ................................................... 178

Figure 5.3 Case study of an RC building with sample results (a) system-level identification:

non-collapse and (b) component-level damage evaluation: severe damage with detection of steel

exposure (left), and moderate damage (right) ............................................................................. 180

Figure 5.4 Repair cost distribution corresponding to the hypothetical case ........................ 181

xxiii
List of Abbreviations

AE=Acoustic emissions

AI=Artificial intelligence

ATC=Applied technology council

CNN=Convolutional neural network

Conv=Convolution

DBPPC=Distance-based point-projection-clustering

DIC=Digital image correlation

DL-Deep learning

DM=Damage measure

DS=Damage state

DV=Decision variable

EDP=Engineering demand parameter

FC=Fully-connected

FEMA= Federal emergency management agency

FP=Feature points

FPS= Frame per second

GAP=global average pooling (GAP)

GPR=Ground penetrating radar

GPS=Global positioning system

xxiv
HT=Hough transform

IM=Intensity measure

IPT=Image processing technique

IT=Infrared thermography

KLT=Kanade-Lucas-Tomasi

LM=Laser testing methods

LVDT=Linear variable differential transducer

MSAC=M-estimator Sample Consensus

MT=Magnetic particle testing

NDTE=Non-destructive testing and evaluation

PBEE=Performance-based earthquake engineering

PSHA=Probabilistic seismic hazard analysis

PZT=Piezoelectric

RANSAC=Random Sample Consensus

RC=Reinforced concrete

RCNN=Region-based convolutional neural network

ReLU=Rectified Linear Unit

RGB=Red-green-blue

ROI=Region of interest

RT=Radiographic testing

xxv
RTDT=Real-time detection and tracking

SDD=Structural damage detection

SfM=Structure-from-motion

SHM=Structural health monitoring

TLS=Terrestrial laser scanning

UAV=Unmanned aerial vehicle

UGV=Unmanned ground vehicle

UT=Ultrasonic testing

YOLO=You only look once

YOLO-TDCH=YOLO detection and top-down convex hull

xxvi
Acknowledgements

This endeavor would not have been possible without the support of numerous people. I will
be forever thankful for all the relationships and connections formed during my undergraduate and
graduate studies. Life is a perpetual struggle to maintain a balance between various opposing
forces. I would have never got to this point without the endless support from my beloved family
members, friends, colleagues, and academic advisors during the days of prosperity and adversity,
particularly during the 8 years of my overseas studies.
Words cannot express my gratitude to my Ph.D. supervisor, Prof. Tony Yang, for his
invaluable feedback, encouragement, and scrutiny of my work during this incredible journey. I
have been deeply influenced by his vision, insights, and passion about research and developments
in the field of civil and structural engineering.
I also could not have undertaken this journey without my Ph.D. comprehensive exam
committee and the supervisory committee, who generously provided numerous constructive
feedback. Special thanks to Prof. Carlos Ventura and Prof. Hojjat Adeli for giving critical inputs
to help me advance to my Ph.D. candidacy and articulate my dissertation from a diverse
perspective of both academia and industries. I am very grateful to my master’s supervisor, Dr
Christian Malaga-Chuquitaype at Imperial College London, who continuously interacted with and
supported me in research during this journey. Besides, I would like to express my sincere gratitude
to my undergraduate advisors, Prof. Xinmin Zhan, Prof. Padraic O'Donoghue, Dr. Bryan McCabe,
Prof Chaosheng Zhang, at the National University of Ireland, Galway, and Prof. Yun Zou at
Jiangnan University for their thoughtful comments and recommendations in pursuit of my career
as a student and structural engineering researcher. Moreover, many thanks to Prof. Yunxin Pan at
the Hong Kong University of Science and Technology, and Prof. Qipei Mei at the University of
Alberta, for giving me important suggestions as an early-career researcher.
Over the past few years of my Ph.D. program, I have got a couple of great opportunities to
conduct various experimental tests including steel honeycomb damper tests, friction damper tests,
steel corrugated panel tests, nonlinear shake table control tests, vision-based shake table tests,
reinforced concrete column tests, timber bridge tests, and masonry prism tests. Therefore, I would
like to extend my sincere thanks to my fellow students, T. Li, T. Qiao, X. Zuo, S. Vaze, Y. Xiao,
xxvii
Y. Hsu, M. Dou, for having such opportunities to work with them on these tests, which have greatly
enriched my hands-on experience and expertise. Besides, the experimental data collected in some
of these tests are crucial to validate the essential algorithms and methods developed in this
dissertation. Furthermore, I had the pleasure of working with my colleagues, S. Tavasoli, Y. Xiao,
M. Azimi, and E. Faraji, where I got a chance to discuss many novel ideas in programming with
robotic control and sensing technologies, which stays as an active ongoing and future work during
and beyond my Ph.D. research.
I offer my enduring gratitude to the civil engineering department, the university, and all the
funding agencies. The research and teaching opportunities granted, and the professional seminars
and workshops organized by them have greatly inspired me to continue my research and teaching
in academia.
I would also like to express my profound appreciation to many of my supportive friends
outside of research and work, H. Wu, Y. Xiao, S. Tavasoli, S. Vaze, P. Kakoty, T. Li, H. Zhang,
S. Zhuo, X. Xie, H. Xu, D. Tung, M. Muazzam, R. Fu, W. Li, J. Li, to name some of them, for
sharing their values, helping me enjoy a fruitful life and build a sense of identity, resilience and
perseverance during this hard journey.
Last but not the least, I am deeply indebted to my loving parents and other family members
for their unconditional financial and moral support throughout the years. Thank you so much for
being a source of love, advice and friendship.

xxviii
Dedication

I dedicate this dissertation to my family: my loving parents, and the ones to come

xxix
Chapter 1: Introduction

1.1 Background

Civil engineering structures such as buildings and bridges deteriorate continuously due to

aging and environmental impacts. The Canadian Infrastructure Report notes that 40% of the

infrastructure is rated in poor condition, with the estimated repair or replacement costs of $141

billion (The Globe and Mail, 2016). For example, the British Columbia province of Canada is

situated in a high seismic zone, where a significant earthquake can result in $75 billion in financial

losses and thousands of casualties and injuries. Failure to understand the impact of deteriorating

infrastructure can significantly hinder the region’s, as well as Canada’s ability to recover, while

further impacting the social and economic progress. Infrastructure investments in Canada are

estimated to reach $11 trillion by 2067, of which 60-70% will be used to replace the existing

infrastructure (Future Cities Canada, 2018). Currently, there is a lack of efficient and cost-effective

methods to maintain the functionalities of civil structures. Traditional structural maintenance

consists of regular human inspections on site, which are slow and heavily dependent on the proper

training and experience of inspectors. Data obtained by visual inspection are manually analyzed

and documented by the inspectors or engineers, which is inherently biased and can be inconsistent

from time to time. This may result in false conclusions and generate erroneous evaluation reports.

On the other hand, manual inspection procedures are usually unsafe and inefficient, because the

civil structures are relatively large and can be constructed in a harsh environment. Besides, some

portions of large or complex infrastructures are practically impossible to be inspected with

traditional human-based approaches. In addition, there is a significant labor shortage of skilled

workers in the field of infrastructure inspection in Canada. These issues result in very high long-

1
term maintenance costs and become even more challenging when a rapid assessment of

infrastructure on a regional scale is demanded by decision-makers right after the event of natural

hazards such as earthquakes.

Within this context, there is a compelling need to provide efficient, accurate and economical

automated structural health monitoring (SHM) methods (e.g., Salawu, 1997; An et al., 2019) to

replace traditional manual inspection methods, using novel sensing technologies and data

processing algorithms, which are more reliable, consistent, and less dependent on environmental

conditions. Prior to natural disasters, engineers and researchers can use these SHM methods in

long-term monitoring of civil structures for earlier damage detection, followed by necessary repair

actions to avoid catastrophic failure of the civil structures. During and immediately after natural

disasters, the health condition of civil structures can be rapidly obtained to aid decision-makers in

allocating critical resources (such as fire trucks, ambulances and police) to prevent the risks from

growing, thus shortening the recovery process.

Vibration-based and non-destructive testing and evaluation (NDTE)-based SHM methods

have been well established in recent decades to enhance the efficiency of damage inspections. In

general, vibration-based SHM methods typically rely on contact-type sensors to measure global

structural response to identify damage based on the structural modal properties (e.g., stiffness and

damping), which are related to natural frequencies and mode shapes. On the other hand, NDTE-

based methods are widely applied to damage detection with more focus on the local component

level. While the vibration-based and NDTE-based methods have shown promising results, there

exist several limitations, such as the high sensitivity of sensors to environmental effects,

2
complicated instrumentation setup, and high cost, to name a few. A more detailed discussion of

these limitations will be presented in Chapter 2.

In recent years, computer vision-based (or vision-based) SHM has been established as an

economical, efficient and novel complementary approach to vibration-based and NDTE-based

SHM methods for structural damage detection in various civil engineering applications (Spencer

Jr, Hoskere, & Narazaki, 2019). In comparison to contact-type sensors, vision-based methods use

non-contact detection and have relatively low costs in sensors, and their installation and operation.

In addition, due to the nature of cameras, the damage features in the image are immune to some

environmental effects such as temperature or humidity within the operating range. Despite the

achievements of existing vision-based SHM methods, there exist several limitations such as

insufficient algorithm speed for real-time deployment, algorithm robustness and constraints which

hamper their real-world applications, insufficient level of damage evaluation which does not

provide accurate and comprehensive damage assessment, etc. Detailed discussion about the

limitations of the existing vision-based research will be shown in Chapter 2.

1.2 Research goals and contributions

To address these limitations aforementioned, this dissertation has proposed a 3D vision-based

SHM and loss estimation framework to provide a more rapid and comprehensive performance

assessment of civil structures. Within the framework, advanced 2D and 3D vision methods, and

3D point cloud processing techniques have been developed to provide more comprehensive

damage evaluation. In addition, the vision-based damage evaluation pipeline is combined with a

3
loss estimation scheme to provide additional evaluation metrics. The main goals and contributions

of the research are summarized below:

1) Propose a 3D vision-based damage detection and loss estimation framework for civil

structures: The research is first intended to propose a 3D vision-based structural damage

detection and loss evaluation framework which provides a more rapid and comprehensive

solution to evaluate structural damages and the associated loss information, which can be

more easily conveyed to owners, stakeholders, and decision-makers to aid their decision

making.

2) Enhance existing 2D vision-based damage detection methods: Within the framework,

the research has proposed enhanced 2D vision methods from several aspects as follows.

a) It has improved the robustness of vision-based methods against external environmental

effects (e.g., background noise and illumination changes). b) It has made efforts in

improving the speed of the local damage evaluation algorithms towards real-time

performance, which provides a firm foundation for future deployments in rapid real-world

applications (e.g., structural component and damage localization). c) It has eliminated the

computational limitations of several existing vision-based methods for damage

quantification (e.g., concrete spalling and bolt loosening angle).

3) Develop the advanced 3D vision-based methods for a more comprehensive damage

evaluation of civil structures: Within the framework, the research has developed a 3D

vision-based damage evaluation methodologies for civil structures, as opposed to most

existing vision-based research that is built on 2D computer vision methods. The research

first expands the scope of most existing 2D vision-based damage detection methods from

4
damage recognition and localization, towards more detailed quantification in 3D space.

The proposed 3D vision-based framework consists of a preliminary rapid 2D vision-based

system-level and component-level damage assessment, followed by a more detailed 3D

vision-based damage quantification of structural components. The 3D vision methodology

incorporates 3D vision-based reconstruction, a novel 3D vision-based multi-view

structural component detection method, and newly proposed point cloud processing

methods to recognize, localize and quantify the structural damages. The effectiveness of

the rapid 2D vision-based assessment methods has been examined on the system level and

component level. The effectiveness of the 3D vision-based quantification methods has

been validated on three examples of structure components, where each one of them comes

from three prevalent types including reinforced concrete (RC) structures, steel structures,

and structural bolted components, respectively. Compared to the majority of the existing

damage detection methods developed in 2D computer vision, the proposed 3D vision

methods provide a more comprehensive damage evaluation.

4) Bridge the gap between vibration-based SHM and vision-based SHM through vision-

based vibration measurement: The dissertation is focused on the development and

application of computer vision techniques for visual damage evaluation of civil structures.

Further, as an ongoing and future research, the research has discussed an economical and

efficient vision-based method to measure the vibration response of structures. The

outcomes can be analyzed by vibration-based SHM methods to conduct additional

structural health evaluation using vibration theories. Besides, the outcomes allow the

damage state assessment of non-structural components to be incorporated into the

5
proposed framework in the near future. Once achieved, this will provide even more

comprehensive damage and loss evaluation outcomes of civil structures.

1.3 Objectives and methodology

This dissertation first proposes a 3D vision-based structural damage detection and loss

estimation framework for civil structures. The development and implementation of the framework

consist of structural damage data collection and preparation, training and validation of the

methods, implementation of the methods on three common types of civil structures, as well as a

case study to illustrate the loss estimation. The dissertation is first dedicated to the development,

optimization and validation of the new 2D vision methods to enhance the existing 2D vision-based

structural damage detection methods. Further, to achieve a more comprehensive structural damage

evaluation, the research has proposed more advanced 3D computer vision-based methods to detect

and quantify structural damages. The methodology consists of vision-based 3D dense point cloud

reconstruction, 3D vision-based multi-view object detection, 3D point cloud processing, and

damage quantification procedures. The 2D vision and 3D vision and deep learning algorithms are

developed and implemented in OpenCV, PyTorch, and MATLAB. The 3D point cloud

reconstruction pipeline is implemented in Meshroom, Metashape and Open3D. The point cloud

processing algorithms are developed and implemented in Open3D and MATLAB.

The development and application of deep learning-based computer vision algorithms for SHM

requires structural damage data to train, validate and test the computer vision and deep learning

algorithms. The structural damage data in this research were collected from available online

6
datasets (which will be presented in detail in Chapter 4), and experimental tests conducted in the

structural laboratory at the University of British Columbia (UBC).

The collected data are utilized to train the deep learning algorithms to perform classification

and object detection tasks on images of civil structures. Once trained, the algorithms are expected

to identify the structural condition at the system level (e.g., collapse or non-collapse), structural

damage states at the component level (e.g., light damage or severe damage), and localization of

critical structural components (e.g., structural columns, structural bolts) and critical damage

features (e.g., concrete spalling, exposure of steel reinforcement in damaged reinforced concrete

components). Meanwhile, this research also examines 2D and 3D vision-based methods in damage

quantification of three common structures. Further, these trained deep learning and computer

vision algorithms are validated through experimental test specimens in the UBC structural

laboratory, and on the on-site data collected from the web sources.

Once the damage information has been determined, the corresponding repair costs for the

components can be estimated according to the ATC-58 (2007) fragility database and the associated

guidelines. This fragility database is one of the essential products of the ATC-58 project

established by the Applied Technology Council (ATC) in contract with the Federal Emergency

Management Agency (FEMA) to develop FEMA P-58 Seismic Performance Assessment of

Buildings, Methodology, and Implementation (also known as Performance-Based Earthquake

Engineering). The total financial loss of the building can be determined by adding the total repair

quantities of all damaged structural and nonstructural components taking into account their

suitable unit cost distribution. The loss information can be easily conveyed to decision-makers and

7
stakeholders who lack engineering knowledge to perform early repair actions, or for risk

management and resource allocation during post-disaster reconstruction.

The proposed framework and SHM methods provide guidance to engineers and researchers

to practically implement them on civil structures at a large scale. Successful implementation of

these methods will reduce the cost, enhance efficiency and consistency during infrastructure

inspection, and shorten the post-disaster recovery process after the event of natural hazards such

as earthquakes. In addition, it can help alleviate the skilled worker shortage issue in Canada.

1.4 Thesis organization

This thesis is organized into six chapters with the following content.

Chapter 1: Introduction provides a description of the general background, research

motivation, main goals and contribution, objectives and methodology, and the thesis organization.

Chapter 2: Literature review provides a review of the various SHM methods, including

vibration-based methods, NDTE-based methods, TLS-based methods, and computer vision-based

methods for civil structures. The review discusses the advancements and limitations of the existing

SHM methods. More efforts have been made to highlight the limitations of existing vision-based

SHM methods, and demonstrate the need in proposing a more systematic 3D vision-based SHM

and loss estimation framework to provide a more rapid and comprehensive evaluation.

Chapter 3: 3D Vision-based SHM and loss estimation framework for civil structures –

towards more rapid and comprehensive assessment proposes a more comprehensive 3D vision-

based SHM and loss estimation framework. The main objectives of the framework are stated in

8
this chapter. Additionally, the detailed methodology is presented including the system-level

evaluation, component-level evaluation, and loss quantification.

Chapter 4: Development and application of vision-based SHM methods provides a

detailed description of the development and application of 2D and 3D vision-based SHM methods

for RC structures, steel structures and structural bolted components. Data collection and

preparation, development and training of the algorithms, experimental validations, as well as real-

world implementation for the proposed vision-based SHM methods are presented in detail. The

main contributions of the proposed methods to the existing methods are highlighted, while the

limitations are also summarized.

Chapter 5: Combined vision-based SHM and loss estimation framework presents the

general background of the FEMA P-58 performance-based loss estimation methodology, the

description of the fragility database, and a case study to illustrate the implementation of the

proposed vision-based SHM and loss estimation framework.

Chapter 6: Conclusions provides a summary of this dissertation. The key contributions and

limitations are presented. Discussions for the ongoing and future work are provided.

9
Chapter 2: Literature review

2.1 Overview

Civil infrastructure deteriorates continuously due to aging problems, human activities, and

environmental impacts. Damages to infrastructure will result in lower structural performance.

Accumulation or growth of these damages may result in catastrophic failure if no remedial action

is taken. Therefore, monitoring structural damage is of great importance to maintain structural

integrity, provide early warnings, and prevent catastrophic events. Traditional infrastructure

maintenance consists of regular human inspections, which are laborious, and heavily dependent

on the experience and expertise of inspectors. While this procedure is usually unsafe and

inefficient, some complex infrastructures are practically impossible to be inspected by manual

inspection approaches.

To address these issues, in the past decades, a wide variety of SHM methods such as vibration-

based and non-destructive testing and evaluation (NDTE)-based methods have been successfully

developed and applied to evaluate structural damages of many types of infrastructure at both the

system level and component level. The vibration-based methods are generally intended to detect

damage patterns by analyzing structural vibration response data (Wu & Jahanshahi, 2018). These

SHM methods have been proven to make the monitoring process more feasible, efficient and

reliable. The NDTE methods rely on special tools to estimate material properties, or to indicate

the presence of other defects or discontinuities. They have been shown to provide effective and

more detailed results in localized applications (Kumar & Mahto, 2013). In addition, research using

terrestrial laser scanners (TLS) has been attempted for the evaluation of different damage types,

10
such as concrete spalling quantification. The results indicate that the high-end TLS can reach a

desirable scanning accuracy (Park et al., 2007; Kim et al., 2015).

On the other hand, in recent years, computer vision-based (or vision-based) SHM methods

were investigated. As a complementary monitoring approach to other SHM methods, vision-based

methods have shown very promising results in detecting damages that are visually observable by

cameras or human beings. This indicates its great potential to significantly improve damage

inspection efficiency, compared to traditional manual inspection methods. In addition, it provides

a more economical and feasible way to perform damage localization and quantification in some

scenarios where vibration-based or NDTE-based SHM methods cannot be effectively or easily

applied at a reasonable cost.

The chapter is organized as follows. It starts with a brief review of common SHM methods,

including vibration-based SHM methods, general non-destructive testing and evaluation (NDTE)-

based SHM methods, and terrestrial laser scanning (TLS)-based SHM methods. Further, the

subsequent sections of the chapter are focused on a review of vision-based SHM methods. This

consists of advancements in computer vision and deep learning in recent years, followed by the

developments and applications of deep learning-based vision methods in structural damage

detection. The review highlights the advancements and limitations of these existing vision-based

SHM methods. Further, the chapter is concluded with the need in proposing a more advanced

framework to perform vision-based structural damage detection and loss estimation, leveraging

the state-of-the-art developments in the computer vision community, to provide more

comprehensive evaluation outcomes for engineers and decision-makers, thus benefiting the

research community, civil engineering industries, and society.

11
2.2 Vibration-based SHM methods

Vibration-based SDD methods are generally applied to civil structures at the global level

where the vibration response of the structures is utilized to understand the global state of the

structures (Peeters, Maeck, & De Roeck, 2001; Li, Deng, & Dai, 2007; Bayissa, Haritos, &

Thelandersson, 2008; Brincker, & Ventura, 2015; Amezquita-Sanchez, & Adeli, 2016). The idea

of vibration-based methods originated from the assessment of train wheels by hammer tapping the

wheels and analyzing the resulting sound (Turner & Pretlove, 1991). With the advancement of

sensing techniques and computational hardware, vibration-based SDD theories have received

enormous developments and applications in both model-driven methods (e.g., Kopsaftopoulos &

Fassois, 2013; Chang, & Kim, 2016; Vamvoudakis-Stefanou, Sakellariou, & Fassois, 2018), and

data-driven methods (e.g., Yuen, & Lam, 2006; Betti, Facchini, & Biagini, 2015; Hakim, Razak,

& Ravanfar, 2015; Ghiasi, Torkzadeh, & Noori, 2016; Abdeljaber, & Avci, 2016; Gulgec, Takáč,

& Pakzad, 2020; Ni, Zhang, & Noori, 2020; Jiang et al., 2021; Sajedi, & Liang, 2021; Eltouny, &

Liang, 2021). The vibration response is typically measured and recorded by a sensor network

deployed at predefined locations. The response will be processed in time, frequency, or modal

domains by advanced software algorithms to be translated to damage indicators which will be used

to recognize the existence of damages, and their locations and severity if damages are identified

(Figure 2.1).

12
Figure 2.1 Vibration-based structural health monitoring setup

2.3 NDTE-based SHM methods

NDTE-based methods are widely applied to damage detection with more focus on the local

component level. In general, dedicated sensing techniques are required such as ultrasonic testing

(UT), acoustic emissions (AE), infrared thermography (IT), radiographic testing (RT), magnetic

particle testing (MT), laser testing methods (LM), ground penetrating radar (GPR), and

piezoelectric (PZT) sensing (Ph Papaelias, Roberts, & Davis, 2008; Dwivedi, Vishwakarma, &

Soni, 2018). For example, electro-mechanical impedance-based SDD approaches are widely used

for component-level damage detection of small structural members. Piezoelectric units are

installed on the structural members to be monitored which will act as actuators and sensors

simultaneously. The units are excited at a relatively high frequency. The impedance signals across

the piezoelectric units are recorded and analyzed to assess the damage condition of the adjacent

area of the units. On the other hand, radiographic testing (RT) utilizes the x-ray that is able to

penetrate specimens and generates a radiograph showing any changes in thickness, or defects. If

13
an object has internal voids, more x-rays will pass in the void area and the part beneath that area

will have more exposure than that under the non-void area. This method can be considered to

detect sublevel small voids of thin-walled structures.

2.4 TLS-based SHM methods

Terrestrial laser scanning (TLS) is an efficient method to construct a 2D or 3D digital

representation of structures by projecting laser beams onto the surfaces of the structures (Van

Genechten, 2008). The word “laser” is an abbreviation which stands for Light Amplification by

Stimulated Emission of Radiation. A laser typically emits light in a narrow, low-divergence beam

with a well-defined wavelength, and has a large propagation distance. It propagates mostly in a

well-defined direction, at a constant speed in a certain medium. Due to these properties, laser is

well suited for measurement purposes for civil structures. The measurement methods using laser

light can be classified as triangulation-based methods and time-based methods. In the former type

of methods, a laser emitter and a camera are placed at a certain distance away from each other, at

a constant angle facing towards the object to be scanned. The laser emitter is used to project laser

beams onto the surface of an object to create a pattern (e.g., a set of points), which will be captured

by a camera which is placed at a certain distance from the laser emitter. Using the triangulation

principle based on the matching point pairs, the 3D shape of the structure can be constructed. The

triangulation scanners generally have a relatively short range (typically less than 10 meters) but

can reach very high accuracy. On the other hand, the latter type of methods using laser light, the

time-based methods consist of two scanning principles including pulse-based (time-of-flight) and

phase-based. A time-of-flight scanner has a transmitter which sends out lights, a receiver which

14
receives the lights reflecting from the surface of an object to be scanned, and a clocking device

which measures the light travel time. The time-of-flight scanners employ the simple fact that light

travels at a constant velocity in a certain medium. Once the time delay from the light source to the

scanning surface, and back to the source is measured, the distance between the light source to the

scanning surface can be measured. The surface can be constructed accordingly. The measurement

accuracy of time-of-flight scanners is highly dependent on the clocking mechanism, desired time

resolution, the counting rate, etc. The accuracy is also affected by signal strength, noise, time jitter,

and sensitivity of the threshold detector, etc. Phase-based scanners do not rely on high-precision

clocking mechanisms. The scanner modulates a light before projecting it onto an object's surface.

The reflected light wave from the surface is collected by a receiver. The phase difference between

the original wave and the reflected wave can be calculated, which can be used to determine the

distance between the light source to the object's surface. In general, compared to time-of-flight

scanners, phase-based scanners have higher speeds and resolution, but less precision. The accuracy

of the phase-based scanners is affected by signal strength, noise, stability of the modulation

oscillator, etc.

In recent years, research using TLS devices has been conducted to identify structural damages

in 3D space. The scanning results are point cloud coordinates in 3D space. For example, Mizoguchi

et al. (2013) quantifies the scaling damage of concrete bridge pier using a TLS. Kim et al. (2015)

employed a TLS to localize and quantify concrete spalling. Kim et al. (2021) proposed a damage

quantification framework for concrete bridge piers with more complicated shapes by processing

the point cloud obtained using a TLS. The accuracy of the TLS as reported in many of these studies

in structural engineering field applications generally ranges from 5 to 15mm.

15
2.5 Vision-based SHM methods

2.5.1 Overview

On the other hand, vision-based methods have been proposed as complementary damage

detection approaches to the aforementioned approaches. Vision-based methods have been proven

to be very effective in detecting structural surface damages that can be captured by modern

cameras or human eyes. In general, the implementation of vision-based methods is relatively

straightforward, which consists of the following parts:

• Cameras which can be carried by humans, or mounted to unmanned aerial vehicles

(UAVs) and unmanned ground vehicles (UGVs) to facilitate more efficient data

collection.

• Postprocessing algorithms which can be developed, validated, compiled, and deployed at

ease on a wide range of open-source platforms, thanks to the open-source nature of the

machine learning, deep learning, and computer vision communities.

• A computational unit typically equipped with a dedicated GPU for efficient parallel

processing of image batches. In recent years, computational power has benefited

significantly from the rapid developments of graphics, gaming, computer and mobile

device industry, leading to a fairly low cost when processing image data of structural

damages. For example, many newly developed lightweight vision algorithms (e.g.,

MobileNet by Sandler et al., 2018) can be effectively implemented on modern mobile

smartphones.

16
From the algorithm’s perspective, vision-based SHM methods can be classified into non-deep-

learning (DL)-based methods and DL-based methods. These methods will be discussed as follows.

2.5.2 Non-DL-based vision methods for SHM

Prior to the year of 2012, most computer vision methods developed for SDD were based on

traditional image processing techniques (IPT) such as color thresholding, image filtering, template

matching, edge-based segmentation, threshold-based segmentation, region-based segmentation,

histogram transform, and texture pattern recognition (Jahanshahi et al., 2009; Koch et al., 2014;

Koch et al., 2015). For example, Sinha et al. (2003) investigates different types of filtering

techniques, feature extraction and pattern recognition algorithms in identifying underground

pipeline defects such as cracks and holes. It was concluded that the background lighting condition

and the complicated texture on the pipe surface impose a big challenge for the damage detection

process. Choi and Kim (2005) used color, texture, and shape to characterize corrosion images and

classify the images into six categories (non-corroded specimen, crevice corrosion, intergranular,

pitting, fretting, and uniform corrosion. Although the accuracy was reported at ~85%, the

algorithms were validated on a relatively small dataset. German, Brilakis, & DesRoches (2012)

employed an entropy-based thresholding algorithm to localize the spalling area of RC columns.

Further, the authors employed a connected image pixel labelling algorithm, a global adaptive

thresholding algorithm, and a template matching algorithm, to estimate the concrete spalling length

and depth. The spalling localization accuracy was reported ~80%, while the average error of length

and depth measurement was reported ~4%.

17
2.5.3 DL-based vision methods for SHM

The transition from non-DL-based methods to DL-based methods occurred when the

breakthrough was achieved by AlexNet (Krizhevsky, Sutskever, and Hinton, 2012), which is built

upon convolutional neural networks (CNNs, a type of artificial neural networks) and has shown

substantial accuracy increase and robustness in image classification, object detection, and semantic

segmentation tasks compared to all the previous methods.

2.5.3.1 Advances in artificial neural networks

The development of artificial neural networks can be generally divided into 3 phases. The first

phase can be dated back to the 1940s–1960s, when the theories of biological learning (McCulloch

and Pitts, 1943) and the first artificial neural network such as Perceptron (Rosenblatt, 1958) were

implemented. The second period happened during the 1980–1995 period, when the back-

propagation technique (Rumelhart, Hinton, & Williams, 1986) was developed to train a neural

network with one or two hidden layers. During the 1990s, the artificial neural network evolved

into deep neural networks (DNNs), where multiple layers can be trained through the back-

propagation algorithm. One such application was the work done by LeCun, Bottou, Bengio, &

Haffner (1998) for document recognition. The third phase of neural networks (also named deep

learning) began with the breakthrough in 2006 when Hinton, Osindero, and Teh (2006)

demonstrated that a so-called deep belief network could be efficiently trained using a greedy layer-

wise pretraining strategy. With the fast-growing and optimization of the deep learning algorithms,

the increasing size of training data, as well as enhanced computational power, Convolutional

Neural Network (CNN, or ConvNet), which is a class of DNNs, or deep learning (DL), has been

18
advancing rapidly. Unlike traditional neural networks that utilize multiple fully-connected (FC)

layers, the hidden layers of a CNN typically include a series of convolutional layers that convolve

with multiplication or other dot product through learnable filters. In recent years, CNNs have

dominated the fields of computer vision, speech recognition, and natural language processing.

Within the field of computer vision, CNN such as the AlexNet, developed by Krizhevsky,

Sutskever, and Hinton (2012), has shown a substantial increase in accuracy and efficiency than

any other previous algorithms. With the success of AlexNet, CNNs have been successfully applied

in computer vision for classification, object detection, semantic segmentation, and visual object

tracking. In addition to AlexNet, other deeper CNN networks such as VGG Net (Simonyan &

Zisserman, 2014), Google Net (Szegedy et al., 2015), Deep Residual Net (He, Zhang, Ren, & Sun,

2016), DenseNet (Huang, Liu, Van Der Maaten, & Weinberger, 2017) and MobileNet (Sandler et

al., 2018) have been developed.

2.5.3.2 Developments and applications of DL-based vision methods for SHM of civil

structures

Ever since the great achievement by Alexnet, CNN-based vision methods have been widely

developed and applied in damage identification and localization of various types of infrastructure.

CNN-based vision methods have been proven effective for structural damage classification. These

include metal surface defects detection (Soukup & Huber-Mörk, 2014), post-disaster collapse

classification (Yeum, Dyke, Ramirez, & Benes, 2016), joint damage detection through a one-

dimensional CNN (Abdeljaber, Avci, Kiranyaz, Gabbouj, & Inman, 2017), concrete crack

detection using a sliding window technique (Cha, Choi, & Büyüköztürk, 2017), pavement crack

19
detection (Zhang et al., 2017; Vetrivel et al., 2018), damage detection of masonry structures (Wang

et al., 2018; Wang et al., 2019), structural damage classification with the proposal of Structural

ImageNet (Gao & Mosalam, 2018), regional post-disaster damage assessment based on time-

frequency distribution plots of ground motions (Lu et al., 2021), and post-hurricane preliminary

damage assessment of buildings using aerial imagery (Cheng, Behzadan, & Noshadravan, 2021).

In addition to classification, CNNs can be used in the field of object detection which involves

classification and localization of an object (e.g., damage area). Prior to the use of CNNs, object

detection was dominated by the use of histogram of oriented gradients (HOG) (Dalal, & Triggs,

2005) and scale-invariant feature transform (SIFT) (Lowe, 2004). In 2014, Girshick, Donahue,

Darrell and Malik (2014) proposed the region-based CNNs (R-CNNs), which utilizes the Region

Proposal Function (RPF) in order to localize and segment objects. It significantly improved the

global performance compared to the previous best result on PASCAL Visual Object Classes

(VOC) challenge 2012. The PASCAL Visual VOC challenge ran each year from 2005 to 2012,

which provides a benchmark in visual object category recognition and detection with a standard

dataset of images and annotation, and standard evaluation procedures. Discussion of object

detection proposal methods can be found in Hosang, Benenson, and Schiele (2014), Hosang,

Benenson, Dollár, and Schiele (2015). Further, the Fast R-CNN (Girshick et al., 2015) and the

Faster R-CNN (Ren, He, Girshick, & Sun, 2017) were developed to improve the speed and

accuracy of the R-CNN. Region-based CNN methods (e.g. RCNN, Fast-RCNN, and Faster-

RCNN) have been successfully implemented in civil engineering applications. In recent years, the

development and application the vision-based detectors have been well demonstrated for visual

damage localization of different types of structural systems (Spencer Jr, Hoskere, & Narazaki,

20
2019), such as reinforced concrete (RC) structures (e.g., Li et al., 2018; Liang et al., 2019; Zhang,

Chang & Jamshidi, 2020; Zhang & Yuen, 2021; Deng, Lu, & Lee, 2020; Liu et al., 2020; Chun,

Izumi, & Yamane, 2021; Miao et al., 2021; Maeda et al., 2021; Jiang et al., 2021), masonry

structures (e.g., Wang et al., 2018; Wang et al., 2019), steel structures (e.g., Yeum & Dyke, 2015;

Yun et al., 2017; Kong & Li, 2018; Xu, Gui, & Han, 2020; Luo et al., 2021; Pan & Yang, 2022),

roads and pavements (e.g., Tong et al., 2020), tunnels (e.g., Xue & Li, 2018), or multiple damage

detection of different types of structures (Cha, Choi, Suh, Mahmoudkhani & Büyüköztürk, 2018).

2.6 Discussion of limitations of the existing methods

2.6.1 Limitations of vibration-based and NDTE-based SHM methods

While the vibration-based and NDTE-based methods offer great advantages in their specific

application scenarios, several limitations can be observed as follows. a) Most vibration-based

methods require careful design of sensor placements, where numerical modelling of the target

structure is typically required to identify suitable sensor locations. Although this can be practically

achieved, there exist uncertainties in modelling assumptions made. Besides, the process requires

many efforts and becomes more difficult for structures with irregular shapes. b) Although NDTE-

based methods provide relatively detailed results at the local component level, generalization of

these detailed evaluation methods at the global infrastructure level becomes practically very

challenging and expensive. c) Both vibration-based and NDTE-based methods require relatively

complicated setup protocols and dedicated data processing software. This demands more

expensive training and the hiring of professional experts.

21
2.6.2 Limitations of TLS-based methods

The accuracy of the TLS as reported in many studies generally ranges from 5 to 15mm, and

is suitable for applications where an error of 5-15mm in quantifying local damage areas does not

greatly affect the global health inspection of the entire structures. Although the results are

promising in those studies, the TLS devices may not be accurate enough to quantify structural

damages at a relatively small magnitude such as crack width of less than 3 mm, or the out-of-plane

displacements of steel plate structures due to buckling less than 10 mm. Besides, these TLS are

generally expensive and may not be readily available to many researchers and engineers. Although

the low-end and mid-tier laser scanners cost less, they typically have much lower detection range

and accuracy. Moreover, these TLS devices can only be operated by qualified and well-trained

persons with extra precautions in accordance with laser safety guidelines.

2.6.3 Limitations of existing vision-based methods

As the dissertation is strongly focused on the developments and applications of vision-based

SHM methods. In this section, the main limitations of existing vision-based SHM methods are

summarized in detail to emphasize the need to propose a more comprehensive vision-based

damage evaluation framework. To wit:

• The algorithms employed in non-DL-based methods typically require a careful

preprocessing of the images such that the background information is sufficiently removed

to ensure the algorithm is applied to a relatively small region of interest. More importantly,

all the non-DL methods are not robust against background noises. The evaluation outcomes

22
are sensitive to the image noise, image quality and lighting conditions. Hence, the real-

world applications of the non-DL-based methods are rather limited.

• The current trend of computer vision developments and applications is strongly focused on

data-driven algorithms. The DL-based methods receive an enormous amount of attention

due to their high accuracy and robustness in real-world applications. However, the speed

of the CNN-based vision algorithms (i.e., DL-based methods) achieved in many damage

detection applications is not sufficiently high, which may hamper their deployment as rapid

assessment tools in real-world applications.

• A vast majority of existing vision methods (i.e., both non-DL-based and DL-based

methods) were developed in the area of 2D computer vision, which assumes the camera to

be placed at appropriate locations and poses to obtain informative photos or videos. The

evaluation outcomes are relatively sensitive to the locations and poses where the photos or

videos were taken.

• Existing 2D vision methods may not provide a comprehensive damage assessment of

structures. For example, considering a structural component is damaged on one side while

undamaged on the other side, if only the photos of the undamaged side are collected, there

will be no damage reported for this structural component, leading to a false evaluation.

• Most of these vision methods were developed for qualitative structural state identification

(e.g., building collapse) and damage localization, while vision-based damage

quantification, particularly in 3D space, remains very limited.

• Some 2D vision methods show promising results in quantifying in-plane damage extent

(e.g., concrete crack width, concrete spalling area, bolt rotation). However, they are

23
incapable of quantifying out-of-plane damage features such as concrete spalling volume or

spalling depth, and out-of-plane structural deformations. This is due to the limitation that

processing 2D images individually cannot yield out-of-plane 3D damage information.

• Lastly, the vision-based methods provide structural damages as output, such as concrete

cracks, concrete spalling and steel corrosion, of specific structural components. Damage

information may be of prime interest to engineers and researchers to understand the

structural residual stiffness and capacity. However, local damage information of specific

structural components may not be useful enough, and can be difficult to understand for

owners or decision-makers who likely lack engineering knowledge, but instead pay more

attention to the global condition of the structures, cost to repair the damaged structures or

socio-economic impacts due to potential interruptions caused by damages. In such

situations, it is necessary to convert such damage information to other metrics (e.g., repair

cost) that are easier to be interpreted by owners, stakeholders, and decision-makers.

2.7 Research motivations

To address these limitations and challenges, this dissertation proposed a more comprehensive

3D vision-based structural damage evaluation and performance quantification framework. Within

the framework, first, a 3D computer vision-based structural damage evaluation methodology is

proposed to identify, localize, and quantify structural damages in 3D space. At the time of writing,

there exist almost no attempts at 3D vision methods in structural damage evaluation, and their

developments and applications are currently in the infancy stage (Bao, & Li, 2021). Within the

damage evaluation methodology, damage evaluation based on both 2D and 3D vision is

24
investigated. The 2D vision research is intended to optimize the speed of the existing 2D vision

algorithms for real-time local applications, while still maintaining high accuracy compared to the

existing methods. The 3D vision research is aimed at providing more detailed damage

quantification of structural components. The last part of the research is intended to combine the

damage evaluation pipeline with the performance quantification procedures to provide additional

loss metrics that are easier to be conveyed to owners and decision-makers.

25
Chapter 3: 3D vision-based SHM and loss estimation framework for civil

structures – towards more rapid and comprehensive assessment.

3.1 Overview

In this chapter, a 3D vision-based SHM and loss estimation framework is proposed, which

aims to provide a more rapid and comprehensive performance assessment of civil structures.

Within the framework, the vision-based damage detection pipeline consists of system-level failure

identification, component-level damage state recognition, localization, and quantification of civil

structures. Further, the damage detection pipeline is combined with loss quantification procedures

to provide additional metrics to aid decision-making. The methodology presented in this chapter

will be implemented on three prevalent types of civil structures, as will be detailed in Chapter 4.

3.2 Introduction

A rapid vision-based SHM and loss estimation framework is proposed in this study to identify

damages and quantify the associated loss of civil structures (Figure 3.1). First, system-level and

component-level images for the structures are collected, which can be achieved by manual site

inspection, unmanned aerial vehicles (UAVs), or preinstalled cameras. The system-level images

are assessed by system-level classification CNNs to confirm if the structure has collapsed. If the

system-level collapse is identified, the replacement loss of the structure should be estimated

considering various factors such as the current market value of the structure, cost of demolition,

and potential cost inflation due to disruptions in the entire local supply chain in a natural disaster.

If the building is identified as non-collapse, the component-level images are input into component-

level classification and localization CNNs. Once the component damage states are identified, the
26
corresponding repair costs for the components were identified and the total loss of the structure

can be computed accordingly through the PBEE methodology and FEMA P-58 fragility database.

The dissertation is focused on the development and application of CNN-based vision methods to

identify structural damage states at the system level and component level. The loss estimation is

achieved by adopting existing well-established literature from FEMA P-58 Seismic Performance

Assessment of Buildings, Methodology, and Implementation. The total loss is the general decision

variable, which can be defined as repair cost and repair time, etc. Within the context, the

framework proposed in this dissertation makes the first attempt to integrate the proposed CNN-

based computer vision methods with the PBEE methodology to facilitate loss evaluation of civil

structures.

Figure 3.1 Vision-based SHM and loss estimation framework


27
3.3 Data collection

This section briefly discusses possible image data collection methods. The methods of data

collection can be determined based on the application scenarios of structural damage detection,

which is typically classified into long-term continuous structural health monitoring, scheduled

periodic site inspections, and post-disaster site inspections, etc. In the long-term monitoring

scenario, preinstalled sensing devices such as RGB or thermal cameras, and other structural

vibration measurement sensors can be considered. In the periodic and post-disaster site inspection

scenarios, manual inspection, UAV-based and UGV-based inspection, or the combinations of

these methods can be considered. In these two scenarios, although manual inspection is a common

practice at the time of writing, the deployment of UAV fleet and UGVs on sites is very likely to

be a future trend for autonomous, efficient and safe data collection, especially for applications at

large scale (Ham et al., 2016). In the post-disaster scenario, rapid data collection and analysis right

after the disaster is particularly important to provide valuable inputs to decision-makers to make

informed risk management decisions. The current manual inspection practices are time-

consuming, inherently based, highly dependent on the proper training of the inspectors, and impose

a threat to the life safety of the inspectors. Therefore, it is of prime interest to researchers to develop

more intelligent and efficient autonomous data collection methods for next-generation vision-

based structural damage detection. It should be noted that this dissertation is not focused on the

development of autonomous data collection methods. However, for completeness, a more detailed

discussion about the use of advanced robotic technologies (e.g., UAVs and UGVs) for image data

collection are provided in Appendix A.

28
During data collection, to facilitate a more accurate and comprehensive structural damage

assessment, images or videos of the structural systems and components should be taken from

multiple views at multiple locations. This is because first, civil structures generally have a large

scale where structural components are constructed at different spatial locations, and second, some

portions of structural components may be damaged while the remaining portions may stay intact.

3.4 System-level collapse recognition

Civil structures continuously deteriorate. Structural collapse can occur if the structures are not

properly maintained, and consequently, the accumulation of local damages may cause a global

failure of the structure. On the other hand, in extreme natural events such as major earthquakes,

structures can experience significant nature forces exceeding their designed resisting capacity. In

this case, structural collapse may occur within a very short period without any prewarning.

Identification of structural collapse is important to provide information to decision-makers,

engineers and rescue workers. On one hand, structural collapse greatly contributes to the total loss

of a region, which is important information to decision-makers. The collapse of lifeline systems in

particular needs immediate attention during the post-disaster recovery process, and repair or

reconstruction actions need to be applied promptly. On the other hand, chances are survivors may

be hidden or stuck under the debris due to structural collapse. Rapid identification of structural

collapse in a large region can greatly help deploy rescue workers efficiently to the key locations

where collapse occurs, thus saving more human lives.

29
Figure 3.2 Building collapse in Wenchuan earthquake, 2008

Therefore, the framework incorporates the identification of structural collapse in the first

place. As structural collapse exists in a wide variety of ways, traditional computer vision methods

which are built upon hand-crafted features do not work well. This research adopted the advanced

CNN architecture to build the classification vision methods for collapse recognition. The structure

is classified by CNN-based vision algorithms into collapse or non-collapse (Figure 3.3). If the

structure is identified as collapse, the total loss of the structure is taken as the output. In this case,

no further evaluation of structural components is required. If the structure does not experience

collapse, the damages to critical structural components (e.g., columns) will be evaluated.

30
Figure 3.3 Vision-based collapse identification

It should be noted that the above classification procedures by the CNN model only conduct a

rapid assessment at the system level to identify whether collapse has occurred visually. In some

cases, a building may not collapse after an earthquake, but can have unacceptable residual drift

(e.g., exceeding the code drift limit, or not expected to achieve satisfactory performance even after

repair). The owners may decide to demolish the building. Therefore, the building is still considered

the same as “collapse” and the replacement cost should be evaluated. The residual drift may be

obtained using distance measurement tools during the post-disaster inspection, or using sensors

31
preinstalled on each floor of the building. In brief, while the system-level CNN model can provide

a rapid assessment, the CNN model and the drift measurement should both be considered for more

comprehensive evaluation at the system level.

3.5 Component-level damage recognition

If the structures are identified as non-collapse, component-level damage state recognition will

be conducted. Classification of damage states can be referred to existing structural damage

evaluation codes. One of the noticeable structural performance evaluation guidelines, FEMA P-58

(ATC-58, 2007), can be considered. The guideline provides the ATC-58 database where damage

states of structural and non-structural components are defined. For example, the definition of

damage states for reinforced concrete beam-column joints is shown in Table 3-1. Figure 3.4

presents an example of damage state recognition results for structural columns. If the structural

components are identified as having no damage, there will be zero associated loss for these

components. Otherwise, if damages are recognized in any structural components, damage

localization and quantification will be performed as shown in the subsequent sections. Such

evaluation procedures have been practically accepted and implemented manually for many years

as demonstrated by Nakano, Maeda, Kuramoto, & Murakami, M. (2004) and Maeda, Matsukawa,

& Ito (2014).

Table 3-1 Description of damage state classes

DS index Description
0 No damage

32
1 Light damage: visible narrow cracks and/or very limited
spalling of concrete
2 Moderate damage: cracks, large area of spalling
concrete cover without exposure of steel bars
3 Severe Damage: crushing of core concrete, and or
exposed reinforcement buckling or fracture

Figure 3.4 Vision-based damage state recognition

3.6 Component-level damage localization

Damage state classification provides a preliminary assessment of structural damages.

However, a single damage classification network may not perform well in the situation of multiple

complex damage patterns present on the structural components at the same time. In this case, a
33
second network dedicated to identifying local damage features can be additionally implemented

to enhance the accuracy of damage state recognition. Details of an example of such cases will be

demonstrated in Section 4.2. Besides, identifying the locations of damage within the structural

components is important to understand the residual performance of the structures. Figure 3.5,

Figure 3.6, and Figure 3.7 present examples of damage localization for reinforced concrete

columns including concrete cracks, steel reinforcement exposure, and concrete spalling,

respectively.

Figure 3.5 Vision-based damage localization: concrete cracks localization

34
Figure 3.6 Vision-based damage localization: steel reinforcement exposure localization

Figure 3.7 Vision-based damage localization: concrete spalling localization

3.7 Component-level damage quantification

3.7.1 Overview

While the damage state classification and damage feature localization methods provide the

preliminary information on damage severity and location, they do not offer detailed damage

quantifications. Quantification of damages, such as the concrete spalling volume or spalling depth
35
of concrete structures, provides important metrics in damage inspection scenarios (Beckman,

Polyzois, & Cha, 2019; Kim, Yoon, Hong, & Sim, 2021). In such cases, as described in Section

2.5, existing 2D vision-based methods cannot determine such metrics. Most of the existing vision

methods were developed for qualitative structural state identification (e.g., building collapse) and

localization, while vision-based damage quantification research, particularly in 3D space, remains

very limited. In essence, quantification of structural damages cannot be accurately achieved in 2D

space in many situations. Hence, 3D computer vision methods are considered in this research.

Therefore, this section provides an overview of the 3D vision-based quantification procedures,

which consists of vision-based 3D reconstruction to generate dense 3D point cloud data (i.e., RGB

+ depth) from 2D images, multi-view CNN-based 3D object detection, and point cloud processing

for damage quantification, as shown in Figure 3.8. Image sources for 3D reconstruction can be

obtained using common consumer-grade cameras such as smartphone cameras, unmanned aerial

vehicles (UAVs) or unmanned ground vehicles (UGVs) with appropriate camera specs. These

procedures are described in the following sections. More details of the damage quantification

procedures with respect to specific structural components investigated in this research are

presented in Chapter 4.

It should be noted that in many situations, 3D reconstruction of the entire structural system

can be very computationally expensive and is generally not needed. In this dissertation, 2D vision-

based methods are first used to identify system-level structural collapse. If collapse is identified,

the evaluation is completed. If no collapse is identified, then the structural component-level

evaluation will be conducted. When evaluating structural component damages, 2D vision-based

methods are first applied to recognize and localize the damaged area to provide qualitative damage

36
measures. Further, 3D reconstruction is only applied to the damaged structural components

identified, or the damaged area localized. In some situations, depending on the application

scenarios and the requirements of the damage evaluation outcomes, 3D reconstruction may not be

needed. For example, if a concrete column is identified to have a complete failure, 3D

reconstruction is not needed because the column should be replaced, rather than repaired. On the

other hand, if a concrete column has light cracks and concrete spalling, quantification of crack

width and spalling volume will be useful to assess its residual performance, guide the repair action,

and estimate the associated cost.

Figure 3.8 3D vision-based damage quantification pipeline for structural components

3.7.2 Vision-based 3D reconstruction

In this section, 3D reconstruction procedures are described. As shown in Figure 3.9, in general,

3D reconstruction consists of data association which establishes image-to-image connection,

37
camera pose estimation using the geometrically validated image pairs, structure-from-motion

which determines sparse point cloud from the image scene graph, multi-view stereo which

generates dense point cloud based on the sparse point cloud and the input RGB images, and dense

point cloud preprocessing which cleans the reconstructed dense cloud using outlier removal

methods.

38
Figure 3.9 Vision-based 3D reconstruction procedures of an RC column

39
3.7.2.1 Data association

First, a sequence of image frames was collected to form an unstructured image database. It

should be noted that the resolution of the images should be maintained reasonably high (e.g.,

1080p, 4K, or higher) for better reconstruction accuracy. Second, the feature points extraction

algorithm, scale-invariant feature transform (SIFT) proposed by Lowe (2004), is applied to all the

images to detect feature points. Next, the image correspondence is established between image pairs

that share similar features. This can be achieved using the naïve search to find the most similar

features in one image to another image, and iterate through the entire database. However, this is

very computationally expensive. In recent years, many efficient algorithms have been proposed to

find the image correspondence as summarized by Schonberger & Frahm (2016). In this research,

the vocabulary tree (VocTree) algorithm (Nister & Stewenius, 2006), is selected for this purpose.

Finally, the output of the above steps is a scene graph, which contains many image pairs within

the database.

3.7.2.2 Structure-from-motion

Once the image correspondence is completed, feature matching is conducted to find the closest

matched features in the corresponding image pairs using the KD-tree-based approach (Muja &

Lowe, 2009). The camera matrix (i.e., camera projection matrices such as the fundamental matrix

for the uncalibrated camera, or essential matrix for the calibrated camera) for the image pair can

be estimated using epipolar geometry (Hartley & Zisserman, 2003). The corresponding point pairs

from two image views must satisfy the following equation,

40
𝒙𝑇𝑖 𝑭𝒙𝑗 = 0, 𝑤ℎ𝑒𝑟𝑒 𝒙𝑖 ∈ 𝑃𝑖 𝑎𝑛𝑑 𝒙𝑗 ∈ 𝑃𝑗
(3.1)
𝑭 = 𝑲𝒊−𝑻 [𝒕]× 𝑹𝑲−𝑻 𝒋
𝑃𝑖 = (𝑥𝑎 , 𝑦𝑎 )|𝑎 = 1,2, … , 𝑁}
{
{ 𝑃𝑗 = {(𝑥𝑏 , 𝑦𝑏 )|𝑏 = 1,2, … , 𝑀}

where 𝒙𝒊 refers to the coordinates of a point from the feature set, 𝑃𝑖 , identified in the 𝑖𝑡ℎ image,

and 𝒙𝒋 is the corresponding point from the feature set, 𝑃𝑗 , identified in the 𝑗𝑡ℎ image. The camera

matrix is denoted as 𝑭, which is to be estimated. 𝑲𝒊 and 𝑲𝒋 are the camera intrinsic matrices. The

parameter 𝑹 and 𝒕 define the relative rotation and translation, respectively, between the two views,

where [𝒕]× is the anti-symmetric matrix of 𝒕. During this process, if a camera matrix is found that

maps a sufficient number of feature points from one image to the other, these points will be

considered geometrically validated. Besides, Random Sample Consensus (RANSAC) algorithm

(Fischler & Bolles, 1981) is required to reduce the effects of outliers. The image that is determined

to have insufficient shared features with any other images will be considered an outlier and

excluded from the computations that follow. The estimated camera matrices are used to generate

a sparse point cloud of the scene using the triangulation methods developed by Hartley, Gupta, &

Chang (1992), or Hartley & Sturm (1997), or more recently by Kang, Wu, & Yang (2014). Finally,

the bundle adjustment (BA) algorithm (Triggs et al., 1999) is applied to adjust the camera

parameters and points estimated previously such that the reprojection error is minimized, to further

refine the sparse cloud.

41
3.7.2.3 Multi-view stereo

The estimated sparse point cloud and the images from the scene graph are used to generate a

dense 3D point cloud of the scene. This step is aimed to approximate the depth value for every

single pixel of the images, using the semi-global matching (SGM) method (Hirschmuller, 2007).

During this process, it is recommended to use CUDA-enabled GPU to speed up the computation.

This process can achieve sub-pixel accuracy (Hirschmuller, 2007).

3.7.2.4 Dense point-cloud preprocessing

Prior to any point cloud post-processing steps, it is important to denoise the original point

cloud data. In this study, statistical outlier removal (Carrilho, Galo, & Santos, 2018) and radius

outlier removal methods are implemented on the reconstructed dense cloud. The statistical outlier

removal assumes that the distance between a point and its neighboring points is normally

distributed. The algorithm checks for every point in the point cloud, and calculates the mean

distance between the point and its K nearest neighbors. The points will be considered outliers if

they are not within N standard deviations from the mean. The value of K and N are manually

specified, typically chosen as 20 and 2, respectively. The radius outlier removal method eliminates

points that have a small number of neighboring points in a sphere with a given radius around them.

The procedures are performed in Open3D (Python API) which is an open-source library for 3D

data processing. It should be noted that in the situation of highly noisy dense clouds (which may

be attributed to the source images collected in poor lighting conditions, insufficient image

resolution, or low-configuration cameras), additional manual point cloud cleaning can also be

considered.

42
3.7.3 Multi-view structural components localization

The proposed 3D vision reconstruction method generates the 3D coordinates as well as the

color information for the point cloud. Therefore, it can be efficiently processed using image

processing techniques leveraging the recent developments of CNN-based computer vision

methods. Existing point cloud-based 3D object detection algorithms, such as Complex-YOLO

(Martin, Stefan, & Karl, 2018) and PointPillar (Lang et al., 2019), are designed to process

organized LiDAR point clouds only. Hence, they cannot be directly applied to the unorganized

photogrammetric point cloud generated by the 3D vision-based reconstruction pipeline.

For these reasons, a multi-view vision-based 3D object detection method is developed to

detect structural components from an unorganized 3D scene cloud. This method builds upon a

CNN-based 2D object detector. Depending on the application criteria, The CNN-based detector

will be trained to detect various types of civil structural components with available training images.

Depending on the selection of the CNN-based object detector, the above procedures can achieve

real-time performance. Besides, compared to traditional image-based object detection algorithms,

one advantage of CNN-based object detection algorithms is their robustness in localizing objects

of interest in relatively complicated external environments, where many irrelevant objects and

background noise are present. In order to extract the object of interest from the 3D scene, a

minimum of two camera viewpoints are required to remove background objects sufficiently, as

illustrated in Figure 3.10. In this research, for convenience, the point cloud of the 3D scene is

automatically rendered onto the XZ and YZ plane view, which will then be processed by the CNN-

based object detector to localize the components. The generated bounding boxes on the two view

43
planes will be used to crop out the point cloud of the steel component from the original 3D scene

cloud.

It should be noted that the above object detection procedures to extract the structural

components are very important before further dense point cloud postprocessing. This is because,

a) processing of the entire original cloud is very inefficient and computationally demanding, and

b) once the structural components are extracted, algorithms can be strategically designed to process

the point cloud of structural components only, without dealing with any other irrelevant

surrounding objects. In fact, without removing the surrounding irrelevant objects, many standard

dense point cloud processing algorithms, such as line fitting and plane fitting, cannot be effectively

applied for damage quantification of the structural components.

44
Figure 3.10 Multi-view structural component localization in a 3D point cloud

45
3.7.4 Dense point cloud postprocessing

Once the structural components are localized, the damage regions of the components should

be identified and quantified. Various types of point-cloud processing algorithms such as line/arc

fitting, plane segmentation, can be considered for different types of structures such as steel or

concrete components. For example, Figure 3.11 shows a spalling quantification approach applied

to a reinforced concrete column. This can be achieved by plane fitting and point projection

methods. Detailed algorithmic implementation of this will be presented in Chapter 4. In the end,

critical damage quantities, such as total concrete spalling volume and location, as well as the steel

reinforcement exposure length can be estimated.

It should be noted that damage quantification methods vary when processing different types

of structures. This research investigates three examples of structural components. Each one comes

from three prevalent structural types including reinforced concrete structures, steel structures, and

structural bolted devices. Details of the development and experiments of damage quantifications

for the structural components investigated in this research are presented in Chapter 4.

46
Figure 3.11 Point cloud processing: concrete spalling quantification

3.8 Loss quantification

Structural damage evaluation methods presented in Section 3.4-3.7 provide the outcomes of

damage recognition, localization and quantification. In some situations, such outcomes are

sufficient for engineers and researchers to assess the damage condition and residual capacity of

the structure. However, as stated in Section 2.6, local damage information of specific structural

components may not be useful enough and can be difficult to understand for owners or decision-

makers who are likely to lack of engineering knowledge, but instead pay more attention to the

global condition of the structures, and cost to repair the structures if damaged. In such situations,

47
it will be extra useful to convert such damage information to other metrics (e.g., repair cost) that

are easier to be interpreted by owners, stakeholders, and decision-makers to make more informed

decisions.

Therefore, in this section, a loss quantification approach is presented to provide an additional

quantification of the financial metric associated with damage outcomes. For this purpose, the 3D

vision-based damage evaluation methodology presented previously is combined with the loss (e.g.,

repair cost) quantification methodology. Notably, one of the loss quantification methodologies is

known as Performance-based Earthquake Engineering (PBEE) evaluation (ATC-58, 2007). An

essential product of the PBEE methodology is the development of the fragility database and its

implementation guidelines. The fragility database comes from the ATC-58 project established by

the Applied Technology Council in contracts with the Federal Emergency Management Agency

(FEMA) to develop FEMA P-58 Seismic Performance Assessment of Buildings, Methodology

and Implementation. Traditionally, the loss quantification step within the PBEE framework is

based on damage states estimated from structural response quantities (e.g., floor displacements,

accelerations) of structural and non-structure components. Such implementations of the PBEE

framework for cost evaluation of buildings have been widely attempted (Goulet et al., 2007; Yang,

Moehle, Stojadinovic, & Der Kiureghian, 2009; Mitrani-Resier, Wu, & Beck, 2016).

In this research, rather than estimating damages from response quantities, the damage states

of the structural components are determined by the proposed vision-based methodologies as

documented from Section 3.4 to Section 3.7. Once damage states are determined, the loss

quantification is adopted from the PBEE procedures. The total loss of the structures can be

48
calculated using the Monte Carlo simulations with the repair cost/time functions from the PBEE

database as follows.

𝜆(𝐷𝑉) = ∫ 𝐺⟨𝐷𝑉|𝐷𝑀⟩𝑑(𝐷𝑀) (3.2)

where 𝐷𝑀 stands for damage measures. The parameter, 𝐷𝑉, stands for decision variables such as

repair cost or repair time. The result, 𝜆(𝐷𝑉) is typically represented as a cumulative loss

distribution curve (i.e., a summary of the probability of repair cost exceeding a certain value),

which can be easily conveyed to owners or decision makers. To illustrate the loss quantification

procedures, a case study will be presented in Chapter 5.

49
Chapter 4: Development and application of vision-based SHM methods

4.1 Overview

This dissertation is strongly focused on the development and application of 3D vision-based

structural damage detection methods for civil structures. Therefore, this chapter presents the most

significant portion of the dissertation. The chapter consists of detailed studies on damage detection

of three examples of structural components, where each example comes from three common and

widely used structural categories including RC structures, steel structures and structural bolted

connections, respectively. Section 4.1 depicts the general workflow of the CNN-based 3D vision

methods, from damage recognition to localization and quantification. Section 4.2 is focused on the

damage evaluation of structural RC columns, where the majority of the findings have been

published in Pan & Yang (2020). Section 4.3 proposes an out-of-plane damage evaluation pipeline

of steel plate structures, where the research outcomes have been reported in Pan & Yang (2022).

Section 4.4 investigates vision methods for bolt loosening evaluation of a structural bolted

component, where part of the research outcomes has been published in Pan & Yang (2021).

Comparative studies and parameter studies have been conducted to demonstrate the

effectiveness and efficiency of the proposed damage detection methods over the existing methods.

The results in this chapter indicate that the proposed 3D vision and CNN-based methodology can

achieve high accuracy and efficiency, and more comprehensive evaluation, compared to many of

the existing methods. This provides a more solid foundation when these damage detection methods

are combined with the PBEE-based loss estimation scheme as will be shown in Chapter 5.

50
4.2 Vision-based SHM methods for RC structures

4.2.1 Introduction

Reinforced concrete (RC) structures are one of the most prevalent structural systems

constructed worldwide. With many of these structures built in high seismic zones, the performance

of RC buildings after strong earthquake shaking is becoming a significant concern for many

building owners. When an earthquake happens, decision-makers such as city planners, and

emergency management departments need a first-hand response to allocate resources to manage

the damaged infrastructure. This requires rapid performance assessments of the facilities.

Traditional post-earthquake inspections were performed manually, and the results may be biased

and highly reliant on the proper training of the inspectors and qualitative engineering judgments.

The processing time may also be very long, due to the large amount of data processing required.

These deficiencies can be overcome if the current manual evaluation processes are fully automated

(Zhu & Brilakis, 2010). On the other hand, although conventional image processing techniques

(IPTs) have been applied in the past, these methods are relatively time-consuming and not robust

against background noises. Hence it is ineffective to apply in practice.

In recent years, CNN-based vision methods have been investigated for damage detection of

reinforced concrete structures, such as concrete crack detection using a sliding window technique

(Cha, Choi, & Büyüköztürk, 2017), structural damage classification of reinforced concrete

structures aided by transfer learning (Gao & Mosalam, 2018), structural multiple damage

localization using Faster-RCNN (Cha, Choi, Suh, Mahmoudkhani & Büyüköztürk, 2018), near-

real-time concrete defect detection with geolocalization using a unified vision-based methodology

(Li, Yuan, Zhang, & Yuan, 2018), and RC bridge column recognition and spalling localization

51
using deep learning with Bayesian optimization (Liang, 2019), concrete crack detection using

robotic technologies (Liu et al., 2020; Jiang & Zhang, 2020), and crack growth quantification using

dual CNNs (Kong et al., 2021).

While many of these CNN-based methods can provide reasonable accuracy, they are still

relatively slow in terms of achieving real-time practical application when the images were recorded

with high frames per second (FPS). To address this deficiency, Redmon, Divvala, Girshick, &

Farhadi, (2016) presented YOLO (i.e. You Only Look Once) for real-time object detection. While

YOLO is extremely fast, it makes more localization errors and achieved relatively low recall

compared to region-based CNN methods. To further improve recall and localization accuracy,

Redmon & Farhadi (2017) developed the YOLOv2 algorithm. They have shown that YOLOv2

significantly improves the recall and localization accuracy while still maintaining the speed to be

10 times faster in FPS compared to Faster-RCNN on the VOC 2007 database.

CNN-based classification for civil engineering applications has been hampered by limited

training data (Gao & Mosalam, 2018). In general, a single classification model can provide

reasonable accuracy if the training data covers a wide range of hidden features. However, even

when the size of the training data is sufficiently large, the classification model may still not perform

well if the training data is not properly pre-processed to identify the localized damage (Gao &

Mosalam, 2018). For example, an image may contain multiple damage states where a portion of

the structure has fractured, while the other part of the structure remains undamaged.

At the time of the publication of Pan & Yang (2020), although region-based CNN methods

have been widely applied in civil engineering, it remained almost little to no attempt at regression-

based detection methods such as YOLOv2, for structural damage detection. Moreover, existing

52
developments and applications of 3D vision methods remain very limited in structural damage

detection and are currently in the infancy stage (Bao, & Li, 2021). For example, Xiong et al. (2015)

proposed a 3D reconstruction algorithm to reconstruct buildings with complex roofs. The

algorithm was validated on both LiDAR point clouds and photogrammetric point clouds. Hu et al.

(2019) proposed a learning-based 3D reconstruction method to determine the structural graph

layouts of long-span steel bridges. These studies were focused on the investigation of algorithm

applicability in reconstructing civil structures. However, existing developments and applications

of 3D vision methods remain almost none in structural damage quantification and are currently in

the infancy stage (Bao, & Li, 2021).

To address the limitations aforementioned, this paper proposed a hierarchical approach which

evaluates the damage of RC structures at both the system level and component level. At the system

level, CNN-based classification is implemented to identify collapse. If no collapse is identified,

CNN-based classification is then implemented to identify the damage state of RC components (i.e.,

RC columns in this study). To enhance the accuracy in recognizing the damage state of the RC

column, a second CNN is added to focus on detecting steel reinforcement exposure. Steel

reinforcement exposure is a critical damage feature of the most severe damage state that imposes

a big life safety threat and requires high repair costs. Furthermore, 3D vision-based damage

quantification procedures are proposed for estimating concrete spalling volume and steel

longitudinal reinforcement exposure length. Therefore, the main contributions of the research are

summarized as follows: (a) it established component training data that follow the codified damage

state classification of the RC columns; (b) it effectively examined the applicability of the real-time

detector, YOLOv2, to identify the critical damage feature of RC columns; (c) proposed and

53
successfully implemented the dual CNN methods which incorporate the classification network and

YOLOv2 object detection network to improve the accuracy achieved by a single classification

network; d) proposed 3D vision-based damage quantification procedures for RC columns, where

the concept can also be generalized to other RC components such as RC beams and walls.

4.2.2 Methodology

4.2.2.1 Overview

The 3D vision-based damage evaluation of RC structures consists of system-level evaluation

and component-level evaluation. Section 4.2.2.2 presents the CNN-based classification methods

for system-level collapse identification and component-level damage state recognition. This step

provides a preliminary rapid damage assessment of the RC structures, which can be considered if

efficiency is the priority for decision-making, such as in post-disaster scenarios. Section 4.2.2.3

presents the CNN-based object detection algorithm which intends to localize critical damage

features of RC structures. Section 4.2.2.4 demonstrates the use of classification and object

detection results improves the reliability and accuracy in determining structural damage states.

Section 4.2.2.5 presents detailed 3D vision-based damage quantification procedures for two

common types of RC columns.

4.2.2.2 CNN-based classification

In this research, CNNs were used to identify the damage states of the building systems and

components. Typical CNNs involve multiple types of layers, including Convolution (Conv) layers,

Rectified Linear Unit (ReLU) layers, Pooling layers, Fully-connected (FC) layers and Loss layer
54
(e.g. Softmax layer). The Conv layer combined with the subsequent layer, ReLU, constitute the

essential computational block for CNNs. This is the feature that distinguishes CNNs from the

traditional fully connected deep learning network. One of the advantages of CNNs is that it

drastically improves the computational efficiency of the traditional neural network because the

number of training parameters enclosed in the filter of CNNs is significantly less than the number

of weights utilized by fully connected layers which are the only layers presented in the traditional

feed-forward neural network. Besides, CNNs preserve the spatial locality of pixel dependencies

and enforce the learnable filters to achieve the strongest response to a local input pattern.

During the forward pass, the output from the previous layer is convolved with each one of the

learnable filters, which yields a stack of two-dimensional arrays. Applying the desired nonlinear

activation function (such as ReLU) to these two-dimensional arrays leads to a volume of two-

dimensional activation maps. After a single or multiple Conv-ReLU blocks, the pooling layer is

introduced which is a form of non-linear down-sampling. The objective of the pooling layer is to

reduce the number of parameters to improve the computation efficiency. During the pooling

process, the input image is partitioned into sub-regions which may or may not overlap with each

other. If max pooling is used, the maximum value of each sub-region is taken.

Following several Conv-ReLU blocks and pooling layers, the resulting layer is transposed to

an FC layer. The output can be computed as matrix multiplication followed by a bias offset, which

then substitutes into an activation function. For example, in VGG-19, the CNNs end up with 3 FC

layers with the dimension of 4096, 4096, and 1000, respectively. In addition, due to the fact that

FC layers occupy most of the parameters in the entire CNN, they are prone to overfitting which

can be alleviated by incorporating dropout layers. The idea of using dropout layers is to randomly

55
remove FC layers, which can improve the computation efficiency and has proven to alleviate the

concern for overfitting (Srivastava, Hinton, Krizhevsky, Sutskever, & Salakhutdinov, 2014). In

VGG-19, 50% dropout is applied to the FC layers. Finally, the output of the last FC layer is passed

to a Loss layer (i.e. Softmax in this study) which determines the probability of each class (i.e. how

confidently the model thinks the input image being each class). The result of the classification is

recognized as the output with the highest probability for each class.

4.2.2.2.1 System-level collapse recognition

Based on the CNNs presented, the status of the reinforced concrete building can be classified

as collapse or non-collapse. Multiple pre-trained models can be used to facilitate the training

process. In this study, transfer learning from the three pretrained models including AlexNet

(Krizhevsky, Sutskever, and Hinton, 2012), VGG-19 (Simonyan & Zisserman, 2014) and ResNet-

50 (He, Zhang, Ren, & Sun, 2016) is applied for the binary classification task. Transfer learning

is a new machine learning technique that takes advantage of certain pretrained models in the source

domain and fine-tunes part of the parameters with a few labeled data in the target domain, which

can greatly promote the training process in the situation of data scarcity.

In deep learning, there is a trend to develop a deeper and deeper network which aims at solving

more complex tasks and improving performance. However, research has shown training deep

neural networks becomes difficult and the accuracy can reach a plateau or even degrade (He,

Zhang, Ren, & Sun, 2016). ResNets were developed by He, Zhang, Ren, & Sun, (2016) to solve

the problems where the shortcut connections were proposed. It has been demonstrated that training

this form of networks is easier than training plain deep convolutional neural networks and the

56
problem of accuracy deterioration is resolved. The complete architecture of ResNet-50 adopted in

this study is shown in Figure 4.1. The ResNet-50 contains a sequence of Convolution-Batch

Normalization-ReLU (Conv-BN-ReLU) blocks. Batch normalization is added right after each

convolution and before ReLU activation to stabilize training, speed up convergence, and regularize

the model. After a series of Conv-BN-ReLU blocks, global average pooling (GAP) is performed

to reduce the dimensionality, which is then followed by the FC layer associated with the softmax

function.

Figure 4.1 Architecture of ResNet-50 for system-level collapse identification

Due to the limited number of images for civil engineering applications, 686 images are

collected from datacenterhub.org at Purdue University and google images, of which 240 images

are related to the collapse of buildings, while 446 images of non-collapsed buildings. The image

preprocessing is conducted to reduce the inconsistency in image classification following the same

approach adopted by Gao & Mosalam (2018). The preprocessed images will be resized

appropriately to 224x224 or 227x227 pixels (depending on what network is chosen) before being

fed into the CNNs for state and damage classification. The performance of the model is verified

through the training and validation process. In this case, 80% of the collected images are chosen
57
as the training data and the rest is chosen as the testing data. Further, within the training set, 20%

of the images are set as the validation data and the remaining images are used to train the model.

Therefore, 686 × 0.8 × 0.8 ≈ 439, 492 × 0.8 × 0.2 ≈ 110, and 492 × 0.2 ≈ 137 images are

allocated for training, validation, and testing purposes, respectively.

4.2.2.2.2 Component-level damage state pre-classification

As per the proposed evaluation scheme depicted in Figure 4.2, if the RC building is identified

as non-collapse, the subsequent step is to determine the damage state of the structural components.

In this study, the definition of several damage states for RC structural columns is shown in Table

4-1, which is adopted from the ATC-58 damage state (DS) definitions for reinforced concrete

columns. These procedures have been shown and practically implemented for many years as

demonstrated by Nakano, Maeda, Kuramoto, & Murakami (2004), and Maeda, Matsukawa, & Ito

(2014).

Figure 4.2 Architecture of ResNet-50 for component-level damage classification

58
The RC columns, the critical load-bearing components of the RC buildings, are selected to

demonstrate the component-level classification. In total, there are 2260 images collected from the

damage survey conducted by Sim, Laughery, Chiou, & Weng (2018), EERI Learning from

Earthquake Reconnaissance Archive and Google Image. The number of images for DS 0, DS 1,

DS 2 and DS 3 is 496, 404, 580 and 780, respectively. Similar to before, image preprocessing and

resizing are applied before training. Also, 80% of the acquired images for each damage class are

chosen as its training set and 20% as the testing set. The validation set is chosen as 20% of the

training set and the rest of the training set is used to train the model.

Table 4-1 Description of damage state classes

DS index Description
0 No damage
1 Light damage: visible narrow cracks and/or
very limited spalling of concrete
2 Moderate damage: cracks, large area of
spalling concrete cover without exposure of steel
bars
3 Severe Damage: crushing of core concrete,
and or exposed reinforcement buckling or fracture

As shown in Table 4-1, four DS classes need to be distinguished. Similar to system-level

classification, the pretrained AlexNet, VGG nets, and ResNet-50 are selected for transfer learning.

The trained model with the highest test accuracy is adopted to demonstrate the applicability of the

classification of multiple damage states. The construction of the network is similar to the previous

one except for the last three layers, a fully connected layer, a Softmax layer, and a classification

59
output layer are updated with new labels and the new number of classes (i.e., 4 damage states in

this case).

4.2.2.3 Steel reinforcement object detection

In addition to damage state classification, a CNN-based object detection model is also

introduced in this study to identify steel reinforcement exposed due to concrete spalling. In

comparison to image classification, object detection is one step further which localizes the object

within an image and predicts the class label of the object. The output of object detection would be

different bounding boxes with their labels in the image. While R-CNN methods (i.e., R-CNN, Fast

R-CNN, Faster R-CNN) have been widely attempted in civil engineering applications, they are

still relatively slow for real-time applications. This study designed and applied a specific YOLOv2

object detection network for the identification of reinforcement exposure. Compared to R-CNN

methods, A detailed comparison of object detection networks is presented in Redmon & Farhadi

(2017).

In general, YOLOv2 consists of a customized feature extractor which is usually a series of

Conv-BN-ReLU blocks and pooling layers, followed by localization and classification layers

which predicts the bounding box location and the class score, respectively. In this study, YOLOv2

built on ResNet-50 is adopted for steel reinforcement detection. First, the layers after the third

Conv-BN-ReLU block of ResNet-50 (as shown in Figure 4.3) are removed such that the remaining

layers can work as a feature extractor. Second, a detection subnetwork is added which comprises

groups of serially connected Conv-BN-ReLU blocks. Details of layer properties within the

detection subnetwork are illustrated in Figure 4.3. In conclusion, the detection is modelled as a

60
regression problem. The output of the network contains S x S grid cells of which each predicts B

boundary boxes. Each boundary box includes 4 parameters for the position, 1 box confidence score

(objectness) and C class probabilities. The final prediction is expressed as a tensor with the size

of 𝑆 × 𝑆 × 𝐵 × (4 + 1 + 𝐶).

Figure 4.3 The schematic architecture of YOLOv2 built on ResNet-50 for steel

The objective of training of neural network is to minimize the multi-part loss function as

𝑜𝑏𝑗
shown in Equation (4.1) where 𝐼𝑖,𝑗 = 1 if the 𝑗𝑡ℎ boundary box in cell 𝑖 is responsible for

𝑜𝑏𝑗
detecting the object, otherwise 0. Similarly, 𝐼𝑖 = 1 if an object appears in cell 𝑖, otherwise 0,

𝑛𝑜𝑜𝑏𝑗 𝑜𝑏𝑗
and 𝐼𝑖,𝑗 is the complement of 𝐼𝑖,𝑗 . The parameters 𝑥𝑖 𝑎𝑛𝑑 𝑦𝑖 are the predicted bounding box

position, 𝑥̂𝑖 𝑎𝑛𝑑 𝑦̂𝑖 refer to the ground truth position, 𝑤𝑖 𝑎𝑛𝑑 ℎ𝑖 are the width and height of the
61
̂𝑖 𝑎𝑛𝑑 ℎ̂𝑖 . The term 𝐶 is
predicted bounding box, while the associated ground truth is denoted as 𝑤

the confidence score and 𝐶̂ is the intersection over union of the predicted bounding box with the

ground truth. The multiplier 𝜆𝑐𝑜𝑜𝑟𝑑 is the weight for the loss in the boundary box coordinates and

𝜆𝑛𝑜𝑜𝑏𝑗 is the weight for the loss in the background. As most boxes generated do not contain any

objects, indicating the model detects background more frequently than detecting objects, to put

more emphasis on the boundary box accuracy, 𝜆𝑐𝑜𝑜𝑟𝑑 is set to 5 by default and 𝜆𝑛𝑜𝑜𝑏𝑗 is chosen as

0.5 by default.

s2 B s2 B
obj obj 2
λcoord ∑ ∑ Ii,j [(xi -x̂i )2 +(yi -ŷi )2 ]+ λcoord ∑ ∑ Ii,j [(√wi -√w
̂)i + (√hi -
i=0 j=0 i=0 j=0

2 s2 B s2 B s2 (4.1)
obj 2 noobj 2 obj
√ĥi ) ] + ∑ ∑ Ii,j (Ci -Ĉi ) +λnoobj ∑ ∑ Ii,j (Ci -Ĉi ) + ∑ Ii ∑ (pi (c)-p̂i (c)) 2

i=0 j=0 i=0 j=0 i=0 c∈classes

The network learns to adapt predicted boxes appropriately with regards to ground truth data

during training. However, it would be much easier for the network to learn if better anchor priors

are selected. Therefore, to facilitate the training process, K-means clustering as suggested by

Redmon & Farhadi (2017) is implemented to search for the tradeoff between the complexity of

the model and the number of bounding boxes required to achieve the desired performance. Once

the number of anchors is specified, the K-means clustering algorithm takes as input the dimensions

of ground truth boxes labelled in the training data, and outputs the desired dimensions of anchor

boxes and the mean IoU with the ground truth data. Clearly, the selection of more anchor boxes

provides a higher mean IoU, but also causes more computational cost. Through the parametric

study on the number of anchors, the relationship between the mean Intersection-over-Union (IoU)

62
and the number of anchors is established in Figure 4.4, which shows the number of 10 anchors is

a reasonable choice, where the mean IoU can reach about 0.8. It should be noted that unlike the

original work by Redmon & Farhadi (2017) where the network utilizes 5 box priors to classify and

localize 20 different classes, this study only focuses on the detection of one class (i.e. steel

reinforcement), indicating more anchors can be used without losing too much computational

efficiency. Figure 4.5 depicts the dimensional properties of each ground truth box as well as its

associated clustering. Anchor dimensions corresponding to each cluster, determined by K-means

clustering approach are reported in Table 4-2. These anchors will be utilized to determine the

bounding box properties as shown in equations (2) to (6). In summary, B is chosen as 10. C is

equal to 1 which corresponds to steel exposure. As a result, the predicted tensor has a size of

26×26×60.

Figure 4.4 Relationship between Mean IoU and number of dimension priors

63
Figure 4.5 Illustration of K-means clustering results with 10 anchor sizes

Table 4-2 Selection of Anchor properties

Group 1 2 3 4 5 6 7 8 9 10

Width
104 174 174 107 105 67 274 208 138 54
[pixels]
Height
98 309 132 285 167 206 338 213 199 77
[pixels]

The network predicts 10 bounding boxes at each cell in the output feature map. For each

bounding box, 5 coordinates are predicted including 𝑡𝑥 , 𝑡𝑦 , 𝑡𝑤 , 𝑡ℎ and 𝑡0 . As illustrated in Figure

4.6, the bold solid box is the predicted boundary box and the dotted rectangle is the anchor prior.

Assuming the cell is offset from the top left corner of the image by (𝑐𝑥 , 𝑐𝑦 ) and the anchor box

64
prior has a width of 𝑏𝑎𝑛𝑐ℎ𝑜𝑟 and height of ℎ𝑎𝑛𝑐ℎ𝑜𝑟 , then equations (4.2) to (4.6) can be derived.

Equations (4.2)-(4.3) predict the location of the bounding box and (4.4)-(4.5) predict the

dimensions of the bounding box based on anchor box dimensions. Equation (4.6) is related to

objectness prediction that involves the IoU of the ground truth, the proposed box, and the

conditional probability of the class given that there is an object.

𝑏𝑥 = 𝜎(𝑡𝑥 ) + 𝑐𝑥 (4.2)

𝑏𝑦 = 𝜎(𝑡𝑦 ) + 𝑐𝑦 (4.3)

𝑏 = 𝑏𝑎𝑛𝑐ℎ𝑜𝑟 𝑒 𝑡𝑤 (4.4)

ℎ = ℎ𝑎𝑛𝑐ℎ𝑜𝑟 𝑒 𝑡ℎ (4.5)

Pr(𝑜𝑏𝑗𝑒𝑐𝑡) ∗ 𝐼𝑜𝑈(𝑏, 𝑜𝑏𝑗𝑒𝑐𝑡) = 𝜎(𝑡0 ) (4.6)

Similar to other CNN models, the YOLOv2 is trained by back-propagation and stochastic

gradient descent (SGD). The learning rate is constant and set to 10−4 and mini-batch size is set to

16. The input image size is 416 x 416, which is identical to what has been adopted by Redmon &

Farhadi (2017) for finetuning detection subnetwork. The training and testing images of YOLOv2

are taken separately from the DS 3 images which have been used in the training and testing of the

DS classification model. Data augmentation such as cropping, flipping, and small rotation is

applied such that the augmented images still contain the object that needs to be detected. The

training is implemented in MATLAB R2019a on two computers: Alienware Aurora R8 (a Core

i7-9700K @ 3.60 GHz, 16 GB DDR4 memory and 8 GB memory GeForce RTX 2070 GPU) and

a Lenovo Legion Y740 (a Core i7-8750H @2.20 GHz, 16 GB DDR4 memory and 8 GB memory

GeForce RTX 2070 max-q GPU).

65
Figure 4.6 Geometric properties of the predicted bounding box and the anchor prior

4.2.2.4 Component-level damage state determination

Post-earthquake cost evaluation of the RC building strongly relies on the classification

accuracy of the damage state. A single classification model generally performs well if trained on

a large dataset which covers a wide range of hidden features. Besides, it is required the image

scene is properly pre-processed such that the targeted region (i.e., RC column with/without damage

in this study) dominates the entire image. Moreover, the classification model may not perform well

if different classes have obvious shared features. In case of multiple irrelevant objects in the

background or a column with multiple damage features presented in a single image (i.e., the crack

feature is shared by DS 1, DS2 and DS3; the spalling feature is shared by DS2 and DS3), the

classification model may fail to identify the damage state class correctly. For example, an image

of a column is shown in Figure 4.7 that shows a lot of small cracks and small spalling, and at the

66
same time also presents exposure of evident steel bars concentrated in relatively small regions, the

classification model fails to identify it as DS 3 (i.e., ground truth label) in our experiments. This is

interpretable because in this case, the damage features of DS 1, DS 2 and DS 3 are all included in

one single image, leading to the fact that the predicted probability for DS 3 is not the highest. This

is the moment when the detection of steel bars is needed to reinforce the severe damage state

identification for DS 3 where the exposed reinforcement needs to be specifically captured.

67
Figure 4.7 Sample images of RC columns that classification model wrongly identifies as DS 2 while

ground truth is DS3

In this regard, a novel dual CNN-based inspection approach (Figure 4.8) is proposed to

facilitate the process. The classification model is trained across all the damage states as defined in

Table 4-1. On the other hand, the localization of steel bars is implemented using the YOLOv2
68
object detection approach. The advantage of the object detection approach is its ability to focus on

damage-sensitive features (i.e., exposed reinforcement bars in this case) which distinguish DS 3

from DS 0, DS 1 and DS 2. It is noted that the detection of exposed reinforcement is crucial because

most of the tensile stiffness and strength of the reinforced concrete components are contributed by

the reinforcement. Therefore, CNN-based detection is employed to reinforce the identification of

DS 3 in case the classification fails to classify it. In fact, from the safety point of view,

identification of components in DS 3 condition is critical after the event of earthquake because

these components are prone to fail completely in the aftershocks, which may lead to a partial or

complete collapse of the building and consequently significant increase of repair cost, injuries and

death rate. In other words, it is more conservative to maintain the second object detection network

even if in some rare cases, the final damage state is identified as DS3 while the ground truth label

is one of the others.

Figure 4.8 Flowchart of component damage state inspection scheme

69
The results of the classification model including the label and its associated probability are

obtained. Meanwhile, the reinforcement detection model checks the existence of exposed steel

bars. The final decision on the damage state is taking advantage of both outcomes from the

classification model and object detection model. As shown in Figure 4.8, each image is evaluated

by the two models in parallel. For a given image, it is first resized to fit the size of input layers of

the classification networks and object detection networks, respectively. The classification model

predicts the probability of each damage state and takes the one with the highest probability as the

result. Meanwhile, the object detection model aims at detecting the exposed steel bars. If the steel

bars are not detected, the classification result is directly output as the final inspection outcome. If

the steel bars are captured by the detection model, then DS 3 should be returned as the final

decision. The proposed dual CNN-based framework builds on the traditional damage classification

model, and extends the evaluation scope by analyzing the local details (i.e., steel bars in this case).

The object detection model does not change the fundamental diagnosis logic of the classification

model but reinforces the identification of DS 3 through the localization of exposed bars on top of

the classification model. This comes from three bases. First, object detection is a more complex

task than classification because it involves both localizing and classifying the target in the scene.

Solely relying on object detection results is more likely to lower the overall evaluation accuracy

(potentially caused by insufficient recall). Second, there are some other damage features of DS 3

such as the crushing of concrete, and a substantial shear failure mechanism while the proposed

object detection model is trained to detect exposed steel bars only. The identification of such

multiple features still partly relies on the classification model. Third, steel reinforcement is one of

the most essential load-carrying components in RC columns. The proposed detection network of

70
steel bars introduces one redundancy to reinforce the identification of the most severe damage case

which is more likely to cause an unexpected system failure in response to aftershocks, and

consequently higher chances of injuries and death, as well as substantial repair cost and longer

repair time.

4.2.2.5 Component-level damage quantification

In this section, component-level damage quantification procedures for two common shapes of

concrete columns are developed including cuboid-shape and cylinder-shape concrete columns.

Two damage features are considered to be quantified including concrete spalling volume and the

exposed length of steel reinforcements. The damaged columns to be quantified are assumed to

have some remaining portions of top, bottom, and side surfaces. In other words, those severely

damaged columns which have either one of the top, bottom or side surfaces of columns completely

missing, will be not considered for quantification. This assumption should be reasonable given

that these severely damaged columns should be immediately replaced, rather than repaired.

Therefore, quantifications of concrete spalling and steel rebar exposure in such scenarios are rather

unnecessary. For these columns, the CNN-based classification methods described in Section

4.2.2.2 to Section 4.2.2.4 should suffice to provide sufficient damage evaluation results, without

further quantification steps.

Quantification of the concrete spalling volume is achieved by two main processes: a) use

detected planes or surfaces to recover the undamaged geometry of the structural component, b)

subtract the volume of the damaged point cloud from the undamaged geometry to determine the

spalling volume quantification. Detailed procedures are presented in the sections below.

71
4.2.2.5.1 Cuboid-shape concrete columns

In this section, the damage evaluation methodology for the concrete spalling and steel

reinforcement (rebar) exposure of rectangular columns is discussed. As stated in Section 3.7,

quantification of concrete spalling is not feasible using 2D vision methods. This section proposes

a 3D vision-based method as described below.

• First, a point cloud processing algorithm, plane segmentation, is implemented to identify

major planes of the RC columns. The plane segmentation is achieved by the M-estimator

SAmple Consensus (MSAC) algorithm (Torr & Zisserman, 2000), which is a similar

variant of the RANSAC algorithm, where a distance threshold should be provided.

• Next, these planes are used to automatically recover the undamaged configuration of the

RC columns. Plane normal vector will be checked with respect to that of the ground plane

which is assumed to be a known metric. It should be noted that the prior knowledge of

ground plane vectors is reasonable as many existing algorithms (e.g., [ref]) can be used to

identify ground from images and point clouds. In fact, in the situation of large-scale 3D

scene reconstruction, the ground plane can be easily recognized using MSAC or

RANSAC. This is because the first principal plane detected by these algorithms is always

the ground plane due to the fact that the ground plane contains the greatest number of

points. To recover the original undamaged configuration of the RC columns, the top,

bottom and side surfaces should be identified.

• The side surface planes are typically easy to segment. Using the intersection lines of the

side surfaces and the ground plane, the cross-sectional area of the RC column can be

72
determined. In order to determine the volume of the RC column, the column height is

required. The column height can be assumed as prior knowledge of the storey height, or

realized by an automated algorithm. In the former case, while human interventions are

needed, such information should be easy to obtain. In the latter case, a plane segmentation

algorithm can be applied to identify the ceiling plane, which is typically parallel to the

ground plane. Then, the distance between the ground plane and the ceiling plane can be

determined as the column height.

• Once the undamaged configuration of the RC component is reconstructed. The concrete

spalling volume can be calculated as the difference between the undamaged configuration

and the damaged configuration of the RC component. Calculation of the volume of the

undamaged configuration of the RC columns is rather straightforward as the columns are

typically in cuboid shape or cylinder shape. The calculation of the volume of the damaged

configuration of the RC columns is determined based on the alpha shape which is

determined using the methodology first presented by Edelsbrunner, Kirkpatrick, & Seidel

(1983). The alpha shape defines piecewise linear curves that create a boundary for the

point cloud. To implement the algorithm, a shrink factor should be provided where a

shrink factor of 0 will lead to the convex hull of the points, while a shrink factor of 1 will

result in the most compact boundary for the point cloud. Therefore, it is generally

recommended to use an intermediate value (e.g., 0.5) between 0 and 1.

On the other hand, quantification of the steel reinforcement exposure is also presented. The

pretrained YOLOv2 is deployed to localize steel reinforcement in the rendered images of the 3D

reconstructed point cloud. In this study, for simplicity and the purpose of demonstrating the

73
concept, the vertical reinforcement exposure length is considered. The vertical exposure length is

the projected exposure length along the longitudinal direction of the RC column, regardless of

whether the steel rebar is buckled (or bent) or not. Within this context, the exposure length can be

estimated as the height of the bounding box detected for the steel rebars. In the situation of multiple

bounding boxes detected on the same RC column, the steel exposure length of that column is taken

as the union of the projected length from all the bounding boxes.

4.2.2.5.2 Cylinder-shape concrete columns

In the situation of circular-shaped columns, most processing steps remain similar to Section

4.2.5.2.1, except for the following. a) Plane segmentation can still be used to identify the top and

bottom surfaces of the columns. However, since the cylinder columns do not have flattened side

surfaces, rather than using plane segmentation to detect the side surfaces, a circle fitting algorithm

can be performed on the cross-section of the columns multiple times along the longitudinal

direction of the columns. The fitted circle, together with the distance between the top and bottom

surfaces are used to establish the original undeformed circular column. Detailed algorithmic

implementation of such procedures is presented in Appendix B. In addition, quantification of the

steel reinforcement exposure for the cylinder-shape RC columns can be done in the same way as

presented for cuboid-shape RC columns.

74
4.2.3 Experiments and results

This subsection presents the results for system-level collapse recognition, and component-

level damage detection and localization. The training parameter settings and the overall testing

accuracy of the CNN classification models have been summarized in Table 4-3.

Table 4-3 System-level and component-level training parameters and performance of transfer learning

from three different pretrained models

Pretrained CNN models AlexNet VGG-19 ResNet-50


Input size 227 × 227 × 3 224 × 224 × 3 224 × 224 × 3
Initial learning rate 0.0001 0.0001 0.0001
Regularization factor 0.0001 0.0001 0.0001
Momentum coefficient 0.90 0.90 0.90
System-level testing accuracy 93.15% 95.63% 95.92%
Component-level testing accuracy 85.17% 87.17% 87.47%

4.2.3.1 System-level failure classification

Although the testing accuracy among all three pretrained models is relatively close, the

ResNet-50 which has the deepest architecture, yields slightly higher accuracy than the other two

CNN models. The loss and accuracy of ResNet-50 during training are presented in Figure 10. Both

the training and validation accuracy exceeds 90% after around 80 epochs of training and

approaches 100% at the end. It is generally acknowledged that the validation dataset can provide

an unbiased assessment of a model fit on the training dataset (Krizhevsky, Sutskever, and Hinton,

2012; He, Zhang, Ren, & Sun, 2016). An increase in the error on the validation dataset is a sign of

overfitting to the training dataset. A high and stable validation accuracy of the validation is

observed in Figure 4.9 which demonstrates the applicability of the system-level classification

75
model for collapse identification. Figure 4.10 compares the confusion matrices (Kohavi & Provost,

1998) between training and testing results. For instance, 95% of the testing images which have

ground-truth labels as collapse are successfully predicted while only 5% of these images are

misclassified as “non-collapse”. Moreover, sample testing images with probability for their

associated class are shown in Figure 4.11. More sample results are presented in Appendix C. The

trained model can predict the correct class for the images with high probability.

(a) (b)

Figure 4.9 System-level collapse identification for training and validation sets using ResNet-50: (a)

accuracy and (b) loss

(a) (b)
Figure 4.10 System-level collapse versus no collapse: confusion matrices of (a) training set and (b) testing

set

76
Figure 4.11 Sample testing images of the building with predicted probability for each class

4.2.3.2 Component-level damage state classification

Similar to the system-level failure classification, Table 4-3 presents the identical training

parameters and performance comparison of three different pretrained models (AlexNet, VGG-19,

ResNet-50) for the classification of the component damage states. In general, all three models have

high accuracy, while the ResNet-50 has slightly higher accuracy than AlexNet and VGG-19. The

loss and accuracy for ResNet-50 during the training process are presented in Figure 4.12, which

shows both the training and validation accuracies are approaching 100% at the end. The

performance of the trained model is confirmed by the confusion matrix for training and testing as

shown in Figure 4.13. Figure 4.14 shows the classification of a few sample images with a correct

prediction. More sample results are presented in Appendix C.1. The results show that the trained

77
model can classify different damage states with reasonably high accuracy, although the

classification accuracy with respect to moderate damage (i.e., DS 2) and severe damage (i.e., DS

3) is not as high as that regarding the class of no damage (i.e., DS 0) and light damage (i.e., DS 1).

The results reflect there is an increasing difficulty in detecting damage features from DS 0 to DS

3. Basically, there is no damage feature in DS 0 when RC columns are in almost perfect condition

in which case the trained CNN model only needs to identify the column profile without any extra

damage features. Similarly, only cracks and very limited spalling are enclosed in DS 1, where

slightly more damage features are introduced compared to DS 0. However, more features are

usually observed in DS 2, such as light or severe cracks and a large area of spalling. In the case of

DS 3, the model performs reasonably well to detect its own damage features which include

exposure of significant length of steel bars, crushing of concrete and buckling or fracture of

reinforcement. However, it occasionally misclassifies the damage state as DS 2, while the ground

truth is DS 3 (Figure 4.15). There are two potential reasons. First, DS 2 and DS 3 have many

common damage features, such as cracks and a large amount of concrete spalling. Second,

exposure of steel reinforcement is not evident in some cases while cracks or significant spalling

may dominate the entire image (Figure 4.7). To overcome such deficiency, a novel object detection

algorithm is implemented and combined with the classification algorithm to identify the damage

states more accurately. Experiments and results of the dual CNN-based method will be discussed

in Section 4.2.3.4.

78
(a) (b)

Figure 4.12 Component-level DS classification for training and validation sets using ResNet-50: (a)

accuracy and (b) loss

(a) (b)

Figure 4.13 Component-level damage state identification: confusion matrices of (a) training (left) and (b)

testing set.

79
Figure 4.14 True prediction of sample testing images of the building with predicted probability for each

class

Figure 4.15 False prediction of sample testing images with ground truth of “Severe Damage”

4.2.3.3 Steel reinforcement object detection

This subsection presents the results regarding the detection of exposed longitudinal

reinforcement in RC columns to demonstrate the applicability of YOLOv2 in this scenario. The

performance of an object detector is usually evaluated using a precision-recall curve (Everingham,

80
Van Gool, Williams, Winn, & Zisserman, 2010). A low false positive rate leads to high precision

and low false negative rate results in a high recall. In other words, a large area under the recall-

precision curve indicates the high performance of the detector with both high recall and precision.

A detector with high recall but low precision retrieves many results, but most of its predicted labels

are incorrect (e.g., incorrect bounding box locations, low IoU). A detector with high precision but

low recall can localize the object very accurately once the object is successfully recalled, but only

very few results can be recalled. The average precision (AP) is often used to quantify the

performance of an object detector (Girshick, 2015; Ren, He, Girshick, & Sun, 2017), which is

determined as the area under the precision-recall curve. Mean AP (mAP) is defined as the mean

of calculated APs for all classes. Figure 4.16 presents the precision-recall curve for training and

testing. The mAP for both training and testing has demonstrated the applicability of YOLOv2 for

detecting steel bars. The testing results in Figure 4.16 (b) indicate there still exists room for

improving the mAP with the larger training set, particularly improving the recall. It should be

noted that the detection of steel bars is more difficult than detecting objects with a more regular

pattern because the steel bars may buckle or fracture in a very complex way in different situations.

Figure 4.17 provides sample detection images where the steel bars are localized by rectangular

bounding boxes. Figure 4.17 (b) (upper) shows the images which were wrongly classified by the

traditional classification method, while Figure 4.17 (b) (lower) shows the same images in which

the exposed steel bars are localized by YOLOv2. More sample testing results are presented in

Appendix C.1.

81
(a) Training (b) testing
Figure 4.16 Recall - precision curve of training (upper) and testing (testing)

4.2.3.4 Component-level damage state determination

As illustrated in Figure 4.8, the proposed evaluation scheme combines the classification model

and object detection model to reinforce the identification of the most severe damage state. Figure

4.18 presents the identification summary of a single classification model and the dual CNNs model

using the confusion matrix method. Overall, the employment of the YOLOv2 object detector

improves the identification accuracy of DS 3 by 7.5%. It should be noted that the classification

accuracy can be potentially improved by expanding the training on the large dataset. However, the

training data for civil engineering applications is relatively limited as aforementioned. The

YOLOv2 network demonstrated in this study can be considered as a local-feature-oriented detector

on the basis of existing classification networks to specially focus on the identification of DS 3

images when the training dataset is relatively small.

82
DS2, 56.2% DS2, 99.5%

(a) (b)

Figure 4.17 Detection of steel bars highlighted by yellow rectangular bounding boxes (a) Sample testing

images, (b) detection of exposed steel bars (lower) in testing images wrongly predicted by classification model

(upper)

Figure 4.18 Confusion matrix with consideration of only the classification model (left); the combined

classification and object detection model (right)

83
4.2.3.5 Component-level damage quantification

This section presents the implementation procedures and results for the RC components. Due

to the limited experimental specimens available at the UBC structural laboratory, only three

cuboid-shape columns are considered in the experiments. The concept of the proposed

quantification procedures for columns of other shapes will be similar.

The 3D point clouds of the three RC column specimens are reconstructed first, using a series

of images with a resolution of 4032 x 3024, captured at different viewpoints by a smartphone

camera. Such a resolution provides an image-to-object ratio of about 9 pixels/mm. Given that the

3D reconstruction pipeline used in this study can achieve sub-pixel accuracy, the reconstructed

point cloud using these high-resolution images can theoretically achieve very high accuracy (i.e.,

well below 1mm). To validate the accuracy, for each column, several benchmark points are marked

on different surfaces of the specimens before quantification procedures. After the validation, the

3D point clouds are processed by plane segmentation algorithms. In this study, the distance

threshold to identify planes is chosen as 0.25 mm. Figure 4.19 depicts the plane segmentation

results for a sample RC column investigated. It is shown that the ground plane, and all the side

planes of the RC column are successfully identified after 6 iterations of the plane segmentation

algorithm. Full sample results of all three RC columns are presented in Appendix C.1.

In order to automatically find the top surface plane, the following procedures are considered.

First, the furthest point with respect to the ground plane is determined, which is used as a reference

to establish a plane parallel to the ground plane. Then, the number of points (i.e., plane inliers) that

fit this plane is counted. The inlier points are defined as the points having a point-to-plane distance

less than a certain distance threshold. If the number of points fitting this plane is greater than a

84
predefined threshold, this plane is considered the top surface plane of the RC column. Otherwise,

this point is considered noise. The above procedures will be repeated on the next furthest point

until the number of points fitting the respective plane is greater than the predefined threshold. This

first plane that meets the requirement will be regarded as the top surface plane. Further, using all

the detected planes, a 3D cuboid shape can be reconstructed. The spalling volume can be quantified

by simply subtracting the volume of the damaged RC column from the volume of the reconstructed

3D cuboid.

4.2.3.6 Quantification accuracy validation

The reconstruction by structure from motion algorithms can achieve high accuracy in both

field applications and laboratory settings. This has been well demonstrated in many existing

literatures (e.g., Hirschmuller, 2007; Westoby et al., 2012; Micheletti, Chandler, & Lane, 2015;

Morgan et al., 2017). To further demonstrate the effective and accurate reconstruction of the RC

components using structure from motion, this section presents the quantification results of concrete

spalling and steel exposure length, and investigates their accuracy.

About 450 images were collected for each RC column to reconstruct the 3D point cloud. The

reconstructed point cloud is then calibrated to the real-world scale. There are typically three ways

to achieve this: a) Calibration of the camera at each viewpoint to explicitly determine the intrinsic

and extrinsic parameters. This requires a lot of efforts and is generally not recommended. b)

Integration of distance measurement devices with the camera so that the reconstructed cloud can

be scaled accordingly to a real-world dimension. c) The use of ground-controlled points with

known real-world locations (Hartley, Gupta, & Chang, 1992), which is a method widely used in

85
the surveying industry. In this study, as only a single uncalibrated camera is used, the method of

using ground-controlled points is selected.

Table 4-4 shows a summary of the concrete spalling quantification results. Besides, the

estimation error with respect to the ground truth values is presented, which is around the range of

5-10% for the three columns examined. The average estimation error of the three columns is about

6.7%. Such errors may be attributed to the following aspects.

First, the alpha shape estimated based on the point cloud may not represent the true concrete

continuum, due to the suboptimal shrink factor used. It should be noted that this study is not

focused on searching for the optimal shrink factor that best fits the concrete column scenario. This

is because such optimal shrink factor may not hold consistent for different types of concrete

columns, nor other types of civil structures. Hence, it is recommended to use a rational value as

long as it provides reasonable accuracy.

Second, in this study, the concrete spalled debris is collected and measured by a laboratory

weight scale. The concrete spalling volume is then calculated by dividing the measured mass by

the manufacturing claimed density of the concrete. Such processes inevitably contain errors such

as the errors during the concrete debris collection process, and also the difference between the

manufacturer's claimed density and the true density of the concrete.

Overall, the estimation errors should be deemed reasonably small and acceptable for the

inspection of civil structures. This demonstrates the proposed 3D vision-based quantification

method can achieve high accuracy at a low cost.

86
Figure 4.19 Plane segmentation (units are in mm)

87
Table 4-4 Summary of the spalling quantification of the RC columns

Estimated concrete spalling Ground truth Error

volume [cm3 ] [cm3 ] [%]

RC Column 1 5215 5868 11.1%

RC Column 2 12199 12956 5.8%

RC Column 3 9916 10232 3.1%

Table 4-5 summarizes the steel reinforcement exposure estimation results. The estimation

error with respect to the ground truth values is below 0.05m. These errors are mainly due to the

bounding box jitter (i.e., the effect formally defined as the inconsistency in aspect ratio and minor

random translations of the predicted bounding boxes). In other words, the predicted bounding

boxes may not always encompass the objects inside tightly. Figure 4.20 shows sample steel

reinforcement exposure detection performed by the pretrained YOLOv2. It is shown that all the

steel reinforcement exposure in all viewpoints is successfully localized. On the other hand, it is

also observed that the predicted boxes are not guaranteed to be the minimum boxes that fit the

objects inside. Nevertheless, such accuracy is deemed acceptable for the inspection of these

components.

88
Table 4-5 Summary of the steel exposure quantification of the RC columns

Estimated steel exposure length Ground truth Error

[m] [m] [m]

RC Column 1 0.15 0.14 0.01

RC Column 2 0.38 0.43 0.05

RC Column 3 0.14 0.11 0.03

Figure 4.20 Steel reinforcement exposure detection (place holder)

89
4.2.4 Conclusions

Rapid post-disaster damage estimation and cost evaluation of RC structures are becoming a

crucial need for owners and decision-makers for risk management and resource allocation. In this

section, a novel 3D vision-based damage evaluation pipeline is developed and implemented on

reinforced concrete structures. Within the framework, a dual CNN scheme is proposed to enhance

the accuracy and reliability of a single CNN classification scheme. The main conclusions are

summarized herein: 1) Both system-level and component-level classification models were trained

and deployed successfully, which follows post-disaster damage state classification guidelines of

RC structures; 2) A novel real-time object detector, named YOLOv2 built on ResNet-50 was first

implemented to demonstrate its applicability for detecting exposure of steel bars with 98.2% and

84.5% mAP in both training and testing process. 3) In comparison to a single CNN classification

scheme, the combined YOLOv2 and classification scheme improves the classification accuracy by

7.5% in identifying the most severe damage state 4) The 3D vision-based quantification pipeline

yields a reasonably high accuracy in both concrete spalling quantification (below 10%) and steel

reinforcement exposure estimation (below 5 mm). Overall, this section demonstrates the

applicability of the proposed framework on RC structures at both the system level and component

level. The concept of the 3D vision-based pipeline and the dual CNN scheme can be generalized

to other types of structures to provide a more comprehensive assessment, and enhance the damage

evaluation accuracy and robustness for other types of civil structures.

90
4.3 Vision-based SHM methods for steel structures

4.3.1 Introduction

Steel structures are widely constructed worldwide. Common surface damages of steel

structures can be observed such as steel corrosion, delamination, cracks, and fracture. With the

growing need for faster manufacturing and economical construction, complimented by better

computational methods, investigations of lightweight thin-walled steel components have gained

more traction in recent decades. To date, steel thin-walled structures have been widely used

worldwide as load-bearing elements in resisting both gravity forces and lateral loads from natural

events such as earthquakes and winds. Earlier research has shown one of the critical failure modes,

shear buckling, was observed in both hot-rolled and cold-form steel plate thin-walled elements

(Sabouri-Ghomi, Ventura, & Kharrazi, 2005; Park et al., 2007; Yi et al, 2008; Dou, Pi, & Gao,

2018). More recently, shear buckling has also been observed in steel plate damping devices

developed for energy dissipation in high seismic activity zones taking advantage of the high

ductility of the steel material (Zhang, Zhang, & Shi, 2012; Deng et al., 2015; Sahoo et al., 2015;

Yang et al., 2020, 2021).

The out-of-plane displacements due to buckling of these steel plate components must be

identified and quantified to evaluate their residual performance after major earthquakes so that

repair or replacement actions can be executed accordingly. When performing buckling and post-

buckling analysis of such components, out-of-plane displacements are of prime interest to

researchers and engineers (Singer, Arbocz, & Weller, 2002). Measuring the out-of-plane

displacements and determination of buckling and post-buckling shapes is crucial for further

analyses and interpretation of results (Singer, Arbocz, & Weller, 2002). In addition, measuring the

91
deformed geometry (i.e., both in-plane and out-of-plane deformations) of structures provides the

necessary information to perform geometry model updating for the structures in the field after

natural disasters (Zhang & Lin, 2022). Existing methods to measure out-of-plane displacements

due to buckling can be typically achieved by displacement sensors such as potentiometers (Singer,

Arbocz, & Weller, 2002), line laser device-based measurement systems (Zhao, Tootkaboni, &

Schafer, 2015), motion capture systems (Park et al., 2015), and fringe projection systems (Liu et

al., 2019). When using contact-type displacement sensors, preliminary numerical analyses are

typically required to identify critical buckling regions to determine appropriate sensor placements.

The measurement results may not be accurate if insufficient sensors are used, or if sensor locations

are not well identified. These methods are relatively expensive, which require the design of sensor

placement and relatively complicated installation processes. On the other hand, although the line

laser-based methods can provide higher accuracy, they require a dedicated and complicated

supporting setup, and the total cost is relatively high, which hampers their wide applications in the

structural engineering field. Motion capture systems can provide a time history of structural

displacements in 3D space but are generally limited to small-scale structures in laboratory

conditions. Besides, motion capture systems generally require specific markers to be attached to

the test structures, where the installation can be very difficult for large-scale civil structures. Fringe

projection systems are generally limited to small-scale structures where the projector can capture

the entire object being scanned. Hence, it is difficult to apply fringe projection systems to full-

scale civil structures. Moreover, the cost of both motion capture systems and fringe projection

systems is relatively high.

92
On the other hand, in recent years, research using TLS has also been conducted to identify

structural damages in 3D space. The scanning results are point cloud coordinates in 3D space. For

example, Mizoguchi et al. (2013) quantifies the scaling damage of concrete bridge pier using a

TLS. Kim et al. (2015) employed a TLS to localize and quantify concrete spalling. Kim et al.

(2021) proposed a damage quantification framework for concrete bridge piers with more

complicated shapes by processing the point cloud obtained using a TLS. The accuracy of the TLS

as reported in many studies generally ranges from 3 to 15mm, and is suitable for applications

where an error of 3-15mm in quantifying local damage areas does not greatly affect the global

health inspection of the entire structures. Although the results are promising in those studies, the

TLS devices may not be accurate enough to quantify structural damages at a relatively small

magnitude such as steel plate deformation (due to buckling) of less than 10 mm. Moreover, these

TLS are generally expensive and may not be readily available to many researchers and engineers.

Furthermore, although the low-end and mid-tier laser scanners cost less, they typically have much

lower detection range and accuracy.

In recent years, the effectiveness of the vision-based methods has been well demonstrated for

visual damage detection of different types of structural systems such as reinforced concrete (RC)

structures, masonry structures, and steel structures (Spencer Jr, Hoskere, & Narazaki, 2019). In the

case of steel structures, vision-based damage detection methods have been proposed with a focus

on identifying and localizing steel surface damages. For example, Yeum & Dyke (2015) employed

an integral channel-based sliding window method to localize bolts and utilized the Hessian matrix-

based edge detector to detect cracks near bolts on a steel beam. Yun et al. (2017) applied a

combined Gabor filter and double-thresholding binarization method to identify the shape of the

93
steel surface stains. Kong & Li (2018) examined a video tracking method to detect and quantify

the fatigue cracks of steel structures under repetitive loads. In their study, feature point detection

and tracking are applied to track the motion of the structure. The crack regions are detected by

searching for discontinuities in the motion during tracking. Finally, the crack opening is quantified

based on the tracked location of the two small windows deployed to the crack regions identified.

These methods were built on traditional vision algorithms, which are generally not robust against

background noise. Cha et al. (2018) employed a deep learning-based method, Faster RCNN, to

detect steel corrosion and delamination. The results indicated the CNN-based method can achieve

high accuracy and robustness. Despite the achievements made in these studies, several limitations

can be identified: (a) These methods were developed in 2D computer vision. The assessment

results using 2D vision are sensitive to the camera locations and angles; b) these methods are

primarily designed to detect in-plane damage, they are not directly capable of accurately

quantifying out-of-plane damage features such as out-of-plane deformations due to buckling or

fracture. This is due to the limitation that the processing of 2D RGB or grayscale images

individually cannot provide out-of-plane damage information.

Since the majority of the existing studies on vision-based damage detection of steel structures

are limited to in-plane surface damages such as steel corrosion, delamination, cracks, and other

surface defects, to address these limitations and challenges, in this research, a 3D computer vision-

based framework is proposed to detect and quantify out-of-plane displacement of steel structures

at high accuracy and low cost. The framework is briefly described herein. First, a sequence of

image frames is used to reconstruct the 3D dense point cloud (i.e., RGB-D data) of the scene which

contains the steel components of interest using image association, structure-from-motion, and

94
multi-view stereo algorithms. Second, a multi-view object detection method is proposed to localize

the steel structures in the 3D scene. In this step, a CNN-based object detector is trained to perform

object detection for rendered images of the 3D scene at multiple camera views. The bounding

boxes generated at different views will be used to extract the 3D point cloud of the steel

components in the scene. This step is crucial to remove point clouds of irrelevant objects in the

background and allow only the object (s) of interest to remain in the scene. Further, the buckling

region of the steel components is isolated by using plane fitting algorithms to remove adjacent

non-buckled plates. Finally, a distance-based point-projection-clustering (DBPPC) algorithm is

proposed to cluster the points for quantification of out-of-plane amplitude. Experiments of the

proposed framework have been examined on a steel plate damping device (Yang et al., 2021) and

a full-scale steel corrugated plate wall. The results indicate the proposed framework can

successfully localize the steel components in a 3D scene, and accurately quantify the out-of-plane

damage with an error of about 1mm at the benchmark points of the experimental specimen. This

shows the proposed framework can achieve high accuracy at a significantly lower cost compared

to traditional methods to measure the out-of-plane damage extent. In addition, the proposed

framework is easier to set up compared to these traditional methods.

4.3.2 Methodology

As shown in Section 4.3.1, many other in-plane damage types (e.g., steel surface cracks,

surface corrosion) of steel structures have been conducted by many existing studies, the

dissertation is focused on quantifying out-of-plane displacement due to buckling, which cannot

not be addressed by the existing 2D vision-based research. The proposed methodology consists of

95
vision-based 3D scene reconstruction, multi-view CNN-based steel component detection and out-

of-plane displacement quantification, as shown in Figure 4.21. As both undamaged and damaged

structural components may be present on-site, prior to the implementation of the proposed

methodology, a simple binary CNN-based classification network can be applied to identify

whether the structural component is damaged. The proposed methodology will only be applied

when the structural component is identified as damaged. This will eliminate the need to reconstruct

and assess the undamaged structural components. As the demonstration of the system-level

classification method has been presented in Section 4.2, to maintain the focus of the dissertation,

the detailed implementation of a similar system-level classification method on the steel corrugated

panels is not repeated in this section. Instead, sample system-level classification results are

provided in Appendix C.2.

Image sources for 3D reconstruction can be obtained using common consumer-grade cameras

such as smartphone cameras, or unmanned aerial vehicles (UAVs) with appropriate camera specs.

In this study, two types of steel plate structures in the structural laboratory at the UBC are selected

to demonstrate the effectiveness of the proposed framework, including a damaged steel plate

damping device (Yang et al., 2021) and a full-scale damaged steel corrugated plate wall. Plate

dampers of similar types are used as energy-dissipating devices in the event of earthquakes and

received a wide range of investigations (Zhang, Zhang, & Shi, 2012; Deng et al., 2015; Sahoo et

al., 2015; Etebarian, Yang, & Tung, 2019; Yang et al., 2019). Meanwhile, buckling behavior

investigation and performance evaluation of steel corrugated panels were also widely conducted

(Vigh et al., 2013; Bahrebar et al., 2016; Mansouri, & Hu, 2018; Tong, Guo, & Zuo, 2018). As

these components represent common steel plate structures and are prone to experience out-of-plane

96
deformations due to buckling in the event of earthquakes, they are considered good candidates to

examine the proposed framework.

Figure 4.21 3D vision-based steel buckling quantification methodology

4.3.2.1 Vision-based 3D reconstruction

In this section, 3D reconstruction procedures of steel plate structures are briefly described. As

shown in Figure 4.22, the 3D reconstruction of steel plate structures consists of data association

which determines the image-to-image connection based on a pair of unstructured images of steel

plate structures that share similar features, structure-from-motion which determines sparse point

cloud of the plate structure from the estimated camera poses, and multi-view stereo which

generates a dense point cloud of the plate structure based on the sparse point cloud and the input

RGB images.

97
Figure 4.22 Vision-based 3D reconstruction of steel plate structures

4.3.2.2 Multi-view vision-based structural component detection

4.3.2.2.1 Description of the 3D object detection setup

The concept of a multi-view vision-based 3D object detection method has been presented in

Section 3.7. Similarly, the proposed 3D object detection method is applied to detect structural

components from an unorganized 3D scene cloud. In this section, the detector is trained to detect

the steel plate damper and the steel corrugated plate wall. In order to extract the object of interest

from the 3D scene, a minimum of two camera viewpoints are required to remove background

objects sufficiently, as illustrated in Figure 4.24. In this section, the point cloud of the 3D scene is

automatically rendered onto the XZ and YZ plane view, which will then be processed by a CNN-

based object detector to localize the steel component. In the presence of multiple steel plate

components, different cuboid 3D boxes will be generated for different components, where

98
displacement quantification will be performed for each component separately. In this section, the

CNN-based object detector is selected as a specific version of YOLOv3-tiny.

4.3.2.2.2 YOLOv3-tiny real-time detection networks

In this section, the architecture and novelty of the proposed YOLOv3-tiny detector are

described. The architecture of the YOLOv3 has been adopted and modified to create a new version

of YOLOv3-tiny for the localization of bolts. Compared to the original YOLOv3 built on Darknet-

53 which has 247 layers in total, the developed YOLOv3-tiny has a total of 44 layers only. This is

achieved by reducing the depth in convolutional layers of YOLOv3. The main advantage of

YOLOv3-tiny is that it achieves over 10 times higher speed (as will be shown in Section 3.1) than

the original YOLOv3, while still maintaining high enough precision for bolt localization. The

architecture of the proposed YOLOv3-tiny is depicted in Figure 4.23. In general, YOLOv3-tiny

consists of a series of Convolution, Batch Normalization, Leaky ReLU (Conv-BN-Leaky ReLU)

blocks, and max pooling layers. The input image will be resized to 416 x 416 in width and height,

before entering the networks for training. During training, YOLOv3-tiny divides the image into

26x26 grid cells. Each grid cell has 6 anchor boxes, of which each has an object score, multiple

class scores (depending on the number of classes being detected), and 4 bounding box coordinates.

In this case, the number of classes is two, for steel plate dampers and steel corrugated plate walls.

Consequently, the output of YOLOv3-tiny has a dimension of 26 x 26 x 42, where ‘26’ represents

the number of grid cells for each output, and ‘18’ represents each of the three anchor box class

scores, object scores, and bounding boxes (i.e., 2 class score for bolt + 4 bounding box values + 1

object score = 7 values. 7 values x 6 anchor boxes = 42 values).

99
Figure 4.23 Architecture of the YOLOv3-tiny object detector

4.3.2.2.3 Training and validation of YOLOv3-tiny

Image data for training and testing of the YOLOv3-tiny algorithm were collected at the UBC

structural laboratory environment. In addition, rendered images of the 3D reconstructed scenes

were also collected. Standard data augmentation techniques are applied such as horizontal flipping,

small translations, rotation, small cropping, and scaling. As a result, 3248 images were generated

for the steel damper, and 2702 images were generated for the steel corrugated panel, where 70%

of the images are randomly selected from each component type for training, and the rest is used

for testing. The YOLOv3-tiny detector was trained by back-propagation and stochastic gradient

descent with momentum (SGDM). The initial learning rate is set as 0.001. The mini-batch size and

number of training epochs were set to 0.001, 6 and 65, respectively. Training of the YOLOv3-tiny

was done in MATLAB R2021a, with the hardware configuration of a Core i7-9700K CPU and
100
RTX 2070 GPU. To train the YOLOv3-tiny networks, anchor box selection is required. Table 1

shows the anchor boxes estimated based on the training data. The YOLOv3-tiny predicts

localization bounding boxes based on anchor boxes.

The performance of an object detector is generally assessed by the precision-recall diagram

(Everingham et al., 2010). A low false-positive rate corresponds to a high precision value, and a

low false-negative rate reflects a high recall value. If a detector attains high precision but low

recall, it can detect objects in only a few images, but the localization accuracy is high once the

object is recalled. If a detector has low precision but high recall, it can retrieve objects in many

images, but the localization precision is not high. The overall performance of the algorithm is

quantified as the average precision (AP), which is computed from the precision-recall plot as the

weighted average of precision at each recall value. The precision-recall curves of the YOLOv3-

tiny during training and testing are reported in Figure 4.25 and Figure 4.26, respectively, where

the APs for the steel plate damper and steel corrugated plate wall are above 0.9. This shows the

trained YOLOv3-tiny is capable of detecting the steel plate damper and corrugated plate wall at

high precision.

Figure 4.27 shows the deployment of the trained YOLOv3-tiny on sample testing rendered

scenes and real-world images. Both the steel plate damper and the steel corrugated wall are

successfully localized with a high probability. The trained YOLOv3-tiny model will be used in the

proposed multi-view 3D object detection method to extract the steel components from the rendered

images of the 3D scene. The predicted bounding box locations from multiple camera views are

fused to extract the steel components from the 3D scene.

101
Figure 4.24 Steel components identification

102
Table 4-6 Estimation of anchor box dimensions

Anchor index 1 2 3 4 5 6
Width
115 90 196 222 147 146
[pixels]
Height
107 85 187 96 72 141
[pixels]

Figure 4.25 Precision-recall curve of training

103
Figure 4.26 Precision-recall curve of testing

104
Figure 4.27 Sample testing results of YOLOv3-tiny on the rendered scenes and real-world images

4.3.2.3 Localization of points of interest

In most cases, the structural components identified may compose of multiple parts. In the

example shown in Figure 4.28, part A, B, and C are not the region of interest which need to be

removed manually or automatically using surface fitting strategies such as plane segmentation.

The M-estimator SAmple Consensus (MSAC) algorithm (Torr & Zisserman, 2000), which is a

105
similar variant of the RANSAC algorithm, can be used to for plane segmentation, where a distance

threshold should be provided. The distance threshold defines the maximum distance from a point

to the plane. If the distance between a point and the fitted plane is less than the threshold, the point

will be considered an inlier to the fitted plane. Such distance thresholds should be chosen according

to the resolution of the point cloud (which is dependent on the input image resolution), the point

cloud down-sampling rate if any, and also the computational power of the hardware (which should

be considered particularly for very large point clouds). The plane fitting algorithm is first applied

to the original point cloud data, where the first principal plane (i.e., the plane that contains the

maximum number of inliers) and the corresponding inliers will be identified. Next, these inlier

points are removed, and the fitting algorithm is applied to the remaining point cloud. The process

can be repeated iteratively until the plate of interest is isolated.

When dealing with structural assemblies with various shapes of parts, plane segmentation and

other types of surface fitting strategies (e.g., Kim et al., 2021) can be used to remove most of the

irrelevant surfaces, while minor manual interventions can be used to further refine the remaining

point cloud.

106
Figure 4.28 Plane segmentation for an I-shaped steel plate damper

4.3.2.4 Structural out-of-plane displacements quantification

Once the region to be measured is isolated, the out-of-plane displacements can be quantified

with respect to a predefined reference plane. The results can provide detailed and accurate out-of-

plane structural displacements in laboratory conditions, as well as crucial information to perform

geometry model updating in field applications, as shown in a recent study by Zhang & Lin (2022).

To limit the scope of this study, the framework proposed in this paper is dedicated to quantifying

the out-of-plane displacement of steel plate structures, while model updating is not further

implemented.

107
To quantify the out-of-plane displacements of the steel plate structures, users only need to

take photos from one side of the structures (i.e., less than 180 degrees field of view). In this case,

data points will only appear on one surface of the isolated plate. However, if the data collection is

done using 360 degrees field of view, dual layers will exist corresponding to the two surfaces of

the plate. In such a situation, it is required to separate these two layers of points. There are three

options. 1) manually separate the points from two surfaces. 2) adjust the video prior to the 3D

reconstruction such that the extracted frames only contain views from one side of the structure. 3)

use a point cloud clustering method to separate the points from two surfaces. In this study, a

distance-based point-projection-clustering (DBPPC) method is developed to automatically

separate the two layers of points, as shown in Figure 4.29. Detailed implementation of the DBPPC

method on the point cloud is described below:

• The points on the top-most and bottom-most surfaces of the buckled plate are eliminated

such that these points are not considered in the clustering process.

• Starting from one corner of the remaining point cloud, a sub-cloud is sampled by taking a

relatively small grid step along the vertical direction. The sampled sub-cloud is then

projected to a specified plane (i.e., XY plane in this study).

• The DBPPC method is applied to the projected sub-clouds. For each projected sub-cloud,

a point at one end of the buckled plate is sampled first and its k nearest neighbors are

determined such that the Euclidean distance between the point and its furthest neighbor

does not exceed a predefined threshold (e.g., half of the thickness of the plate). These k

points will be grouped into cluster 1 and the rest of the points will be temporarily grouped

into cluster 2. Next, the furthest neighbor point is selected as the new point and its nearest

108
neighbors are determined from cluster 2 using the same distance threshold. These neighbor

points will be again assigned to cluster 1, and the remaining cloud will be temporarily

assigned to cluster 2. The process is repeated until no more points can be assigned to cluster

1 within the sub-cloud.

• The DBPPC method is implemented on the projected sub-clouds iteratively until all the

points within the entire cloud are successfully grouped into two clusters.

During these processes, point associations between the original points and projected points

will be stored so that the original points can be clustered based on the clustering of their

corresponding projected point indices. The algorithmic implementation of the DBPPC method is

presented in Appendix B. Figure 4.29 shows a comparison of sample point projected locations

when small and large grid step sizes are selected. This shows when the step size is relatively small,

the algorithm can be effectively performed. Finally, two separate clusters of the original points

will be obtained for the quantification of bucking out-of-plane displacements. To quantify the out-

of-plane displacement for the steel plate damper, a reference plane should be selected. The

reference plane may be arbitrarily defined by users, or defined as the undeformed (undamaged)

configuration of the structure (Figure 4.30).

109
Figure 4.29 Illustration of DBPPC method

Figure 4.30 Illustration of reference planes

110
4.3.2.5 Accuracy validation

The reconstruction by structure from motion algorithms can achieve high accuracy in both

field applications and laboratory settings as presented in Section 4.2. To further demonstrate this

in the scenario of reconstructing steel plate structures considered in this research, this section

investigates the accuracy in displacement quantification results of the steel plate damper.

In this study, about 300 images are extracted from a 4K video recorded for the steel damper,

which are used to reconstruct the 3D point cloud. The reconstructed point cloud is then calibrated

to the real-world scale using the ground-controlled points. Next, after multi-view vision-based 3D

object detection and point cloud postprocessing, the buckled plate is isolated. The proposed

DBPPC method is applied to separate the two layers of points of the plate, where the vertical grid

step size is set constant as 10, and the distance threshold is set as 3. As the two clusters lead to

very similar quantification results, one cluster (i.e., the one in the top right corner in Figure 4.30)

is selected to generate a detailed out-of-plane displacement distribution plot. Figure 4.31 presents

the out-of-plane displacement distribution, where the horizontal axis represents the longitudinal

direction of the plate, and the vertical axis represents the vertical direction of the plate. The

quantification results of the steel plate damper are compared with the ground truth results

(measured by a Fowler digital caliber which has an accuracy of 0.02mm) at the 4 BPs of the

damper, as presented in Table 3. The locations of the 4 benchmark points (BP) have been indicated

in Figure 4.31 and Figure 4.32, where two of the four BPs are selected at the top edge, while the

other two are selected at the bottom edge. The mean error of the benchmark points is estimated to

be 1.07mm. The maximum error is 1.30mm observed at BP 1, and the minimum error is 0.85mm

111
at BP 4. This shows the proposed method can quantify the out-of-plane structural displacements

of steel plate structures at high accuracy.

Table 4-7 Comparison of the estimated out-of-plane displacements with the ground truth values at four

benchmark points

Proposed method Ground truth Error


(mm) (mm) (mm)
Point 1
11.93 13.23 1.30
(x= -25.2)
Point 2
9.92 8.89 1.03
(x= -25.2)
Point 3
-10.52 -11.62 1.10
(x= 46.6)
Point 4
-8.81 -9.66 0.85
(x= 46.6)
Average - - 1.07

112
Figure 4.31 Out-of-plane displacement measurements for the steel plate damper. The units are in [mm]

Figure 4.32 Illustration of benchmark points

It should be noted that, depending on the accuracy needed for reconstructing different types

of structures, the number of images will be varied. For large structure applications such as

113
buildings, more images should be collected. This can be efficiently achieved using drones

equipped with consumer-grade camera recording video at sufficiently high resolution (e.g., 1080p,

4K) and frame rates (e.g., 60 frames per second).

4.3.3 IMPLEMENTATION

The proposed method is implemented on a full-scale steel corrugated plate wall, which has a

wide range of structural engineering applications.

4.3.3.1 Description of image collection setup

The image database is established using a smartphone camera with automatic capturing mode.

There are 495 images collected for the steel corrugated plate wall with a resolution of 3840 x 2160

(which were extracted from a 4K video recorded for about 20 seconds). This can also be achieved

efficiently using drones recording videos of reasonably high resolution. In field applications, there

may not be a sufficient open field of view to capture images at 360 degrees. To demonstrate the

effectiveness of the proposed methodology under a relatively small field of view, the video for the

steel corrugated plate wall was recorded within about a 120-degree field of view on one side of

the corrugated panel.

4.3.3.2 Dense point cloud 3D reconstruction

As shown in Figure 4.33, with data association, structure from motion, multi-view stereo, and

point cloud preprocessing, the point clouds are successfully reconstructed for the steel corrugated

plate wall. Sample images in the input database and scene graph are also shown for illustration
114
purposes. Implementation of the vision-based 3D reconstruction procedures including data

association, structure from motion, and multi-view stereo are done in Meshroom (Python API) and

Metashape (Python API). The point cloud preprocessing (cleaning) procedures are performed in

Open3D (Python API) which is an open-source library for 3D data processing.

4.3.3.3 Out-of-plane displacement quantification

After the proposed 3D object detection method is applied, the point cloud of the steel

corrugated plate wall is extracted. In this case, only one iteration of the plane segmentation is

applied to remove the ground plane successfully, as shown in Figure 4.34. Next, the out-of-plane

displacement quantification results are reported. For convenience, the reference plane is manually

defined as a flat plane (following the centerline of the corrugation) as shown in Figure 4.35. Once

the reference plane is defined, the relative out-of-plane displacements can be determined, as shown

in Figure 4.36. This shows the developed framework can be effectively used to provide detailed

out-of-plane displacements of steel plate structures. The quantification procedures are

implemented in MATLAB R2021a.

115
Figure 4.33 Vision-based 3D reconstruction procedures for the steel corrugated plate wall

Figure 4.34 One iteration of plane segmentation for the steel corrugated panel. The units are in [mm]

116
Figure 4.35 Illustration of the reference plane for the steel corrugated plate wall. The units are in [mm]

117
Figure 4.36 Quantification of out-of-plane displacement distribution for the steel corrugated wall panel.

The units are in [mm]

118
4.3.4 Conclusions

Steel plate structures are commonly used as load-bearing elements and energy dissipation

devices. Buckling is one of the dominant damage types experienced by many steel plate

components. In this section, a 3D vision-based pipeline has been developed and implemented to

quantify out-of-plane damage for steel plate structures. The framework consists of a 3D vision-

based scene reconstruction pipeline, a newly proposed multi-view 3D object detection method,

and point cloud postprocessing methods including a newly proposed DBPPC algorithm. The

results indicate the proposed framework can successfully reconstruct the steel plate structures,

effectively localize the steel components from a 3D scene, and accurately quantify the out-of-plane

displacements with an accuracy of ~1mm. The main contributions and novelties of the section are

concluded: a) A novel 3D vision-based framework has been implemented to quantify out-of-plane

displacements of steel plate structures. b) A novel vision-based multi-view 3D object detection

method is implemented to detect structural components. c) The proposed method provides a non-

contact measurement approach, and a more economical solution in both equipment and setup cost,

compared to traditional measurement methods using contact-type displacement sensors or laser

devices; d) The proposed method provides finer measurement results compared to traditional

contact-type displacement sensors which measure displacement at limited sensor locations. e) The

proposed method can provide information to perform geometry model updating of steel plate

structures in the field.

There are certain limitations in this study. Therefore, recommendations for further studies are

also summarized: a) This study is focused on out-of-plane displacement quantification of two

common steel plate structures which have primary failure mode as buckling. In the future, the

119
damage quantification algorithms should be further developed and quantified for more

complicated structural steel assemblies. b) 3D vision-based quantification of multiple damage

types of steel structures should be further investigated, such as a combination of buckling, fracture

and steel cracks, etc.

4.4 Vision-based SHM methods for structural bolted components

4.4.1 Introduction

Structural bolts are critical parts to connect structural elements in place. Structural

components such as beams-column joints and column-base connections can experience complete

failure if the bolts get loosened to a certain level, which may result in a catastrophic system-level

collapse. Besides, some of the innovative energy dissipation devices such as friction dampers

heavily rely on the bolts to generate the desired friction force. The seismic energy absorption

potential of such damping devices will deteriorate with the loosening of bolts, and consequently,

affect the global performance of the building. Therefore, robust monitoring methods should be

developed to detect damages in bolted components, and if the damage has been found, repair or

replacement actions should be applied to maintain the structural integrity, prior to extreme natural

hazards.

Earlier, traditional structural health monitoring (SHM) methods (Wang et al., 2013) were

developed to replace time-consuming manual inspections of bolted connections. In general, SHM

methods using contact-type sensors identify damage based on the structural modal properties (i.e.,

stiffness and damping), which are related to natural frequencies and mode shapes. The contact

sensor-based SHM methods to identify bolt loosening have also been developed in recent years

120
(Yang & Chang, 2006; Wang et al., 2013; Sevillano, Sun, & Perera, 2016). However, these

contract sensor-based methods have several limitations. Contact sensors are unreliable when

subjected to changes of environmental conditions, such as temperature and humidity, which could

lead to false detection (Xia et al., 2012; Li, Deng, & Xie, 2015). These methods require dedicated

experts to set up the sensors, high-precision instrumentation, and a software package to account

for environmental variation effects (Huynh, & Kim, 2017; Huynh, & Kim, 2018). Moreover, in

the case of bolt loosening detection, these methods could recognize damage in the bolted

assemblies, but could not precisely localize the loosened bolts (Ramana, Choi, & Cha, 2019). Such

methods are labor-intensive, expensive, and may be impractical in real-world applications to assess

the bolt loosening in a device with a large number of bolts of different types, which requires many

sensors to be set up differently.

In recent years, vision-based SHM has evolved as a reliable and efficient way for structural

damage detection in various civil engineering applications. In comparison to the contact-type

sensors, vision-based methods offer non-contact detection, and have low sensor cost, and easier

installation and operation. For example, a simple commercial-grade camera can be easily fix-

mounted to a beam-column bolted connection and capture multiple bolts at the same time. The

image or video data can be acquired wirelessly in real time, and efficiently processed and analyzed

by modern consumer-grade computers. In addition, due to the nature of cameras, the damage

information encoded in the image is not affected by changes in temperature or humidity.

In recent years, convolutional neural networks (CNNs), which fall into a category of deep

neural networks (or deep learning), have been shown to prominently outperform traditional image

processing techniques (IPTs). CNN-based vision methods have been effectively implemented in

121
the damage detection of various structural components (e.g., Cha, Choi, & Büyüköztürk, 2017;

Xu, Gui, & Han, 2020; Azimi, & Pekcan, 2020; Miao, Ji, Okazaki, & Takahashi, 2021; Gao, Zhai,

& Mosalam, 2021; Sajedi, & Liang, 2021). To date, a limited number of vision-based studies for

bolt loosening detection were conducted. In summary, most of the existing studies on vision-based

bolt loosening detection have been developed upon 2D computer vision, which is generally

categorized into front view-based detection and side view-based detection methods.

Front-view-based bolt loosening detection typically consists of the localization of bolts in an

image captured from the front view, followed by quantification of the bolt loosening rotation angle.

Many existing front view-based detection methods rely on Hough Transform (HT) algorithm

(Hart, & Duda, 1972), such as the work by Park, Kim, & Kim (2015) or Park, Huynh, Choi, &

Kim (2015). Kong & Li (2018) adopted an image registration approach for bolt loosening angle

estimation. Huynh et al. (2019) employed HT and R-CNN algorithms to estimate the bolt rotation

angle, which was validated on a full-scale bridge connection using an unmanned aerial vehicle. Ta

and Kim (2020) further applied similar algorithms to detect a combination of bolt loosening and

corrosion. Zhao and Wang (2019) applied a dual-class single shot detector to localize the bolt and

a specific symbol on the bolt simultaneously, where the bounding box locations are used to

quantify the bolt rotation angle. Although the results of these studies have been promising, their

methods bear several main limitations. To wit:

a) The methodology relied on the HT algorithm to detect lines and circles, which may not

perform well when the bolt assembly becomes more complicated, with the existence of washers,

or in the situation of light reflection on the bolts, shades of the surrounding objects on the bolts, or

the presence of background noise in reality;

122
b) In the HT-based studies, the estimation of rotation angle is conducted on two static images,

using geometric transformation analysis of edge lines of bolts. This only works effectively when

the rotation angle between these two images is less than 60 degrees, due to the geometric nature

of the hexagon-shaped bolt nuts examined. This constraint may not be satisfied in real-world

scenarios when structures are experiencing severe shaking (due to major earthquakes) and the bolts

get loosened by a relatively large angle (above 60 degrees).

c) Despite the RCNN methods implemented in some of these studies provide reasonable

accuracy with proper training, the speed of RCNN is too slow for real-time applications (Redmon,

& Farhadi, 2018). To be more specific, the rotational speed of bolts under severe earthquake

shaking can be relatively high (e.g., up to or even over 90 degrees per second). In order to ensure

the tracking method can track bolt rotation, it requires the rotation of bolts to be less than 60

degrees between two adjacent video frames. This means the minimum processing speed for real-

time detection and tracking of bolt rotation is 1.5 FPS (= 90 degrees per second / 60 degrees per

frame), which cannot be achieved by the RCNN method implemented on consumer-grade

computers.

d) In the study presented by Zhao and Wang (2019), the dual-class single shot detector is

trained to detect a specific symbol on the bolt. However, different types of bolts have different

symbols. The trained detector cannot be directly applied to other types of bolts without excessive

training for many different bolts. Moreover, this method is not very accurate, because the predicted

bounding box is typically subjected to jittering, which cannot precisely localize the symbols and

bolts.

123
On the other hand, side view-based detection methods localize bolts and quantify bolt

loosening in images captured from the side view. The bolt loosening length along the longitudinal

direction of the bolt can be estimated. In recent years, side view-based detection methods have

been implemented, such as the integration of HT and support vector machine for bolt loosening

quantification (Cha et al., 2016; Ramana et al., 2017; Ramana et al., 2019). These studies were

more focused on qualitative evaluation by localizing the bolts in an image and assigning

“loosened” or “non-loosened” labels to the detected bolts. No precise bolt loosening

quantifications were implemented. Zhang et al. (2020) employs Faster R-CNN to localize tight

and loose bolts, and quantify the extension length of the bolt loosening. While this method shows

the capability of localizing the bolts, the extension length quantification requires the camera to be

placed at an appropriate angle to achieve high accuracy. Besides, a reference ruler is used in this

method to aid the quantification process. Although this can be practically done, it requires more

human interventions. More recently, Zhang and Yuen (2021) proposed a bolt loosening

quantification method using an orientation-aware bounding box approach, which can address

multiple orientations of bolt assemblies in the image. The method uses the aspect ratio of the

loosened bolts as the looseness quantification metric. However, the aspect ratio does not provide

a direct measurement of the longitudinal loosened length. Moreover, the experiments in this study

were conducted on bird-eye view images, this means the quantification results need to be further

geometrically adjusted manually to determine the exact loosening length.

124
4.4.2 Overview and application scenarios of the proposed bolt loosening quantification

methodologies

To address these issues, in this research, two methodologies are proposed. The first method is

a front view-based method, while the second method is a full-view (i.e., full 3D view) method.

The first method is built upon 2D vision methods, which should be considered as long-term

monitoring solutions for bolts. In this case, cameras are assumed to be preinstalled at appropriate

locations and facing towards the front face of the bolt cap. The first proposed method is aimed at

addressing the limitations of the existing front view-based method as aforementioned.

On the other hand, the second method is built upon 3D vision-based evaluation pipeline, which

reconstructs the bolted devices from 2D images captured. The application of such a method should

be considered for post-disaster inspection where photos or videos of the bolted devices can be

taken at multiple locations and angles. This can be conducted manually, or more efficiently by

UAVs or UGVs equipped with cameras recording videos at high frame rates (e.g., 30/60 frames

per second). The second proposed method is aimed at addressing the limitations of the existing

side view-based method as previously described.

4.4.3 Methodology – method 1

In the first method, a combined 2D vision-based real-time detect-track (RTDT) method,

named RTDT-bolt, for bolt rotation is proposed. The proposed method is the first-of-its-kind to

detect and track bolt rotation interactively, using vision-based techniques. The procedures have

been briefly summarized herein, and the implementation details will be explained in Section 4.4.5.

First, the object detection algorithm, YOLOv3-tiny, which was built upon the original architecture

125
of YOLOv3 (Redmon, & Farhadi, 2018), was trained for accurate real-time bolt localization in an

image, under various lighting conditions. YOLOv3-tiny has a reduced depth of the convolutional

layers compared to the original YOLOv3, thus greatly improving the detection speed, while still

maintaining competitive accuracy. Second, feature points (FPs) are generated using the Shi-

Tomasi corner detection algorithm (Shi, &Tomasi, 1994) within the regions of interest (ROIs)

detected by YOLOv3-tiny, and tracked through Kanade-Lucas-Tomasi (KLT) optical flow

tracking algorithm (Tomasi, & Kanade, 1991). The geometric transformation analysis based on

M-estimator Sample Consensus (MSAC) algorithm (Torr, & Zisserman, 2000) is then developed

to estimate the frame-to-frame rotation. Third, given that the optical flow algorithm tends to fail

in the presence of sudden changes in pixel values, likely due to illumination changes and

accumulated errors from background noise during tracking (Nixon, & Aguado, 2019), to address

this issue, the tracking algorithm is combined with the YOLOv3-tiny algorithm to re-detect the

target when the tracking gets lost. The tracking continues with the new FPs generated every time

when the new detection is imposed. The proposed method can allow the users to continuously

track the bolt rotation in real time. To demonstrate the effectiveness and examine the potential

limitations of the proposed method, extensive parameter studies have been conducted, including

the number of image pyramid levels (NP), bi-directional error threshold (BE), search block size

(BS), and the maximum number of iterations during tracking (NI). Details of such parameters will

be explained in Section 4.4.3.5.

It should be noted that the traditional HT-based method may potentially be used to monitor

the total rotation greater than 60 degrees, provided that the HT algorithm can reliably identify the

bolt edges, and also the incremental rotation of the bolt between two adjacent frames is less than

126
60 degrees. However, the processing of the HT algorithm to detect edges of the bolt may not

perform well in the relatively complicated situations aforementioned. Even if the HT algorithm

can accurately detect the edges of the bolt in every frame, the processing of such an algorithm on

all the frames is time-consuming, compared to the optical flow tracking algorithms (Nixon, &

Aguado, 2019).

4.4.3.1 Overview of real-time integrated detection and tracking framework

Figure 4.37 depicts the integrated method, RTDT-bolt, to robustly monitor the rotation of

structural bolts. First, YOLOv3 has been adopted and modified to create a new version of

YOLOv3-tiny to localize the bolts with ROI bounding boxes, in the 1st video frame (Pan & Yang,

2021). The Shi-Tomasi algorithm is then employed to identify high-quality FPs within the ROIs

for tracking purposes. Second, the optical-flow KLT feature-tracking algorithm is applied to track

the FPs generated, from frame to frame. Third, the YOLOv3-tiny algorithm will be integrated with

the KLT tracking algorithm to ensure the high performance of tracking. The YOLOv3-tiny

detector will generate new ROIs for the bolts, if the number of FPs being tracked falls below a

certain threshold (e.g., less than 50% of the initial number of FPs identified). This will not only

eliminate the loss of tracking problem due to external environment changes, such as changes in

lighting conditions, but also effectively reduce the accumulated error from long-time tracking.

Lastly, the total rotation angle of the bolt can be calculated as the sum of the rotation angle

determined at each detect-track interval. Specific details of these procedures are discussed in

Section 2.2 to Section 2.4.

127
Figure 4.37 Flowchart of the RTDT-Bolt method

4.4.3.2 YOLOv3-tiny real-time detection networks

In this section, the architecture of the YOLOv3-tiny detector for bolt localization is presented

are described. As depicted in Figure 4.38, in general, YOLOv3-tiny consists of a series of

Convolution, Batch Normalization, Leaky ReLU (Conv-BN-Leaky ReLU) blocks, and max

pooling layers. The input image will be resized to 416 x 416 in width and height, before entering

the networks for training. During training, YOLOv3-tiny divides the image into 26x26 grid cells.

Each grid cell has 3 anchor boxes, of which each has an object score, multiple class scores

(depending on the number of classes being detected), and 4 bounding box coordinates. In this case,

the number of classes is one, for structural bolts. Consequently, the output of YOLOv3-tiny has a

dimension of 26 x 26 x 18, where ‘26’ represents the number of grid cells for each output, and ‘18’

represents each of the three anchor box class scores, object scores, and bounding boxes (i.e., 1

class score for bolt + 4 bounding box values + 1 object score = 6 values. 6 values x 3 anchor boxes

= 18 values).

128
Figure 4.38 Architecture of the YOLOv3-tiny object detector

The training data were collected in the structural laboratory at The University of British

Columbia. The friction damping device recently developed at UBC was selected to demonstrate

the integrated method (Figure 4.39). First, bolt rotation was achieved by rotating the bolt on the

backside of the damping device. Meanwhile, the videos of bolt rotation were recorded by the

iPhone Xs Max smartphone device on the front side, at 4K video quality settings. The phone

camera was placed at various angles in order to generate more variety in the dataset. Then, the

video frames were processed and extracted to generate the training images for YOLOv3-tiny.

Standard data augmentation techniques such as cropping, horizontal flipping, small translations,

rotation, and small scaling are also applied such that the object being localized is still included in

the augmented images. In this case, 3808 images were generated from the original frames of all

video files.
129
Figure 4.39 Image of the experimental bolted component

The proposed integrated method is designed to deal with illumination changes whose effects

have rarely been studied by existing structural engineering vision-based research, but can happen

quite often in real-world situations. In order to enhance the data variety and enhance the robustness

of the object detector against illumination changes, a lighting-condition-oriented data

augmentation approach (Chaichulee et al., 2017) is applied to generate three different lighting

conditions for an image. The procedures are briefly explained as follows: a) convert the image

from red-green-blue (RGB) space to hue-saturation-lightness (HSL) color space; b) obtain the

histogram of average lightness value of all the original images; c) evenly divide the histogram into

3 sections and compute the mean of lightness for each section; d) for each original image falling

in one section, two more images are generated by scaling its average lightness to the other two

mean of lightness levels in the other two sections, respectively. Details of such augmentation

130
method are described in (Chaichulee et al., 2017). In this regard, the database after the lighting

augmentation contains 3808 × 3 = 11424 images.

In the end, 70% of the augmented image database is assigned as the training dataset and the

rest is selected as the testing dataset. Consequently, 11424 × 0.7 ≈ 7997, and 11424 × 0.3 ≈

3427 images are distributed for training, and testing sources, respectively. The images are resized

to 416 × 416, before being input into the networks for training. Then, the selection of anchor boxes

is conducted for training data using the methodology presented in Redmon and Farhadi (2018) and

Pan and Yang (2020). The anchor box dimensions will be utilized by YOLOv3-tiny to predict the

bounding box location for objects in input images. The training of YOLOv3-tiny was implemented

by back-propagation and stochastic gradient descent with momentum (SGDM), where the learning

rate was chosen as 0.001, the mini-batch size was chosen as 6 and the maximum number of training

epochs was set to 80.

In addition, to demonstrate the speed and accuracy of the YOLOv3-tiny against the existing

object detection algorithms, the RCNN and YOLOv3 detection algorithms are also implemented,

respectively. The RCNN algorithm is built on AlexNet, which is adopted from Ta and Kim (2020).

Similarly, the training of RCNN was implemented by back-propagation and stochastic gradient

descent (SDG), where the learning rate was chosen as 0.000001, the mini-batch size was chosen

as 32 and the maximum number of training epochs was set to 10. The YOLOv3 algorithm is built

on Darknet-53, which is adopted from Redmon and Farhadi (2018). The training setting of

YOLOv3 is the same as that of YOLOv3-tiny. All the training was implemented in MATLAB

R2021a (MATLAB, 2021) on two computers: a Lenovo Legion Y740 (a Core i7-8750H @2.20

GHz, 16GB DDR4memory and 8GBmemory GeForce RTX 2070 max-q GPU), an Alienware

131
Aurora R8 (a Core i7-9700K@3.60 GHz, 16 GB DDR4 memory and 8 GB memory GeForce RTX

2070 GPU).

4.4.3.3 RTDT-Bolt method

In this section, the integrated RTDT-Bolt method, for robust detection and tracking of bolt

loosening is described. Earlier research (Park et al., 2015b; Huynh et al., 2019; Ta, & Kim, 2020)

has demonstrated the effectiveness of the vision-based methods for bolt rotation estimation.

Although their results are promising, these studies have several limitations aforementioned. In

particular, the HT algorithm employed in these studies may not be able to accurately detect lines

and circles in complex images, e.g., when the image contains washers, light reflections, shades of

surrounding objects, or background noise. To illustrate this challenging phenomenon, the HT

algorithm using three different methods (i.e., Canny, Prewitt and Log) is applied to three types of

images, including the original image, smoothed image, and sharpened image. The original image

is cropped from an image of the friction damping device presented. The smoothed image is

obtained by applying the Gaussian image filter with a standard deviation of two to the original

image, while the sharpened image is generated by subtracting a blurred (unsharp) variant of the

image from itself. The original, blurred and sharpened images were processed by the three methods

(i.e., Canny, Prewitt and Log), respectively, to identify edges in the images. Then, the HT

algorithm is used to identify the straight line edges, which is achieved by collecting the votes of

the identified edges by the three methods in the Hough space and selecting the highest votes as the

fitted line edges. The procedures were implemented in MATLAB R2021a, where the edge

sensitivity threshold for the three HT methods is set as [0.1 0,8], 0.05, and 0.004, respectively.

132
Results in Figure 4.40 and Figure 4.41 indicate that the identification of the hexagon-shaped edges

of the bolts in our experiments is difficult using HT methods. The neighboring object(s), circular-

shaped washer, shades, and background noise can cause the algorithm to fail in identifying bolt

edges. Besides, although the RCNN method employed by some of these studies provides good

localization accuracy, the architecture of the RCNN is relatively heavy. Hence, its speed is too

slow for real-time applications when it is desired. On the other hand, the effectiveness of optical-

flow-based tracking algorithms has been demonstrated in vision-based structural motion tracking

(Ji, & Chang, 2008; Chen et al., 2015; Zheng, Shao, Racic, & Brownjohn, 2016; Cha, Chen, &

Büyüköztürk, 2017; Kuddus et al., 2019). Two main limitations have been identified. First, these

studies focused on the extraction of horizontal or vertical translations, the investigation of rotation

estimation of structural components that exhibit rotation behavior, such as bolts, remains very rare.

Second, although these studies have shown promising results, there existed several challenges,

such as outdoor lighting conditions. In essence, the optical flow methods are known to work well

for tracking objects that have a rigid-body profile and distinct visual texture, but it tends to fail in

the situation of sudden external environment changes, such as a change in outdoor lighting

conditions, light reflection, or shades of neighboring objects (Nixon, & Aguado, 2019).

133
Figure 4.40 Image preprocessing of the structural bolts.

134
Figure 4.41 Hough transformation of the original image, smoothed image, and sharpened image, using

the Canny method, Prewitt method and Log method, respectively.

The proposed RTDT-Bolt method is aimed to address these issues. The scope herein is to, a)

achieve real-time performance in both detection and tracking; b) provide solutions to measure the
135
rotation of bolts up to any range; c) enhance the robustness of traditional KLT tracking algorithms

against illumination changes and background noise. Detailed implementation of the proposed

RTDT-Bolt method is described herein. First, the YOLOv3-tiny algorithm is implemented to

generate the bounding box (i.e., ROI) for each bolt in the 1st frame of the video. Second, these

ROIs will be extracted from the original video frame and the Shi-Tomasi algorithm is applied to

generate FPs inside the ROIs. This step is essential to eliminate the need to process the entire video

frame, but rather focus on the ROIs, thus greatly reducing the computational burden. There are

two essential parameters involved in the Shi-Tomasi algorithm, including the minimum quality

measure, and the Gaussian filter dimension. The minimum quality measure determines the

minimum threshold below which the FPs will be discarded. It is recommended to set a reasonably

large value to eliminate low-quality points. The Gaussian filter dimension determines the

dimension of the Gaussian filter used to smooth the gradient of the input image. In this study, the

minimum quality of the Shi-Tomasi FP generation is set as 0.2, while the Gaussian filter dimension

is set as 5.

Further, the KLT tracking algorithm initiates on the FPs generated. As the tracking moves

forward, some FPs may get lost, due to external environmental changes such as lighting conditions,

the variation of background noise, or the change of the relative location of cameras with respect to

the bolts under severe earthquake shaking. In this case, if the number of the FPs being tracked is

below a predefined threshold, the tracking is considered lost. In this case, the real-time detector,

YOLOv3-tiny, will interfere with the specific frame where the tracking gets lost, and generate new

ROIs again. New FPs will be generated inside the ROIs and the tracking continues on the new FPs

in the same way as before. The above detect-track steps are repeated whenever the tracking is

136
considered lost throughout the videos. Meanwhile, the geometric transformation matrix about the

origin, 𝑇, can be evaluated based on the locations of FPs obtained by the KLT tracking algorithm

between two adjacent frames (MathWorks, 2021). Similar to Kuddus et al. (2019), the MSAC

algorithm (Torr & Zisserman, 2000) is applied to remove outliers in this step. Then, the rotation

angle can be extracted from the transformation matrix using the following steps. Consider a rigid-

body object in MATLAB image coordinate system are rotated by 𝜃 angle about the origin, and

translated by 𝑡𝑥 pixel in the horizontal direction, and 𝑡𝑦 pixel in the vertical direction, which can

be expressed by:

[𝑥𝑖+1 𝑦𝑖+1 1] = [𝑥𝑖 𝑦𝑖 1] ∗ 𝑇 (4.7)

where 𝑥𝑖+1 , 𝑦𝑖+1 are the horizontal and vertical pixel coordinates, respectively, of a feature

point on the object in the (𝑖 + 1)th frame. Similarly, 𝑥𝑖 and 𝑦𝑖 are those in the 𝑖th frame. 𝑇 is the

2D affine geometric transformation matrix about the origin as defined in MATLAB:

𝑐𝑜𝑠𝜃 𝑠𝑖𝑛𝜃 0
𝑇=[−𝑠𝑖𝑛𝜃 𝑐𝑜𝑠𝜃 0] (4.8)
𝑡𝑥 𝑡𝑦 1

Further, the transformation matrix 𝑇 ∗ with respect to an arbitrary point (e.g., the center of

rotation of bolt) with coordinates of (a, b) can be derived based on 𝑇,

[𝑥𝑖+1 𝑦𝑖+1 1] = [𝑥𝑖 − 𝑎 𝑦𝑖 − 𝑏 1] ∗ 𝑇 + (𝑎 𝑏 0) (4.9)

Simplifying equation (4.9) gives:

1 0 0 0 0 0
[𝑥𝑖+1 𝑦𝑖+1 1] = [𝑥𝑖 𝑦𝑖 1] ∗ ([ 0 1 0] 𝑇 + [ 0 0 0]) = [𝑥𝑖 𝑦𝑖 1] ∗ 𝑇 ∗ (4.10)
−𝑎 −𝑏 1 𝑎 𝑏 0

137
where,

1 0 0 0 0 0

𝑇 =[ 0 1 0] 𝑇 + [0 0 0] (4.11)
−𝑎 −𝑏 1 𝑎 𝑏 0
It is observed that the 1st row and 2nd row of 𝑇 ∗ are the same as those of 𝑇. Therefore, the

incremental bolt rotation angle can be easily extracted from the transformation matrix 𝑇 estimated.

Finally, the incremental rotation estimated at each interval, is summed up to determine the total

rotation angle of the bolt, φ. The time history of the rotation can also be generated. The procedures

were implemented in MATLAB R2021a.

4.4.3.4 Evaluation of the ground truth rotation angle

To assess the feasibility of the proposed integrated method, it is necessary to obtain the

ground-truth value for the rotation of the bolt. This can be done using the following simple steps:

a) manually label the line edges of the bolts, at an appropriate interval of video frames such that

the rotation experienced by the bolt does not exceed 60 degrees within each interval; b) similar to

the method presented by Ta and Kim (2020), apply geometric transformation method for all the

labeled line edges, and compute the rotation of the bolt, 𝜃𝐺𝑇,𝑗 , as the mean rotation of all the line

edges, for each interval, 𝑗; c) sum up the rotation for each interval to determine the total ground-

truth rotation angle, 𝜑𝐺𝑇 , as shown below,

𝑛=𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙𝑠

𝜑𝐺𝑇 = ∑ 𝜃𝐺𝑇,𝑗 (4.12)


𝑗=1

In the end, the accuracy of bolt rotation estimation in percentage can be determined as follow

(when 𝜑𝐺𝑇 ≠ 0),

138
𝜑−𝜑𝐺𝑇
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = max (0, 1 − | |) (4.13)
𝜑𝐺𝑇

where 𝜑 is the estimated total rotation angle by the RTDT-Bolt method, and 𝜑𝐺𝑇 is the ground

truth rotation angle.

4.4.3.5 Description of parameter studies

In order to examine the capability and potential limitations of the proposed RTDT-Bolt

method, extensive parameter studies are conducted. In this section, the scope of the parameter

studies is described. The sensitivity of results to the selection of detection and tracking parameters

is reported and discussed in detail.

Given that the KLT algorithm is the essential part of the proposed RTDT-Bolt method, the

parameters of the KLT algorithm are investigated with the values shown in Table 4-8. These

parameters are explained as follows:

a) Number of pyramid levels (NP): The KLT tracking algorithm generates an image pyramid,

where each subsequent level of the pyramid decreases in resolution by a factor of two compared

to the previous level. If the number of pyramid levels is set to greater than one, the algorithm tracks

the points at multiple levels of resolutions, which may potentially enhance the tracking

effectiveness. However, as the computational cost increases with the increase in the number of

pyramid levels, it is recommended to select an appropriate value to balance between speed and

accuracy.

b) Bi-directional error threshold (BE): The bi-directional error is calculated based on the FPs

in the two adjacent frames. Essentially, the algorithm conducts forward-backward tracking. It

tracks the FPs from the preceding frame to the current frame, and then traces back the same FPs
139
to the previous frame. The bi-directional error represents the pixel distance in the image coordinate

system, between the original location of the points and the backward-tracing location. The FPs

will be abandoned when the error associated with them is greater than the threshold.

c) Search block size (BS): This metric determines the neighboring area around the point being

tracked. The computational time increases as the block size increases.

d) Maximum number of iterations (NI): This parameter is the maximum number of iterative

searches performed by the KLT algorithm to determine the new location of each FP until it

converges.

Table 4-8 Parameters examined for KLT tracking algorithms

Examined values Value of the base model


NP 1,2,3,4 3
BE 2,6,10,20 6
BS 5, 11,21,31 5
NI 10, 20, 30, 40 30

The parameter studies are initiated from a base model, whose selection of values for each

parameter is also provided in Table 4-8.

4.4.4 Methodology – method 2

In the second method, 3D vision-based bolt loosening quantification is described. As shown

in Section 4.4.1, existing side view-based quantification methods developed in 2D computer vision

have several limitations aforementioned, the dissertation develops a 3D vision-based bolt

loosening quantification pipeline. The proposed methodology consists of vision-based 3D scene

reconstruction, multi-view CNN-based detection of structural bolted devices, and bolt loosening
140
length quantification, as shown in Figure 4.42. Image sources for 3D reconstruction can be

obtained using common consumer-grade cameras such as smartphone cameras, or unmanned aerial

vehicles (UAVs) with appropriate camera specs. In this study, a structural bolted device with

loosened bolts at UBC structural laboratory is selected to examine the effectiveness of the

proposed methodology. Other similar experimental setups have been proposed in recent years (Cha

et al., 2016; Ramana et al., 2017; Ramana et al., 2019; Zhang et al., 2020; Zhang & Yuen, 2021).

At the time of writing, to the best of the author’s knowledge, the proposed methodology is the

first-of-its-kind using 3D vision, multi-view CNN-based methods, and advanced point cloud

processing, for bolt loosening quantification in the structural engineering field. The proposed

methodology is fully automated which does not require a specific camera viewing angle for

accurate quantification, as opposed to the existing 2D vision-based methods described in Section

4.4.1.

Figure 4.42 3D vision-based bolt loosening quantification methodology.


141
4.4.4.1 Vision-based 3D reconstruction

The procedures for the 3D reconstruction of the bolted device are similar to those presented

in Section 4.2 and Section 4.3. As shown in Figure 4.43, the procedures consist of data association

which determines the image-to-image connection of a pair of unstructured images, structure-from-

motion which determines sparse point cloud of the plate structure from the estimated camera poses,

and multi-view stereo which generates a dense point cloud of the plate structure based on the sparse

point cloud and the input RGB images. Details of the 3D reconstruction pipeline have been

described in Section 3.7.

142
Figure 4.43 Vision-based 3D reconstruction of the structural bolted device

143
4.4.4.2 Multi-view structural bolted device detection

Once the scene point cloud is generated, a multi-view object detection method based on

YOLOv3-tiny is applied, as previously presented in Section 4.3.2. In this case, the YOLOv3-tiny

is retrained to localize the structural bolted device. The scene cloud is projected to two predefined

views where the YOLOv3-tiny is applied to generate the bounding box on the rendered views,

respectively. The generated bounding boxes on the two view planes will be fused to extract the

point cloud of the structural bolted component from the original 3D scene cloud (Figure 4.44).

144
Figure 4.44 Structural bolted device localization
145
4.4.4.3 Structural bolt loosening quantification

This section presents a fully automated bolt looseness quantification method, which consists

of a front-view bolt localization method to find a reference plane, followed by a side-view bolt

localization method for looseness quantification.

4.4.4.3.1 Front-view structural bolt localization

This section describes the bolt localization method, which will provide a reference for bolt

loosening localization in the subsequent section. In this study, localization of the bolts is achieved

by applying CNN-based vision methods on an image which shows a front view of the bolts. This

requires the projection of the point cloud onto the plane which provides a clear front view of the

bolts. Plane fitting is implemented using the MSAC algorithm (Torr & Zisserman, 2000) to

identify multiple planes of the bolted devices. Next, the point cloud is projected onto these planes

detected, which will lead to multiple rendered 2D images. In order to identify the plane which

contains the front face of the bolts, a YOLOv3-tiny detector is adopted from Pan and Yang (2021)

and then deployed to localize the bolts. The plane where front-view bolts are detected will be

deemed as the front-view reference plane.

146
Figure 4.45 Front view-based bolt localization

4.4.4.3.2 Side-view bolt looseness quantification

This section presents the procedures to quantify bolt loosening length, which is defined as the

distance between the bottom of the bolt cap and the supporting plate or surface underneath the

bolt, as illustrated in Figure 4.46.

147
Figure 4.46 Illustration of loosened bolts and tight bolts.

A combined YOLO detection and top-down convex hull (YOLO-TDCH) method is proposed

for automated quantification of bolt loosening length. The detailed implementation procedures are

described herein:

• The predicted bounding boxes by YOLOv3-tiny (in the previous section) will be utilized

to form a cuboid boundary (Figure 4.47) for each bolt along the direction of the normal

vector of the front-view reference plane identified (in the previous section).

• A series of small sub-cloud (with a relatively small step) is sampled along the direction of

the normal vector of the front-view reference plane (Figure 4.48). Then, within the sub-

cloud, the points within each cuboid boundary obtained in step 1 will be projected to the

reference plane. Further, the convex hull of the projected points belonging to each cuboid

boundary will be determined, respectively. The area of each convex hull will be calculated.

148
• The above process is repeated for all the small sub-clouds from the top down. The area of

each convex hull will be consistently checked for each cuboid boundary within each sub-

cloud. The bolt loosening length can be estimated considering the two following criteria.

a) If a bolt is tight (e.g., Bolt 1 and Bolt 3 in Figure 4.46), there will be a relatively constant

convex hull area at the beginning (i.e., bolt cap region) of the sub-cloud stepping-down

process, and then a sudden increase of the convex hull area when the sub-cloud sampling

reaches the structural surface underneath the bolt. b) If a bolt is loosened (e.g., Bolt 2 and

Bolt 4 in Figure 4.46), there will be a relatively constant convex hull area in the first region

(i.e., bolt cap region), followed by a sudden decrease of the convex hull area at the

beginning of the second region (i.e., bolt thread region due to loosening), and then followed

by a sudden increase of the convex hull area when the sub-cloud sampling reaches the steel

surface underneath the bolt. The plane travel distance within the second region will be

determined as the bolt loosening length.

Figure 4.47 Cuboid boundaries formed by YOLOv3-tiny predictions

149
Figure 4.48 Sub-cloud top-down sampling process

Overall, unlike the traditional 2D vision-based methods, the proposed 3D vision-based

YOLO-TDCH method is fully automated during the image and point cloud data processing,

without human intervention to identify reference planes for quantification of bolt loosening length.

4.4.5 Experiments and results – method 1

This section presents the results by method 1, using the RTDT-bolt method. In order to

examine the effectiveness of method 1, a friction damper device is selected at the UBC structural

laboratory.

4.4.5.1 Training and testing results of RCNN, YOLOv3 and YOLOv3-tiny

In this section, the training and testing results of the RCNN, YOLOv3, and YOLOv3-tiny for

bolt localization are presented. Six anchor boxes are selected based on the training data and applied

150
for the training of both YOLOv3 and YOLOv3-tiny, using the methodology described in Section

2.2. The dimensions of the anchor boxes are presented in Table 4-9. Figure 4.49 provides the

precision-recall curves of the three object detectors, for training and testing, respectively. The

performance of an object detection algorithm is usually assessed by the precision-recall diagram

(Everingham, Van Gool, Williams, Winn, & Zisserman, 2010). In short, a low false positive rate

corresponds to a high precision value, and a low false negative rate reflects a high recall value.

The overall performance of the algorithm is reflected by the area under the recall-precision curve,

where a large value of area indicates the detector has both high recall and precision. In other words,

if a detector has high precision but low recall, it can only detect objects in a few sample images,

although the localization accuracy is high once the object is recalled. A detector with high recall

but low precision can retrieve objects in many images, but the localization error is high inside the

images. In the end, the average precision (AP) can be computed from the precision-recall plot, as

the weighted average of precision at each recall value. The AP values for both training and testing

of the three object detectors are presented in Figure 4.49. Overall, all the three detectors have

achieved high AP during training. However, during testing, YOLOv3 and YOLOv3-tiny show

similar performance, while RCNN achieves slightly lower AP. This is because only one class (i.e.,

bolt) needs to be detected, which does not require over-complex CNNs such as RCNN and the

original YOLOv3 to achieve the desired accuracy. Figure 4.50 provides sample images processed

by the YOLOv3-tiny where all the bolts are detected by bounding boxes with a high confidence

score. More sample testing results are presented in Appendix C. In addition, the speed of the three

object detectors was also examined by applying the RCNN, YOLOv3 and YOLOv3-tiny,

respectively, through the full testing set 5 times. The average speed is calculated as the number of

151
frames or images processed per second (FPS). Table 4-10 shows a speed comparison of RCNN,

YOLOv3 and YOLOv3-tiny, using the Alienware Aurora R8 computer and software platform

presented in Section 2.2.2. As shown in Table 4-10, the RCNN method achieves about 0.05 FPS,

while YOLOv3 runs at 2.23 FPS. The speed of RCNN is less than the required minimum speed

(1.5 FPS, assuming 90 degrees per second of bolt rotational speed), which indicates RCNN is too

slow to be adopted in the proposed RTDT method. On the other hand, the speed of YOLOv3 is

very close to the minimum required speed. In the situation of slightly lower hardware specs,

YOLOv3 cannot achieve real-time speed. The proposed YOLOv3-tiny achieves about 25 FPS,

which is about 500 times faster than the RCNN, and about 10 times faster than YOLOv3. This

demonstrates the speed and accuracy of YOLOv3-tiny for localizing the steel bolts in real time.

Table 4-9 Estimation of anchor box dimensions

Index 1 2 3 4 5 6
Width [pixels] 53 47 36 37 32 29
Height [pixels] 42 37 38 35 33 30

Table 4-10 Speed comparison of RCNN, YOLOv3 and YOLOv3-tiny

Object detector RCNN YOLOv3 YOLOv3-tiny


Speed [FPS] 0.05 2.23 25.16

152
(a) Precision-recall curve of training

(b) Precision-recall curve of testing

Figure 4.49 Precision-recall curve for (a) training, and (b) testing

153
Figure 4.50 Sample results of YOLOv3-tiny detection of steel bolts

4.4.5.2 The integrated method against illumination changes

As one of the major goals of this study is to deal with illumination changes, which hampers

the application of traditional optical-flow-based tracking algorithms in real-world situations, this

section intends to showcase the effectiveness of the proposed RTDT-Bolt method against

illumination changes. The RTDT-Bolt method is implemented with the parameters of the base

model. The minimum threshold for the number of FPs is set to 7 in these experiments. If the

number of FPs during tracking is below this threshold, the YOLOv3-tiny re-applies detection, and

new FPs will be generated for continuous tracking. Besides, to demonstrate the advantages of the

integrated method over the traditional optical flow algorithms, the KLT tracking algorithm without
154
the YOLOv3-tiny was also investigated in parallel. Figure 4.51 illustrates the light-changing

scenarios conducted in the laboratory, where the light was switched on and off about every 10

seconds. Figure 4.52 shows a close-up video frame montage of the bolt being processed by the

RTDT-Bolt method. It can be observed the extra light reflection appeared on the surface of the

bolt, when the light was switched on, while disappearing if the light was switched off. The

experiment results indicate the KLT tracking algorithm without the YOLOv3-tiny instantly lost

tracking when the light was switched on for the first time (i.e., at around the 350th frame), and all

the remaining frames of the bolt cannot be tracked. In comparison, the RTDT-Bolt method can

redetect and continue to track the bolt, when the previous tracking got lost due to light change, as

shown in Figure 4.52. The rotation transformation of the points being tracked is imposed on the

ROI bounding box for better visualization. The total rotation angle estimated by the RTDT-Bolt

method is 12.68 rads in the anti-clockwise direction, which corresponds to an accuracy of 95.1%

(with a ground-truth value of 13.25 rads). In addition, the processing speed has also been

examined. The proposed method achieves about 17 frames per second on the original 4K video

frames, and about 325 frames per second on the cropped video frames (cropped from the 4K video

frame using the ROI detected by the YOLOv3-tiny). This demonstrates both the accuracy and real-

time speed of the proposed method in monitoring the bolt rotation angle.

155
Figure 4.51 Montage of videos processed by the RTDT-bolt method: original video frame with the

illustration of the changing light conditions, and a highlight of the bolt under investigation by the rectangular

box; closed-up video frame, with the illustration of detection, tracking, and re-detection. (Note: the frame

index is shown at the top-left corner of each thumbnail image. Frame rate: 30 frames per second)

156
Figure 4.52 Montage of videos processed by the RTDT-bolt method: closed-up video frame, with the

illustration of detection, tracking, and re-detection. (Note: the frame index is shown at the top-left corner of

each thumbnail image. Frame rate: 30 frames per second)

157
4.4.5.3 Parameter studies

Parameter studies are conducted on the proposed RTDT-Bolt method to assess the sensitivity

of the rotation estimation to the selected parameters shown in Table 4-11. There is a total of 4 x 4

x 4 x 4 = 256 runs. In order to maintain the unique controlled parameter in each set of runs, the

light condition is maintained consistently in all the parameter studies. The comparison between

the estimated rotation of the bolts with the ground truth values in both short and long video

scenarios is first summarized in Table 4-12.

Table 4-11 Comparison of the estimated rotation and ground truth rotation for the six bolts in the short

video processed by the base model

Bolt 1 Bolt 2 Bolt 3 Bolt 4 Bolt 5 Bolt 6


Short video (441 frames)
Estimated
rotation 0.026 -0.017 -0.019 0.022 0.024 8.42
[rads]
Ground truth
rotation 0 0 0 0 0 8.45
[rads]
Long video (10,504 frames)
Estimated
rotation 0.034 0.028 -0.026 0.031 0.026 51.61
[rads]
Ground truth
rotation 0 0 0 0 0 54.32
[rads]

158
Figure 4.53 Illustration of bolt index

The results were obtained by the base model whose parameters are shown in Table 4-8. The

results indicate the proposed model can accurately quantify the rotation of the loosened bolts and

non-loosened bolts in both short and long video scenarios. Figure 4.54 shows a detailed

comparison between the time-history rotation of the bolts with the ground truth for bolt 6 in the

short video scenario. The corresponding montage of sample video frames is shown in Figure 4.55,

where two reference lines are used to better visualize the tracking process of the bolt rotation. The

initial location of the bolt is indicated by the line with smaller line width, while the current

rotational position of the bolt is represented by the line with a larger line width. In addition, the

rotation angle corresponding to each thumbnail image is indicated at the bottom right of the image,

while the frame number is shown at the top left of the image.

159
Figure 4.54 Time-history rotation estimation of the bolt in the short video with the base model

Figure 4.55 Montage of sample close-up frames of the short video processed by the base model in

parameter studies (Note: each thumbnail image with the labeled frame index corresponds to the associated

ground truth point in Figure 4.54

Figure 4.56 depicts the effects of these parameters on the rotation estimation results. As shown

in Figure 4.56 (a1), as the number of pyramids increases, the accuracy increases in general, which

is particularly well reflected in the situation of medium accuracy due to the large block size (i.e.,

160
31) being used. However, the number of pyramid levels does not have a substantial effect on the

rotation estimation, when the accuracy is relatively high already, as depicted in Figure 4.56 (a2,

a3). On the other hand, the effect of the maximum bidirectional error on the estimation accuracy

is quite limited in general, as shown in Figure 4.56 (a2, b, c2, and d2). Figure 4.56 (c) indicates

that the search block size has a great impact on the accuracy of rotation estimation. In general, as

the block size increases, the performance of the proposed method decreases. This can be attributed

to the fact that the KLT tracking algorithm is more likely to mismatch the points being tracked at

the current frame, with the incorrect points in the next frame, when the search area becomes larger

where more adjacent (but irrelevant) points are included. Figure 4.56 (c1) also shows that when

the number of pyramid levels is low, as the search block size increases, the accuracy degrades

faster than the case where the number of pyramid levels is set higher. However, the search block

size will have less impact on the performance, if the maximum bidirectional error, the maximum

number of iterations, and the number of pyramid levels are set appropriately, as shown in Figure

4.56 (c2, c3). In these cases, the accuracy converges to about 90%, with the increase of the block

size, maximum bidirectional error, and the maximum number of iterations. Lastly, the maximum

number of iterations has minor effects on the results, as shown in Figure 4.56 (d). This implies that

the number of iterations can be set reasonably low to achieve faster speed, if there exist hardware

limitations.

161
Table 4-12 Complete results of parameter studies, expressed by the accuracy of rotation estimation

(expressed in percentage)

NI = 10 NI = 20 NI = 30 NI = 40
BE = 2 BE = 6 BE = 10 BE = 15 BE = 2 BE = 6 BE = 10 BE = 15 BE = 2 BE = 6 BE = 10 BE = 15 BE = 2 BE = 6 BE = 10 BE = 15
NP =1 99.81% 97.81% 97.83% 97.19% 98.34% 98.31% 98.83% 98.78% 98.63% 98.95% 98.89% 99.06% 98.56% 98.92% 98.83% 98.74%
NP =2 99.40% 96.58% 99.12% 99.16% 99.07% 99.19% 99.34% 99.07% 99.18% 98.86% 98.85% 98.83% 98.66% 98.86% 98.86% 98.42%
BS = 5
NP =3 99.85% 98.59% 97.01% 97.24% 99.38% 99.45% 99.70% 99.61% 99.74% 99.66% 99.51% 99.82% 99.50% 99.52% 99.39% 99.44%
NP =4 99.49% 97.79% 97.78% 97.68% 99.85% 99.43% 99.58% 99.42% 99.86% 99.32% 99.54% 99.64% 99.62% 99.41% 99.48% 99.59%
NP =1 92.11% 91.23% 91.48% 91.02% 93.03% 92.44% 91.51% 92.15% 93.14% 91.99% 92.25% 92.68% 92.86% 92.51% 92.64% 91.90%
NP =2 92.20% 91.05% 91.08% 90.60% 92.10% 93.60% 91.95% 92.12% 92.09% 92.30% 92.39% 92.43% 92.08% 92.44% 92.32% 92.29%
BS = 11
NP =3 91.81% 91.13% 91.39% 92.89% 93.38% 92.12% 91.80% 92.44% 92.01% 91.97% 92.41% 92.38% 92.42% 92.31% 93.03% 92.40%
NP =4 92.14% 91.27% 91.30% 90.86% 92.26% 91.81% 92.16% 91.88% 92.39% 91.95% 92.39% 92.30% 92.10% 92.12% 91.88% 92.42%
NP =1 88.43% 87.30% 88.43% 87.23% 92.92% 87.46% 87.38% 87.24% 88.34% 87.23% 87.30% 92.83% 92.95% 89.43% 88.43% 87.16%
NP =2 87.78% 88.40% 93.61% 87.45% 92.79% 87.82% 87.27% 92.85% 88.22% 88.35% 90.49% 88.01% 88.56% 87.32% 88.34% 87.12%
BS = 21
NP =3 88.43% 87.26% 88.22% 93.13% 88.34% 88.56% 87.18% 88.54% 88.31% 92.79% 88.59% 89.98% 88.39% 92.79% 87.22% 93.00%
NP =4 88.22% 92.97% 87.21% 88.33% 87.65% 88.31% 87.34% 92.84% 84.98% 88.45% 92.91% 88.28% 88.56% 88.40% 92.90% 88.24%
NP =1 90.58% 90.58% 79.84% 85.54% 84.67% 80.27% 79.63% 79.91% 90.58% 79.57% 90.58% 81.62% 85.07% 90.58% 90.58% 46.90%
NP =2 90.57% 90.58% 90.63% 90.92% 90.87% 90.58% 90.87% 90.58% 90.58% 71.85% 90.87% 90.58% 77.80% 73.60% 85.07% 90.70%
BS = 31
NP =3 85.10% 90.58% 90.66% 90.58% 84.84% 90.58% 90.58% 90.58% 90.58% 90.58% 85.04% 90.58% 90.58% 90.63% 73.21% 85.09%
NP =4 79.77% 90.57% 90.57% 79.55% 90.57% 90.57% 90.57% 90.57% 81.79% 90.57% 90.63% 90.86% 90.56% 90.57% 85.33% 90.57%

In addition, a complete set of results (i.e., 256 runs) obtained from the parameter studies is

summarized in Table 4-12, where the highest and lowest accuracies are highlighted. In these

experiments, the run with the highest accuracy (i.e., 99.86%) is achieved, when the number of

pyramid levels is set at the highest, the maximum bidirectional error and search block size are set

at the lowest. This is explicable because this set of parameters imposes the most stringent

requirements for the algorithm to limit the tracking error. On the other hand, the run with the lowest

accuracy (i.e., 46.9%) is observed, where the number of pyramid levels, maximum bidirectional

error, search block size, and the maximum number of iterations are set to 1, 20, 31, and 40,

respectively. This is close to the worst-case scenario among all the experiments, which is also in

line with our expectations.

In short, a reasonably high value (i.e., greater than 2) for the number of pyramid levels, and a

sufficiently low value (i.e., less than 21) for the search block size must be specified to achieve the

desired accuracy (i.e., over 90%). Meanwhile, the maximum number of iterations does not have

162
noticeable effects on the accuracy, and therefore, it can be set at a reasonably low value (e.g., 10)

to reduce computational cost. Overall, the parameter studies confirm the effectiveness of the

proposed method, and provide recommendations to achieve high accuracy and speed in both the

short video and long video scenarios. It should be noted that, in the situation of long-term

monitoring which can last for months or years, the detection of bolts by the YOLOv3-tiny can be

set to apply at an appropriate interval (e.g., every 5 min) to reset the tracking. This will further

alleviate the potential accumulated errors due to noise during long-time tracking.

163
(a1) (a2) (a3)

(b1) (b2) (b3)

(c1) (c2) (c3)

(d1) (d2) (d3)


Figure 4.56 Sensitivity study of rotation estimation to the selected parameters, with a control parameter

of (a1-a3) number of pyramid levels, (b1-b3) maximum bidirectional error, (c1-c3) search block size, and (d1-

d3) maximum number of iterations

164
4.4.6 Experiments and results – method 2

This section presents the results by method 2, using the proposed 3D vision-based pipeline to

quantify the bolt loosening length from a common experimental setup (shown in Section 4.4.4). In

this experiment, a friction damper developed in the structural laboratory at The University of

British Columbia is selected to examine the proposed method. First, photos of the friction device

were collected using a smartphone camera. In the real-world implementation, a consumer-grade

drone can be deployed for large-scale civil structures to capture images more efficiently. Next, the

3D reconstruction methods were applied to obtain a point cloud of the friction device. This point

cloud generated typically contains irrelevant surrounding objects. Therefore, the region of interest

(where bolts are located) should be identified first. This can be done manually, or by using 3D

object detection methods (pretrained to localize the bolted component) as presented earlier in this

dissertation. This may also be done using point cloud classification algorithms (e.g., PointNet) to

classify the region of interest into a specific class. This requires a robust pretrained deep learning

algorithm and a serious amount of time in labelling. Besides, this typically does not provide highly

accurate localization, because some points of the irrelevant objects, or points in the background,

may be wrongly classified into the desired class, which can lead to the follow-up bolt loosening

quantification processes failing.

Given the point cloud processing demand and the available computational power, in this

section, the region of interest is manually specified to be the central area of the structural

component which includes the 6 central bolts only (Figure 4.57), while other bolts are neglected

165
in the quantification procedures. It should be noted that the concept of the loosening quantification

of the selected bolts can be similarly applied to other bolts.

Figure 4.57 Illustration of the region of interest

Next, as bolts are typically used to connect plates, a plane segmentation algorithm was applied

to identify the major plane of interest where bolts are placed. Then, the proposed YOLO-TDCH

method was applied. At each time of plane segmentation, the point cloud was rendered in that

plane to form a 2D image, which was processed by the pretrained YOLOv3-tiny to check the

existence of the bolts. If the bolts are not found by the YOLOv3-tiny, the plane will be discarded.

If the bolts are found by the YOLOv3-tiny, the segmented plane will be recorded. In this

166
experiment, the first principal plane identified by the plane segmentation algorithm is shown in

Figure 4.58. When the point cloud was rendered onto the plane, all the bolts were successfully

recognized by the YOLOv3-tiny, as shown in Figure 4.59.

Figure 4.58 Plane segmentation of the 3D point cloud of the friction damper (units are in mm)

167
Figure 4.59 Bolt localizations by the pretrained YOLOv3-tiny

Table 4-13 presents a summary of the bolt loosening quantification results where the index

for each bolt is shown in Figure 4.53. Results show that the quantification error for all the bolts is

less than 0.5 mm compared to the ground truth values measured by the digital Caliber presented

in Section 4.3. This shows the proposed full view-based bolt loosening quantification method can

achieve high accuracy. Besides, it does not require human intervention during image and point

cloud data processing. Although the manual selection of the region of interest is used in this

experiment, such a process can be automated using the methods aforementioned.

168
Table 4-13 Bolt loosening quantification results

Estimated value Ground truth (measured) Error


[mm] [mm] [mm]
Bolt 1 15.9 15.5 -0.4
Bolt 2 43.0 43.5 0.5
Bolt 3 0.3 0.0 -0.3
Bolt 4 0.5 0.0 -0.5
Bolt 5 -0.4 0.0 0.4
Bolt 6 30.2 30 -0.2

4.4.7 Conclusions

Structural bolts are commonly used to connect structural components. The forces in the

structural bolts are highly dependent on bolt rotation. This research proposed two different

methods with two different application scenarios.

The first method is a combined vision-based method, named RTDT-Bolt, to interactively

detect and track the rotation of bolts in the front view. The efficient YOLOv3-tiny detector has

been established and trained to precisely localize the bolts in real time. Then, the YOLOv3-tiny is

combined with the KLT tracking algorithm to improve the tracking performance. The effectiveness

of the proposed method, in dealing with tracking loss problems due to light changes, has been

demonstrated over the traditional optical-flow-based tracking algorithms. Further, extensive

parameter studies have been conducted to examine the capability and potential limitations of the

proposed method. The results indicate the proposed RTDT-Bolt method can reliably quantify the

169
bolt rotation with over 90% accuracy using the recommended range for the parameters. It is also

found that the number of pyramid levels and the search block size have great impacts on the

rotation estimation, while the maximum number of iterations and the maximum bidirectional error

do not have substantial effects on the results. The proposed RTDT method has multiple advantages

including a) achieving real-time performance in both detection and tracking; b) providing solutions

to measure the rotation of bolts up to any range; c) enhancing the robustness of traditional KLT

tracking algorithms against illumination changes and background noise.

The second method is built upon 3D vision-based evaluation pipeline. This method aims to

address the limitations of existing side view-based quantification methods developed in 2D

computer vision. The proposed method consists of vision-based 3D scene reconstruction and bolt

loosening length quantification using a newly proposed YOLO-TDCH method. At the time of

writing, to the best of the author’s knowledge, the proposed methodology is the first-of-its-kind

using 3D vision and advanced point cloud processing algorithms, for bolt loosening quantification

in the structural engineering field. Experimental results indicate that the average quantification

accuracy is as low as ~1mm. Besides, the proposed methodology is fully automated which does

not rely on a specific camera viewing angle to provide quantification results, as opposed to the

existing 2D vision-based methods described in Section 4.4.1. Moreover, it does not require human

intervention during the image and point cloud data processing to determine the bolt loosening

length.

Overall, the two proposed methods can provide economical and accurate solutions to bolt

loosening quantification in their respective application scenario, including a long-term monitoring

solution for bolts using preinstalled camera facing towards the front view of bolts, and a post-

170
disaster inspection solution where photos or videos of the bolted devices can be taken at multiple

locations and angles by human inspectors, or more efficiently by drones or ground robots. Both

two methods have been demonstrated to address the limitations of the existing methods in their

respective application scenario.

171
Chapter 5: Combined vision-based SHM and loss estimation framework

5.1 Overview

Over the last decade, vision-based methods have received great success in structural visual

damage detection. Vision-based methods provide structural damages as output, such as concrete

cracks, concrete spalling and steel corrosion, of specific structural components. While damage

evaluation provides important and necessary information to engineers and researchers to assess

the residual performance and safety of the structural components in many situations, local damage

information of specific structural components may not be useful enough compared to global

damage information at the system level for owners or decision makers. Besides, damage

information can be difficult to understand for owners or decision-makers who are likely to lack

engineering knowledge, but instead pay more attention to the repair or replacement cost of the

structures if they are damaged or completely collapse. In such situations, it is necessary to convert

such damage information to other metrics (e.g., repair cost) that are easier to be interpreted by

owners, stakeholders, and decision-makers.

Within this context, this chapter presents a combined vision-based structural damage detection

and loss quantification framework. The loss quantification procedures are adopted from part of the

existing PBEE loss evaluation procedures. To implement the framework, the local damage

information will be integrated into global damage information and combined with the PBEE

methodology to estimate the loss information, which can be more easily conveyed to stakeholders

to aid their decision-making. This chapter first provides a detailed description of the combined

vision-based SHM and loss estimation framework proposed in this dissertation, followed by a case

172
study showing its implementation on a reinforced concrete building for illustration purposes. Part

of this chapter is adopted from the author’s publication, Pan & Yang (2020).

5.2 Methodology

5.2.1 Overview

The proposed framework is described in this section. As depicted in Figure 5.1, first, images

or video frames should be acquired from a post-disaster site inspection, which can be performed

manually, or captured by preinstalled cameras, or more flexibly and efficiently by UAVs and

UGVs. In an ideal situation, images of the structural systems and components should be taken

from multiple views to facilitate the comprehensive evaluation. Second, the collected image data

are processed by vision-based methods, such as those presented in Chapter 4, to determine the

damage status at both the system level and component level. If a structural system is identified as

collapse, then the total loss is determined as replacement cost. If a structural system is estimated

as non-collapse, component-level damage evaluation methods are applied to determine the damage

states of all the structural components and non-structural components of interest. Once the

component damage states are identified, the corresponding repair costs for the components were

identified using the fragility database and the corresponding PBEE loss quantification procedures

(ATC-58, 2007). Finally, the total repair cost of the building is determined by adding the total

repair quantities from all structural and non-structural components taking into account their

suitable unit cost distribution. The process will be repeated using Monte Carlo simulation to

consider uncertainties arising from different involved stages within the PBEE framework. In the

173
end, a cumulative loss distribution curve can be obtained as an additional financial metric to aid

decision-making.

Figure 5.1 A picture of the workflow

174
5.2.2 Vision-based damage evaluation

The proposed framework involves system-level and component-level damage evaluation

methods. Depending on the structural system types, component types and damage types, different

vision-based damage evaluation methods should be considered. These include but are not limited

to the vision-based damage detection methods presented in this dissertation and other existing

methods reviewed in Section 2. It should be noted that new vision-based methods are being

developed and remain an active area of research in the field, which can be later integrated into the

proposed framework in the future. This will allow the framework to be implemented on various

types of civil structures at both the component level and the system level.

5.2.3 PBEE methodology

The loss estimation methodology used in this dissertation is adopted from the PBEE

methodology. The theory of the PBEE methodology was mainly developed by researchers at the

Pacific Engineering Research Center (PEER) between 1997 and 2010. As early as 2004, the

concept of the methodology was first presented by Moehle and Deierlein (2004) which involves

the quantification of multiple components including earthquake hazards information (intensity

measure), structural response (engineering demand parameter), fragility data (damage measure),

and loss data (decision variable). Further, the framework was further developed by Yang et al.

(2009) and implemented for the seismic evaluation of a building. Since then, the implementation

of the framework has been applied in seismic assessments of structures under various scenarios.

The application scenarios of the framework include a) seismic performance assessment of

structures; b) performance-based design of new structures; c) seismic retrofit design of existing

175
structures. It has been shown that the framework can be effectively adopted into the current design,

analysis, and construction practices.

The mathematical representation of the methodology can be described as follows,

𝜆(𝑑𝑣 < 𝐷𝑉) = ∭ 𝐺〈𝐷𝑉|𝐷𝑀〉𝑑𝐺〈𝐷𝑀|𝐸𝐷𝑃〉𝑑𝐺〈𝐸𝐷𝑃|𝐼𝑀〉𝑑𝜆(𝐼𝑀) (5.1)

The equation consists of four components. The first part, 𝜆(𝐼𝑀), is the probabilistic seismic

hazard analysis (PSHA) which is associated with the intensity measure such as earthquake

magnitude, source-site distance. The second part, 𝐺〈𝐸𝐷𝑃|𝐼𝑀〉, represents the structural analysis

which determines structural responses (e.g., floor displacements, accelerations) using the

representative ground motions with proper scaling factors obtained from PSHA. The third part,

𝐺〈𝐷𝑀|𝐸𝐷𝑃〉, is the estimation of damage extent based on the structural responses. The fourth part,

𝐺〈𝐷𝑉|𝐷𝑀〉, is the estimation of loss information for decision making based on the damage extent.

The output of the equation, 𝜆(𝑑𝑣 < 𝐷𝑉), is the decision variable which is selected by owners or

decision makers depending on different application scenario. For example, if the probable repair

cost under a specific earthquake hazard level is of interest to the owners, 𝜆(𝑑𝑣 < 𝐷𝑉) can be

represented as the annual rate of not exceeding a repair cost threshold.

In this dissertation, the major potion is dedicated to the development and application of vision-

based damage detection methods. The output of the vision-based methodology as presented in

Chapter 3 provide damage information. Therefore, the loss information can be quantified based on

the damage information, using a simplified PBEE equation as shown below,

𝑃(𝑑𝑣 < 𝐷𝑉) = ∫ 𝐺〈𝐷𝑉|𝐷𝑀〉𝑑𝐺〈𝐷𝑀〉 (5.2)

176
As shown in Equation (5.2), the seismic hazard information and the structural analysis need

not be considered. The damage output by vision-based damage detection methods can be directly

used to evaluate the loss distribution curve, which can be represented as the probability of not

exceeding a certain repair cost or repair time. When implementing Equation (5.2), the loss

information of each component is first evaluated using the fragility database developed as a

companion product of the PBEE framework. The loss information for the local components will

then be integrated to obtain the total loss distribution of the entire structure at the system level.

The process can be done efficiently using Monte Carlo simulations, as described in Yang et al.

(2009).

5.2.4 Description of the PBEE fragility database

This section presents a brief description of the fragility database. For better illustration, a

sample component is taken from the fragility database (ATC 58, 2007) as shown in Table 5-1 and

Table 5-2. In this example, the cost information related to each damage state (DS) is presented.

For each damage state, the minimum cost refers to the unit cost to conduct a repair action,

considering all possible economies of scale (which corresponds to maximum quantity) and

operation efficiencies. On the contrary, the maximum cost is the unit cost with no benefits from

scale and operation efficiencies, which corresponds to the minimum quantity. The uncertainties of

the unit cost are typically assumed to follow a normal or lognormal distribution. Figure 5.2

provides a schematic diagram of the unit cost function.

177
Table 5-1 Fragility data for a sample concrete column (component ID: B1041.031a).

Mean Cost Mean Cost


Damage state (USD$) at (USD$) at
Dispersion
index minimum maximum
quantity quantity
DS 0 0 0 0
DS 1 25704 20910 0.39
DS 2 38978 25986 0.32
DS 3 47978 31986 0.3

Table 5-2 Description of the damage states for the sample component

DS index Description
0 No damage
1 Light damage: visible narrow cracks and/or very limited
spalling of concrete
2 Moderate damage: cracks, large area of spalling concrete
cover without exposure of steel bars
3 Severe Damage: crushing of core concrete, and or
exposed reinforcement buckling or fracture

Figure 5.2 Typical consequence function for repair costs

178
5.3 A case study on a post-disaster damage inspection survey

5.3.1 Description of the case study.

For the purpose of illustration, a prototype RC building (Sim et al., 2015), as shown in Figure

5.3, is selected to evaluate its damage and repair cost after an earthquake. The prototype building

is a mid-rise RC building.

5.3.2 Evaluation results

As the case study is conducted on a concrete building, the CNN-based methods described in

Section 4.2 are applied to determine the damage states of the RC structural components from the

available images collected. The repair cost is then estimated based on the damage states. In

summary, the dual CNNs algorithms determine that 17 of such columns are in DS1, 26 in DS2,

and 14 in DS3. Using the unit cost information provided in Table 5-1, the total repair cost of these

RC components is calculated. It should be noted that as the survey contains primarily the RC

columns, only the RC columns are considered in this case study. However, a similar concept can

be applied to different components if any.

179
Figure 5.3 Case study of an RC building with sample results (a) system-level identification: non-collapse

and (b) component-level damage evaluation: severe damage with detection of steel exposure (left), and

moderate damage (right)

The process is repeated 10000 times with Monte Carlo procedures to simulate the dispersion

of the repair costs. Finally, the results are presented in a cumulative distribution function as shown

in Figure 5.4. The cost simulation results can provide critical risk data for decision-making and

resource allocation during post-disaster reconstruction. For example, the decision maker can use

the 50% probability of non-exceedance to identify the median repair cost for the building. In the

example presented in Figure 5.4, the median repair cost is $2.69 million USD for the prototype

building.

180
Figure 5.4 Repair cost distribution corresponding to the hypothetical case

5.4 Discussion of the case study

This chapter presents the theory and implementation of the proposed framework which

combines vision-based structural damage detection and PBEE-based loss estimation to facilitate

the loss evaluation. The aim of this chapter is to provide an additional financial metric which can

be more easily conveyed to decision-makers and stakeholders who may lack engineering

knowledge. Overall, the results of the case study indicate the rapid loss estimation of buildings can

be achieved using the proposed framework.

However, the case study presented is certainly not comprehensive. Several limitations are

summarized below:

• Due to COVID-19 impacts and limited access to a post-disaster site, the author of the

dissertation did not get a chance to conduct an in-person post-disaster site inspection

survey. Instead, the case study presented in this chapter is based on an earlier post-disaster

site inspection conducted by Sim et al. (2015). The survey was conducted for an RC

181
building which primarily contains damaged RC columns based on the images collected.

Therefore, in this study, the total loss information of the building is determined based on

the damage information of the RC columns captured by the images collected. Therefore,

the component type variability is rather limited in the case study. Besides, the repair cost

estimation through the proposed framework cannot be compared with the real financial

loss which is unavailable at the time of writing. Nevertheless, the author deems the concept

of the proposed framework can be generalized to quantify the total loss of buildings with

the presence of more component types, when more data are available.

• The post-disaster site inspection survey does not provide detailed information about the

non-structural components. Therefore, the case study presented only considers structural

components, while non-structural components are completely neglected. According to the

PBEE loss quantification procedures, damage states of non-structural components are

generally related to resulting structural floor response quantities (e.g., displacements,

velocities, accelerations) from a disaster event. Therefore, to provide a comprehensive loss

estimation, additional structural vibration measurement sensors are required to be installed

on each floor of the structural systems. This can be achieved by traditional contact-type

sensors, or more economically and conveniently by vision-based methods which will be

discussed in Chapter 6 as part of the ongoing and future work. The measured floor response

can be used to estimate the damage state and the associated repair cost of the non-structural

components. Further, the total loss of the entire structure can be calculated as the sum of

the loss contributed by the structural components and non-structural components.

182
• In the case study presented, due to the limited available data, the unit cost functions of the

RC component are directly taken from the PBEE fragility database without calibration. In

a real application of site damage inspection, cost calibration should be performed

considering different factors such as repair strategies, repair sequence, scheduling of

contractors, and cost inflation particularly right after a major disaster that may cause a wide

range of disruptions in the local supply chain, etc.

• To limit the scope, implementation of the loss estimation framework is only limited to the

repair cost for the prototype building. The framework should be further extended to

estimate other metrics such as repair time.

While the case study presented in this chapter is relatively simple, the concept of the proposed

vision-based damage detection and loss estimation framework can be generalized to different types

of structural systems which are built with different structural components. It should be noted that,

although the proposed framework is general, when dealing with different types of structural types

(e.g., steel, masonry, timber, etc.) and structural components, different damage evaluation

methods, such as the methods presented in Chapter 4, existing damage evaluation methods

proposed by other researchers as reviewed in Chapter 2, should be considered. In addition, new

2D and 3D vision-based structural damage evaluation methods should be further developed to

include more damage types to facilitate multi-damage type evaluation of more complex structural

systems. Furthermore, additional contact-type or vision sensors should be installed to measure

critical structural response quantities such as floor displacements and accelerations to facilitate a

more comprehensive damage and loss evaluation, particularly for non-structural components.

183
Chapter 6: Conclusions

This chapter first presents a summary of the dissertation, followed by the main research

findings and major contributions of the dissertations to the field. Further, it discusses the

limitations of the completed research, the active ongoing research, as well as recommendations on

future research directions.

6.1 Summaries

This dissertation first provides an extensive literature review on SHM methods including

vibration-based methods, NDTE-based methods, TLS-based methods and computer vision-based

methods. Limitations of the methods falling in each of these categories are discussed, such as high

cost, complicated instrumentation process, high sensitivity to environmental effects, etc.

Subsequently, more attention has been given to the review and discussion of the contributions and

limitations of existing 2D vision-based methods. Despite the promising results achieved by the 2D

vision methods in recent years, it is noted that there exist several main issues that are hard or even

impossible to be addressed by many of these 2D vision methods. These include but are not limited

to their unstable robustness against background noise, insufficient algorithm speed for real-world

implementation, high sensitivity of evaluation outcomes to camera setup, incapability to provide

out-of-plane or 3D evaluation outcomes, etc. Therefore, this dissertation proposes a 3D vision-

based structural damage assessment and loss estimation framework, which aims to address the

limitations of existing 2D vision-based methods, and provide more rapid and comprehensive

evaluation solutions, thus offering a more effective complementary assessment tool in addition to

other types of SHM methods.

184
6.2 Main contributions

The dissertation is strongly focused on the development and application of 3D vision-based

methods, which have been validated on three prevalent types of structural components including

RC structures, steel structures, and structural bolted connections that are widely used in the field.

At the time of writing, there exist almost no attempts at 3D vision methods in structural damage

evaluation. Their developments and applications are currently in the infancy stage. Below

summarizes the main contributions of the research:

1) For the first time, this dissertation proposes a structural damage and performance

assessment framework which combines the novel 3D vision-based structural damage

detection methods and PBEE loss quantification procedures. The outcomes of the

proposed pipeline provide more comprehensive structural damage assessment, and

additional critical information such as repair cost or repair time for owners or decision

makers to make more informed risk management decisions.

2) Within the framework, the research first proposed enhanced 2D vision-based methods

from several aspects as follows. a) It has improved the accuracy of the structural

component damage classification for RC structures, using the dual CNN scheme (by

combining the classification and object detection CNNs). Such a concept can be

potentially extended to improve the accuracy of damage evaluation of other types of

structures. b) It has enhanced the robustness of vision-based methods against external

environmental effects such as background noise and illumination changes, using the

concept of combining vision-based object detection and tracking algorithms. c) Compared

185
to many existing methods for damage classification and localization, it has optimized the

speed of the local damage evaluation algorithms towards real-time performance, which

paves a way for rapid real-world applications. d) It has achieved very high accuracy (>

95%), and eliminated the hard limitations of many existing vision-based studies in bolt

loosening quantification (i.e., the 60-degree constraint to estimate hexagon bolt loosening

angle).

3) Within the framework, the research proposed novel 3D vision-based methods, leveraging

the recent advancements of deep learning and computer vision. This includes the adoption

of robust and accurate vision-based 3D reconstruction procedures, the development and

validation of multi-view 3D object detection method, as well as the development and

validation of multiple point cloud processing algorithms presented in the dissertation.

These 3D vision methods address two main limitations of the 2D vision-based methods

which are widely developed and applied by the existing published literature. a) The

assessment results by 2D vision-based methods are sensitive to the camera locations and

poses; b) The 2D vision-based methods can only assess the in-plane damage features,

while incapable of analyzing out-of-plane damage patterns. The research expands the

scope of existing 2D vision-based methods from damage recognition, localization, and 2D

quantification, towards more detailed quantification in 3D space. The effectiveness of the

3D vision-based methods has been validated on three prevalent types of structures

including RC structures, steel structures, and structural bolted components, respectively.

4) The proposed 3D vision-based methods are much more economical compared to TLS-

based damage quantification methods in field applications. In laboratory conditions,

186
compared to contact-type sensors which typically require a relatively complicated setup,

the proposed methods provide a more economical and convenient instrumentation and

measurement solution in the laboratory testing scenario.

5) The concept of the proposed framework, and the 3D vision-based damage evaluation

methods can be generalized to other types of structures such as masonry and timber

structures.

6) As will be presented in detail in Section 6.3, one additional contribution of this dissertation

is to bridge the gap between the vibration-based SHM and vision-based SHM. As an

ongoing and future research, Section 6.3 has discussed an economical and efficient vision-

based method to measure the vibration response of structures. The outcomes can be

analyzed using vibration-based theories to evaluate the health condition of the structures.

Besides, the outcomes will allow the incorporation of non-structural components into the

proposed framework to facilitate a more comprehensive assessment in the near future.

6.3 Ongoing and future work

The research presented in this dissertation has certain limitations aforementioned. Future

research will be focused on addressing these limitations. In addition, with a brief review of recent

literatures between 2020-2022 in the field of vision-based structural damage detection, several

future trends can be identified:

• A transition from damage evaluation in a planar 2D space to a more comprehensive 3D

space,

187
• A transition from manual data collection to autonomous data collection through the use of

advanced robotic systems (e.g., drones, ground robots, etc.), with the development of

advanced control algorithms/systems, sensors data fusion and analysis,

• A transition from local structural components to full-scale civil structures,

• A transition from laboratory applications to field applications.

Within this context, the below subsections summarize the ongoing and future works to be

continued.

6.3.1 Development of new 3D vision methods for structural damage assessment

In this dissertation, the proposed 3D vision methods have been validated on structural

components of reinforced concrete columns, single-panel steel corrugated wall structures, and

structural bolted connections, respectively. It is envisioned that new 3D vision algorithms will be

developed and examined on other structural component types such as RC walls, RC slabs, steel

beams or columns of different shapes, and more complicated structural connections. Besides, new

3D vision-based methods should be expanded to other structural types such as masonry and timber

structural components.

At the time of writing, extensive laboratory experimental tests are being conducted at the

structural laboratory at UBC. These include monotonic and cyclic pushover tests on full-scale

three-panel and six-panel steel corrugated plate walls (with a more realistic real-world construction

setup that consists of the roof panel and footing connections), reinforced masonry walls, and

innovative timber connections. Currently, new 3D vision methods are being developed and

188
investigated to evaluate the damages of these structures. The damage data collected during the

experimental tests will be utilized to train and validate the new damage detection methods.

6.3.2 Development of new vision-based structural vibration measurements

In this dissertation, the vision methods are developed to evaluate the damages of structural

components, while non-structural components (e.g., furniture, equipment) are neglected. Although

the evaluation of the damage status of non-structural components is difficult and imposes

uncertainties, previous studies by FEMA P-58 suggested their damage states can be related to floor

displacements, velocities, or accelerations in a probabilistic manner. This requires the

measurements of these floor response quantities. To facilitate a more comprehensive PBEE-based

evaluation, at the time of writing, one of the active ongoing works is to develop accurate and

economical vision-based real-time structural vibration measurement approaches.

Structural vibration measurement is important in SHM. Nowadays, different types of sensors

are widely used in structural engineering to provide structural response parameters such as

displacements, accelerations and strain (Mukhopadhyay, 2011; Kralovec & Schagerl, 2020).

Traditional contact-type sensors such as linear potentiometers are widely used in laboratories to

measure structural displacements. However, the accuracy of the linear potentiometer is not very

high, which is not suitable for dynamic shake table testing. Besides, the linear potentiometer is

only able to measure displacement along a single axis and is easy to break when subjected to

accidental bending moment during the test. The high degree of robustness and accuracy of the

linear variable differential transducer (LVDT) results in better performance than other types of

displacement transducers. However, its performance can be affected by stray capacitance effects

189
and electromagnetic interference (Masi et al., 2011; Mandal et al., 2018). In addition, these contact-

type sensors need to be placed on tested structures, and consequently the properties and responses

of the structures may be affected, especially for small-scale structures. On the other hand,

noncontact-type sensors such as laser displacement transducers (Stanbridge & Ewins, 1999) are

easier to install and can provide relatively accurate displacements. However, the measurement

range of these transducers is relatively small. Radar interferometry is another type of noncontact

sensing technique, but the displacement measured can be easily affected by a systematic and

deterministic error which cannot be eliminated (Pieraccini, 2013). Global Positioning System

(GPS) provides a remote non-intrusive method. It provides dynamic measurements of over 20 Hz

at a reasonable accuracy (Im, Hurlebaus, & Kang, 2013; Häberling et al., 2015). However, its

accuracy is influenced by electromagnetic noise and many environmental conditions (Im,

Hurlebaus, & Kang, 2013). In addition, non-contact type wireless sensors are usually expensive

and require specialized workers to install and operate, which hampers their applications in many

structural laboratories (Amezquita-Sanchez, Valtierra-Rodriguez, & Adeli, 2018).

In recent years, innovative and cost-effective computer vision approaches to measure

structural vibrations have received more attention. A vision-based sensor setup is typically built

by a camera with zoom capability, and image processing software and hardware platforms. Earlier,

the effectiveness of the different vision-based methods has been demonstrated in vibration

measurement of various structures, such as the digital image correlation (DIC) method (e.g.,

Dutton, Take, & Hoult, 2014; Ghorbani, Matta, & Sutton, 2015), template matching technique

(e.g., Fukuda, Feng, & Shinozuka, 2010; Feng, & Feng, 2016), phase-based motion magnification

methods (e.g., Chen et al., 2015; Yang et al., 2017). Several limitations can be observed. Although

190
the DIC algorithm can provide full-field measurement with high accuracy, it is sensitive to image

noise. On the other hand, implementation of DIC usually requires large physical targets (e.g.,

spray-painting of the speckle pattern, chessboard) covering a large region of the specimen to be

attached. The template matching algorithm is relatively computationally expensive and is

unreliable against light variation. Despite the phase-based motion magnification methods can

achieve high accuracy in extracting modal frequencies and damping ratios, the algorithms are only

suitable for linear structures. More recently, several researchers have examined optical flow-based

algorithms for structural motion tracking. There exist numerous computational models for

estimating the optical flow (Beauchemin & Barron, 1995). One of the intensity-based differential

methods, named Kanade–Lucas–Tomasi (KLT) (Tomasi & Kanade, 1991), is commonly used in

structural motion tracking for its high precision under a stable and well-controlled environment

(Yoon et al., 2016; Zhao et al., 2019; Kuddus et al., 2019). Even though these studies have well

demonstrated the effectiveness of the KLT method for structural vibration measurements, several

limitations have been observed: a) Most studies relied on the manual specification of the region of

interest (ROI) at the initial video frame where tracking is initiated on this ROI throughout the

video. The performance of the tracking algorithm employed in these studies is relatively sensitive

to external environmental conditions, such as illumination changes and shades of surrounding

objects, which can happen frequently in real-world situations. Consequently, the experiments can

fail if the tracking gets lost in the middle of the experiments due to the change in these external

conditions. b) Some studies used fiducial markers (i.e., an object used as a point of reference in

imaging tasks) and the associated detection and tracking algorithm, but these algorithms are not

robust against background noise and relatively poor lighting conditions; c) Some other studies

191
implemented target-free tracking algorithm, which tracked objects based on their local distinct

texture features. Although these methods can greatly enhance the convenience of the vibration

tests, they require the existence of such distinct features which may not always be present. Besides,

these methods are unreliable if the local texture features get changed, such as the development of

cracks on the texture region when the structure undergoes relatively large deformations. Further,

these target-free tracking algorithms are also prone to fail when lighting conditions change, and

background noise accumulates during long-time tracking.

Recently, convolutional neural networks (CNNs) have been widely applied in many structural

engineering problems such as image classification, object detection and segmentation. Compared

to the classical computer vision algorithms, one advantage of these data-driven methods is their

robustness against background noise and changes in external environmental conditions. To address

the aforementioned limitations, at the time of writing, one of the active ongoing works is to

examine a combined CNN-based object detection and tracking pipeline for real-time structural

vibration measurements. This pipeline consists of a real-time object detector to automatically

localize the region of interest to initiate the measurement, and a real-time tracker to continuously

track and record the motion of the structures. Preliminary results indicate the proposed method can

achieve very high accuracy at a low cost in laboratory conditions (e.g., shake table tests, static

pushover tests). Furthermore, it is expected the proposed method will be examined in the field in

the near future. Once the method is fully validated, it is envisioned that:

• The proposed method will yield higher robustness against environmental effects compared

to the existing KLT-only methods,

192
• The proposed method will have a lower cost and easier installation process than most

contact-type sensors and laser-based measuring approaches.

• More importantly, the proposed method can bridge the gap between vision-based and

vibration-based methods in SHM. It can be easily integrated into the proposed 3D vision-

based structural damage evaluation and loss quantification framework proposed in this

dissertation, to provide even more comprehensive evaluation outcomes, particularly for

non-structural components. For example, preinstalled cameras can be used to capture

structural vibrations and local structural component damages concurrently, without a need

to install additional contact-type vibration measurement sensors.

6.3.3 Integration of advanced robotic technologies into the proposed framework

As indicated in Section 3.3, this dissertation is dedicated to developing and validating 3D

vision-based structural damage detection methods, without making additional attempts in

enhancing the data collection autonomy and efficiency. In recent years, some researchers have

made limited attempts at robot-based structural damage inspection. Despite the achievements

made, most of these studies are limited to applications of a single UAV or UGV for inspection of

small-scale structures, or a small portion of full-scale structures. A single UAV or UGV is usually

incapable of examining full-scale civil structures which have large sizes and typically consist of

many structural components placed in different spatial locations. The UAVs or UGVs in most

existing studies are manually controlled while a few are semi-automated, not achieving fully

autonomous navigation and inspection.

193
Within this context, one future research will be focused on the implementation of advanced

multi-robot systems into the proposed framework. The multi-robot systems to be developed will

be based on multiple UAVs or UGVs, or a combination of drone fleets and UGVs. The idea behind

this is to leverage the advantages of all the robots by integrating them, while alleviating the

disadvantages of each robot. The UAVs and UGVs will be equipped with data acquisition,

transmission and processing systems, and a variety of manipulators or sensors. It is expected that

the robots will be designed to collaborate with each other during the inspection. Novel machine

learning, reinforcement learning, and deep learning-based algorithms will be adopted from other

scientific and engineering fields, and further developed to control the robotic systems for

autonomous navigation, efficient data collection, and sensor data processing. Once validated, it

will significantly enhance the inspection capability of the existing robot-based inspection methods.

This is because civil infrastructure such as bridges or multi-storey buildings is typically very large

and complex, which is extremely difficult or impossible to be fully inspected by existing ground

robots or drones with the current navigation and data collection philosophy.

194
Bibliography

Abdeljaber, O., & Avci, O. (2016). Nonparametric structural damage detection algorithm for

ambient vibration response: utilizing artificial neural networks and self-organizing maps. Journal

of Architectural Engineering, 22(2), 04016004.

Abdeljaber, O., Avci, O., Kiranyaz, S., Gabbouj, M., & Inman, D. J. (2017). Real-time

vibration-based structural damage detection using one-dimensional convolutional neural

networks. Journal of Sound and Vibration, 388, 154–170.

Agarwal, S., Furukawa, Y., Snavely, N., Simon, I., Curless, B., Seitz, S. M., & Szeliski, R.

(2011). Building rome in a day. Communications of the ACM, 54(10), 105-112.

Amezquita-Sanchez, J. P., & Adeli, H. (2016). Signal processing techniques for vibration-

based health monitoring of smart structures. Archives of Computational Methods in

Engineering, 23(1), 1-15.

Amezquita-Sanchez, J. P., Valtierra-Rodriguez, M., & Adeli, H. (2018). Wireless smart

sensors for monitoring the health condition of civil infrastructure. Scientia Iranica, 25(6), 2913-

2925.

An, Y., Chatzi, E., Sim, S. H., Laflamme, S., Blachowski, B., & Ou, J. (2019). Recent progress

and future trends on damage identification methods for bridge structures. Structural Control and

Health Monitoring, 26(10), e2416.

ATC-58. (2007). Development of next generation performance-based seismic design

procedures for new and existing buildings. ATC, Red- wood City, CA, USA.

195
Azimi, M., & Pekcan, G. (2020). Structural health monitoring using extremely compressed

data through deep learning. Computer- Aided Civil and Infrastructure Engineering, 35(6), 597–

614.

Bahrebar, M., Kabir, M. Z., Zirakian, T., Hajsadeghi, M., & Lim, J. B. (2016). Structural

performance assessment of trapezoidally-corrugated and centrally-perforated steel plate shear

walls. Journal of Constructional Steel Research, 122, 584-594.

Bao, Y., & Li, H. (2021). Machine learning paradigm for structural health

monitoring. Structural Health Monitoring, 20(4), 1353-1372.

Bayissa, W. L., Haritos, N., & Thelandersson, S. (2008). Vibration-based structural damage

identification using wavelet transform. Mechanical systems and signal processing, 22(5), 1194-

1215.

Beauchemin, S. S., & Barron, J. L. (1995). The computation of optical flow. ACM computing

surveys (CSUR), 27(3), 433-466.

Beckman, G. H., Polyzois, D., & Cha, Y. J. (2019). Deep learning-based automatic volumetric

damage quantification using depth camera. Automation in Construction, 99, 114-124.

Betti, M., Facchini, L., & Biagini, P. (2015). Damage detection on a three-storey steel frame

using artificial neural networks and genetic algorithms. Meccanica, 50(3), 875-886.

Bochkovskiy, A., Wang, C. Y. & Liao, H. Y. M. (2020). Yolov4: Optimal speed and accuracy

of object detection. arXiv preprint arXiv:2004.10934.

Brincker, R., & Ventura, C. (2015). Introduction to operational modal analysis. John Wiley &

Sons.

196
Carrilho, A. C., Galo, M., & Santos, R. C. (2018). STATISTICAL OUTLIER DETECTION

METHOD FOR AIRBORNE LIDAR DATA. International Archives of the Photogrammetry,

Remote Sensing & Spatial Information Sciences.

Cha, Y. J., Chen, J. G., & Büyüköztürk, O. (2017). Output-only computer vision based

damage detection using phase-based optical flow and unscented Kalman filters. Engineering

Structures, 132, 300–313.

Cha, Y. J., Choi, W., & Büyüköztürk, O. (2017). Deep learning- based crack damage detection

using convolutional neural networks. Computer-Aided Civil and Infrastructure Engineering,

32(5), 361– 378.

Cha, Y. J., Choi, W., Suh, G., Mahmoudkhani, S., & Büyüköztürk, O. (2018). Autonomous

structural visual inspection using region- based deep learning for detecting multiple damage types.

Computer-Aided Civil and Infrastructure Engineering, 33(9), 731–747.

Cha, Y. J., You, K., & Choi, W. (2016). Vision-based detection of loosened bolts using the

Hough transform and support vector machines. Automation in Construction, 71, 181–188.

Chaichulee, S., Villarroel, M., Jorge, J., Arteta, C., Green, G., McCormick, K., Zisserman, A.,

& Tarassenko, L. (2017). Multi-task convolutional neural network for patient detection and skin

segmentation in continuous non-contact vital sign monitoring. 2017 12th IEEE International

Conference on Automatic Face & Gesture Recognition (FG 2017), Washington, DC (pp. 266–

272).

Chang, K. C., & Kim, C. W. (2016). Modal-parameter identification and vibration-based

damage detection of a damaged steel truss bridge. Engineering Structures, 122, 156-173.

197
Chen, J. G., Wadhwa, N., Cha, Y. J., Durand, F., Freeman, W. T., & Buyukozturk, O. (2015).

Modal identification of simple structures with high-speed video using motion magnification.

Journal of Sound and Vibration, 345, 58–71.

Cheng, C. S., Behzadan, A. H., & Noshadravan, A. (2021). Deep learning for post-hurricane

aerial damage assessment of buildings. Computer-Aided Civil and Infrastructure Engineering,

36(6), 695– 710.

Chun, P., Izumi, S., and Yamane, T. (2021), Automatic detection method of cracks from

concrete surface imagery using two-step Light Gradient Boosting Machine, Computer-Aided Civil

and Infrastructure Engineering, 36:1, 61-72.

Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In

International Conference on Computer Vision & Pattern Recognition (CVPR’05), Vol. 1, 886–

893, IEEE Computer Society.

Deng, J., Lu, Y., and Lee, V. C. (2020). “Concrete crack detection with handwriting script

interferences using faster region‐based convolutional neural network.” Computer-Aided Civil

and Infrastructure Engineering, 35(4), 373–388.

Deng, K., Pan, P., Li, W., & Xue, Y. (2015). Development of a buckling restrained shear panel

damper. Journal of Constructional Steel Research, 106, 311-321.

Dou, C., Pi, Y. L., & Gao, W. (2018). Shear resistance and post-buckling behavior of

corrugated panels in steel plate shear walls. Thin-Walled Structures, 131, 816-826.

Dutton, M., Take, W. A., & Hoult, N. A. (2014). Curvature monitoring of beams using digital

image correlation. Journal of Bridge Engineering, 19(3), 05013001.

198
Dwivedi, S. K., Vishwakarma, M., & Soni, A. (2018). Advances and researches on non

destructive testing: A review. Materials Today: Proceedings, 5(2), 3690-3698.

Edelsbrunner, H., Kirkpatrick, D., & Seidel, R. (1983). On the shape of a set of points in the

plane. IEEE Transactions on information theory, 29(4), 551-559.

Eltouny, K.A. and Liang, X. (2021), Bayesian-Optimized Unsupervised Learning Approach

for Structural Damage Detection, Computer-Aided Civil and Infrastructure Engineering, 36:10,

1249-1269.

Etebarian, H., Yang, T. Y., & Tung, D. P. (2019). Seismic design and performance evaluation

of dual-fused H-frame system. Journal of Structural Engineering, 145(12), 04019158.

Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisser- man, A. (2010). The

PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2),

303–338.

Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal

visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.

Farzampour, A., Mansouri, I., & Hu, J. W. (2018). Seismic behavior investigation of the

corrugated steel shear walls considering variations of corrugation geometrical

characteristics. International Journal of Steel Structures, 18(4), 1297-1305.

Feng, D., & Feng, M. Q. (2016). Vision‐based multipoint displacement measurement for

structural health monitoring. Structural Control and Health Monitoring, 23(5), 876-890.

Fischler, M. A., & Bolles, R. C. (1981). Random sample consensus: a paradigm for model

fitting with applications to image analysis and automated cartography. Communications of the

ACM, 24(6), 381-395.


199
Fukuda, Y., Feng, M. Q., & Shinozuka, M. (2010). Cost‐effective vision‐based system for

monitoring dynamic response of civil engineering structures. Structural Control and Health

Monitoring, 17(8), 918-936.

Future Cities Canada. (2018). Building Our Urban Futures: Inside Canada’s Infrastructure and

Real Estate Needs. https://futurecitiescanada.ca/resources/.

Gao, Y., & Mosalam, K. M. (2018). Deep transfer learning for image- based structural damage

recognition. Computer-Aided Civil and Infrastructure Engineering, 33(9), 748–768.

Gao, Y., Zhai, P., & Mosalam, K. M. (2021). Balanced semisupervised generative adversarial

network for damage assessment from low- data imbalanced-class regime. Computer-Aided Civil

and Infras- tructure Engineering, 36(9), 1094–1113.

German, S., Brilakis, I., & DesRoches, R. (2012). Rapid entropy-based detection and

properties measurement of concrete spalling with machine vision for post-earthquake safety

assessments. Advanced Engineering Informatics, 26(4), 846-858.

Ghiasi, R., Torkzadeh, P., & Noori, M. (2016). A machine-learning approach for structural

damage detection using least square support vector machine based on a new combinational kernel

function. Structural Health Monitoring, 15(3), 302-316.

Ghorbani, R., Matta, F., & Sutton, M. A. (2015). Full-field deformation measurement and

crack mapping on confined masonry walls using digital image correlation. Experimental

Mechanics, 55(1), 227-243.

Girshick, R. (2015). Fast R-CNN. In Proceedings of the IEEE International Conference on

Computer Vision, 1440–1448.

200
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate

object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer

Vision and Pat- tern Recognition, 580–587.

Goulet, C. A., Haselton, C. B., Mitrani-Reiser, J., Beck, J. L., Deierlein, G. G., Porter, K. A.,

& Stewart, J. P. (2007). Evaluation of the seis- mic performance of a code-conforming reinforced-

concrete frame building—From seismic hazard to collapse safety and economic losses.

Earthquake Engineering & Structural Dynamics, 36(13), 1973–1997.

Gulgec, N.S., Takáč, M., and Pakzad, S.N. (2020), Structural Sensing with Deep Learning:

Strain Estimation from Acceleration Data for Fatigue Assessment, Computer-Aided Civil and

Infrastructure Engineering, 35:12, 1349-1364

Häberling, S., Rothacher, M., Zhang, Y., Clinton, J. F., & Geiger, A. (2015). Assessment of

high-rate GPS using a single-axis shake table. Journal of Geodesy, 89(7), 697-709.

Hakim, S. J. S., Razak, H. A., & Ravanfar, S. A. (2015). Fault diagnosis on beam-like

structures from modal parameters using artificial neural networks. Measurement, 76, 45-61.

Ham, Y., Han, K. K., Lin, J. J., & Golparvar-Fard, M. (2016). Visual monitoring of civil

infrastructure systems via camera-equipped Unmanned Aerial Vehicles (UAVs): a review of

related works. Visualization in Engineering, 4(1), 1-8.

Hart, P. E., & Duda, R. O. (1972). Use of the Hough transformation to detect lines and curves

in pictures. Communications of the ACM, 15(1), 11–15.

Hartley, R. I., & Sturm, P. (1997). Triangulation. Computer vision and image

understanding, 68(2), 146-157.

201
Hartley, R. I., Gupta, R., & Chang, T. (1992, June). Stereo from uncalibrated cameras.

In CVPR (Vol. 92, pp. 761-764).

Hartley, R., & Zisserman, A. (2003). Multiple view geometry in computer vision. Cambridge

university press.

He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask R-CNN. Proceedings of the

IEEE International Conference on Computer Vision, Venice, Italy (pp. 2961–2969).

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778.

Hinton, G. E., Osindero, S., & Teh, Y. (2006). A fast learning algorithm for deep belief nets.

Neural Computation, 18, 1527–1554.

Hirschmuller, H. (2007). Stereo processing by semiglobal matching and mutual

information. IEEE Transactions on pattern analysis and machine intelligence, 30(2), 328-341.

Hosang, J., Benenson, R., & Schiele, B. (2014). How good are detection proposals, really?

Presented at Proceedings of British Machine Vision Conference, Nottingham, England.

Hosang, J., Benenson, R., Dollár, P., & Schiele, B. (2015). What makes for effective detection

proposals? IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(4), 814–830.

Hoskere, V., Park, J. W., Yoon, H., & Spencer, B. F., Jr. (2019). Vision- based modal survey

of civil infrastructure using unmanned aerial vehicles. Journal of Structural Engineering, 145(7),

04019062.

Hu, F., Zhao, J., Huang, Y., & Li, H. (2019). Learning structural graph layouts and 3D shapes

for long span bridges 3D reconstruction. arXiv preprint arXiv:1907.03387.

202
Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected

convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern

Recognition, 4700–4708.

Huynh, T. C., & Kim, J. T. (2017). Quantification of temperature effect on impedance

monitoring via PZT interface for prestressed tendon anchorage. Smart Materials and Structures,

26(12), 125004.

Huynh, T. C., & Kim, J. T. (2018). RBFN-based temperature compensation method for

impedance monitoring in prestressed tendon anchorage. Structural Control and Health

Monitoring, 25(6), e2173.

Huynh, T. C., Park, J. H., Jung, H. J., & Kim, J. T. (2019). Quasi- autonomous bolt-loosening

detection method using vision-based deep learning and image processing. Automation in

Construction, 105, 102844.

Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K., Dally, W. J. & Keutzer, K. (2016).

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size. arXiv

preprint arXiv:1602.07360.

Im, S. B., Hurlebaus, S., & Kang, Y. J. (2013). Summary review of GPS technology for

structural health monitoring. Journal of Structural Engineering, 139(10), 1653-1664.

Jahanshahi, M. R., Kelly, J. S., Masri, S. F., & Sukhatme, G. S. (2009). A survey and

evaluation of promising approaches for automatic image- based defect detection of bridge

structures. Structure and Infrastructure Engineering, 5(6), 455–486.

203
Jang, K., An, Y.-K., Kim, B., and Cho, S. (2021). “Automated crack evaluation of a high-rise

bridge pier using a ring-type climbing robot.” Computer-Aided Civil and Infrastructure

Engineering, 36:1, 14-29.

Jiang, S., and Zhang, J. (2020). “Real‐time crack assessment using deep neural networks

with wall ‐ climbing unmanned aerial system. ” Computer-Aided Civil and Infrastructure

Engineering, 35(6), 549–564.

Jiang; K., Han, Q., Du, X., and Ni, P. (2021), A decentralized unsupervised structural

condition diagnosis approach using deep auto-encoders, Computer-Aided Civil and Infrastructure

Engineering, 36:6, 711-732.

Kang, L., Wu, L., & Yang, Y. H. (2014). Robust multi-view l2 triangulation via optimal inlier

selection and 3d structure refinement. Pattern Recognition, 47(9), 2974-2992.

Kim, H., Yoon, J., Hong, J., & Sim, S. H. (2021). Automated Damage Localization and

Quantification in Concrete Bridges Using Point Cloud-Based Surface-Fitting Strategy. Journal of

Computing in Civil Engineering, 35(6), 04021028.

Kim, M. K., Sohn, H., & Chang, C. C. (2015). Localization and quantification of concrete

spalling defects using terrestrial laser scanning. Journal of Computing in Civil Engineering, 29(6),

04014086.

Koch, C., Georgieva, K., Kasireddy, V., Akinci, B., & Fieguth, P. (2015). A review on

computer vision based defect detection and condition assessment of concrete and asphalt civil

infrastructure. Advanced Engineering Informatics, 29(2), 196–210.

204
Koch, C., Paal, S. G., Rashidi, A., Zhu, Z., König, M., & Brilakis, I. (2014). Achievements

and challenges in machine vision-based inspection of large concrete structures. Advances in

Structural Engineering, 17(3), 303-318.

Kohavi, R., & Provost, F. (1998). Confusion matrix. Machine Learning, 30(2–3), 271–274.

Kong, S.Y., Fan, J.S., Liu, Y.F., Wei, X.C., and Ma, X.W. (2021), Automated Crack

Assessment and Quantitative Growth Monitoring, Computer-Aided Civil and Infrastructure

Engineering, 36:5, 656-674.

Kong, X., & Li, J. (2018). Image registration-based bolt loosening detection of steel joints.

Sensors, 18(4), 1000.

Kong, X., & Li, J. (2018). Vision‐based fatigue crack detection of steel structures using video

feature tracking. Computer‐Aided Civil and Infrastructure Engineering, 33(9), 783-799.

Kopsaftopoulos, F. P., & Fassois, S. D. (2013). A functional model based statistical time series

method for vibration based damage detection, localization, and magnitude estimation. Mechanical

Systems and Signal Processing, 39(1-2), 143-161.

Kralovec, C., & Schagerl, M. (2020). Review of structural health monitoring methods

regarding a multi-sensor approach for damage assessment of metal and composite

structures. Sensors, 20(3), 826.

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep

convolutional neural networks. In Advances in Neural Information Processing Systems, 1097–

1105.

205
Kuddus, M. A., Li, J., Hao, H., Li, C., & Bi, K. (2019). Target-free vision-based technique for

vibration measurements of structures subjected to out-of-plane movements. Engineering

Structures, 190, 210–222.

Kullaa, J. (2010). Vibration-based structural health monitoring under variable environmental

or operational conditions. In New trends in vibration based structural health monitoring (pp. 107-

181). Springer, Vienna.

Kumar, S., & Mahto, D. G. (2013). Recent trends in industrial and other engineering

applications of nondestructive testing: a review. International Journal of Scientific & Engineering

Research, 4(9).

Lang, A. H., Vora, S., Caesar, H., Zhou, L., Yang, J., & Beijbom, O. (2019). PointPillars: Fast

encoders for object detection from point clouds. In Proceedings of the IEEE/CVF Conference on

Computer Vision and Pattern Recognition (pp. 12697-12705).

LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to

document recognition. Proceedings of the IEEE, 86(11), 2278–2324.

Li, H., Deng, X., & Dai, H. (2007). Structural damage detection using the combination method

of EMD and wavelet analysis. Mechanical Systems and Signal Processing, 21(1), 298-306.

Li, J., Deng, J., & Xie, W. (2015). Damage detection with streamlined structural health

monitoring data. Sensors, 15(4), 8832– 8851.

Li, R., Yuan, Y., Zhang, W., & Yuan, Y. (2018). Unified vision-based methodology for

simultaneous concrete defect detection and geolocalization. Computer-Aided Civil and

Infrastructure Engineering, 33(7), 527–544.

206
Liang, X. (2019). Image-based post-disaster inspection of reinforced concrete bridge systems

using deep learning with Bayesian optimization. Computer-Aided Civil and Infrastructure

Engineering, 34(5), 415–430.

Liu, D., Bai, R., Wang, R., Lei, Z., & Yan, C. (2019). Experimental study on compressive

buckling behavior of J-stiffened composite panels. Optics and Lasers in Engineering, 120, 31-39.

Liu, J., Yang, X., Lau, S., Wang, X., Luo, S., Lee, C.S., and Ding, L., (2020), Automated

pavement crack detection and segmentation based on two-step convolutional neural network,

Computer-Aided Civil and Infrastructure Engineering, 35:11, 1291-1305.

Liu, Y.F., Nie, X., Fan, J.S., and Liu, X.G. (2020), “Image-based Crack Assessment of Bridge

Piers using Unmanned Aerial Vehicles and 3D Scene Reconstruction,” Computer-Aided Civil and

Infrastructure Engineering, 35:5, 511-529.

Lowe, D. G. (2004). Distinctive image features from scale-invariant key- points. International

Journal of Computer Vision, 60(2), 91–110.

Lu, X., Xu, Y., Tian, Y., Cetiner, B., & Taciroglu, E. (2021). A deep learning approach to

rapid regional post-event seismic damage assessment using time-frequency distributions of ground

motions. Earthquake Engineering & Structural Dynamics, 50(6), 1612– 1627.

Luo, C., Yu, L., Yan, J., Li, Z., Ren, P., Bai, X., Yang, E., and Liu, Y. (2021), Autonomous

detection of damage to multiple steel surfaces from 360° panoramas using deep neural networks,

Computer-Aided Civil and Infrastructure Engineering, 36:12.

Maeda, H., Kashiyama, T., Sekimoto, Y., Seto, T., Omata, H. (2021), Generative Adversarial

Networks for Road Damage Detection, Computer-Aided Civil and Infrastructure Engineering,

36:1, 47-60.
207
Maeda, M., Matsukawa, K., & Ito, Y. (2014, July). Revision of guideline for post-earthquake

damage evaluation of RC buildings in Japan. In Tenth US National Conference on Earthquake

Engineering, Frontiers of Earthquake Engineering. Anchorage, Alaska, 21–25.

Mandal, H., Bera, S. K., Saha, S., Sadhu, P. K., & Bera, S. C. (2018). Study of a modified

LVDT type displacement transducer with unlimited range. IEEE Sensors Journal, 18(23), 9501-

9514.

Martin, S., Stefan, M., & Karl, A. (2018). Complex-yolo: real-time 3d object detection on

point clouds. In Computer vision and pattern recognition. arXiv:1803.06199.

Masi, A., Danisi, A., Losito, R., Martino, M., & Spiezia, G. (2011). Study of magnetic

interference on an LVDT: FEM modeling and experimental measurements. Journal of

Sensors, 2011.

MathWorks. (2021). Computer vision toolbox: User’s guide (R2021a).

https://www.mathworks.com/help/vision/ref/estimate geometrictransform.html

MATLAB. (2021). Version 9.10.0 (R2021a). The MathWorks Inc. Nixon, M., & Aguado, A.

(2019). Feature extraction and image processing for computer vision. Academic Press.

McCulloch, W. S., & Pitts, W. (1943). A logical calculus of ideas immanent in nervous

activity. Bulletin of Mathematical Biophysics, 5, 115– 133.

Miao, Z., Ji, X., Okazaki, T., Takahashi, N. (2021), Pixel-level multi-category detection of

visible seismic damage of reinforced concrete components, Computer-Aided Civil and

Infrastructure Engineering, 36:5, 620-637.

208
Micheletti, N., Chandler, J. H., & Lane, S. N. (2015). Investigating the geomorphological

potential of freely available and accessible structure‐from‐motion photogrammetry using a

smartphone. Earth Surface Processes and Landforms, 40(4), 473-486.

Mitrani-Resier, J., Wu, S., & Beck, J. L. (2016). Virtual Inspector and its application to

immediate pre-event and post-event earthquake loss and safety assessment of buildings. Natural

Hazards, 81(3), 1861– 1878.

Mizoguchi, T., Koda, Y., Iwaki, I., Wakabayashi, H., Kobayashi, Y., Shirai, K., ... & Lee, H.

S. (2013). Quantitative scaling evaluation of concrete structures based on terrestrial laser

scanning. Automation in construction, 35, 263-274.

Moehle, J., & Deierlein, G. G. (2004, August). A framework methodology for performance-

based earthquake engineering. In 13th world conference on earthquake engineering (Vol. 679).

Vancouver: WCEE.

Muja, M., & Lowe, D. G. (2009). Fast approximate nearest neighbors with automatic

algorithm configuration. VISAPP (1), 2(331-340), 2.

Mukhopadhyay, S. C. (Ed.). (2011). New developments in sensing technology for structural

health monitoring (Vol. 96). Springer Science & Business Media.

N. Wang, Q. Zhao, S. Li, X. Zhao, and P. Zhao, “Damage Classification for Masonry Historic

Structures Using Convolutional Neural Networks Based on Still Images,” Computer-Aided Civil

and Infrastructure Engineering, vol. 33, no. 12, pp. 1073–1089, 2018, doi: 10.1111/mice.12411.

N. Wang, X. Zhao, P. Zhao, Y. Zhang, Z. Zou, and J. Ou, “Automatic damage detection of

historic masonry buildings based on mobile deep learning,” Autom. Constr., vol. 103, pp. 53–66,

Jul. 2019, doi: 10.1016/j.autcon.2019.03.003.


209
Nakano, Y., Maeda, M., Kuramoto, H., & Murakami, M. (2004, August). Guideline for post-

earthquake damage evaluation and rehabilitation of RC buildings in Japan. In 13th World

Conference on Earthquake Engineering (No. 124).

Ni, F., Zhang, J., and Noori, M.N. (2020), Deep Learning for Data Anomaly Detection and

Data Compression of a Long-span Suspension Bridge, Computer-Aided Civil and Infrastructure

Engineering, 35:7, 685-700.

Nister, D., & Stewenius, H. (2006, June). Scalable recognition with a vocabulary tree. In 2006

IEEE Computer Society Conference on Computer Vision and Pattern Recognition

(CVPR'06) (Vol. 2, pp. 2161-2168). Ieee.

Pan, X., & Yang, T. Y. (2020). Postdisaster image-based damage detection and repair cost

estimation of reinforced concrete buildings using dual convolutional neural networks. Computer-

Aided Civil and Infrastructure Engineering, 35(5), 495–510. https://doi.org/10.1111/mice.12549

Pan, X., & Yang, T. Y. (2021). Image-based monitoring of bolt loosening through deep-

learning-based integrated detection and tracking. Computer‐Aided Civil and Infrastructure

Engineering, 1–16. https://doi.org/10.1111/mice.12797

Pan, X., & Yang, T. Y. (2022). 3D vision‐based out‐of‐plane displacement quantification

for steel plate structures using structure ‐ from ‐ motion, deep learning, and point ‐ cloud

processing. Computer ‐ Aided Civil and Infrastructure Engineering.

https://doi.org/10.1111/mice.12906

Park, H. G., Kwack, J. H., Jeon, S. W., Kim, W. K., & Choi, I. R. (2007). Framed steel plate

wall behavior under cyclic lateral loading. Journal of structural engineering, 133(3), 378-388.

210
Park, H. S., Lee, H. M., Adeli, H., & Lee, I. (2007). A new approach for health monitoring of

structures: terrestrial laser scanning. Computer‐Aided Civil and Infrastructure Engineering, 22(1),

19-30.

Park, J. H, Kim, T., & Kim, J. (2015). Image-based bolt-loosening detection technique of bolt

joint in steel bridges. 6th International Conference on Advances in Experimental Structural

Engineering 11th International Workshop on Advanced Smart Materials and Smart Structures

Technology, Champaign, IL (pp. 1–2).

Park, J. H., Huynh, T. C., Choi, S. H., & Kim, J. T. (2015). Vision-based technique for bolt-

loosening detection in wind turbine tower. Wind and Structures, 21(6), 709–726.

Park, S. W., Park, H. S., Kim, J. H., & Adeli, H. (2015). 3D displacement measurement model

for health monitoring of structures using a motion capture system. Measurement, 59, 352-362.

Peeters, B., Maeck, J., & De Roeck, G. (2001). Vibration-based damage detection in civil

engineering: excitation sources and temperature effects. Smart materials and Structures, 10(3),

518.

Ph Papaelias, M., Roberts, C., & Davis, C. L. (2008). A review on non-destructive evaluation

of rails: state-of-the-art and future development. Proceedings of the Institution of Mechanical

Engineers, Part F: Journal of Rail and rapid transit, 222(4), 367-384.

Pieraccini, M. (2013). Monitoring of civil infrastructures by interferometric radar: A

review. The Scientific World Journal, 2013.

Ramana, L., Choi, W., & Cha, Y. J. (2017). Automated vision-based loosened bolt detection

using the cascade detector. In C. Walber, E. Wee Sit, P. Walter, & S. Seidlitz (Eds.), Sensors and

instrumentation (Vol. 5, pp. 23–28). Springer.


211
Ramana, L., Choi, W., & Cha, Y. J. (2019). Fully automated vision- based loosened bolt

detection using the Viola–Jones algorithm. Structural Health Monitoring, 18(2), 422–434.

Ramana, L., Choi, W., & Cha, Y. J. (2019). Fully automated vision-based loosened bolt

detection using the Viola–Jones algorithm. Structural Health Monitoring, 18(2), 422-434.

Redmon, J. & Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv preprint

arXiv:1804.02767.

Redmon, J., & Farhadi, A. (2017). YOLO9000: Better, faster, stronger. In Proceedings of the

IEEE Conference on Computer Vision and Pattern Recognition, 7263–7271.

Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified,

real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and

Pattern Recognition, 779– 788.

Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object

detection with region proposal net- works. Advances in Neural Information Processing Systems,

28, 91–99.

Ren, S., He, K., Girshick, R., & Sun, J. (2017). Faster R-CNN: Towards real-time object

detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine

Intelligence, 39(6), 1137–1149.

Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and

organization in the brain. Psychological Review, 65, 386–408.

Rumelhart, D., Hinton, G., & Williams, R. (1986). Learning representations by back-

propagating errors. Nature, 323, 533–536.

212
Sabouri-Ghomi, S., Ventura, C. E., & Kharrazi, M. H. (2005). Shear analysis and design of

ductile steel plate walls. Journal of Structural Engineering, 131(6), 878-889.

Sahoo, D. R., Singhal, T., Taraithia, S. S., & Saini, A. (2015). Cyclic behavior of shear-and-

flexural yielding metallic dampers. Journal of Constructional Steel Research, 114, 247-257.

Sajedi, S. O., & Liang, X. (2021). Uncertainty-assisted deep vision structural health

monitoring. Computer-Aided Civil and Infrastructure Engineering, 36(2), 126–142.

Salawu, O. S. (1997). Detection of structural damage through changes in frequency: a

review. Engineering structures, 19(9), 718-723.

Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. (2018). MobileNetV2:

Inverted residuals and linear bottlenecks. In Proceed- ings of the IEEE Conference on Computer

Vision and Pattern Recog- nition, 4510–4520.

Schonberger, J. L., & Frahm, J. M. (2016). Structure-from-motion revisited. In Proceedings

of the IEEE conference on computer vision and pattern recognition (pp. 4104-4113).

Sevillano, E., Sun, R., & Perera, R. (2016). Damage detection based on power dissipation

measured with PZT sensors through the combi- nation of electro-mechanical impedances and

guided waves. Sensors, 16(5), 639.

Shi, J., & Tomasi, C. (1994). Good features to track. In IEEE Conference on Computer Vision

and Pattern Recognition, pages 593–600. Simonyan, K. & Zisserman, A. (2014). Very deep

convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.

Sim, C., Laughery, L., Chiou, T. C., & Weng, P. (2018). 2017 Pohang Earthquake:

Reinforced concrete building damage survey. Retrieved from

https://datacenterhub.org/resources/14728
213
Sim, C., Song, C., Skok, N., Irfanoglu, A., Pujol, S., & Sozen, M. (2015). Database of low-

rise reinforced concrete buildings with earth- quake damage. Retrieved from

https://datacenterhub.org/dv_dibbs/ view/1012:dibbs/experiments_dv/

Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale

image recognition. arXiv preprint arXiv:1409.1556.

Singer, J., Arbocz, J., & Weller, T. (2002). Buckling experiments: experimental methods in

buckling of thin-walled structures, volume 2: shells, built-up structures, composites and additional

topics (Vol. 2). John Wiley & Sons.

Sinha, S. K., Fieguth, P. W., & Polak, M. A. (2003). Computer vision techniques for automatic

structural assessment of underground pipes. Computer ‐ Aided Civil and Infrastructure

Engineering, 18(2), 95-112.

Sirca Jr, G. F., & Adeli, H. (2018). Infrared thermography for detecting defects in concrete

structures. Journal of Civil Engineering and Management, 24(7), 508-515.

Soukup, D., & Huber-Mörk, R. (2014). Convolutional neural networks for steel surface defect

detection from photometric stereo images. International Symposium Visual Computing, New

York, NY: Springer International Publishing, 668–677.

Spencer Jr, B. F., Hoskere, V., & Narazaki, Y. (2019). Advances in computer vision-based

civil infrastructure inspection and monitoring. Engineering, 5(2), 199-222.

Stanbridge, A. B., & Ewins, D. J. (1999). Modal testing using a scanning laser Doppler

vibrometer. Mechanical systems and signal processing, 13(2), 255-270.

214
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., … Rabinovich, A. (2015).

Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and

Pattern Recognition, 1–9.

Ta, Q. B., & Kim, J. T. (2020). Monitoring of corroded and loosened bolts in steel structures

via deep learning and hough transforms. Sensors, 20(23), 6888.

The Globe and Mail. (2016). Municipalities not spending enough to maintain infrastructure:

review. https://www.theglobeandmail.com/news/politics/municipalities-not-spending-enough-to-

maintaininfrastructure-review/article28234692/.

Tomasi, C., & Kanade, T. (1991). Detection and tracking of point. Inter- national Journal of

Computer Vision, 9, 137–154.

Tong, J. Z., Guo, Y. L., & Zuo, J. Q. (2018). Elastic buckling and load-resistant behaviors of

double-corrugated-plate shear walls under pure in-plane shear loads. Thin-Walled Structures, 130,

593-612.

Tong, Z., Yuan, D., Gao, J., and Wang, Z. (2020), Pavement defect detection with fully

convolutional network with an uncertainty framework, Computer-Aided Civil and Infrastructure

Engineering, 35:8, 832-849.

Torr, P. H., & Zisserman, A. (2000). MLESAC: A new robust estimator with application to

estimating image geometry. Computer Vision and Image Understanding, 78(1), 138–156.

Triggs, B., McLauchlan, P. F., Hartley, R. I., & Fitzgibbon, A. W. (1999, September). Bundle

adjustment—a modern synthesis. In International workshop on vision algorithms (pp. 298-372).

Springer, Berlin, Heidelberg.

215
Turner, J., & Pretlove, A. J. (1991). Acoustics for engineers. Macmillan International Higher

Education.

Vamvoudakis-Stefanou, K. J., Sakellariou, J. S., & Fassois, S. D. (2018). Vibration-based

damage detection for a population of nominally identical structures: Unsupervised Multiple Model

(MM) statistical time series type methods. Mechanical Systems and Signal Processing, 111, 149-

171.

Van Genechten, B. (2008). Theory and practice on Terrestrial Laser Scanning: Training

material based on practical applications. Universidad Politecnica de Valencia Editorial; Valencia,

Spain.

Vetrivel, A., Gerke, M., Kerle, N., Nex, F. C., & Vosselman, G. (2018). Disaster damage

detection through synergistic use of deep learning and 3D point cloud features derived from very

high resolution oblique aerial images, and multiple-kernel-learning, ISPRS Journal of Pho-

togrammetry and Remote Sensing, 140, 45–59.

Vigh, L. G., Deierlein, G. G., Miranda, E., Liel, A. B., & Tipping, S. (2013). Seismic

performance assessment of steel corrugated shear wall system using non-linear analysis. Journal

of Constructional Steel Research, 85, 48-59.

Wang, N., Zhao, Q., Li, S., Zhao, X., & Zhao, P. (2018). Damage classification for masonry

historic structures using convolutional neural networks based on still images. Computer‐Aided

Civil and Infrastructure Engineering, 33(12), 1073-1089.

Wang, N., Zhao, X., Zhao, P., Zhang, Y., Zou, Z., & Ou, J. (2019). Automatic damage

detection of historic masonry buildings based on mobile deep learning. Automation in

Construction, 103, 53-66.

216
Wang, T., Song, G., Liu, S., Li, Y., & Xiao, H. (2013). Review of bolted connection

monitoring. International Journal of Distributed Sensor Networks, 9(12), 871213.

Westoby, M. J., Brasington, J., Glasser, N. F., Hambrey, M. J., & Reynolds, J. M. (2012).

‘Structure-from-Motion’photogrammetry: A low-cost, effective tool for geoscience

applications. Geomorphology, 179, 300-314.

Wu, R. T., & Jahanshahi, M. R. (2020). Data fusion approaches for structural health

monitoring and system identification: past, present, and future. Structural Health

Monitoring, 19(2), 552-586.

Xia, Y., Chen, B., Weng, S., Ni, Y. Q., & Xu, Y. L. (2012). Temperature effect on vibration

properties of civil structures: a literature review and case studies. Journal of Civil Structural Health

Monitoring, 2(1), 29–46.

Xiong, B., Jancosek, M., Elberink, S. O., & Vosselman, G. (2015). Flexible building

primitives for 3D building modeling. ISPRS Journal of Photogrammetry and Remote

Sensing, 101, 275-290.

Xu, J., Gui, C., and Han, Q. (2020), Recognition of rust grade and rust ratio of steel structures

based on ensembled convolutional neural network, Computer-Aided Civil and Infrastructure

Engineering, 35:10, 1160-1174.

Xue, Y. D., & Li, Y. C. (2018). A fast detection method via region-based fully convolutional

neural networks for shield tunnel lining defects. Computer-Aided Civil and Infrastructure

Engineering, 33(8), 638– 654.

Yang, J., & Chang, F. K. (2006). Detection of bolt loosening in C– C composite thermal

protection panels: II. Experimental verification. Smart Materials and Structures, 15(2), 591.
217
Yang, T. Y., Banjuradja, W., Etebarian, H., & Tobber, L. (2021). Numerical modeling of

welded wide flange fuses. Engineering Structures, 238, 112181.

Yang, T. Y., Li, T., Tobber, L., & Pan, X. (2019). Experimental Test of Novel Honeycomb

Structural Fuse. ce/papers, 3(3-4), 451-456. https://doi.org/10.1002/cepa.1082

Yang, T. Y., Li, T., Tobber, L., & Pan, X. (2020). Experimental and numerical study of

honeycomb structural fuses. Engineering Structures, 204, 109814.

https://doi.org/10.1016/j.engstruct.2019.109814

Yang, T. Y., Moehle, J., Stojadinovic, B., & Der Kiureghian, A. (2009). Seismic performance

evaluation of facilities: Methodology and implementation. Journal of Structural Engineering,

135(10), 1146–1154.

Yang, Y., Dorn, C., Mancini, T., Talken, Z., Kenyon, G., Farrar, C., & Mascareñas, D. (2017).

Blind identification of full-field vibration modes from video measurements with phase-based video

motion magnification. Mechanical Systems and Signal Processing, 85, 567-590.

Yeum, C. M., & Dyke, S. J. (2015). Vision-based automated crack detec- tion for bridge

inspection. Computer-Aided Civil and Infrastructure Engineering, 30(10), 759–770.

Yeum, C. M., Dyke, S. J., Ramirez, L., & Benes, B. (2016). Big visual data analysis for

damage evaluation in civil engineering. In International Conference on Smart Infrastructure and

Construction, Cambridge, UK, June 27–29.

Yi, J., Gil, H., Youm, K., & Lee, H. (2008). Interactive shear buckling behavior of

trapezoidally corrugated steel webs. Engineering structures, 30(6), 1659-1666.

218
Yoon, H., Elanwar, H., Choi, H., Golparvar‐Fard, M., & Spencer Jr, B. F. (2016). Target‐free

approach for vision‐based structural system identification using consumer‐grade

cameras. Structural Control and Health Monitoring, 23(12), 1405-1416.

Yuen, K. V., & Lam, H. F. (2006). On the complexity of artificial neural networks for smart

structures monitoring. Engineering Structures, 28(7), 977-984.

Yun, J. P., Kim, D., Kim, K., Lee, S. J., Park, C. H., & Kim, S. W. (2017). Vision-based

surface defect inspection for thick steel plates. Optical Engineering, 56(5), 053108.

Zhang, A., Wang, K. C. P., Li, B., Yang, E., Dai, X., Peng, Y., … Chen, C. (2017). Automated

pixel-level pavement crack detection on 3D asphalt surfaces using a deep-learning network.

Computer-Aided Civil and Infrastructure Engineering, 32(10), 805–819.

Zhang, C., Chang, C., and Jamshidi, M. (2020). “Concrete bridge surface damage detection

using a single‐stage detector.” Computer-Aided Civil and Infrastructure Engineering, 35(4),

389–409.

Zhang, C., Zhang, Z., & Shi, J. (2012). Development of high deformation capacity low yield

strength steel shear panel damper. Journal of Constructional Steel Research, 75, 116-130.

Zhang, Y. and Yuen, K.V. (2021), Crack detection using fusion features-based broad learning

system and image processing, Computer-Aided Civil and Infrastructure Engineering, 36:12.

Zhang, Y., & Lin, W. (2022). Computer‐vision‐based differential remeshing for updating the

geometry of finite element model. Computer‐Aided Civil and Infrastructure Engineering, 37(2),

185-203.

219
Zhang, Y., & Yuen, K. V. (2022). Bolt damage identification based on orientation-aware

center point estimation network. Structural Health Monitoring, 21(2), 438-450.

Zhang, Y., Sun, X., Loh, K. J., Su, W., Xue, Z., & Zhao, X. (2020). Autonomous bolt

loosening detection using deep learning. Structural Health Monitoring, 19(1), 105-122.

Zhao, J., Bao, Y., Guan, Z., Zuo, W., Li, J., & Li, H. (2019). Video‐based multiscale

identification approach for tower vibration of a cable‐stayed bridge model under earthquake

ground motions. Structural Control and Health Monitoring, 26(3), e2314.

Zhao, X., Tootkaboni, M., & Schafer, B. W. (2015). Development of a laser-based geometric

imperfection measurement platform with application to cold-formed steel

construction. Experimental Mechanics, 55(9), 1779-1790.

Zhao, X., Zhang, Y., & Wang, N. (2019). Bolt loosening angle detection technology using

deep learning. Structural Control and Health Monitoring, 26(1), e2292.

220
Appendices

Appendix A Data collection methods

A.1 Manual data collection

This section describes the possible data collection methods in current inspection practices.

Currently, manual inspection is widely used in the field. For example, in the case of a typical

bridge inspection, specialized trucks and supporting mechanical arms are required during the

process. In addition, ropes are required to allow the inspector to conduct a more detailed

investigation of some portions of the bridges. In this case, an inspector can use measurement

devices and cameras to capture critical damage information, which will be reported in the site

inspection survey. This clearly shows that the manual inspection approaches are not only relatively

inefficient, and potentially cause traffic shutdown, but also impose life safety risks to the

inspectors.

A.2 Robot-based data collection

The current manual inspection practices are time-consuming, inherently based, highly

dependent on the proper training of the inspectors, and impose a threat to the life safety of the

inspectors, particularly in a post-disaster scenario. Currently, autonomous data collection for

structural damage inspection is an active area of research. Rapid data collection and analysis right

after disasters is particularly important to provide valuable inputs to decision-makers to make

informed risk management decisions. Therefore, this section presents potential robotic solutions

(i.e., UAVs and UGVs) to facilitate the data collection process. The deployment of the UAV fleet

and UGVs on sites is a future trend for efficient and safe data collection, especially for inspections
221
at large scale, which is very labor-intensive, time-consuming, and expensive when conducted

manually.

The UAVs (or drones) have the flexibility of flying freely in many possible directions at

different altitudes in the air where the number of obstacles is generally less than that on the ground.

With the rapid development of consumer-grade drones by companies such as DJI and Parrot, it is

more feasible for engineers and researchers to embrace such advanced technologies at a more

affordable cost. In addition, the robotic research community has made great efforts in collaborating

with industrial partners to create programmable drones with the supporting software development

toolkit (SDK), which allows researchers from different disciplines to collaborate on a more

integrated project. For example, with the SDK, state-of-the-art computer vision algorithms

developed by computer vision researchers can be integrated with drone technologies, such as novel

Visual Simultaneous Localization and Mapping (VSLAM) algorithms, to facilitate a more

autonomous navigation and data collection with less or even zero amount of human control. The

figure below depicts an example of a programmable drone, ANAFI AI, developed by Parrot. The

drone can reach a maximum speed of 17m/sec forward, and 16m/sec backward and laterally, with

a wind resistance of 14m/sec. The drone has a built-in front camera setup which consists of stereo

cameras and an RGB camera, which has the capability of rotating in three axes, at a speed of 300°/s

on the pitch and roll axes, while 200°/s on the yaw axis. The stereo camera can be effectively used

for mapping and 3D vision-based scene reconstruction, while the RGB camera can provide high-

resolution image which can be processed by state-of-the-art vision algorithms. In addition, the

built-in time-of-flight (ToF) sensor measures the ground distance, which can be used to calibrate

and refine the 3D reconstruction of structures. In the situation of indoor flight where the GPS signal

222
is normally denied, the measurements from the built-in ToF sensor, the vertical camera and the

IMU device will be fused to provide a more reliable estimation of velocity and ground distance.

In the situation of low-light conditions, the built-in LED lights beside the vertical camera allow

the vision algorithms developed to be more effectively used to ensure more stable navigation. In

addition, the drone has 4G connectivity and can switch automatically between WIFI and 4G for

wireless data transmission. This distinguishes its applicability and operation range significantly

from the other drones which only support WIFI or other local area networks. In short, the hardware

configuration, the SDK, and the resources from the open-source communities make the ANAFI

AI drone a strong candidate for UAV-based structural damage detection, such as bridge damage

inspection, or rapid post-disaster site condition assessment.

Figure. ANAFI drone (from Parrot)


223
The figure below shows a macro-UAV, Tello, from DJI company. The advantage of this drone

is its small size which makes it easier to pass narrow passages of civil structures. Although it has

a limited battery capacity at the current time, it is programmable which makes it ideal to develop

and validate drone-related algorithms in laboratory conditions.

Figure. Tello drone (from DJI)

On the other hand, in recent years, UGVs have gained more attraction in both research

developments and filed applications. For civil engineering applications, one of the UGV

candidates is the Husky UGV. The Husky UGV is a medium-sized ground robot which has a

relatively large payload to accommodate more hardware units than UAVs in general. Users can

install different types of manipulators such as robotic arms, or sensors such as RGB cameras, stereo
224
cameras, LiDAR, GPS, and IMU, onto the Husky robot. The diversity of sensors equipped by this

robot can allow various types of measurements which reflect structural health conditions to be

recorded. The sensor data can be processed by the state-of-the-art structural damage evaluation

algorithms developed in vision-based SHM and vibration-based SHM domains. This will facilitate

a more comprehensive structural damage evaluation. Moreover, the robot is fully supported by

Robot Operating System (ROS) which is an open-source robotics development platform. Using

the stereo camera or LiDAR sensors, SLAM algorithms can be effectively developed, while the

control algorithms for the manipulators installed can also be developed, all on ROS. This will

greatly enhance the level of autonomous navigation and data collection. In short, the diversity of

the hardware configuration, the full support in ROS, and the additional resources from the open-

source communities make the Husky robot a strong candidate for UGV-based structural damage

detection, such as road or bridge damage inspection.

In addition, the development of robot dogs has received great momentum in recent years.

Currently, most applications of robot dogs are for entertainment, security and public safety

purposes. However, applications of robot dogs for structural damage evaluation remain almost

none. The author of the dissertation believes the robot dog can be effectively used for structural

damage detection in the near future. For example, the robot dog, Go 1, developed by

UnitreeRobotics, is a potential candidate. This robot dog has a so-called super sensory system (5

sets of fish-eye stereo depth cameras and 3 sets of hypersonic sensors), which allows the robot to

have full view coverage. It has built-in AI processing units which can reach a total computational

power of 1.5 TFLOPS. The robot can also be equipped with LiDAR, which can be used together

with the advanced camera system to allow more reliable and accurate autonomous positioning and

225
obstacle avoidance both indoors and outdoors. The robot can also be equipped with manipulators

such as a robotic arm to perform different active control tasks. In addition, the Go 1 robot can be

configured to have both 4G and WIFI connectivity for efficient wireless data transmission. Lastly,

the educational version of the Go 1 robot dog provides multiple programming APIs which allows

researchers to further develop the robot dog for specific applications. For example, the Go 1 robot

dog can be potentially developed for autonomous indoor damage inspection of buildings. The

robot dog can climb stairs, and the robotic arm can be programmed to open the doors. These two

combined features make it superior to many of the existing wheeled ground robots and drones.

The advanced sensory system allows the robot dog to perform the SLAM task and damage data

collection simultaneously both indoors and outdoors during a building damage assessment. The

pretrained AI model can be deployed to the robot dog using the Python programming API

provided, such that real-time local image processing can be potentially achieved using the built-in

computational units, without a separate computer. In the situation of a large amount of data

processing (e.g., dense point cloud from the 3D reconstruction of civil structures) which cannot be

handled locally, the data can be wirelessly streamed back via the built-in 4G or WIFI connectivity

to powerful computers or cloud computing platforms for further processing.

It is envisioned that in the future, robotic technologies such as UAVs and UGVs will be

gradually adopted into civil infrastructure inspection and maintenance practices, with the

continuous development of hardware and software, and a decrease in their cost.

226
Appendix B Pseudocodes

In this section, pseudocodes are presented to provide an additional illustration for detailed

algorithmic implementation of some key algorithms and methods developed and validated in this

research.

The CNN-based classification, object detection, and segmentation algorithms developed in

this research or adopted from other research studies, are not presented here. Guidelines for the

development, optimization, finetuning, and deployments of these algorithms are available on many

open-source machine learning/deep learning platforms such as PyTorch and TensorFlow.

Similarly, the development and implementation of the 3D reconstruction pipeline can be accessed

in the user manual of Meshroom and Metashape.

227
Pseudocode for concrete spalling quantification for cuboid RC columns

% Concrete spalling quantification procedures

% Plane-based 3D cuboid recovering method

% Alpha shape-based volume estimation

% Input

Require: Ground (reference) plane

Require: Distance threshold for plane segmentation algorithm

Require: The point cloud of the damaged RC columns

% Search for ground plane and side surface planes of the RC columns

While the number of side planes detected < 4 and ground plane is not found

Apply plane segmentation algorithm

Check the similarity between the segmented plane and the ground plane

If the segmented plane normal and the ground plane normal are close enough

If the distance between the two planes is small enough

Ground plane is detected

Save the fitted plane information and the inlier point fitting the plane

Remove all the inlier point fitting this plane from the point cloud

End if

If the segmented plane is orthogonal to the ground plane

Side surface plane is detected

Save the fitted plane information and the inlier point fitting the plane

228
Remove all the inlier point fitting this plane from the point cloud

Else

Plane detected is not of interest and discarded

End if

End while

% Search for the top surface plane of the RC columns

While the number of points fitting the respective plane < the predefined threshold

Determine furthest point with respect to the ground plane

Establish a plane that is parallel to the ground plane through this point

Count the number of points (i.e., plane inliers) that fit this plane. % The inlier points are
defined as the points having a point-to-plane distance less than a certain distance
threshold

If the number of points fitting this plane > a predefined threshold,

This plane is considered as the top surface plane of the RC column.

Save the plane information % Top surface plane found

Else

This point is considered as noise.

End if

End while

% Recover 3D geometry

Recover 3D geometry using all the detected planes

229
% Determine the volume of the damaged RC column

Volume of the damaged RC column = Volume of the alpha shape of the point cloud

% Output (Quantification of the concrete spalling volume)

Concrete spalling volume = Volume of the reconstructed 3D cuboid - Volume of the


damaged RC column

230
Pseudocode for concrete spalling quantification for circular-shape RC columns

% Concrete spalling quantification procedures

% Circle-based 3D cuboid recovering method

% Alpha shape-based volume estimation

% Input

Require: Ground (reference) plane

Require: Distance threshold for circle fitting algorithm

Require: Distance threshold for plane segmentation algorithm

Require: The point cloud of the damaged RC columns

% Search for ground plane

While the ground plane is not found

Apply plane segmentation algorithm

Check the similarity between the segmented plane and the ground plane

If the segmented plane normal and the ground plane normal are close enough

If the distance between the two planes is small enough

Ground plane is detected

Save the fitted plane information and the inlier point fitting the plane

Remove all the inlier point fitting this plane from the point cloud

End if

Else

Plane detected is not of interest and discarded

231
End if

End while

% Search for the top surface plane of the RC columns

While the number of points fitting the respective plane < the predefined threshold

Determine furthest point with respect to the ground plane

Establish a plane that is parallel to the ground plane through this point

Count the number of points (i.e., plane inliers) that fit this plane. % The inlier points are
defined as the points having a point-to-plane distance less than a certain distance
threshold

If the number of points fitting this plane > a predefined threshold,

This plane is considered as the top surface plane of the RC column.

Save the plane information % Top surface plane found

Else

This point is considered as noise.

End if

End while

% Search for the cross-sectional radius of the RC columns

% Initialize the stepping position (at the lower or upper bound of the point cloud along one
selected axis)

Lower stepping position = initial stepping position

While stepping position is inside the range of the point cloud:

Upper stepping position = Lower stepping position + step size

232
Stepping range = [Lower stepping position, Upper stepping position]

Sample a sub-cloud using the stepping range

Project the sub-cloud points on the ground reference plane

Fit a circle using RANSAC algorithm

Store fitted circle information

End while

Discretize the circle radius range into multiple bins

Count and rank the fitted radius falling into different range of the bins

Cross-sectional radius = Fitted radius with the highest rank

% Recover 3D geometry

Recover 3D geometry using the detected plane and the fitted cross-sectional radius

% Determine the volume of the damaged RC column

Volume of the damaged RC column = Volume of the alpha shape of the point cloud

% Output (Quantification of the concrete spalling volume)

Concrete spalling volume = Volume of the reconstructed 3D cylinder - Volume of the


damaged RC column

233
Pseudocode for the DBPPC method

% DBPPC procedures: Two while loops

% Input

Require: Step size

Require: Distance threshold

Require: The point cloud being clustered

% Initialize the stepping position (at the lower or upper bound of the point cloud along one
selected axis)

Lower stepping position = initial stepping position

% Outer while loop: vertical sampling process

While stepping position is inside the range of the point cloud:

Upper stepping position = Lower stepping position + step size

Stepping range = [Lower stepping position, Upper stepping position]

Sample a sub-cloud using the stepping range

Reset lower stepping position as the upper stepping position

Select a point close to one side along the longitudinal direction of the plate

Project points onto a predefined plane (e.g., XY plane in this paper)

Determine the distance (after projection) of the selected point to all other points

234
Sample k nearest neighbors of the selected point within the sub-cloud such that the
point-to-point distance does not exceed the distance threshold

Assign sampled points to Cluster 1

Assign the remaining cloud temporarily to Cluster 2

Determine the furthest point in the k nearest neighbors of the selected point and set
the furthest point to the newly selected point

% Inner while loop: horizontal sampling process

While sampled points are not empty:

Sample k nearest neighbors of the newly selected point from Cluster 2 such
that the point-to-point distance does not exceed the distance threshold

Check if sampled points are empty

If sampled points are not empty

Move sampled points from Cluster 2 to Cluster 1

End if

End while

Assemble local clusters (of each sub-cloud) to global clusters (of the total cloud)

End while

235
Pseudocode for the YOLO-TDCH method

% Structural bolt loosening quantification

% Front-view bolt localization method to find reference planes

% Bolt looseness quantification

% Input

Require: Vision-based object detector (e.g., pretrained YOLO in this dissertation) for bolt
localization

Require: Step size for point cloud sampling

Require: The point cloud of the bolted component

% Search for reference planes

Apply plane segmentation algorithm until no more principal planes can be found % principal

plane is defined as the plane that has a sufficient number of inlier points fitting the plane

Save all the fitting planes

Project the point cloud onto the planes which lead to multiple render images

For each render image

Apply vision-based object detector on the render image

If bolts are successfully localized

Reference plane is found

Save this plane

236
End if

Save all the reference planes

End for

% Quantification of bolt loosening length

Establish a cuboid boundary box for each bolt, using the respective detected

bounding boxes and the normal vector of the reference planes identified

For each cuboid boundary box

% Stepping along the direction of the normal vector of the reference plane

While stepping position is inside the range of the cuboid boundary box:

Upper stepping position = Lower stepping position + step size

Stepping range = [Lower stepping position, Upper stepping position]

Sample a sub-cloud using the stepping range

Project all sub-cloud points on the reference plane

Determine and record the convex hull of the projected points

Determine and record the convex hull area

End while

End for

% Recognize bolt cap

Use the recorded convex hull information to identify the bolt cap

237
Use the recorded convex hull information to identify the supporting surface beneath
the bolt

% The above process is repeated for all the small sub-clouds from the top down. The

area of each convex hull will be consistently checked for each cuboid boundary within

each sub-cloud. The bolt loosening length can be estimated considering the two following

criteria. a) If a bolt is tight, there will be a relatively constant convex hull area at the

beginning (i.e., bolt cap region) of the sub-cloud stepping-down process, and then a

sudden increase of the convex hull area when the sub-cloud sampling reaches the

structural surface underneath the bolt. b) If a bolt is loosened, there will be a relatively

constant convex hull area in the first region (i.e., bolt cap region), followed by a sudden

decrease of the convex hull area (i.e., bolt thread region due to loosening) at the

beginning of the second region, and then followed by a sudden increase of the convex

hull area when the sub-cloud sampling reaches the steel surface underneath the bolt. The

plane travel distance within the second region will be determined as the bolt loosening

length.

% Output: Bolt loosening length

Bolt loosening length = Distance between the bottom of the bolt cap and the supporting
surface underneath the bolt

238
Appendix C Additional results

This section presents additional sample results.

C.1 Sample system-level classification

This section presents the additional sample system-level testing results including system-level

collapse and non-collapse classification,

239
Collapse

240
Non-collapse

241
C.2 RC structures

This section presents additional sample results for component-level damage states

classification and localization, and 3D reconstruction outcomes.

242
C.2.1 RC component-level damage classification

No damage

No damage
243
No damage

244
No damage

245
No damage

246
No damage

247
Light damage

Light damage
248
Light damage

249
Light damage

250
Light damage

251
Moderate damage

Moderate damage
252
Moderate damage

253
Moderate damage

254
Moderate damage

255
Severe damage

Severe damage
256
Severe damage

257
Severe damage
258
Severe damage

259
Severe damage

260
Severe damage

261
Severe damage
262
C.2.2 RC steel exposure detection

263
264
265
266
267
268
269
270
C.2.3 3D reconstruction and plane segmentation of RC columns

271
3D reconstruction of RC column 1
272
Plane segmentation results of RC column 1

273
274
3D reconstruction of RC column 2

275
Plane segmentation results of RC column 2

276
277
278
3D reconstruction of RC column 3

279
Plane segmentation results of RC column 3

280
C.3 Steel plate structures

This section presents the sample classification results for the steel plate dampers and steel

corrugated plate walls. In addition, it presents some additional sample 3D reconstruction results of

steel structures investigated.

C.3.1 Damage classification

Undamaged steel plate dampers Damaged steel plate dampers (due to buckling)

281
(a) Undamaged steel corrugated plate wall (b) Damaged steel corrugated plate wall (due to

buckling)

282
C.3.2 3D reconstruction of steel solid plate dampers

283
3D reconstruction of steel solid plate damper 1 (multiple views)

284
285
3D reconstruction of steel solid plate damper 2 (multiple views)

286
C.3.3 3D reconstruction of steel honeycomb plate dampers

287
3D reconstruction of steel honeycomb damper 1 (multiple views)

288
289
3D reconstruction of steel honeycomb damper 2 (multi-views)

290
291
3D reconstruction of steel honeycomb damper 3 (multiple views)

292
C.3.4 3D reconstruction of steel corrugated panels

293
3D reconstruction of the steel corrugated panel

294
C.4 Structural bolted components

This section presents some additional sample bolt localization results by the pretrained

YOLOv3-tiny, and the 3D reconstruction results of structural bolted connections.

295
C.4.1 Structural bolts detection

296
297
298
299
C.4.2 3D reconstruction of structural bolted devices

300
3D reconstruction results of bolted component 1 (in multiple views)

301
302
3D reconstruction results of bolted component 2, friction damper, in multiple views
303
C.5 Parameter studies

This section presents a parameter study on plane segmentation algorithm.

C.5.1 Parameter studies on plane segmentation

As described in Chapter 4, to fit points to a plane, a distance threshold should be provided.

The plane segmentation results are directly attributed to the selection of the distance threshold

during plane fitting. If the threshold is set too low, only a small subset of inliers will be determined

to fit a plane, which means a large number of iterations are required to isolate the buckled plates.

This can be very computationally expensive if the point cloud is too large. On the other hand, if

the selection of distance threshold is too high, the quality of the fitting will be downgraded. In this

study, three values of distance threshold are examined. The figure below shows the results of plane

segmentation under three different distance thresholds, where the red region indicates the fitted

plane to which the inlier points are removed after each iteration. As the distance threshold goes

lower, the number of inliers during each plane fitting iteration will be smaller. This means more

iterations are required to isolate the buckled plate. In this study, to achieve more efficient fitting

for the non-buckled plates while still maintaining a relatively high plane fitting quality, the distance

threshold is selected as 20mm. In this case, the number of iterations required to isolate the buckled

plate is only 3, which is computationally efficient. After the plane segmentation, the central

buckled plate is isolated for further processing. It should be noted that this process can be semi-

automated when dealing with highly complicated structural assemblies, or when the selection of

the distance threshold is difficult. In these scenarios, plane segmentation can be used to remove

304
most of the irrelevant flat planes, while minor human intervention can be considered to further

refine the remaining cloud.

(a)

(b)

(c)

Figure Three iterations of plane segmentation results using the distance threshold of (a)

10mm; (b) 15mm; (c) 20mm (Note: the unit for x, y, and z axis is in “mm”)

C.5.2 Parameter studies on the distance threshold of the DBPPC method

In this parameter study, grid sizes of 5 mm, 10 mm, 20 mm, and 30 mm have been

investigated. Similarly, distance thresholds of 2 mm, 3 mm, 4 mm, and 5 mm have been
305
investigated. The table below show the results of the parameter study. As shown in the table, when

grid size is 20 or less, and the distance threshold is 4 or less, the DBPPC method can successfully

separate the points from two surfaces. The figures below show a success case and a fail case.

Grid size [mm]


5 10 20 30
2 Success Success Success Fail
Distance threshold 3 Success Success Success Fail
[mm] 4 Success Success Success Fail
5 Fail Fail Fail Fail

Success (Distance threshold = 3, grid size = 10) in the orthogonal view

306
Success (Distance threshold = 3, grid size = 10) in the planar view

Fail (Distance threshold = 3, grid size = 30) in the planar view


307

You might also like