Applications of Object Detection in

The current issue and full text archive of this journal is available on Emerald Insight at:
https://www.emerald.com/insight/1471-4175.htm
Object
Applications of object detection in detection
modular construction based on a
comparative evaluation of deep
learning algorithms
Chang Liu Received 1 March 2020
Revised 5 August 2020
Faculty of Engineering, University of New South Wales, 2 October 2020
Sydney, Australia 2 January 2021
24 February 2021
Accepted 8 March 2021
Samad M.E. Sepasgozar and Sara Shirowzhan
Faculty of Arts, Design and Architecture, University of New South Wales,
Sydney, Australia, and
Gelareh Mohammadi
Faculty of Engineering, University of New South Wales,
Sydney, Australia
Abstract
Purpose – The practice of artificial intelligence (AI) is increasingly being promoted by technology
developers. However, its adoption rate is still reported as low in the construction industry due to a lack of
expertise and the limited reliable applications for AI technology. Hence, this paper aims to present the detailed
outcome of experimentations evaluating the applicability and the performance of AI object detection
algorithms for construction modular object detection.
Design/methodology/approach – This paper provides a thorough evaluation of two deep learning
algorithms for object detection, including the faster region-based convolutional neural network (faster RCNN)
and single shot multi-box detector (SSD). Two types of metrics are also presented; first, the average recall and
mean average precision by image pixels; second, the recall and precision by counting. To conduct the
experiments using the selected algorithms, four infrastructure and building construction sites are chosen to
collect the required data, including a total of 990 images of three different but common modular objects,
including modular panels, safety barricades and site fences.
Findings – The results of the comprehensive evaluation of the algorithms show that the performance of
faster RCNN and SSD depends on the context that detection occurs. Indeed, surrounding objects and the
backgrounds of the objects affect the level of accuracy obtained from the AI analysis and may particularly
effect precision and recall. The analysis of loss lines shows that the loss lines for selected objects depend on
both their geometry and the image background. The results on selected objects show that faster RCNN offers
higher accuracy than SSD for detection of selected objects.
Research limitations/implications – The results show that modular object detection is crucial in
construction for the achievement of the required information for project quality and safety objectives.
The detection process can significantly improve monitoring object installation progress in an
accurate and machine-based manner avoiding human errors. The results of this paper are limited to
three construction sites, but future investigations can cover more tasks or objects from different
construction sites in a fully automated manner.
Originality/value – This paper’s originality lies in offering new AI applications in modular construction,
using a large first-hand data set collected from three construction sites. Furthermore, the paper presents the
scientific evaluation results of implementing recent object detection algorithms across a set of extended Construction Innovation
© Emerald Publishing Limited
metrics using the original training and validation data sets to improve the generalisability of the 1471-4175
DOI 10.1108/CI-02-2020-0017
CI experimentation. This paper also provides the practitioners and scholars with a workflow on AI applications
in the modular context and the first-hand referencing data.
Keywords Innovation, Construction management, Construction technology,
Construction engineering management, IT building design construction, Modular objects,
Artificial intelligence, Deep learning, Object detection, Construction
Paper type Research paper
1. Introduction
With the development of advanced technologies, automation in construction has improved
substantially (Ahmadi-Karvigh et al., 2019). This leads to an increasing number of different
new technologies adopted in the construction industry to benefit from their introduction.
Safety monitoring, an important task in construction, is usually undertaken and recorded
manually. Although construction workers may recognise potential safety hazards, the
general absence of a safety protocol for temporary structures means that workers will
usually be reluctant to take action to neutralise the threat (Yuan et al., 2016). Giving priority
to the development of digital technology in construction should improve safety monitoring
considerably. Monitoring progress is another critical aspect of the construction endeavour
besides safety monitoring. Monitoring construction progress accurately can help contractors
control and manage costs and scheduling (Love et al., 2019). It can also help project
managers to make timely decisions and take corrective actions in response to anticipated
delays. However, construction progress planners or project managers generally only collect
information manually during project execution. Thus, information about the actual
construction progress is error-prone, incomplete and not available on time due to errors
committed during manual operations (Srewil and Scherer, 2013). Doxel (2018), an artificial
intelligence (AI) company, states that unlike their manufacturing counterparts, most
construction managers do not have access to real-time feedback about progress and quality,
so improving the efficiency of progress monitoring is vital.
The growth of AI in the construction market should not be ignored. Its valuation was US
$429.2m in 2018 and is projected to reach US$4.51bn by 2026, as reported by Reports and
Data (2019). The report shows that the growth of the AI market is mainly driven by the
contribution to efficiency in construction processes as well as the increasing need for
construction safety. According to another report (Zion Market Research, 2018), the valuation
of the AI market was US$312m in 2017 and is expected to reach US$3,161m by 2024,
growing at a compound annual growth rate of 38.14% between 2018 and 2024. The
development of applications providing ease of analysis and building surveys and structures
has been an important benefit of AI comprising computer vision. For instance, AI can help
engineers to decide the workload of machinery and labour across different requirements
accurately to prevent budget overruns (Reports and Data, 2019). The report by Crystal
Market Research (2018) states that rising security and productivity concerns are two major
factors driving the growth of the AI market. This report also indicated that construction
organisations are adopting AI to extract information from large masses of data to improve
construction safety and progress efficiency. The expanding availability of digital data has
also helped to accelerate the development of AI in the construction industry. The integration
of computer vision and deep learning can be a feasible method for AI-based construction
safety and progress monitoring. A survey provided by Chui and Malhotra (2018) shows that
robotic process automation, computer vision and deep learning are the three most applied
aspects of AI technologies. Computer vision research in construction industries has already
been conducted in progress monitoring (Doxel, 2018) and object tracking for safety reasons
(Guo et al., 2018). Moreover, as reported by Blanco et al. (2018), applying deep learning can Object
be considered as an approach to optimise and correct schedules of an ongoing project to detection
sequence tasks more effectively to meet target deadlines.
Computer vision refers to the use of computers to replicate human visual functions, with
the aim of making meaningful judgements on the actual target and scene content based on
the perceived image or videos. As a common type of technology in AI, computer vision is
applied in other disciplines such as medicine, city planning and agriculture. The medicine
field uses computer vision to analyse patients’ disasters based on X-ray images (Mintz and
Brodie, 2019). Land use analysis uses the advantages of computer vision by processing the
remote-sensing images for evaluating the condition of land use, such as subsidence (Du
et al., 2018). Agriculture uses it to detect fruits and vegetables (Tripathi and Maktedar, 2020).
Modular construction has also developed rapidly in the construction industry. Modular
construction means the construction process in which “modular” elements using the same
materials are produced off-site and assembled on construction sites. Marketsandmarkets
(2018) reports that the value of the modular construction market reached to US$106.15bn in
2017 and is expected to reach US$157.19bn by 2023. Hence, the use of modular construction
is projected to become more popular in construction. In addition, modular installation is an
important part of building construction because it usually saves labour and time. However,
there is no reliable data source for engineers and construction workers to check construction
quality at upper levels of modular buildings. Thus, data collection methods for modular
installation should be improved.
Although the merits of AI in construction are obvious, the development of AI adoption in
construction encounters different frontiers. The increased number of privacy invasion cases
and a lack of a skilled workforce are some of the reasons for inhibiting AI in the construction
sector (Reports and Data, 2019). The report provided by Crystal Market Research (2018) also
states that a lack of professionals and the unstructured nature of the construction industry
are the main reasons that limit the market growth of AI. In addition, a McKinsey and
Company report (Blanco et al., 2018) shows that the construction industry lags behind other
industries in terms of implementing AI technologies. The report indicates that few
construction companies currently have the capacity to adopt AI solutions, lacking processes,
personnel and tools. Overall, AI adoption in construction still needs further development
and improvement.
Another problem is that few computer vision methods focus on detecting modular (or
repeated) objects. As a part of AI, computer vision methods can efficiently detect various
objects for different construction purposes. These methods have received much attention
and are perceived as beneficial solutions for construction because they are low cost and
easily used on-site (Fathi et al., 2015). Typically, detected objects in previous studies include
construction workers (Fang et al., 2018; Kim et al., 2019), hardhats (Mneymneh et al., 2017;
Mneymneh et al., 2018), excavators (Soltani et al., 2016; Kim et al., 2019), trucks (Kim and
Kim, 2018), earthmoving construction equipment (Golparvar-Fard et al., 2013), wheel loaders
(Kim et al., 2019), windows (Yang et al., 2016), roofs (Siddula et al., 2016), safety guardrails
(Kolar et al., 2018) and predefined objects (Walsh et al., 2013). However, there are few
methods focusing on analysing and obtaining information from modular objects.
Modular temporary objects are always ignored in construction projects because they will
be removed when the project finishes. Besides, with the increasing number of modular
objects in construction, their monitoring should be improved, and lack of oversight is a
reason for delays in the construction schedule and the occurrence of incidents. Hence, there
is a lack of computer vision algorithms for modular objects’ detection in construction
projects. As a part of safety monitoring, checking the quality of modular temporary
CI structures and avoiding modular falling objects such as ceiling panels is indispensable with
more than 15,500 construction workers injured by falling objects in New South Wales state
workplaces over the past four years (SaveWork NSW, 2019). Hence, modular temporary
structures should be checked constantly to ensure safety in construction. For example,
construction fences and façade panels should be fitted carefully to prevent them from flying
off in high winds and hurting people.
Third, the time-consuming nature of some non-image-based data collection tools may not
be ideal for construction monitoring in certain situations. For example, previous studies
used different types of sensor-based technologies such as laser scanners to improve safety or
measure physical progress on construction sites (Shirowzhan et al., 2018). However, the
installation of sensors and other expensive hardware tools sometimes is time-consuming
and may not be permitted for detecting objects in special areas or dangerous construction
sites. Moreover, computer vision technologies may provide more advantages, including
flexibility and lower-cost solutions, than non-image-based tools.
In conclusion, there are three main problems existing in construction. First, due to a lack
of expertise and the limited reliable applications for AI technology, its adoption rate is still
perceived to be low in the construction industry. To improve the application of AI in the
construction industry, conducting experiments using AI algorithms in construction should
be considered. The second problem is that detection for modular objects is not contained in
previous studies. Third, several construction activities such as safety monitoring or
monitoring progress need detection of the location of objects. However, the time-consuming
nature of many current labour-intensive construction processes are inefficient and barriers
to the improvement of the detection.
Considering the problems, modular construction object detection may provide help to
resolve them. Hence, the aim of this research is to investigate the applicability and
performance of object detection algorithms for modular objects’ detection in the construction
industry.
2. Object detection algorithms

As a branch in computer vision, object detection has gained relevance in recent years
because its techniques have wide applicability in different disciplines. With the application
of object detection technology in different fields, a variety of excellent object detection
algorithms appear. Faster region-based convolutional neural networks (faster RCNN) (Ren
et al., 2016) and the single shot multi-box detector (SSD) (Liu et al., 2016) are two of the most
common algorithms for detecting different objects such as construction signs (Han et al.,
2019) and vehicles (Wang et al., 2019).
Faster RCNN is optimised from RCNN. First, regions of interest are extracted by a
selective search algorithm in RCNN. Then, a standard convolutional neural network (CNN)
is used for classifying and adjusting these regions. It was one of the first modern
incarnations of the convolutional network for object detection. It has undergone several
revisions since its release. Faster RCNN is the third iteration of the RCNN (Girshick et al.,
2014) series of papers (RCNN to fast RCNN to faster RCNN), which was first released in
NIPS in 2015. Although faster RCNN was proposed in 2015, it is still the basis of several
object detection algorithms, which is very rare in the rapidly changing field of machine
learning (ML). This suggests that it is worth discussing its application in construction object
detection.
Faster RCNN is applied based on the following steps. The first step is to use pre-trained
CNN for image classification tasks (e.g. Imagenet) and use the output of middle-level
features obtained by the network. Each input picture is represented by tensors of the length
of the picture, the width of the picture and pixel size of each pixel in the picture and then fed Object
into the pre-trained CNN to place the feature map in the middle layer. The feature map is detection
used as a feature extractor and used in the next process. Each layer of the convolution
network extracts more abstract features from information in the previous layer. For
instance, the first layer always extracts simple edges, and the second layer learns about the
patterns of target edges. Finally, a much smaller convolution feature map will be obtained
than the original image in its spatial dimension (coordinates). The regional proposal
network (RPN) is then applied. It uses the features computed by CNN to obtain the areas that
may contain the target pre-set number. Next, the region of interest pooling on the feature
map of the target is used to store the feature information related to the target as a new
tensor. The subsequent process is consistent with the RCNN model. The faster RCNN
algorithm uses two different full connection layers for each separate target.
SSD is one of the most common detection algorithms in the field of object detection. It
uses CNN to achieve “end-to-end” detection. Input is an original image, and output is a
detection result without external tools or processes for feature extraction and candidate box
generation. In the object prediction phase, the network outputs a set of candidate
rectangular boxes containing locations and category scores. The penultimate rectangular
box in the graph represents the aggregate processing of network detection results. Because
of the large number of candidate rectangular boxes and the overlap of multiple rectangular
boxes, a few rectangular boxes of high quality need to be filtered through post-processing.
The usual method is non-maximum suppression. Non-maximum suppression is used to
ensure the target object is detected with only one predicted (or proposal) bounding box.
Although faster RCNN and SSD were proposed in 2015 and 2016, respectively, they are
still the basis of several object detection algorithms, which is very rare in the rapidly
changing field of deep learning. This suggests that they are worth discussing its application
in construction object detection. Girshick et al. (2014) compared three detectors, faster
RCNN, region-based fully convolutional networks (R-FCN) and SSD with six feature
extractors VGG-16, Resnet-101, Inception V2, Inception V3, Inception Resnet (V2) and
MobileNet. The results reflected that SSD is the quickest and faster R-CNN has the best
performance generally. A major advantage of faster RCNN is that it can greatly improve the
speed of generating bounding boxes (Ren et al., 2016). An advantage of SSD is the quick
image-processing capability. However, little research has been implemented for monitoring
modular objects with these two methods. Therefore, they were selected for their reliability
and validity. The comparison of faster RCNN and SSD is necessary to help engineers and
researchers select which one to apply in construction projects. Hence, this research selected
these two algorithms to compare in terms of efficiency and accuracy.
Based on the previous studies, few two-dimensional (2D) image data sets of construction
objects were recorded and shared online for research, though data sets of several objects can
be downloaded online, such as vehicles, plants and humans (Lin et al., 2014; Imagenet, 2019).
Due to the lack of related research and databases, the detection of construction objects is still
in the initial stage. Although deep learning research related to object detection in different
aspects of building construction has been performed, there is a limited focus on modular
objects. Modular objects exist in nearly all construction objects, but construction engineers
sometimes omit the monitoring of them. Hence, an online free access database of
construction objects should be built for promoting the development of object detection in the
construction industry, especially modular objects. Due to the lack of image data, there is also
a lack of evaluation of the performance of object detection in modular objects. Thus, it is
necessary to evaluate the performance of object detection for modular objects to come to a
more thorough understanding of the AI application in construction.
CI 3. Methodology
To improve the efficiency and accuracy of monitoring safety and modular installation
progress on construction sites, the modular objects should be detected and tracked.
However, most detection and records are manual, and automated construction modular
object detection research is not enough. To resolve the issue, two deep learning object
detection algorithms are applied. This paper tests the performance of the selected
algorithms in construction applications.
This study detects three different but common modular objects in construction, including
barricades, construction site fences and ceiling panels, for the detection analysis. Data are
collected from four construction sites in Sydney, including the University of New South
Wales (UNSW), Dee Why region, Anzac Parade and Sydney city sites. Faster RCNN and
SSD are selected for training models in TensorFlow. Trained object detection models were
evaluated across the following metrics with test data sets:
average recall (AR) (Cyganek, 2013) and the mean average precision (mAP) (Ren
et al., 2016) based on image pixels; and
precision and recall based on counting numbers (Fang et al., 2018; Jiang et al., 2019;
Susutti et al., 2019).
3.1 Algorithm implementation procedure

Figure 1 shows a procedure including four main steps for implementing object detection
algorithms. First, image data were taken from different construction sites. In the second
step, object detection models were built in the TensorFlow platform. Configuring the
computer development environment, deep learning platform (TensorFlow) and object
detection API were the three main tasks in this step. Third, the models based on chosen
object detection algorithms were trained on the training data set and adjusted based on
validation data sets to improve the generalisability of the object detection algorithm to
unseen data. Two models based on faster RCNN and SSD were trained on the labelled
images. The number of images, the chosen algorithm and training steps influenced detection
accuracy. The last step was to analyse the performance-trained models across the selected
metrics.
3.2 Data preparation

Selected modular objects included safety barricades, fences and panels. These three types of
construction objects were easier to access than other objects, such as steel structure and
prefabricated composite walls, and were commonly found at construction sites. Considering
the limitation of time and construction sites, only one shape of each kind of object was
1. Dataset preparation 2. Platform building 3. Model training 4. Analysis
Figure 1. Development Labelling with mAP & AR based on

Flowchart of the four environment bounding box pixels
Data collection
stages of object
TensorFlow machine Precision & recall by
detection: data set Choosing algorithm
learning platform counting numbers
preparation, platform Training Validation Test
building, model sett set set
Object detection API Training model Loss
training and analysis
selected for this object detection research. After pre-processing, the image data were divided Object
into training, validation and test sets. This research selected faster RCNN and SSD for object detection
detection. They are two of the most common algorithms in computer vision (Cyganek, 2013;
Jiang et al., 2018).
In this study, 990 images covering three types of objects were chosen for all training,
validation and testing samples. The number of images from the barricade, fence and panel
were 342, 504 and 144, respectively. The number of images for each object detection was
also guided by earlier investigators examining object detection and was limited to 500. For
instance, Mustamo (2018) applied 200, 50 and 50 images for training, validation and test
data sets to a people-oriented detection test, and also applied 100, 50 and 50 images to a
different people-oriented detection test. Phadnis et al. (2018) used around 150 images for
each detected object, such as cell phones. In addition, the number of panels was lower than
the number of barricades and fences. The reason was that panel images of the target type
were harder to find and collect than those of the other two because only the Dee Why
construction site used that type of panels. Hence, there were fewer images in panel data sets
than fence and barricade sets.
3.3 Platform and model building

This study is implemented based on TensorFlow platform, which contains several deep
learning algorithm packages for convenience of programmers. Faster RCNN and SSD
models are trained and validated after the installation of TensorFlow. Microsoft Common
Objects in Context (COCO) pre-trained faster RCNN and SSD models were chosen instead of
the training-from-scratch approach. A pre-trained model was applied to save training time
(Mustamo, 2018) and compensate for not having a big data set. The use of the COCO data set
resulted in a straightforward fine-tuning of the parameters for the training set.
4. Metrics and computation methods

Two classifications of metrics evaluated different aspects of the trained models. The first
two metrics are AR and mAP calculated with pixels. One of the reasons for selecting them is
that the TensorFlow platform provides related codes for calculating AR and mAP to save
programming time. The second reason is that they are commonly used in previous studies
(Girshick et al., 2014; Ren et al., 2016; Redmon et al., 2016) but few in modular objects. The
last two metrics are precision and recall that are based on counting numbers. They were
applied as evaluation metrics in previous studies (Fang et al., 2018), so this research selected
precision and recall for increasing the reliability of evaluation with more metrics.
In the first classification, three types of AR, including AR@1, AR@10 and AR@100
were selected for evaluating the performance of the selected algorithms because these have
been used in previous studies in computer science (Cyganek, 2013). For example, Pinheiro
et al. (2015) selected AR@10, AR@100 and AR@1000 to judge model performance. Lin et al.
(2017) analysed AR@100 and AR@1000 to evaluate the performance of their proposed
model. “AR@1” means AR with one detection per image; “AR@10” means AR with ten
detections per image; and “AR@100” means AR with 100 detections per image
(TensorFlow, 2019).
AR is the average of all recalls with different detections. The closer to 1 the AR value, the
better the performance of the trained model. Recall was calculated, as shown in equation (1).
Recall reflects the proportion of all positive elements with a correct classification whose
detected elements in this research are predicted bounding boxes. Based on the relationship
of predicted bounding boxes and ground truth bounding boxes, four situations exist,
including true positive (TP), true negative (TN), false positive (FP) and false negative (FN):
CI TP
Recall ¼ (1)
TP þ FN
Besides AR, three types of mAP were evaluated in this research, including mAP,
mAP@0.50IOU and mAP@0.75IOU. The meaning of each type is explained as follows:
“mAP” means mean average precision over classes averaged over intersection over union
(IOU) thresholds of [0.50:0.05:0.95]; “mAP@0.50IOU” means mAP with IOU higher than
0.50; “mAP@0.75IOU” means mAP with IOU higher than 0.75. Thus, mAP@0.75IOU is a
stricter metric than mAP@0.50IOU. IOU in the image is the proportion of “the area of
overlap of two images” divided by “the area of union of two images”:
Area of Overlap
IOU ¼ (2)
Area of Union
This research selected mAP because it is widely applied to analyse the performance of
trained models in earlier object detection studies. For instance, Lin et al. (2017) used mAP
and mAP@0.50IOU to compare feature pyramids of different methods. Mordan et al. (2017)
used mAP, mAP@0.50IOU and mAP@0.75IOU together, to compare their proposed
algorithms with other algorithms.
The following equations (3) and (4) explain the meaning of mAP. Precision represents the
proportion of real detected area of the selected object in the results of detection. The
meanings of TP and FP in equation (3) are the same as in equation (1):
TP
Precision ¼ (3)
TP þ FP
The general definition for the AP is the area under the precision-recall curve above. The
x-axis is the recall, and the y-axis is the precision:
ð
1
AP ¼ pðrÞdr (4)
0
Where p(r) means the precision of the recall.

The term mAP is the mean of AP of all detected objects. In other words, AP is for one
single object, and mAP is for all objects:
mAP ¼ AVGð AP for each object classÞ (5)
where AP takes the average value of the precision across all recall values.
The formula of AP (mAP) in the COCO official data set is:
X
AP ðIOUth ¼ thÞ
th¼f0:5;0:55;...;0:95g
AP ½:50 : :05 : :95 ¼ (6)
10
where APðIOUth ¼ thÞ means that calculating the AP when IOU threshold equals to “th”.
The trained models were tested based on the images in the test data set. The test images Object
had similar quality of images to those in the training data sets; otherwise, precision would detection
decrease.
The loss function is designed to evaluate the performance of an ML algorithm (Mitchell,
1997). Loss calculation codes in this research were applied for calculating the value of the
loss function with predicted results (Jianshu, 2018). Model training was optimised based on
the loss function.
5. Results and findings

Table 1 shows the results of four experiments using both CPU and GPU. To start, the
training-from-scratch approach was used. Training from scratch means training without
pre-trained models. All data and weights in the programme codes are prepared from scratch.
Faster RCNN was used for the experiment comparing the results of training, which can be
used for selecting the appropriate hardware.
The first three experiments in Table 1 (Nos. 1 to 3) were carried out to compare two
factors, time needed for training and the influence of the number of training images. The
reason for choosing fence objects was that the background, including walls and grass of
the fences at the selected construction sites, is complicated (such as Figure 2(e)). To shorten
the experiments, a workstation with GPU was used. The number of training steps and the
training object (i.e. the fence) were identical to the first experiment to yield comparable
results. This experiment was carried out in 2.5 h, which was very fast and enabled the
investigators to run all the required experiments in an efficient way. Thus, GPU was used
for running the rest of analysis, including the training models in subsequent experiments in
this research.
In addition, the COCO pre-trained model replaced the training-from-scratch approach.
This study used the validation data set to monitor the model training to avoid overfitting.
The training steps shown in Table 2 are chosen with considering the validation. The results
of AR and mAP in this research were obtained based on these training steps. As the models
and detected objects were different, the training steps were different.
Table 2 also shows that mAP, mAP@0.50IOU, mAP@0.75IOU values of SSD are all
lower than those of faster RCNN for detecting all three types of objects. In addition, all
values of mAP are lower than mAP@0.50IOU and mAP@0.75IOU. Similar to AR, the
higher value of mAP represents the higher accuracy of the trained model. Hence, the results
show that faster RCNN offers higher accuracy than SSD for the detection of these objects, no
matter which type of AR or mAP.
The results of detecting different objects by counting are presented in Table 3. The table
shows that the recall value of faster RCNN is higher than that of SSD in each object detection
experiment. The precision of SSD for the three objects is higher than those of faster RCNN.
The details of the results are presented in the following paragraphs with six example images
chosen from the test data set.
TensorFlow No. of images in data set Training

No. version Object Training Validation Test Training steps duration Table 1.
1 CPU Fence 148 25 8 38,848 8 days Details of
2 GPU Fence 148 25 8 38,848 2.5 h experiments
3 GPU Fence 471 60 8 38,848 7h including fence and
4 GPU Barricade 503 66 8 38,848 7h barricade objects
CI
Figure 2.
Sample results for the
test images
The results show that the background of the images may decrease both precision and recall Object
where the objects are selected in construction sites. For example, the fences’ mesh misleads detection
the algorithm to detect some lines or shapes in the background of the image in different
locations. Figures 2(a)–(d) show examples of the results of the barricade tests. In Figure 2(a),
all barricades have been detected with the faster RCNN method. The result of the SSD model
is shown in Figure 2 (b). The confidence scores of the two detected barricades based on the
SSD method are lower than those based on the faster RCNN method. The faster RCNN
model can detect small objects better than SSD. In addition, faster RCNN has more FP
errors, as shown in the comparison of Figures 2(c) and (d). Yellow barricades that are not the
targeted objects are detected based on faster RCNN, as shown in Figure 2(c). However, it can
also be argued that depending on the goal, colour-independent models are also useful. Two
examples of fence detection are shown in Figures 2(e)–(h). Example I, shown in Figures 2(e)
and (f), was taken on one of the construction sites for the education building.
Due to the difference of the image backgrounds between Example II of the fence (Figures
2(g) and (h)) and the fence training data set, the results of detecting the locations of fences are
poor for both faster RCNN and SSD methods. Faster RCNN detection shows more FP errors
than SSD. For instance, one FP error exists in the upper-left corner of Example I (Figure 2(i)).
Compared with faster RCNN, the SSD results missed the panel in the bottom right corner
shown in Figure 2(j). The biggest top-left corner panel was detected but with the wrong
place label. Hence, faster RCNN performs better than SSD in panel detection. Figures 2(k)
and (l) indicate that both faster RCNN and SSD models can detect vertical or horizontal
Barricade Fence Panel

Metrics Faster RCNN SSD Faster RCNN SSD Faster RCNN SSD
Training steps 10,000 15,678 1,500 873 4,000 4,300

Metric 1 AR@1 0.8389 0.8278 0.8279 0.6885 0.1926 0.1588
AR@10 0.9208 0.9097 0.9049 0.7328 0.7956 0.6118
AR@100 0.9208 0.9097 0.9049 0.7328 0.8015 0.6118 Table 2.
Metric 2 mAP 0.9017 0.8876 0.8743 0.6918 0.7474 0.5388
mAP@0.50IOU 0.9996 0.9897 0.9981 0.9955 0.9875 0.9090
Results of faster
mAP@0.75IOU 0.9996 0.9886 0.9981 0.9584 0.9354 0.6590 RCNN and SSD
across AR and mAP
Note: Following paragraphs with six example images chosen from the test data set based on pixels
Object Barricade Fence Panel
Construction site Anzac City Anzac Education Commercial

building building
Total number of 92 89 42 41 119
objects
Algorithm Faster SSD Faster SSD Faster SSD Faster SSD Faster SSD
RCNN RCNN RCNN RCNN RCNN
Table 3.
TP 87 52 75 47 33 27 24 23 117 96
FP 18 8 8 3 11 4 12 6 25 8 Results of faster
FN 5 40 14 42 9 15 17 18 2 23 RCNN and SSD
Precision 0.83 0.87 0.90 0.94 0.75 0.87 0.67 0.79 0.82 0.91 across recall and
Recall 0.95 0.57 0.84 0.53 0.79 0.64 0.59 0.56 0.98 0.62 precision by counting
CI panels accurately, which means TP is high. FP also exists in images with vertical or
horizontal panels, and faster RCNN has a higher FP level.
The analysis of barricade loss lines shows that the training processes for faster RCNN
and SSD are different. Figure 3(a) shows that the training steps of the faster RCNN method
for barricades may not need 40,000 steps. The loss line turns nearly horizontal at around
10,000 steps. Compared with faster RCNN, the SSD curve is still decreasing after 10,000
steps. The loss lines of fence detection (Figures 3(c) and (d)) show that the SSD loss line will
become parallel to the x-axis, but the faster RCNN curve may need more steps before it
becomes parallel. The loss lines for panel detection (Figures 3(e) and (f)) had similar lines to
those for barricade detection (Figures 3(a) and (b)). The faster RCNN line turns to parallel at
around 15,000–20,000 steps, while the SSD line continues to be volatile and may need more
steps.
Therefore, it can be concluded that the loss lines for different objects depend on both the
object geometry and the image background. For example, the loss line of faster RCNN for
the fence is more volatile than that for the barricade or panel. The reason is probably
connected with the backgrounds of the fence images being busier because of the meshed
nature of many fences. It can be concluded that the backgrounds of the detected objects and
shapes of objects can influence training accuracy.
Value of loss
Value of loss
Training steps Training steps

(a) Faster RCNN for barricade (b) SSD for barricade
Value of loss
Value of loss
Training steps Training steps

(c) Faster RCNN for fence (d) SSD for fence
Value of loss
Value of loss
Figure 3.
Training losses for Training steps
Training steps
the three objects
(e) Faster RCNN for panel (f) SSD for panel
6. Discussion and contributions Object
6.1 Discussion detection
As the results indicate that these two types of metrics show different results for the three
types of objects, the following paragraphs will discuss these two types of metrics,
respectively.
The metrics based on pixels are discussed first. The results of AR suggest that faster
RCNN and SSD for the detection of barricades and fences performed well, but panel
detection was poor based on a comparison with previous studies. Lin et al. (2017) applied the
feature pyramid network in a faster RCNN system, and the value of AR@100 was 44.0%
tested on a COCO data set. Pinheiro et al. (2016) evaluated different models with AR ranging
from 35.5–39.3%. The values of AR for barricade and fence detection in this paper were all
higher than these studies. However, the highest values among the three types of AR for
faster RCNN and SSD for panel detection were 80 and 61%, respectively, shown in Table 2.
These two values are even lower than the lowest values for barricade and fence detection.
The mAP also reflected different performances of the two algorithms in Table 2.
Compared with previous studies, the performance of these two algorithms is acceptable
based on an evaluation of mAP. Girshick et al. (2014) reported the values of mAP in their
research 53.7% on the PASCAL VOC 2010 data set. The evaluation value of mAP was
35.1% in the experiment implemented by Uijlings et al. (2013). Ren et al. (2016) evaluated the
performance of faster RCNN on the PASCAL VOC 2007 data set and PASCAL VOC 2012
data set with the values of 73.2 and 70.4%, respectively. Redmon et al. (2016) applied YOLO
with a value of 63.4% of mAP on the PASCAL VOC 2007 data set. All these values are lower
than the values of mAP obtained in this research. Hence, the performance of faster RCNN
and SSD in this research is acceptable.
Compared with previous studies, the second type of metrics shows that the performances
of both algorithms are acceptable for barricades, fences and panels based on the values of
precision and recall by counting. As the authors have not come across similar research for
detecting these three objects before, the algorithms’ performance is compared for the
selected objects. For example, Fang et al. (2018) mentioned that the values of detection
precision of workers and excavators based on their research were 91 and 95%, and recalls
were 79% and 81%, respectively. Their values are similar to the values in this research.
The performance of faster RCNN and SSD for barricade detection is always good, no
matter which algorithm is used. The precision of the two methods for barricade detection is
similar, but the recall of faster RCNN is higher than SSD. Hence, it could be concluded that
faster RCNN performs better than SSD for barricade detection. As for fence detection, faster
RCNN performs better because its precision and recall are higher than with SSD. Besides,
faster RCNN had fewer FN errors for fence detection. The third object is the ceiling panel.
The precision of the SSD algorithm is better, while the recall level of faster RCNN is higher.
Faster RCNN has fewer FN errors, while SSD has fewer FP errors.
The result of the precision and recall analyses on the testing samples in this research
gives an in-depth understanding of the detection of differently shaped objects and helps to
select an appropriate method for monitoring project progress when modular objects are in
use. The results reflect that faster RCNN and SSD both have their own advantages and
disadvantages for the detection of the three objects. The results also revealed that the
detection quality depends on different pixels, shapes or the background of object images.
For example, the images that have fewer pixels take less time for training. Images with
different backgrounds and shapes result in different loss lines. Hence, the choice of the
algorithm should be based on the specific situation. The results show that the values of AR
and mAP for faster RCNN were higher than the values of the same metrics for SSD in
CI detection experiments of the three types of objects (Table 2). The FP and FN errors with
precision and recall for test data sets were also analysed, showing that faster RCNN has
higher recall values of 0.95, 0.84, 0.79, 0.59 and 0.98 compared with SSD of0.57, 0.53, 0.64,
0.56 and 0.62 in all construction sites (Table 3). However, SSD performed better in terms of
precision evaluation. Hence, it could be stated that the choice of algorithms depends on the
context and shape of the detected object.
The quality of some images for small object detection is inadequate for all three target
objects. Although faster RCNN performs better than SSD in small object detection, some
small objects still cannot be detected, such as the examples in Figure 2. Despite the
disadvantages of algorithms, this situation may be due to the lack of small object images in
the training data set. After observing several construction sites in Sydney, their
surroundings are full of roads and buildings. It was impossible to collect image data for
selected objects, which were far away or were blocked by structures or buildings. Hence,
there are few examples of small panels in the data set. This situation was especially serious
for panel data collection. Panels were next to each other in ceilings, so the photos were only
taken under the panels. Thus, further work should consider how to improve the detection of
small objects for the two algorithms, especially SSD.
6.2 Contribution
The main theoretical contributions of this research are threefold: the research as extended
the application of object detection technology to modular object detection in construction,
which has not been recorded in the literature; the study has carefully examined the
application of the chosen algorithms in the construction context by using more quantified
metrics; the research has compiled first-hand data sets from selected construction sites over
a 12-month period, thus providing a valuable data set for construction scholars because
there are limited data sets for modular object detection from construction sites. Each of the
main contributions is explained as follows. The findings shed new light on automated
monitoring of modular objects based on computer vision technology.
The first theoretical contribution of this research is to analyse information about
modular objects applying object detection algorithms. Previous studies ignored the effect of
similar items around a targeted object on the precision of the detection. For example, their
selected sample objects were limited to large volumetric shapes, complicated and irregular
forms such as construction workers (Fang et al., 2018; Kim et al., 2019), hardhats (Mneymneh
et al., 2017; Mneymneh et al., 2018), excavators (Soltani et al., 2016; Kim et al., 2019) and
trucks (Kim and Kim, 2018). By contrast, this study used smaller modular objects such as
rectangular panels, safety barricades and construction fences. This research also extended
object detection applications to modular objects.
The second theoretical contribution is that this research applied more metrics for
evaluating object detection than previous studies. Different metrics were used for evaluating
the performance of the algorithms. This is because the performance of detection algorithms
cannot be judged by one metric alone (Alsing, 2018; Fang et al., 2018; Mustamo, 2018).
Specifically, in the object detection experiments, this research evaluated object detection
using mAP and AR based on pixels, and precision and recall by counting, which is an
advance on other studies (Mustamo, 2018). Hence, this research has conducted a
comprehensive study on the performance of two object detection algorithms to identify the
better performing algorithms. Hence, it can be a reference for related research and help to
increase the rigour of future analysis methods.
The third theoretical contribution of the research is that it used original images taken
from real construction sites for appraising the performance of the detection algorithms for
modular objects. Due to the lack of an online free construction object image data set (COCO, Object
2019; Imagenet, 2019; Open-Images, 2019), image data collection is an important factor that detection
needs to be considered for object detection in construction activities. Although previous
studies and online data sets discussed and provided methods of detecting various types of
objects (Lin et al., 2014; Imagenet, 2019), little is known about the detection of modular
construction objects. As limited data is often a problem in the image processing field of
computer science, especially for construction objects, this research collected images of
modular objects for edge and object detection in construction. This provides a valuable
benchmark for research for detecting construction objects, especially modular objects. Thus,
the present study lays the groundwork for future research into construction object detection.
The details of the practical contribution are stated as follows. First, the results show that
the location of an area of target objects can be detected. This can improve the efficiency of
construction safety monitoring. One contribution of this work is to document information
about construction objects and how to detect them. Modular objects are items that should be
considered for construction monitoring because failing to check their position on site has
resulted in several accidents (SafeWork Australia, 2016). Departing from previous object
detection studies that only considered safety from the construction workers’ viewpoint
(Fang et al., 2018), this research has investigated different algorithms for monitoring four
different construction modular objects. For instance, detection of temporary modular
structures, such as barricades and fences, can help construction engineers update the
drawings and three-dimensional (3D) models of their location in a construction project, and
construction engineers will notice hazards in time if they apply these methods off site.
Second, this research could help to improve productivity with object installation progress
monitoring. After detecting selected target objects, construction employees can count the
number of objects for installation with productivity evaluated based on the number and
know where the new installations are located. Thus, the site manager can monitor progress
through AI, which can save both time and money.
6.3 Implication
The methods of object detection in this research can be used for improving safety,
monitoring object installation progress on construction sites. First, the methods in this
research can be applied to safety monitoring. Based on this research, object detection can
show the placement of different objects and whether they are in their correct position or not.
This will help project managers to document temporary modular structures, fencing and
barricade works. The documentation will help to update the digital as-built models or
drawings to cover areas that might be identified as risky areas, and construction progress
can also be visualised. The documentation and visualisation processes will contribute to
improving safety practices and risk analysis in construction projects. Thus, the application
can help engineers to update drawings or 3D models to identify risk areas over time.
Second, the methods help to monitor construction work progress by detecting the
number of installed modular objects such as panels. Object detection can show which area
on a construction site has been completed to help monitor how much of the workload has
been finished. This information is then recorded on the daily data sets that the project
managers receive from their construction sites. For example, this research would be useful
for construction practitioners because they can identify the location of fences quickly and
estimate installation progress (i.e. number of fences per day) of installation directly based on
detection by trained models. The methods presented in this research can also be applied to
measure the installation progress of other construction objects in different projects. To
apply these methods, a set of digital images would be required, which could be collected by
CI less skilled labours from construction sites during installation. Hence, object detection
technologies can assist in object installation.
7. Conclusion
This study aimed to focus on monitoring modular objects and evaluating the performance of
algorithms for modular object detection in the construction context. In the coming years,
object detection will still be a mainstream research direction in this field. Unlike
conventional construction safety and installation progress monitoring methods using
manual records, deep learning can automatically learn task-related features through end-to-
end training and obtain a high-level abstract representation of images through multi-layer
non-linear transformation.
Compared with conventional object detection research (Alsing, 2018), the theoretical
contribution of this research lies in evaluating two popular object detection algorithms to
analyse images of modular objects in construction, including barricades, fences and ceiling
panels. The practical contribution of this research is that it can help construction engineers
to monitor construction projects for safety and object installation progress.
As with other studies that mainly investigated a limited number of objects, this paper is
also limited to the evaluation of three main modular objects on construction sites. Future
studies should cover more objects with different shapes using the procedures used in this
paper to provide a more extensive comparison base for construction practitioners. In
addition, the specifications of the workstations may affect processing speed, so high-
performance computers are recommended for future studies.
The number of images used in this experimentation was limited to 990 due to the
accessibility of construction sites. While the number of images used for similar image
processing experiments in construction is limited to 500 in the literature, some scholars in
other disciplines have used over 1,000 images for detection purposes. Hence, the number can
be increased in the future if the situation allows the researcher to collect the data.
One more point to be considered in future studies is the choice of the threshold of IOU
when calculating the precision and recall by counting the number of detected objects. The
previous literature did not discuss methods of deciding thresholds because the literature
only focused on the framework of the whole system that can be applied in construction.
In the future, risk ratings for projects can be potentially computed based on object
detection. Moreover, the application of object detection can be used for generating data
related to a project, and then elements of real-time training can be implemented to enhance
skills and improve team leadership.
An automated system is suggested to be developed because it may assist as-built
creation based on the actual installation done at the site. There is a need to update the
current shop drawings and create digital versions of as-built drawings for clients. The
results of the detection algorithms will help in drafting as-built models and updating 2D
drawings.
References
Ahmadi-Karvigh, S., Becerik-Gerber, B. and Soibelman, L. (2019), “Intelligent adaptive automation: a
framework for an activity-driven and user-centered building automation”, Energy and Buildings,
Vol. 188-189, pp. 184-199, doi: 10.1016/j.enbuild.2019.02.007.
Alsing, O. (2018), Mobile Object Detection Using TensorFlow Lite and Transfer Learning, DiVA.
Blanco, J.L., Fuchs, S., Matt, P. and Ribeirinho, M.J. (2018), “Artificial intelligence: construction
technology’s next frontier”, Building Economist, pp. 7-13.
Chui, M. and Malhotra, S. (2018), “Notes from the AI frontier: AI adoption advances, but foundational Object
barriers remain”.
detection
COCO (2019), “coco api”, available at: https://github.com/cocodataset/cocoapi (accessed 5 July 2020).
Crystal Market Research (2018), “Artificial intelligence (AI) in construction industry by technology
(natural language processing and machine learning and deep learning) by stage (pre-
construction, construction stage and post-construction) by component (solutions and services)
by application (project management, risk management, field management, supply chain
management and schedule management) by deployment type (on-premise and cloud) - global
market analysis and forecast to 2025”, July 2018, available at: www.crystalmarketresearch.com/
report-sample/IC071084 (accessed 5 July 2020).
Cyganek, B. (2013), Object Detection and Recognition in Digital Images: Theory and Practice, John Wiley
and Sons.
Doxel (2018), “Introducing artificial intelligence for construction productivity”, Medium, available at: https://
medium.com/@doxel/introducing-artificial-intelligence-for-construction-productivity-38a74bbd6d07
Du, Z., Ge, L., Ng, A.H.M., Zhu, Q., Yang, X. and Li, L. (2018), “Correlating the subsidence pattern and
land use in Bandung, Indonesia with both sentinel-1/2 and ALOS-2 satellite images”,
International Journal of Applied Earth Observation and Geoinformation, Vol. 67, pp. 54-68.
Fang, W., Ding, L., Luo, H. and Love, P.E. (2018), “Falls from heights: a computer vision-based
approach for safety harness detection”, Automation in Construction, Vol. 91, pp. 53-61.
Fang, W., Ding, L., Zhong, B., Love, P.E. and Luo, H. (2018), “Automated detection of workers and
heavy equipment on construction sites: a convolutional neural network approach”, Advanced
Engineering Informatics, Vol. 37, pp. 139-149, doi: 10.1016/j.aei.2018.05.003.
Fathi, H., Dai, F. and Lourakis, M. (2015), “Automated as-built 3D reconstruction of civil infrastructure
using computer vision: Achievements, opportunities, and challenges”, Advanced Engineering
Informatics, Vol. 29 No. 2, pp. 149-161, doi: 10.1016/j.aei.2015.01.012.
Girshick, R., Donahue, J., Darrell, T. and Malik, J. (2014), “Rich feature hierarchies for accurate object
detection and semantic segmentation”, Proceedings of the IEEE conference on computer vision
and pattern recognition, Columbus, OH, pp. 580-587.
Golparvar-Fard, M., Heydarian, A. and Niebles, J.C. (2013), “Vision-based action recognition of
earthmoving equipment using spatio-temporal features and support vector machine classifiers”,
Advanced Engineering Informatics, Vol. 27 No. 4, pp. 652-663, doi: 10.1016/j.aei.2013.09.001.
Guo, S., Zhang, P. and Yang, J. (2018), “System dynamics model based on evolutionary game theory for
quality supervision among construction stakeholders”, Journal of Civil Engineering and
Management, Vol. 24 No. 4, pp. 316-328, doi: 10.3846/jcem.2018.3068.
Han, C., Gao, G. and Zhang, Y. (2019), “Real-time small traffic sign detection with revised faster-RCNN”,
Multimedia Tools and Applications, Vol. 78 No. 10, pp. 13263-13278, doi: 10.1007/s11042-018-6428-0.
Imagenet (2019), “Imagenet dataset”, available at: www.image-net.org/ (accessed 20 June 2019).
Jiang, H., Cheng, M.M., Li, S.J., Borji, A. and Wang, J. (2019), “Joint salient object detection and existence
prediction”, Frontiers of Computer Science, Vol. 13 No. 4, pp. 778-788, doi: 10.1007/s11704-017-
6613-8.
Jiang, X., Hadid, A., Pang, Y., Granger, E., and Feng, X. (2018), Deep Learning in Object Detection and
Recognition, Springer.
JianShu (2018), “Object detection API source reading notes (in Chinese)”, available at: www.jianshu.
com/p/cc90803f0bcd (accessed 26 July 2019).
Kim, H. and Kim, H. (2018), “3D reconstruction of a concrete mixer truck for training object detectors”,
Automation in Construction, Vol. 88, pp. 23-30, doi: 10.1016/j.autcon.2017.12.034.
Kim, D., Liu, M., Lee, S. and Kamat, V.R. (2019), “Remote proximity monitoring between mobile
construction resources using camera-mounted UAVs”, Automation in Construction, Vol. 99,
pp. 168-182.
CI Kolar, Z., Chen, H. and Luo, X. (2018), “Transfer learning and deep convolutional neural networks for
safety guardrail detection in 2D images”, Automation in Construction, Vol. 89, pp. 58-70, doi:
10.1016/j.autcon.2018.01.003.
Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B. and Belongie, S. (2017), “Feature pyramid
networks for object detection”, Proceedings of the IEEE conference on computer vision and
pattern recognition, pp. 2117-2125.
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollar, P. and Zitnick, C.L. (2014),
“Microsoft COCO: common objects in context”, European conference on computer vision,
Springer, pp. 740-755.
Liu, J., Jennesse, J.M. and Holley, P. (2016), “Utilizing light unmanned aerial vehicles for the inspection
of curtain walls: a case study”, Construction Research Congress, pp. 2651-2659.
Love, P.E., Zhou, J. and Matthews, J. (2019), “Project controls for electrical, instrumentation and control
systems: enabling role of digital system information modelling”, Automation in Construction,
Vol. 103, pp. 202-212, doi: 10.1016/j.autcon.2019.03.010.
Marketsandmarkets (2018), “Modular construction market by type (permanent, relocatable), material
(steel precast concrete, wood, plastic), end-use sector (housing, commercial, education, healthcare,
industrial), and region - global forecast to 2023”, available at: www.marketsandmarkets.com/
Market-Reports/modular-construction-market-11812894.html
Mintz, Y. and Brodie, R. (2019), “Introduction to artificial intelligence in medicine”, Minimally Invasive
Therapy and Allied Technologies, Vol. 28 No. 2, pp. 73-81.
Mitchell, T.M. (1997), Machine Learning, McGraw-Hill, New York, NY.
Mneymneh, B.E., Abbas, M. and Khoury, H. (2017), “Automated hardhat detection for construction safety
applications”, Procedia Engineering, Vol. 196, pp. 895-902, doi: 10.1016/j.proeng.2017.08.022.
Mneymneh, B.E., Abbas, M. and Khoury, H. (2018), “Vision-based framework for intelligent monitoring
of hardhat wearing on construction sites”, Journal of Computing in Civil Engineering, Vol. 33
No. 2, pp. 4018066, doi: 10.1061/(ASCE)CP.1943-5487.0000813.
Mordan, T. Thome, N. Cord, M. and Henaff, G. (2017), “Deformable part-based fully convolutional
network for object detection”, arXiv preprint arXiv:1707.06175.
Mustamo, P. (2018), “Object detection in sports: TensorFlow object detection API case study”, available
at: https://storage.googleapis.com/openimages/web/factsfigures.html (accessed 14 May 2019).
Phadnis, R., Mishra, J. and Bendale, S. (2018), “Objects talk-object detection and pattern tracking using
TensorFlow”, 2018 Second International Conference on Inventive Communication and
Computational Technologies (ICICCT), IEEE, pp. 1216-1219.
Pinheiro, P.O., Collobert, R. and Dollar, P. (2015), “Learning to segment object candidates”, Advances in
Neural Information Processing Systems, pp. 1990-1998.
Pinheiro, P.O., Lin, T.Y., Collobert, R., and Dollar, P. (2016), “Learning to refine object segments”,
European Conference on Computer Vision, Springer, pp. 75-91.
Redmon, J., Divvala, S., Girshick, R. and Farhadi, A. (2016), “You only look once: unified, real-time
object detection”, Proceedings of the IEEE conference on computer vision and pattern recognition,
pp. 779-788.
Ren, S., He, K., Girshick, R. and Sun, J. (2016), “Faster R-CNN: towards real-time object detection with
region proposal networks”, Advances in Neural Information Processing Systems, Vol. 39 No. 6,
pp. 91-99.
Reports and Data (2019), “Artificial intelligence (AI) in construction market by technology (machine
learning and deep learning, natural language processing), by component, by phase, by deployment
type, by applications, by organization size, by end-use, and segment forecasts, 2016-2026”, available
at: www.reportsanddata.com/report-detail/artificial-intelligence-ai-in-construction-market (accessed
14 May 2019).
SafeWork Australia (2016), “Number of serious claims by industry 2000–01 to 2014–15”, SafeWork Object
Australia.
detection
SaveWork NSW (2019), “Falling objects”, New South Wales (NSW) Government, Australia, available
at: www.safework.nsw.gov.au/hazards-a-z/falling-objects
Shirowzhan, S., Sepasgozar, S.M. and Liu, C. (2018), “Monitoring physical progress of indoor buildings
using mobile and terrestrial point clouds”, Construction Research Congress, pp. 602-611.
Siddula, M., Dai, F., Ye, Y. and Fan, J. (2016), “Classifying construction site photos for roof detection: a
machine-learning method towards automated measurement of safety performance on roof sites”,
Construction Innovation, Vol. 16 No. 3, pp. 368-389, doi: 10.1108/CI-10-2015-0052.
Soltani, M.M., Zhu, Z. and Hammad, A. (2016), “Automated annotation for visual recognition of
construction resources using synthetic images”, Automation in Construction, Vol. 62, pp. 14-23,
doi: 10.1016/j.autcon.2015.10.002.
Srewil, Y. and Scherer, R.J. (2013), “Effective construction process monitoring and control through a
collaborative cyber-physical approach”, Working Conference on Virtual Enterprises, Springer,
pp. 172-179.
Susutti, W., Lursinsap, C. and Sophatsathit, P. (2019), “Pedestrian detection by using weighted channel
features with hierarchical region reduction”, Journal of Signal Processing Systems, Vol. 91 No. 6,
pp. 587-608, doi: 10.1007/s11265-018-1361-z.
TensorFlow (2019), “TensorFlow tutorials”, available at: www.tensorflow.org/tutorials (accessed 10
January 2019).
Tripathi, M.K. and Maktedar, D.D. (2020), “A role of computer vision in fruits and vegetables among
various horticulture products of agriculture fields: a survey”, Information Processing in
Agriculture, Vol. 7 No. 2, pp. 183-203.
Uijlings, J.R., Van De Sande, K.E., Gevers, T. and Smeulders, A.W. (2013), “Selective search for object
recognition”, International Journal of Computer Vision, Vol. 104 No. 2, pp. 154-171, doi: 10.1007/
s11263-013-0620-5.
Walsh, S.B., Borello, D.J., Guldur, B. and Hajjar, J.F. (2013), “Data processing of point clouds for object
detection for structural engineering applications”, Computer-Aided Civil and Infrastructure
Engineering, Vol. 28 No. 7, pp. 495-508, doi: 10.1111/mice.12016.
Wang, H., Yu, Y., Cai, Y., Chen, X., Chen, L. and Liu, Q. (2019), “A comparative study of state-of-the-art
deep learning algorithms for vehicle detection”, IEEE Intelligent Transportation Systems
Magazine, Vol. 11 No. 2, pp. 82-95, doi: 10.1109/MITS.2019.2903518.
Yang, J., Shi, Z. and Wu, Z. (2016), “Vision-based action recognition of construction workers using
dense trajectories”, Advanced Engineering Informatics, Vol. 30 No. 3, pp. 327-336.
Yuan, X., Anumba, C.J. and Parfitt, M.K. (2016), “Cyber-physical systems for temporary structure
monitoring”, Automation in Construction, Vol. 66, pp. 1-14.
Zion Market Research (2018), “AI-in-construction market by technology (natural language processing
and machine learning and deep learning): global industry perspective, comprehensive analysis,
and forecast, 2017-2024”, available at: www.zionmarketresearch.com/report/ai-in-construction-
market (accessed 5 July 2020).
Corresponding author
Samad M.E. Sepasgozar can be contacted at: sepas@unsw.edu.au
For instructions on how to order reprints of this article, please visit our website:
www.emeraldgrouppublishing.com/licensing/reprints.htm
Or contact us for further details: permissions@emeraldinsight.com

Applications of Object Detection in

Uploaded by

Copyright:

Available Formats

You might also like

Applications of Object Detection in

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Applications of Object Detection in

Uploaded by

Copyright:

Available Formats

The current issue and full text archive of this journal is available on Emerald Insight at:

2. Object detection algorithms

3.1 Algorithm implementation procedure

3.2 Data preparation

1. Dataset preparation 2. Platform building 3. Model training 4. Analysis

Figure 1. Development Labelling with mAP & AR based on

3.3 Platform and model building

4. Metrics and computation methods

Where p(r) means the precision of the recall.

mAP ¼ AVGð AP for each object classÞ (5)

5. Results and ﬁndings

TensorFlow No. of images in data set Training

Barricade Fence Panel

Training steps 10,000 15,678 1,500 873 4,000 4,300

Object Barricade Fence Panel

Construction site Anzac City Anzac Education Commercial

Training steps Training steps

Training steps Training steps

You might also like