Improved Detection Network Model Based on YOLOv5 for Warning Safety in Construction Sites

International Journal of Construction Management
ISSN: (Print) (Online) Journal homepage: www.tandfonline.com/journals/tjcm20
Improved detection network model based on

YOLOv5 for warning safety in construction sites
Nguyen Ngoc-Thoan, Dao-Quang Thanh Bui, Cuong N. N. Tran & Duc-Hoc

Tran
To cite this article: Nguyen Ngoc-Thoan, Dao-Quang Thanh Bui, Cuong N. N. Tran & Duc-
Hoc Tran (2024) Improved detection network model based on YOLOv5 for warning safety in
construction sites, International Journal of Construction Management, 24:9, 1007-1017, DOI:
10.1080/15623599.2023.2171836
To link to this article: https://doi.org/10.1080/15623599.2023.2171836
Published online: 09 Feb 2023.
Submit your article to this journal
Article views: 439
View related articles
View Crossmark data
Citing articles: 2 View citing articles
Full Terms & Conditions of access and use can be found at

https://www.tandfonline.com/action/journalInformation?journalCode=tjcm20
INTERNATIONAL JOURNAL OF CONSTRUCTION MANAGEMENT
2024, VOL. 24, NO. 9, 1007–1017
https://doi.org/10.1080/15623599.2023.2171836
Improved detection network model based on YOLOv5 for warning safety in

construction sites
Nguyen Ngoc-Thoana, Dao-Quang Thanh Buib,c, Cuong N. N. Trand and Duc-Hoc Tranb,c
a
Faculty of Building and Industrial Construction, Hanoi University of Civil Engineering (HUCE), Ha Noi, Viet Nam; bFaculty of Civil Engineering,
Ho Chi Minh City University of Technology (HCMUT), Ho Chi Minh City, Vietnam; cVietnam National University Ho Chi Minh City, Linh Trung
Ward, Ho Chi Minh City, Vietnam; dFaculty of International Business and Economics, University of Economics and Business, Vietnam National
University, Hanoi, Vietnam
ABSTRACT ARTICLE HISTORY

The safety of worker guarantee is a crucial task in construction site management. Many accidents occur Received 6 November 2022
in construction sites by falling, collisions, electrocutions, or being stuck in operating devices. The suitable Accepted 19 January 2023
personal protective equipment (PPE) stated in safety rules is widely used to ensure workers’ safety. The
use of PPE is relied on traditional methods such as physical monitoring and video observation that waste KEYWORDS
classification; computer
time, poor timeliness, and missed inspections. To overcome these limitations, this study utilized newly vision; construction safety;
You Only Look Once (YOLO) algorithm, named YOLOv5, which includes four network structures, namely object detection; YOLO
YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x for safety detection. A data set with 11978 samples was
used to establish a digital safety monitoring system via training and testing phases. The comparison
results among the four models show that the YOLOv5s performed the best and the average detection
speed reached 110 frames per second, which fulfils the real-time detection requirements. This study con-
tributes to the state of the knowledge by (i) providing a one-step solution for the automatic identification
the PPE on construction sites; (ii) proposing a valuable tool to assist site safety engineer in the task of
automatically detecting the PPE worn by construction workers; and (iii) the effectiveness and superiority
of the presented approach are demonstrated via large detection dataset with 11978 samples and real
construction case.
HIGHLIGHTS
1. Providing a one-step solution for automatic identification the PPE on construction sites in contrast to
widely used multi-phase hardhat wearing detection methods
2. Introducing four network structures of new YOLO version named as YOLOv5s, YOLOv5m, YOLOv5l,
and YOLOv5x for automatic detection of the PPE worn by construction worker.
3. Constructing a new PPE detection dataset with 11978 samples that cover considerable variations.
Introduction workers sometimes disregard the mandatory safety requirements

at construction sites for a variety of reasons. As a result, current
Construction site works are among the most hazardous jobs in safety control practices rely heavily on detectors’ manual moni-
industrial sectors. Although construction involves a strong need toring and reporting (Lee et al. 2021).
for workers, digitalization is rarely applied to change productivity Several researchers have proposed methods for automatically
via automation and artificial intelligence (Wu et al. 2019; Chung detecting personal protective equipment in order to enhance the
et al. 2020). According to the United States Bureau of Labor quality of safety inspection jobs (Akinlolu et al. 2022; Liu et al.
Statistics, the construction industry has a higher rate of fatal 2022; Yap et al. 2022). The vision-based techniques almost dom-
occupational accidents than the national average (Bureau of inate the costly sensor-based solutions (Kelm et al. 2013;
Labor Statistics 2020). Construction accidents account for nearly Shrestha et al. 2015; Fang et al. 2020; Guo et al. 2021). Typically,
30% of all workplace accidents in the United Kingdom, while in the construction site surveillance camera serves as a frame, and
China, the number of fatal injuries in housing and municipal vision-based approaches execute hardhat wearing detection
construction projects is expected to reach nearly 1000 cases by through several phases, including hardhat localization and identi-
2021 (Development and M.t.M.o.H.a.U.-R 2017). Traumatic fication, which primarily include human movement identification
brain injuries are responsible for approximately 24% of all con- (Fang et al. 2018). Despite numerous extensive studies, vision-
struction fatalities. As a result, procedures and policies are based methods for detecting PPE remain complicated (Wu and
needed to ensure construction safety (Bhagwat and Delhi 2021). Zhao 2018; Paneru and Jeelani 2021). The current study was car-
Personal protective equipment (PPE) such as hardhats, gloves, ried out in specific scenes, which pose difficulties in generalising
vests, and safety goggles is an essential tool for risk management. to other construction sites with high variability in background
However, despite having been trained and educated in advance, and pedestrian states (Deng et al. 2022a). The surveillance
CONTACT Duc-Hoc Tran tdhoc@hcmut.edu.vn Faculty of Civil Engineering, Ho Chi Minh City University of Technology (HCMUT), 268 Ly Thuong Kiet Street,
District 10, Ho Chi Minh City, Vietnam
ß 2023 Informa UK Limited, trading as Taylor & Francis Group
1008 N. NGOC-THOAN ET AL.
camera has trouble distinguishing between small objects and Related works
background noise and overlapping instances. Furthermore, some
Previous studies on the automatic detection of PPE presence can
samples may have a similar image region or partial occlusion
be classified into two categories: sensor-based and vision-based
that makes detection difficult. Finally, there are no datasets avail-
detection (Nath et al. 2020). Sensor-based methods consist of
able for developing and testing models for detecting PPE in a
installing a set of sensors and analysing the signals using data
variety of situations (Rao et al. 2022).
collected from the sensors. These methods focused on remote
The majority of recent vision-based methods are only con-
tracking systems like wireless local area networks (WLANs) and
cerned with detecting hardhats on construction sites. Other
radio frequency identification (RFID) (Dong et al. 2015; Zhang
equipment, such as safety vests, gloves, safety goggles, and steel et al. 2019). The RFID labels are stuck on each PPE component.
toe shoes, should be monitored in addition to the hardhat to The scanner at the entrance gates checks the tags to monitor the
ensure worker safety. A variety of studies has been conducted to presence of wearing proper PPE. Kelm, Laußat (Kelm et al.
identify multi-class PPE (Nath et al. 2020). Open-access commer- 2013) proposed a dynamic RFID technique for automated and
cial software, named smartvid.io, is based on deep learning for rapid detection of the presence of PPE in workers. Dong, He
multiple PPE components detection. The real time detection (Dong et al. 2015) developed real time location system (RTLS)
methodology of PPE is considerable significance to assure safety; and virtual construction for automatic detection and assessment
however, it is scant in the literature (Xie et al. 2018). Recent the use of PPEs. Zhang, Yan (Zhang et al. (2019) applied an
studies considered the real time term as processing at least 5 internet of things (IoT) technique to detect the hardhat-use sta-
video frames per second (FPS) (Redmon et al. 2016; Redmon tus via sensors that were tagged into the hardhat. Naticchia,
and Farhadi 2017). Due to the fast rate, the PPE detection pro- Vaccarini (Naticchia et al. 2013) used a WLAN to determine the
cess is an independent platform along with construction-related wearing of PPE on workers and verify if it abides by the regula-
objects to enable the recognition of complicated and elusive spa- tions. Nevertheless, existing sensor-based methodologies require
tial relationships. Hence, it is required to find a rapid technique manual tags or sensors in the PPE component that need to
that is able to deal with videos frame by frame and respond to install and maintain complex sensor networks. These reasons
the results in real time to track exactly the motion of objects and lead to an increase the project costs and may prevent practical
impending collisions in live video. implementation.
In computer vision, the most popular used algorithm for In contrast to sensor-based detection methods, vision-based
detecting objects is a region-based convolutional neural network methods have recently garnered considerable interest. These
(R-CNN) (Girshick 2015). Due to the low speed of the original methods use a standard camera, pattern recognition, and
R-CNN, the faster variants of it have been proposed including advanced computer-based techniques to establish a firm basis for
Mask R-CNN (He et al. 2017) and Faster R-CNN (Ren et al. detecting the hardhat wearing. Vision-based techniques were able
2017). However, the above-mentioned algorithms are unable to to interpret complex construction sites more comprehensively,
handle the detecting tasks of real-time targets from live video precisely, and quickly (Seo et al. 2015). Wu and Zhao (2018)
streams. Currently, the top effective detecting techniques for integrated local binary patterns (LBP), hu moment invariants
real-time objects include single shot detector (SSD) (Yi et al. (HMI), and color histograms (CH) to establish a hybrid color
2019), you only look once (YOLO) (Redmon et al. 2016), region- descriptor for helmet identification with various colors including
based fully convolutional network (R-FCN), and RetinaNet (Lin red, yellow, blue, and non-hardhat. Mneymneh, Abbas
et al. 2017). The real-time computing achievement will sacrifice (Mneymneh et al. 2019) created an integrated framework to
accuracy. So far, only the YOLO (remarkably, recent variants) is detect hardhat wearing based on computer vision techniques.
more quickly and precise compared to other substitutes. The dataset was captured from recorded videos on the construc-
Previous findings showed that YOLO considerably outperforms tion site. Firstly, the standard deviation matrix (SDM) was used
to identify mobile objects, and then the histogram of the ori-
the SSD and R-FCN in hardhat detection with a higher frame
ented gradients (HOG) descriptor is applied for the hardhat
rate than those algorithms (Xie et al. 2018).
wearing detection.
This study aims to address the task of detecting personal
Examples of video sequences based detection approaches
protective equipment on construction sites. The goal is to
include faces and facial features detection in color images (Shan
detect the presence of required personal protective equipment
et al. 2011), gradient based image edge detection (Shrestha et al.
(PPE) on a worker. Because of the multiphase process with
2015), and histogram of the oriented gradients descriptor to
craftwork features, the aforementioned task is difficult. To alle- monitor the PPE. Additional instances of using HOG based
viate this issue, the YOLOv5 model is capable of high-speed machine learning based techniques are k-nearest neighbor and
automatic feature learning while maintaining detection accuracy support vector machine algorithms to analyze unsafe behaviors
when compared to traditional image processing approaches. in construction sites. Generally, these methods used multiple
Major contributions are as follows: (1) providing a one-step steps and depend on handmade features to monitor if a worker
solution for automatic identification of the PPE on construction is wearing PPE. Therefore, they may face difficulty in detecting
sites in contrast to widely used multi-phase hardhat wearing the PPE in the backgrounds of weather changes, various views,
detection methods; (2) introducing four network structures of and occlusions.
new YOLO version named as YOLOv5s, YOLOv5m, YOLOv5l, Deep learning techniques have grown widely in the object
and YOLOv5x for automatic detection of the PPE worn by con- detection area due to their ability to deal with multi-scale fea-
struction worker; (3) constructing a new PPE detection dataset tures of data. Fang, Li (Fang et al. 2018) developed a novel
with 11978 samples that cover considerable variations. The fol- method based on faster R-CNN to automatically monitor the use
lowing section provides related works on safety detection. The of hardhat in construction sites. A dataset of 81,000 image
third section explains the developed methodologies. The fourth frames was used to train and test the proposed model. However,
section analyzes and discusses the research results, and the final the model was unable to identify the hardhat colors. Kolar, Chen
section draws conclusions. (Kolar et al. 2018) proposed a convolutional neural network to
INTERNATIONAL JOURNAL OF CONSTRUCTION MANAGEMENT 1009
detect safety railing systems. Ding, Fang (Ding et al. 2018) inte- image scaling. Mosaic data enhancement combines four images
grated a long short-term memory into a convolution neural net- to improve the dataset’s ability to recognize small targets.
work to automatically monitor unsafe acts in the workplace. The Adaptive image scaling improves the image’s accuracy by impos-
model only focuses on a single device. Therefore, further study is ing minimum black border requirements on original images and
required to deal with multiple pieces of equipment or workers. converting irregularly sized images to standard size. The adaptive
Siddula, Dai (Siddula et al. 2016) integrated a Gaussian mixture anchor box mechanism enhances the anchor box’s value by com-
model and CNNs for target detection on a construction site roof- puting the difference between the prediction frame and the
top. More recently, Nath, Chaspari (Nath et al. 2019) developed ground truth and iteratively reversing the network parameters.
model based CNNs for common construction-related object The YOLOv5s backbone consists of a focus module, Conv,
detection. C3, spatial pyramid pooling, and additional modules. The focus
With the sharp growth of deep learning, the series of You component recognizes down sampling while expanding the chan-
Only Look Once (YOLO) detection methods have gradually nel’s dimension, decreasing the number of floating-point opera-
become the mainstream approach for safety detection. Redmon, tions per second, and enhancing speed. The convolution (Conv)
Divvala (Redmon et al. 2016) first introduced the one-stage layer is the fundamental unit of YOLOv5s and sequentially per-
object detection algorithm YOLO. Boudjit and Ramzan (Boudjit forms two-dimensional convolution, regularization, and activa-
and Ramzan 2022) presented YOLOv2 based images of drones tion on the input. The C3 module is composed of numerous
for human detection. The methods used the real-time and bene- structural modules known as bottleneck residuals. The spatial
fits of YOLOv2 with great precision to reduce computational pyramid pooling module (He et al. 2014) is a pooling layer that
time. Deng, Li (Deng et al. 2022b) introduced a lightweight performs maximum pooling using various sizes of the kernel and
YOLOv3 to detect helmet wearing. The proposed methods integrates the features via concatenation.
achieved higher detecting accuracy than YOLOv5l and less com- The neck network utilizes the structure of a feature pyramid
puting time than YOLv5m. Wang (Wang et al. 2020) proposed network (FPN) and pixel aggregation network (PAN) (Zhu et al.
an improved version of YOLOv3 for real-time helmet wearing 2022). The FPN transfers potent semantic features from the
detection with different scenarios of occlusion, tiny objects, and high-level feature maps to the low-level feature maps. In add-
dense clusters. Zeng, Duan (Zeng et al. 2022) proposed an ition, the PAN transfers robust localization principles from lower
improved deep learning method based on YOLOv4 for automat- to higher feature maps. The two structures mutually enhance the
ically detecting hardhat wearing. The improved version rose to ability to feature a neck network fusion. The head generates a
4% accuracy compared to the original version. Nain, Sharma vector containing the class probability of the target object, the
(Nain et al. 2021) examined three deep learning algorithms object score, and the location of the target object’s bound-
including YOLOv4, v5, and YOLACTþþ for hardhat detection ing box.
on the construction site. The experimental results showed that
the considered algorithms achieved a high score of precision,
YOLOv5s Improvement
recall, and quick speed. Recently, Zhang, Xiao (Zhang et al.
2022) based the new YOLOv5s algorithm to identify helmet The YOLOv5 model is the optimal selection for target detection
wearing in practical situations. Sadiq, Masood (Sadiq et al. 2022) due to its ability in providing results with high speed and accur-
presented robust YOLOv5 for safety helmet detection consider- acy (Qi et al. 2022). YOLOv5 is more efficient in big object
ing the noise in the image of input. detection. However, it may fail in detecting the background noise
According to the aforementioned studies, computer vision- and overlapped small targets (Tan et al. 2021). By enhancing the
based data analysis is adequate for monitoring safety behavior. model’s scale and loss function, the advanced YOLOv5 model is
The majority of investigations concentrate on researching hard- proposed in order to improve the personal protective equip-
hat detection rather than PPE. Therefore, this research continues ment’s detection accuracy and speed with any type of object.
to employ the newly developed You Only Look Once algorithm Below is a description of YOLOv5’s enhanced version.
for proactive safety performance control with complete PPE A high-level feature map of the detecting model has a larger
detection. responsive domain, with an emphasis on the demonstration of
abstract semantic information that is suitable for classifying
tasks, but yields low resolution and inadequate localization detail
Proposal methods and information. In signal processing, the depth network layer
YOLOV5 methodologies typically has a greater loss of information for small objects. A
low-level characteristic map has a small receiver domain and a
The YOLOv5 structural operation is comparable to its predeces- high resolution corresponding to the size of small objects. The
sor, YOLOv4. The current version is developed further in low-level characteristic map generates a thorough analysis of the
accordance with YOLOv4. According to the network depth and object’s characteristics and data, which provides more advantages
the feature map width features, YOLOv5 has four variants: for extracting the contour, color, and other descriptive features,
YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x (Dong et al. as well as regression of the small object’s position. The original
2022). Among the four models, YOLOv5s obtained the results YOLOv5 consists of three scales of image feature detectors. For
with the highest computing speed, and YOLOv5x achieved the instance, if the input image has a size of 640 640, there are
highest precision in detecting. In general, the structure of the three scale ranges of image features 20 20, 40 40, and
YOLOv5 network consists of four components: input, backbone, 80 80. This indicates that the template is unable to capture
neck, and prediction. YOLOv5s is selected as the benchmark small objects that are lesser than 8 8 pixels. To address this
algorithm. Figure 1 displays the architectural operation of problem, the advanced version of YOLOv5 will include a detec-
YOLOv5s and the description of each part as shown below. tion scale to the model, 160 160. 640 divided by 160 is four
There are three components on the input side: mosaic data (4), therefore the improved model is able to detect objects of
enhancement, adaptive anchor box calculation, and adaptive 4 4 pixels or more that can fulfill the requirements in detecting
Figure 1. YOLOv5s structure.
small targets. Figure 2 represents the structure of the improved are divided into three main parts: training (6473), validation
model. (3570), and testing set (1935). Table 1 shows detailed informa-
tion on the data set. Figure 4 plots the distribution of character-
istics of the data set. This study applies a five-fold cross-
Experimental results and discussion validation technique to lower bias compared to random sampling
Dataset creation methods. The model performance is evaluated by average results
obtained by five testing rounds.
The dataset of personal protective equipment is primarily col-
lected by internet protocol outdoor cameras at construction sites
and through a web crawler named as Roboflow (https://robo-
flow.com/). This study defines the PPE for construction site Experimental settings and criteria
workers including six categories: shoes, suits, marks, gloves, gog-
gles, and hardhat as shown in Figure 3. The Yolo-mark tool The performance of considered models is based on the free and
(Alexey 2016) labels each object on images from the collected open-source software TensorFlow. The experimental environ-
dataset. This task marks bounding boxes of objects and generates ment configuration is set as follows: The central processing unit
annotation files, which include parameters of coordinates and (CPU) is Intel(R) Core (TM) i7-6820 CPU @ 2.70 GHz. The
box size together with the label type. Eleven thousand nine hun- graphics processing unit (GPU) is NVIDIA Quadro M1200.
dred and seventy-eight (11978) valid pictures were collected then The programming language is Python 3.9.10. The system
Figure 2. Improved YOLOv5s algorithm.
Table 1. Data set detail information.

glove goggles helmet marks shoes suit
Number of 984 1192 283 253 639 18
labels
no_glove no_goggles no_helmet no_mark no_shoes no_suit
Number of 1548 1384 229 640 525 15
labels
where true positive (TP) denotes the correctly detected objects.

False positive (FP) and false negative (FN) indicate the wrongly
and miss detected objects.
Results and comparison

The proposal YOLOv5s is tested with several types of images.
Figure 5 showed the qualitative analysis based on two types of
images (goggles and gloves). It is obvious that the improved ver-
sion has increased detection accuracy by being able to identify
small objects. The proposed model YOLOv5s was evaluated with
different epochs. Table 2 shows the experimental results with
416 416 resolution images and a threshold equal to 0.25. As
displayed in Table 2, the larger number of epochs yields better
detecting outcomes and are able to prevent over-fitting. An
epoch is calculated when we feed all the data in the training set
to the neural network once. When the data set is too large, not
Figure 3. Six categories of personal protective equipment.
all the data can be included for training at a time. Therefore, the
batch concept is used for the data set. Iterations are the number
of batches needed to complete one epoch.
environment is Ubuntu 16.04. The acceleration environment is To verify the generality of the proposed method, three net-
CUDA Toolkit 9.0. work resolutions are used. Figure 6 depicted the relationship
The measurement indicators including precision, recall, and between precision and recall. The tradeoff between precision and
F1-score were used to evaluate the performance of the proposed recall for different thresholds is visualized by the precision-recall
improved YOLOv5s. Precision value is computed as the ratio of curve. This curve is simply a plot with precision values on the
correctly predicted positive examples divided by the total number vertical axis and recall values on the horizontal axis. A high area
of positive examples that were predicted. The Recall is calculated under the curve represents both high recall and high precision.
as the ratio between the total number of true positives divided It is expected in high precision and recall values. However, there
by the total number of true positives and false negatives. The F1- is a trade-off between these two factors. As shown in Figure 6,
score is a combined measurement of precision and recall that the curve of classifying the glove is the best, due to the large
considers their harmonic mean. F1 scores close to one indicate training images. Table 3 exhibited results with different scenes of
high precision in detection objectives. Three indicators can be categories of PPE in evaluation indicators.
formulated as the following equations. The confusion matrix (or error matrix) is one way to sum-
TP marize the performance of a classifier for binary classification
Precision ðPrÞ ¼ (1)
TP þ FP tasks. Figure 7 displayed the confusion matrix outcomes of the
TP proposed model.
Recall ðRcÞ ¼ (2) Figure 8 displays the comparison between the improved
TP þ FN
2 Pr Rc YOLO5s against four network structures including YOLOv5s,
F1 ¼ (3) YOLOv5m, YOLOv5l, and YOLOv5x for safety detection in dif-
Pr þ R c
ferent scenes. The improved version shows no significant
Figure 4. Characteristics of data set.
Figure 5. Visual detection: Left: Original model; Right: proposal model.
Table 2. Detection results for PPE with different epochs. algorithms cannot. In Figure 8(d), the improved version of
Epochs YOLO5s can identify the missing PPE. The detection outcomes
demonstrate the robustness of the proposed method.
Indicators 4 8 16 32
To verify the applicability of the proposed model, the input
Precision 0.8 0.82 0.86 0.93
data is extracted from real video from a construction case. The
Recall 0.85 0.9 0.97 0.99
project name is Paihong industrial factory, which is located in
Bau Bang Industrial Park Extension, Binh Duong Province,
difference in results with benchmarked methods in simple scenes Vietnam. The resulting video clip displays the information of
and large objects. For the fuzzy small objects case, the proposed each worker at the construction site as shown in Figure 9. As
algorithm can detect the PPE with serious occlusion, while other can be seen, the model has identified the protective equipment
including helmets, safety vests, protective shoes, and masks as The performance of the proposed YOLOv5 is compared to
well as the faces of workers with relatively high accuracy. The those of the previous version of YOLO via three indicators: pre-
proposed model is able to provide full information for manage- cision, recall, and F1. The comparing algorithms include
ment tasks such as name, status, details of worker, and action YOLOv4, Faster-RCNN MobilenetV3, and Faster-RCNN
time. For instance, the worker named An who does not wear the Resnet50. These models have been customized using the same
hat hart at 9:15, December 12th 2022. Accordingly, the safety architecture library as Pytorch to bring back to the same frame
work can be easily controlled. The manager can introduce rea- of reference for convenient comparison. Table 4 shows the
sonable regulations and sanctions to control and improve the experimental results of all considered algorithms. Compared to
problem of ensuring that all required protective equipment for the previous version YOLOv4, YOLOv5 was not too outstanding,
labor is met. but it still showed effectiveness in all indicators. For the two
remaining models, the results showed that the model families
using Region-based Convolutional Neural Networks such as
MobilenetV3 and Resnet50 have not yet been able to occupy the
position of YOLO in general and YOLOv5 in particular in terms
of performance and application level. YOLOv5 outperforms the
other considered algorithms on average, implying that YOLOv5
Table 3. Detection results for PPE with different network resolutions.

Network Frame
resolution rate Indicators Glove Goggles No_glove No_goggles
608608 6Hz FN 0.01 0.11 0.21 0.14
FP 0.06 0.39 0.30 0.40
TP 0.94 0.61 0.70 0.60
Precision 0.97 0.95 0.97 0.96
Recall 0.99 0.98 0.98 0.99
416416 8Hz FN 0.01 0.12 0.22 0.15
FP 0.06 0.37 0.31 0.42
TP 0.93 0.59 0.69 0.58
Precision 0.95 0.93 0.95 0.95
Recall 0.98 0.97 0.96 0.98
320320 10Hz FN 0.05 0.10 0.24 0.17
FP 0.07 0.35 0.34 0.43
TP 0.91 0.56 0.64 0.55
Precision 0.92 0.91 0.91 0.93
Recall 0.93 0.94 0.92 0.95
Figure 6. Precision-Recall curve.
Figure 7. Confusion matrix outcomes.

Figure 8. Visual detection of all models.
robustly facilitates the solving of the safety problem in construc- In terms of computational process, the improved model
tion sites. spends more computing time than benchmarked models.
Nevertheless, the sacrifice is reasonable and acceptable regarding
the profits of high predicting accuracy.
Discussion The detection accuracy for each category of an object in the
The experimental results demonstrate that the enhanced version dataset is relatively high. Both values of precision and recall are
of YOLOv5s outperforms all other detection models in terms of above 90%. It is clear that the proposed model has improved
precision and recall. Following is a summary of additional detection accuracy.
findings: The experimental results on a real construction case demon-
The improved YOLO5s outperforms the benchmarked mod- strate the feasibility of the model. The model does not need any
els, indicating that including a detection scale in the model is handcraft feature selection and has a good capacity for extracting
crucial in building a detection model. The benchmarked models features in the images. The high precision and recall show the great
own three scale ranges of image features while the improved performance of the model compared to other considered models.
YOLO5s has four scale ranges. Therefore, the proposed model created a useful tool for detecting
The improved model is able to detect more small objects. PPE and enhancing on-site construction safety inspection.
This will effectively improve the detection accuracy, especially In summary, the above findings adequately explain the suc-
for detecting objects in an image with large size differences. cess of the improved YOLO5s in detecting personal protective
Figure 9. Detecting results from real construction case.
Table 4. Comparisons of different methods for hardhat detection. 640” for the size of images; batch for declaring a number of
Indicators images in 1 epoch train; “data ppe.yaml” for declaring the col-
Methods Precision Recall F1 score
lected database address; and “weights yolo5x.pt” to decide the
training model. The fourth step is applied for testing with com-
YOLOv5 0.74 0.66 0.70
YOLOv4 0.73 0.62 0.67 mand the “python detect.py” for starting the recognition process.
Faster-RCNN MobilenetV3 0.59 0.54 0.57 The last step is used for generating the output with two types of
Faster-RCNN Resnet50 0.65 0.58 0.61 data including image and video.
equipment on construction sites. This model is an effective tool Conclusions

for project managers in safety management with cost-saving.
The works at construction sites are hazardous and risky. Safety
management is necessary to reduce occupational accidents. This
Implications and practical contributions research introduced an improved version of YOLOv5 for detect-
ing personal protective equipment from practical necessity and
The proposed model overcomes existing research in controlling emergency needs at construction sites. The automatic tools over-
safety tasks at construction sites efficiently and achieving import- come the limitations of manual inspection and monitoring meth-
ant project objectives. Moreover, the research considers the per- ods and enable data management and recovery.
sonal protective equipment of workers with multiple outfits A large dataset including 11978 samples was utilized to evalu-
instead of a single one. On the practical side, the detection ate the comparative effectiveness of the proposed models. The
model will support the construction industry by allowing safety dataset was collected from real construction sites and through a
engineers to identify people’s unsafe behavior and enhance con- Web crawler. Apart from that, this study built an image database
struction on-site safety. of six categories of common PPE on construction sides. The
The model has practical implications since it does not impose comparison results with a five-fold cross-validation technique
any limitation on the operating of training and testing tasks. With revealed that the YOLOv5s was the most effective model with
minor modification, the model is easy to solve other encountered respect to precision and recall indicators. The improved version
problems in the field of the construction industry such as on-site of YOLOv5s yielded results with the highest value of precision
object recognition, building component searching, interior work and recall compared to those of benchmarked models.
progress monitoring, or structural integrity checking. The detection results on the test dataset and a real construc-
tion case confirmed that the model achieved good performance
Interface with high detection accuracy, low missing rate, and ability to meet
real-time requirements. In summary, the proposed model demon-
A detection interface was developed to make the proposed algo- strated a significant step forward in assisting site managers to con-
rithm much easier and to provide running clearly and graphic- trol safety situations. Despite the current advantages, the model
ally. Figure 10 demonstrates the main interface of the proposed has poor performance when the objects are not pretty clear, too
detection algorithm. The interface consists of five steps as fol- small and vague, and the background is too complicated. The
lows: The first step requires a user to install the required soft- quality of input data is limited in size due to of collecting source.
ware such as Git, Pip, Pytorch, or Python. The user needs to The application interface is quite basic, not outstanding, and opti-
search for the database in the second step. The data set can be mized. Further studies should consider the new dataset with dif-
extracted from the Roboflow website or self-search and self-label ferent conditions. Integrating the optimization algorithm into the
images. The third step is used for training with required parame- current model enhances computational speed and reduces plat-
ters as follows: “Python train.py” for running the training; “img form storage.
Figure 10. Detection platform of all models.

Disclosure statement Liu J, Luo H, Liu H. 2022. Deep learning-based data analytics for safety in
construction. Autom Constr. 140:104302.
No potential conflict of interest was reported by the author(s). Mneymneh BE, Abbas M, Khoury H. 2019. Vision-based framework for
intelligent monitoring of hardhat wearing on construction sites. J Comput
Civil Eng. 33(2):04018066.
Funding Nain M, Sharma S, Chaurasia S. 2021. Authentication control system for the
efficient detection of hard-hats using deep learning algorithms. J Discrete
This research is funded by Vietnam National University HoChiMinh Mathematical Sci Cryptography. 24(8):2291–2306.
City (VNU-HCM) under grant number DS2022-20-01. Nath ND, Behzadan AH, Paal SG. 2020. Deep learning for site safety: real-time
detection of personal protective equipment. Autom Constr. 112:103085.
Nath ND, Chaspari T, Behzadan AH. 2019. Single- and multi-label classification
of construction objects using deep transfer learning methods. J Information
References Technol Construct. 24:511–526. (Special issue Virtual, Augmented and
Mixed: New Realities in Construction).
Alexey AB. 2016. Yolo-mark gui for marking bounded boxes of objects in Naticchia B, Vaccarini M, Carbonari A. 2013. A monitoring system for real-time
images for training yolo. GitHub, Online. https://github.com/AlexeyAB/ interference control on large construction sites. Autom Constr. 29:148–160.
Yolo_mark. Paneru S, Jeelani I. 2021. Computer vision applications in construction: cur-
Akinlolu M, Haupt TC, Edwards DJ, Simpeh F. 2022. A bibliometric review rent state, opportunities & challenges. Autom Constr. 132:103940.
of the status and emerging research trends in construction safety manage- Qi J, Liu X, Liu K, Xu F, Guo H, Tian X, Li M, Bao Z, Li Y. 2022. An
ment technologies. Int J Construct Manage. 22(14):2699–2711.
improved YOLOv5 model based on visual attention mechanism:
Bhagwat K, Delhi VSK. 2021. Review of construction safety performance
Application to recognition of tomato virus disease. Comput Electron
measurement methods and practices: a science mapping approach. Int J
Construct Manage. 1–15. In press. DOI: 10.1080/15623599.2021.1924456. Agric. 194:106780.
Boudjit K, Ramzan N. 2022. Human detection based on deep learning Rao AS, Radanovic M, Liu Y, Hu S, Fang Y, Khoshelham K, Palaniswami M,
YOLO-v2 for real-time UAV applications. J Exper Theor Artificial Ngo T. 2022. Real-time monitoring of construction sites: sensors, meth-
Intelligence. 34(3):527–544. ods, and applications. Autom Constr. 136:104099.
Bureau of Labor Statistics. 2020. Number and rate of fatal work injuries, by Redmon J, Divvala S, Girshick R, Farhadi A. 2016. You only look once: unified,
industry. [accessed 2022 July]. https://www.bls.gov/charts/census-of-fatal- real-time object detection. Computer vision and pattern recognition. Cornell
occupational-injuries/number-and-rate-of-fatal-work-injuries-by-industry.htm. University, Computer Vision and Pattern Recognition. ArXiv Preprint.
Chung WWS, Tariq S, Mohandes SR, Zayed T. 2020. IoT-based application Redmon J, Farhadi A. 2017. YOLO9000: better, faster, stronger. In 2017 IEEE
for construction site safety monitoring. Int J Construct Manage. 1–17. In Conference on Computer Vision and Pattern Recognition (CVPR).
press. DOI: 10.1080/15623599.2020.1847405. Ren S, He K, Girshick R, Sun J. 2017. Faster R-CNN: towards real-time
Deng H, Tian M, Ou Z, Deng Y. 2022a. A semantic framework for on-site object detection with region proposal networks. IEEE Trans Pattern Anal
evacuation routing based on awareness of obstacle accessibility. Autom Mach Intell. 39(6):1137–1149.
Constr. 136:104154. Sadiq M, Masood S, Pal O. 2022. FD-YOLOv5: a fuzzy image enhancement
Deng L, Li H, Liu H, Gu J. 2022b. A lightweight YOLOv3 algorithm used for based robust object detection model for safety helmet detection. Int J
safety helmet detection. Sci Rep. 12(1):10981. Fuzzy Syst. 24(5):2600–2616.
Development, M.t.M.o.H.a.U.-R. 2017. The short reports of fatal accidents in Seo J, Han S, Lee S, Kim H. 2015. Computer vision techniques for construc-
China’s building construction activities. [accessed 2022 July]. http://sgxxxt. tion safety and health monitoring. Adv Eng Inf. 29(2):239–251.
mohurd.gov.cn/Public/AccidentList.aspx. Shan D, Shehata M, Badawy W. 2011. Hard hat detection in video sequences
Ding L, Fang W, Luo H, Love PED, Zhong B, Ouyang X. 2018. A deep hybrid based on face features, motion and color information. In 2011 3rd
learning model to detect unsafe behavior: integrating convolution neural International Conference on Computer Research and Development.
networks and long short-term memory. Autom Constr. 86:118–124. Shrestha K, Shrestha PP, Bajracharya D, Yfantis EA. 2015. Hard-hat detection
Dong S, He Q, Li H, Yin Q. 2015. Automated PPE misuse identification and assess- for construction safety visualization. J Constr Eng. 2015:721380.
ment for safety performance enhancement. In ICCREM 2015. p. 204–214. Siddula M, Dai F, Ye Y, Fan J. 2016. Unsupervised feature learning for
Dong X, Yan S, Duan C. 2022. A lightweight vehicles detection network objects of interest detection in cluttered construction roof site images.
model based on YOLOv5. Eng Appl Artif Intell. 113:104914. Procedia Eng. 145:428–435.
Fang Q, Li H, Luo X, Ding L, Luo H, Rose TM, An W. 2018. Detecting non- Tan S, Lu G, Jiang Z, Huang L. 2021. Improved YOLOv5 network model
hardhat-use by a deep learning method from far-field surveillance videos. and application in safety helmet detection. In 2021 IEEE International
Autom Constr. 85:1–9. Conference on Intelligence and Safety for Robotics (ISR).
Fang W, Love PED, Luo H, Ding L. 2020. Computer vision for behaviour- Wang H, Hu Z, Guo Y, Yang Z, Zhou F, Xu P. 2020. A real-time safety helmet
based safety in construction: a review and future directions. Adv Eng Inf. wearing detection approach based on CSYOLOv3. Appl Sci. 10(19):6732.
43:100980. Wu H, Zhao J. 2018. An intelligent vision-based approach for helmet identi-
Girshick R. 2015. Fast R-CNN. In 2015 IEEE International Conference on fication for work safety. Comput Ind. 100:267–277.
Computer Vision (ICCV); p. 1440–1448. Wu J, Cai N, Chen W, Wang H, Wang G. 2019. Automatic detection of
Guo BHW, Zou Y, Fang Y, Goh YM, Zou PXW. 2021. Computer vision hardhats worn by construction personnel: a deep learning approach and
technologies for safety science and management in construction: a critical
benchmark dataset. Autom Constr. 106:102894.
review and future research directions. Saf Sci. 135:105130.
Xie Z, Liu H, Li Z, He Y. 2018. A convolutional neural network based
He K, Zhang X, Ren S, Sun J. 2014. Spatial pyramid pooling in deep convo-
approach towards real-time hard hat detection. 2018 IEEE International
lutional networks for visual recognition. In: David Fleet, Tomas Pajdla,
Bernt Schiele, Tinne Tuytelaars, editors. Computer vision – ECCV 2014. Conference on Progress in Informatics and Computing (PIC), Suzhou,
Cham: Springer International Publishing; p. 346–361. China. IEEE; p. 430–434.
He K, Gkioxari G, Dollar P, Girshick R. 2017. Mask R-CNN. In 2017 IEEE Yap JBH, Skitmore M, Lam CGY, Lee WP, Lew YL. 2022. Advanced technol-
International Conference on Computer Vision (ICCV); p. 2980–2988. ogies for enhanced construction safety management: investigating
Kelm A, Laußat L, Meins-Becker A, Platz D, Khazaee MJ, Costin AM, Malaysian perspectives. Int J Construct Manage. 1–10. In press. DOI: 10.
Helmus M, Teizer J. 2013. Mobile passive Radio Frequency Identification 1080/15623599.2022.2135951.
(RFID) portal for automated and rapid control of Personal Protective Yi J, Wu P, Metaxas DN. 2019. ASSD: attentive single shot multibox
Equipment (PPE) on construction sites. Autom Constr. 36:38–52. detector. Comput Vision Image Understanding. 189:102827.
Kolar Z, Chen H, Luo X. 2018. Transfer learning and deep convolutional Zeng L, Duan X, Pan Y, Deng M. 2022. Research on the algorithm of hel-
neural networks for safety guardrail detection in 2D images. Autom met-wearing detection based on the optimized YOLOv4. The Visual
Constr. 89:58–70. Computer. In press. DOI: 10.1007/s00371-022-02471-9.
Lee D, Khan N, Park C. 2021. Rigorous analysis of safety rules for vision Zhang H, Yan X, Li H, Jin R, Fu H. 2019. Real-time alarming, monitoring,
intelligence-based monitoring at construction jobsites. Int J Construct and locating for non-hard-hat use in construction. 145(3):04019006.
Manage. 1–11. In press. DOI: 10.1080/15623599.2021.2007453. Zhang Y-J, Xiao F-S, Lu Z-M. 2022. Helmet wearing state detection based on
Lin TY, Goyal P, Girshick R, He K, Dollar P. 2017. Focal loss for dense improved Yolov5s. Sensors 22(24):9843.
object detection. In 2017 IEEE International Conference on Computer Zhu L, Lee F, Cai J, Yu H, Chen Q. 2022. An improved feature pyramid net-
Vision (ICCV); p. 2999–3007. work for object detection. Neurocomputing. 483:127–139.

Improved Detection Network Model Based on YOLOv5 for Warning Safety in Construction Sites

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Improved Detection Network Model Based on YOLOv5 for Warning Safety in Construction Sites

Uploaded by

Copyright:

Available Formats

International Journal of Construction Management

ISSN: (Print) (Online) Journal homepage: www.tandfonline.com/journals/tjcm20

Improved detection network model based on

Nguyen Ngoc-Thoan, Dao-Quang Thanh Bui, Cuong N. N. Tran & Duc-Hoc

To link to this article: https://doi.org/10.1080/15623599.2023.2171836

Published online: 09 Feb 2023.

Submit your article to this journal

Article views: 439

View related articles

View Crossmark data

Citing articles: 2 View citing articles

Full Terms & Conditions of access and use can be found at

Improved detection network model based on YOLOv5 for warning safety in

ABSTRACT ARTICLE HISTORY

Introduction workers sometimes disregard the mandatory safety requirements

Figure 1. YOLOv5s structure.

Figure 2. Improved YOLOv5s algorithm.

Table 1. Data set detail information.

where true positive (TP) denotes the correctly detected objects.

Results and comparison

Figure 4. Characteristics of data set.

Figure 5. Visual detection: Left: Original model; Right: proposal model.

Table 3. Detection results for PPE with different network resolutions.

Figure 7. Confusion matrix outcomes.

Figure 8. Visual detection of all models.

Figure 9. Detecting results from real construction case.

equipment on construction sites. This model is an effective tool Conclusions

Figure 10. Detection platform of all models.

You might also like