Professional Documents
Culture Documents
Improved Detection Network Model Based on YOLOv5 for Warning Safety in Construction Sites
Improved Detection Network Model Based on YOLOv5 for Warning Safety in Construction Sites
To cite this article: Nguyen Ngoc-Thoan, Dao-Quang Thanh Bui, Cuong N. N. Tran & Duc-
Hoc Tran (2024) Improved detection network model based on YOLOv5 for warning safety in
construction sites, International Journal of Construction Management, 24:9, 1007-1017, DOI:
10.1080/15623599.2023.2171836
HIGHLIGHTS
1. Providing a one-step solution for automatic identification the PPE on construction sites in contrast to
widely used multi-phase hardhat wearing detection methods
2. Introducing four network structures of new YOLO version named as YOLOv5s, YOLOv5m, YOLOv5l,
and YOLOv5x for automatic detection of the PPE worn by construction worker.
3. Constructing a new PPE detection dataset with 11978 samples that cover considerable variations.
CONTACT Duc-Hoc Tran tdhoc@hcmut.edu.vn Faculty of Civil Engineering, Ho Chi Minh City University of Technology (HCMUT), 268 Ly Thuong Kiet Street,
District 10, Ho Chi Minh City, Vietnam
ß 2023 Informa UK Limited, trading as Taylor & Francis Group
1008 N. NGOC-THOAN ET AL.
camera has trouble distinguishing between small objects and Related works
background noise and overlapping instances. Furthermore, some
Previous studies on the automatic detection of PPE presence can
samples may have a similar image region or partial occlusion
be classified into two categories: sensor-based and vision-based
that makes detection difficult. Finally, there are no datasets avail-
detection (Nath et al. 2020). Sensor-based methods consist of
able for developing and testing models for detecting PPE in a
installing a set of sensors and analysing the signals using data
variety of situations (Rao et al. 2022).
collected from the sensors. These methods focused on remote
The majority of recent vision-based methods are only con-
tracking systems like wireless local area networks (WLANs) and
cerned with detecting hardhats on construction sites. Other
radio frequency identification (RFID) (Dong et al. 2015; Zhang
equipment, such as safety vests, gloves, safety goggles, and steel et al. 2019). The RFID labels are stuck on each PPE component.
toe shoes, should be monitored in addition to the hardhat to The scanner at the entrance gates checks the tags to monitor the
ensure worker safety. A variety of studies has been conducted to presence of wearing proper PPE. Kelm, Laußat (Kelm et al.
identify multi-class PPE (Nath et al. 2020). Open-access commer- 2013) proposed a dynamic RFID technique for automated and
cial software, named smartvid.io, is based on deep learning for rapid detection of the presence of PPE in workers. Dong, He
multiple PPE components detection. The real time detection (Dong et al. 2015) developed real time location system (RTLS)
methodology of PPE is considerable significance to assure safety; and virtual construction for automatic detection and assessment
however, it is scant in the literature (Xie et al. 2018). Recent the use of PPEs. Zhang, Yan (Zhang et al. (2019) applied an
studies considered the real time term as processing at least 5 internet of things (IoT) technique to detect the hardhat-use sta-
video frames per second (FPS) (Redmon et al. 2016; Redmon tus via sensors that were tagged into the hardhat. Naticchia,
and Farhadi 2017). Due to the fast rate, the PPE detection pro- Vaccarini (Naticchia et al. 2013) used a WLAN to determine the
cess is an independent platform along with construction-related wearing of PPE on workers and verify if it abides by the regula-
objects to enable the recognition of complicated and elusive spa- tions. Nevertheless, existing sensor-based methodologies require
tial relationships. Hence, it is required to find a rapid technique manual tags or sensors in the PPE component that need to
that is able to deal with videos frame by frame and respond to install and maintain complex sensor networks. These reasons
the results in real time to track exactly the motion of objects and lead to an increase the project costs and may prevent practical
impending collisions in live video. implementation.
In computer vision, the most popular used algorithm for In contrast to sensor-based detection methods, vision-based
detecting objects is a region-based convolutional neural network methods have recently garnered considerable interest. These
(R-CNN) (Girshick 2015). Due to the low speed of the original methods use a standard camera, pattern recognition, and
R-CNN, the faster variants of it have been proposed including advanced computer-based techniques to establish a firm basis for
Mask R-CNN (He et al. 2017) and Faster R-CNN (Ren et al. detecting the hardhat wearing. Vision-based techniques were able
2017). However, the above-mentioned algorithms are unable to to interpret complex construction sites more comprehensively,
handle the detecting tasks of real-time targets from live video precisely, and quickly (Seo et al. 2015). Wu and Zhao (2018)
streams. Currently, the top effective detecting techniques for integrated local binary patterns (LBP), hu moment invariants
real-time objects include single shot detector (SSD) (Yi et al. (HMI), and color histograms (CH) to establish a hybrid color
2019), you only look once (YOLO) (Redmon et al. 2016), region- descriptor for helmet identification with various colors including
based fully convolutional network (R-FCN), and RetinaNet (Lin red, yellow, blue, and non-hardhat. Mneymneh, Abbas
et al. 2017). The real-time computing achievement will sacrifice (Mneymneh et al. 2019) created an integrated framework to
accuracy. So far, only the YOLO (remarkably, recent variants) is detect hardhat wearing based on computer vision techniques.
more quickly and precise compared to other substitutes. The dataset was captured from recorded videos on the construc-
Previous findings showed that YOLO considerably outperforms tion site. Firstly, the standard deviation matrix (SDM) was used
to identify mobile objects, and then the histogram of the ori-
the SSD and R-FCN in hardhat detection with a higher frame
ented gradients (HOG) descriptor is applied for the hardhat
rate than those algorithms (Xie et al. 2018).
wearing detection.
This study aims to address the task of detecting personal
Examples of video sequences based detection approaches
protective equipment on construction sites. The goal is to
include faces and facial features detection in color images (Shan
detect the presence of required personal protective equipment
et al. 2011), gradient based image edge detection (Shrestha et al.
(PPE) on a worker. Because of the multiphase process with
2015), and histogram of the oriented gradients descriptor to
craftwork features, the aforementioned task is difficult. To alle- monitor the PPE. Additional instances of using HOG based
viate this issue, the YOLOv5 model is capable of high-speed machine learning based techniques are k-nearest neighbor and
automatic feature learning while maintaining detection accuracy support vector machine algorithms to analyze unsafe behaviors
when compared to traditional image processing approaches. in construction sites. Generally, these methods used multiple
Major contributions are as follows: (1) providing a one-step steps and depend on handmade features to monitor if a worker
solution for automatic identification of the PPE on construction is wearing PPE. Therefore, they may face difficulty in detecting
sites in contrast to widely used multi-phase hardhat wearing the PPE in the backgrounds of weather changes, various views,
detection methods; (2) introducing four network structures of and occlusions.
new YOLO version named as YOLOv5s, YOLOv5m, YOLOv5l, Deep learning techniques have grown widely in the object
and YOLOv5x for automatic detection of the PPE worn by con- detection area due to their ability to deal with multi-scale fea-
struction worker; (3) constructing a new PPE detection dataset tures of data. Fang, Li (Fang et al. 2018) developed a novel
with 11978 samples that cover considerable variations. The fol- method based on faster R-CNN to automatically monitor the use
lowing section provides related works on safety detection. The of hardhat in construction sites. A dataset of 81,000 image
third section explains the developed methodologies. The fourth frames was used to train and test the proposed model. However,
section analyzes and discusses the research results, and the final the model was unable to identify the hardhat colors. Kolar, Chen
section draws conclusions. (Kolar et al. 2018) proposed a convolutional neural network to
INTERNATIONAL JOURNAL OF CONSTRUCTION MANAGEMENT 1009
detect safety railing systems. Ding, Fang (Ding et al. 2018) inte- image scaling. Mosaic data enhancement combines four images
grated a long short-term memory into a convolution neural net- to improve the dataset’s ability to recognize small targets.
work to automatically monitor unsafe acts in the workplace. The Adaptive image scaling improves the image’s accuracy by impos-
model only focuses on a single device. Therefore, further study is ing minimum black border requirements on original images and
required to deal with multiple pieces of equipment or workers. converting irregularly sized images to standard size. The adaptive
Siddula, Dai (Siddula et al. 2016) integrated a Gaussian mixture anchor box mechanism enhances the anchor box’s value by com-
model and CNNs for target detection on a construction site roof- puting the difference between the prediction frame and the
top. More recently, Nath, Chaspari (Nath et al. 2019) developed ground truth and iteratively reversing the network parameters.
model based CNNs for common construction-related object The YOLOv5s backbone consists of a focus module, Conv,
detection. C3, spatial pyramid pooling, and additional modules. The focus
With the sharp growth of deep learning, the series of You component recognizes down sampling while expanding the chan-
Only Look Once (YOLO) detection methods have gradually nel’s dimension, decreasing the number of floating-point opera-
become the mainstream approach for safety detection. Redmon, tions per second, and enhancing speed. The convolution (Conv)
Divvala (Redmon et al. 2016) first introduced the one-stage layer is the fundamental unit of YOLOv5s and sequentially per-
object detection algorithm YOLO. Boudjit and Ramzan (Boudjit forms two-dimensional convolution, regularization, and activa-
and Ramzan 2022) presented YOLOv2 based images of drones tion on the input. The C3 module is composed of numerous
for human detection. The methods used the real-time and bene- structural modules known as bottleneck residuals. The spatial
fits of YOLOv2 with great precision to reduce computational pyramid pooling module (He et al. 2014) is a pooling layer that
time. Deng, Li (Deng et al. 2022b) introduced a lightweight performs maximum pooling using various sizes of the kernel and
YOLOv3 to detect helmet wearing. The proposed methods integrates the features via concatenation.
achieved higher detecting accuracy than YOLOv5l and less com- The neck network utilizes the structure of a feature pyramid
puting time than YOLv5m. Wang (Wang et al. 2020) proposed network (FPN) and pixel aggregation network (PAN) (Zhu et al.
an improved version of YOLOv3 for real-time helmet wearing 2022). The FPN transfers potent semantic features from the
detection with different scenarios of occlusion, tiny objects, and high-level feature maps to the low-level feature maps. In add-
dense clusters. Zeng, Duan (Zeng et al. 2022) proposed an ition, the PAN transfers robust localization principles from lower
improved deep learning method based on YOLOv4 for automat- to higher feature maps. The two structures mutually enhance the
ically detecting hardhat wearing. The improved version rose to ability to feature a neck network fusion. The head generates a
4% accuracy compared to the original version. Nain, Sharma vector containing the class probability of the target object, the
(Nain et al. 2021) examined three deep learning algorithms object score, and the location of the target object’s bound-
including YOLOv4, v5, and YOLACTþþ for hardhat detection ing box.
on the construction site. The experimental results showed that
the considered algorithms achieved a high score of precision,
YOLOv5s Improvement
recall, and quick speed. Recently, Zhang, Xiao (Zhang et al.
2022) based the new YOLOv5s algorithm to identify helmet The YOLOv5 model is the optimal selection for target detection
wearing in practical situations. Sadiq, Masood (Sadiq et al. 2022) due to its ability in providing results with high speed and accur-
presented robust YOLOv5 for safety helmet detection consider- acy (Qi et al. 2022). YOLOv5 is more efficient in big object
ing the noise in the image of input. detection. However, it may fail in detecting the background noise
According to the aforementioned studies, computer vision- and overlapped small targets (Tan et al. 2021). By enhancing the
based data analysis is adequate for monitoring safety behavior. model’s scale and loss function, the advanced YOLOv5 model is
The majority of investigations concentrate on researching hard- proposed in order to improve the personal protective equip-
hat detection rather than PPE. Therefore, this research continues ment’s detection accuracy and speed with any type of object.
to employ the newly developed You Only Look Once algorithm Below is a description of YOLOv5’s enhanced version.
for proactive safety performance control with complete PPE A high-level feature map of the detecting model has a larger
detection. responsive domain, with an emphasis on the demonstration of
abstract semantic information that is suitable for classifying
tasks, but yields low resolution and inadequate localization detail
Proposal methods and information. In signal processing, the depth network layer
YOLOV5 methodologies typically has a greater loss of information for small objects. A
low-level characteristic map has a small receiver domain and a
The YOLOv5 structural operation is comparable to its predeces- high resolution corresponding to the size of small objects. The
sor, YOLOv4. The current version is developed further in low-level characteristic map generates a thorough analysis of the
accordance with YOLOv4. According to the network depth and object’s characteristics and data, which provides more advantages
the feature map width features, YOLOv5 has four variants: for extracting the contour, color, and other descriptive features,
YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x (Dong et al. as well as regression of the small object’s position. The original
2022). Among the four models, YOLOv5s obtained the results YOLOv5 consists of three scales of image feature detectors. For
with the highest computing speed, and YOLOv5x achieved the instance, if the input image has a size of 640 640, there are
highest precision in detecting. In general, the structure of the three scale ranges of image features 20 20, 40 40, and
YOLOv5 network consists of four components: input, backbone, 80 80. This indicates that the template is unable to capture
neck, and prediction. YOLOv5s is selected as the benchmark small objects that are lesser than 8 8 pixels. To address this
algorithm. Figure 1 displays the architectural operation of problem, the advanced version of YOLOv5 will include a detec-
YOLOv5s and the description of each part as shown below. tion scale to the model, 160 160. 640 divided by 160 is four
There are three components on the input side: mosaic data (4), therefore the improved model is able to detect objects of
enhancement, adaptive anchor box calculation, and adaptive 4 4 pixels or more that can fulfill the requirements in detecting
1010 N. NGOC-THOAN ET AL.
small targets. Figure 2 represents the structure of the improved are divided into three main parts: training (6473), validation
model. (3570), and testing set (1935). Table 1 shows detailed informa-
tion on the data set. Figure 4 plots the distribution of character-
istics of the data set. This study applies a five-fold cross-
Experimental results and discussion validation technique to lower bias compared to random sampling
Dataset creation methods. The model performance is evaluated by average results
obtained by five testing rounds.
The dataset of personal protective equipment is primarily col-
lected by internet protocol outdoor cameras at construction sites
and through a web crawler named as Roboflow (https://robo-
flow.com/). This study defines the PPE for construction site Experimental settings and criteria
workers including six categories: shoes, suits, marks, gloves, gog-
gles, and hardhat as shown in Figure 3. The Yolo-mark tool The performance of considered models is based on the free and
(Alexey 2016) labels each object on images from the collected open-source software TensorFlow. The experimental environ-
dataset. This task marks bounding boxes of objects and generates ment configuration is set as follows: The central processing unit
annotation files, which include parameters of coordinates and (CPU) is Intel(R) Core (TM) i7-6820 CPU @ 2.70 GHz. The
box size together with the label type. Eleven thousand nine hun- graphics processing unit (GPU) is NVIDIA Quadro M1200.
dred and seventy-eight (11978) valid pictures were collected then The programming language is Python 3.9.10. The system
INTERNATIONAL JOURNAL OF CONSTRUCTION MANAGEMENT 1011
Table 2. Detection results for PPE with different epochs. algorithms cannot. In Figure 8(d), the improved version of
Epochs YOLO5s can identify the missing PPE. The detection outcomes
demonstrate the robustness of the proposed method.
Indicators 4 8 16 32
To verify the applicability of the proposed model, the input
Precision 0.8 0.82 0.86 0.93
data is extracted from real video from a construction case. The
Recall 0.85 0.9 0.97 0.99
project name is Paihong industrial factory, which is located in
Bau Bang Industrial Park Extension, Binh Duong Province,
difference in results with benchmarked methods in simple scenes Vietnam. The resulting video clip displays the information of
and large objects. For the fuzzy small objects case, the proposed each worker at the construction site as shown in Figure 9. As
algorithm can detect the PPE with serious occlusion, while other can be seen, the model has identified the protective equipment
INTERNATIONAL JOURNAL OF CONSTRUCTION MANAGEMENT 1013
including helmets, safety vests, protective shoes, and masks as The performance of the proposed YOLOv5 is compared to
well as the faces of workers with relatively high accuracy. The those of the previous version of YOLO via three indicators: pre-
proposed model is able to provide full information for manage- cision, recall, and F1. The comparing algorithms include
ment tasks such as name, status, details of worker, and action YOLOv4, Faster-RCNN MobilenetV3, and Faster-RCNN
time. For instance, the worker named An who does not wear the Resnet50. These models have been customized using the same
hat hart at 9:15, December 12th 2022. Accordingly, the safety architecture library as Pytorch to bring back to the same frame
work can be easily controlled. The manager can introduce rea- of reference for convenient comparison. Table 4 shows the
sonable regulations and sanctions to control and improve the experimental results of all considered algorithms. Compared to
problem of ensuring that all required protective equipment for the previous version YOLOv4, YOLOv5 was not too outstanding,
labor is met. but it still showed effectiveness in all indicators. For the two
remaining models, the results showed that the model families
using Region-based Convolutional Neural Networks such as
MobilenetV3 and Resnet50 have not yet been able to occupy the
position of YOLO in general and YOLOv5 in particular in terms
of performance and application level. YOLOv5 outperforms the
other considered algorithms on average, implying that YOLOv5
robustly facilitates the solving of the safety problem in construc- In terms of computational process, the improved model
tion sites. spends more computing time than benchmarked models.
Nevertheless, the sacrifice is reasonable and acceptable regarding
the profits of high predicting accuracy.
Discussion The detection accuracy for each category of an object in the
The experimental results demonstrate that the enhanced version dataset is relatively high. Both values of precision and recall are
of YOLOv5s outperforms all other detection models in terms of above 90%. It is clear that the proposed model has improved
precision and recall. Following is a summary of additional detection accuracy.
findings: The experimental results on a real construction case demon-
The improved YOLO5s outperforms the benchmarked mod- strate the feasibility of the model. The model does not need any
els, indicating that including a detection scale in the model is handcraft feature selection and has a good capacity for extracting
crucial in building a detection model. The benchmarked models features in the images. The high precision and recall show the great
own three scale ranges of image features while the improved performance of the model compared to other considered models.
YOLO5s has four scale ranges. Therefore, the proposed model created a useful tool for detecting
The improved model is able to detect more small objects. PPE and enhancing on-site construction safety inspection.
This will effectively improve the detection accuracy, especially In summary, the above findings adequately explain the suc-
for detecting objects in an image with large size differences. cess of the improved YOLO5s in detecting personal protective
INTERNATIONAL JOURNAL OF CONSTRUCTION MANAGEMENT 1015
Table 4. Comparisons of different methods for hardhat detection. 640” for the size of images; batch for declaring a number of
Indicators images in 1 epoch train; “data ppe.yaml” for declaring the col-
Methods Precision Recall F1 score
lected database address; and “weights yolo5x.pt” to decide the
training model. The fourth step is applied for testing with com-
YOLOv5 0.74 0.66 0.70
YOLOv4 0.73 0.62 0.67 mand the “python detect.py” for starting the recognition process.
Faster-RCNN MobilenetV3 0.59 0.54 0.57 The last step is used for generating the output with two types of
Faster-RCNN Resnet50 0.65 0.58 0.61 data including image and video.
Disclosure statement Liu J, Luo H, Liu H. 2022. Deep learning-based data analytics for safety in
construction. Autom Constr. 140:104302.
No potential conflict of interest was reported by the author(s). Mneymneh BE, Abbas M, Khoury H. 2019. Vision-based framework for
intelligent monitoring of hardhat wearing on construction sites. J Comput
Civil Eng. 33(2):04018066.
Funding Nain M, Sharma S, Chaurasia S. 2021. Authentication control system for the
efficient detection of hard-hats using deep learning algorithms. J Discrete
This research is funded by Vietnam National University HoChiMinh Mathematical Sci Cryptography. 24(8):2291–2306.
City (VNU-HCM) under grant number DS2022-20-01. Nath ND, Behzadan AH, Paal SG. 2020. Deep learning for site safety: real-time
detection of personal protective equipment. Autom Constr. 112:103085.
Nath ND, Chaspari T, Behzadan AH. 2019. Single- and multi-label classification
of construction objects using deep transfer learning methods. J Information
References Technol Construct. 24:511–526. (Special issue Virtual, Augmented and
Mixed: New Realities in Construction).
Alexey AB. 2016. Yolo-mark gui for marking bounded boxes of objects in Naticchia B, Vaccarini M, Carbonari A. 2013. A monitoring system for real-time
images for training yolo. GitHub, Online. https://github.com/AlexeyAB/ interference control on large construction sites. Autom Constr. 29:148–160.
Yolo_mark. Paneru S, Jeelani I. 2021. Computer vision applications in construction: cur-
Akinlolu M, Haupt TC, Edwards DJ, Simpeh F. 2022. A bibliometric review rent state, opportunities & challenges. Autom Constr. 132:103940.
of the status and emerging research trends in construction safety manage- Qi J, Liu X, Liu K, Xu F, Guo H, Tian X, Li M, Bao Z, Li Y. 2022. An
ment technologies. Int J Construct Manage. 22(14):2699–2711.
improved YOLOv5 model based on visual attention mechanism:
Bhagwat K, Delhi VSK. 2021. Review of construction safety performance
Application to recognition of tomato virus disease. Comput Electron
measurement methods and practices: a science mapping approach. Int J
Construct Manage. 1–15. In press. DOI: 10.1080/15623599.2021.1924456. Agric. 194:106780.
Boudjit K, Ramzan N. 2022. Human detection based on deep learning Rao AS, Radanovic M, Liu Y, Hu S, Fang Y, Khoshelham K, Palaniswami M,
YOLO-v2 for real-time UAV applications. J Exper Theor Artificial Ngo T. 2022. Real-time monitoring of construction sites: sensors, meth-
Intelligence. 34(3):527–544. ods, and applications. Autom Constr. 136:104099.
Bureau of Labor Statistics. 2020. Number and rate of fatal work injuries, by Redmon J, Divvala S, Girshick R, Farhadi A. 2016. You only look once: unified,
industry. [accessed 2022 July]. https://www.bls.gov/charts/census-of-fatal- real-time object detection. Computer vision and pattern recognition. Cornell
occupational-injuries/number-and-rate-of-fatal-work-injuries-by-industry.htm. University, Computer Vision and Pattern Recognition. ArXiv Preprint.
Chung WWS, Tariq S, Mohandes SR, Zayed T. 2020. IoT-based application Redmon J, Farhadi A. 2017. YOLO9000: better, faster, stronger. In 2017 IEEE
for construction site safety monitoring. Int J Construct Manage. 1–17. In Conference on Computer Vision and Pattern Recognition (CVPR).
press. DOI: 10.1080/15623599.2020.1847405. Ren S, He K, Girshick R, Sun J. 2017. Faster R-CNN: towards real-time
Deng H, Tian M, Ou Z, Deng Y. 2022a. A semantic framework for on-site object detection with region proposal networks. IEEE Trans Pattern Anal
evacuation routing based on awareness of obstacle accessibility. Autom Mach Intell. 39(6):1137–1149.
Constr. 136:104154. Sadiq M, Masood S, Pal O. 2022. FD-YOLOv5: a fuzzy image enhancement
Deng L, Li H, Liu H, Gu J. 2022b. A lightweight YOLOv3 algorithm used for based robust object detection model for safety helmet detection. Int J
safety helmet detection. Sci Rep. 12(1):10981. Fuzzy Syst. 24(5):2600–2616.
Development, M.t.M.o.H.a.U.-R. 2017. The short reports of fatal accidents in Seo J, Han S, Lee S, Kim H. 2015. Computer vision techniques for construc-
China’s building construction activities. [accessed 2022 July]. http://sgxxxt. tion safety and health monitoring. Adv Eng Inf. 29(2):239–251.
mohurd.gov.cn/Public/AccidentList.aspx. Shan D, Shehata M, Badawy W. 2011. Hard hat detection in video sequences
Ding L, Fang W, Luo H, Love PED, Zhong B, Ouyang X. 2018. A deep hybrid based on face features, motion and color information. In 2011 3rd
learning model to detect unsafe behavior: integrating convolution neural International Conference on Computer Research and Development.
networks and long short-term memory. Autom Constr. 86:118–124. Shrestha K, Shrestha PP, Bajracharya D, Yfantis EA. 2015. Hard-hat detection
Dong S, He Q, Li H, Yin Q. 2015. Automated PPE misuse identification and assess- for construction safety visualization. J Constr Eng. 2015:721380.
ment for safety performance enhancement. In ICCREM 2015. p. 204–214. Siddula M, Dai F, Ye Y, Fan J. 2016. Unsupervised feature learning for
Dong X, Yan S, Duan C. 2022. A lightweight vehicles detection network objects of interest detection in cluttered construction roof site images.
model based on YOLOv5. Eng Appl Artif Intell. 113:104914. Procedia Eng. 145:428–435.
Fang Q, Li H, Luo X, Ding L, Luo H, Rose TM, An W. 2018. Detecting non- Tan S, Lu G, Jiang Z, Huang L. 2021. Improved YOLOv5 network model
hardhat-use by a deep learning method from far-field surveillance videos. and application in safety helmet detection. In 2021 IEEE International
Autom Constr. 85:1–9. Conference on Intelligence and Safety for Robotics (ISR).
Fang W, Love PED, Luo H, Ding L. 2020. Computer vision for behaviour- Wang H, Hu Z, Guo Y, Yang Z, Zhou F, Xu P. 2020. A real-time safety helmet
based safety in construction: a review and future directions. Adv Eng Inf. wearing detection approach based on CSYOLOv3. Appl Sci. 10(19):6732.
43:100980. Wu H, Zhao J. 2018. An intelligent vision-based approach for helmet identi-
Girshick R. 2015. Fast R-CNN. In 2015 IEEE International Conference on fication for work safety. Comput Ind. 100:267–277.
Computer Vision (ICCV); p. 1440–1448. Wu J, Cai N, Chen W, Wang H, Wang G. 2019. Automatic detection of
Guo BHW, Zou Y, Fang Y, Goh YM, Zou PXW. 2021. Computer vision hardhats worn by construction personnel: a deep learning approach and
technologies for safety science and management in construction: a critical
benchmark dataset. Autom Constr. 106:102894.
review and future research directions. Saf Sci. 135:105130.
Xie Z, Liu H, Li Z, He Y. 2018. A convolutional neural network based
He K, Zhang X, Ren S, Sun J. 2014. Spatial pyramid pooling in deep convo-
approach towards real-time hard hat detection. 2018 IEEE International
lutional networks for visual recognition. In: David Fleet, Tomas Pajdla,
Bernt Schiele, Tinne Tuytelaars, editors. Computer vision – ECCV 2014. Conference on Progress in Informatics and Computing (PIC), Suzhou,
Cham: Springer International Publishing; p. 346–361. China. IEEE; p. 430–434.
He K, Gkioxari G, Dollar P, Girshick R. 2017. Mask R-CNN. In 2017 IEEE Yap JBH, Skitmore M, Lam CGY, Lee WP, Lew YL. 2022. Advanced technol-
International Conference on Computer Vision (ICCV); p. 2980–2988. ogies for enhanced construction safety management: investigating
Kelm A, Laußat L, Meins-Becker A, Platz D, Khazaee MJ, Costin AM, Malaysian perspectives. Int J Construct Manage. 1–10. In press. DOI: 10.
Helmus M, Teizer J. 2013. Mobile passive Radio Frequency Identification 1080/15623599.2022.2135951.
(RFID) portal for automated and rapid control of Personal Protective Yi J, Wu P, Metaxas DN. 2019. ASSD: attentive single shot multibox
Equipment (PPE) on construction sites. Autom Constr. 36:38–52. detector. Comput Vision Image Understanding. 189:102827.
Kolar Z, Chen H, Luo X. 2018. Transfer learning and deep convolutional Zeng L, Duan X, Pan Y, Deng M. 2022. Research on the algorithm of hel-
neural networks for safety guardrail detection in 2D images. Autom met-wearing detection based on the optimized YOLOv4. The Visual
Constr. 89:58–70. Computer. In press. DOI: 10.1007/s00371-022-02471-9.
Lee D, Khan N, Park C. 2021. Rigorous analysis of safety rules for vision Zhang H, Yan X, Li H, Jin R, Fu H. 2019. Real-time alarming, monitoring,
intelligence-based monitoring at construction jobsites. Int J Construct and locating for non-hard-hat use in construction. 145(3):04019006.
Manage. 1–11. In press. DOI: 10.1080/15623599.2021.2007453. Zhang Y-J, Xiao F-S, Lu Z-M. 2022. Helmet wearing state detection based on
Lin TY, Goyal P, Girshick R, He K, Dollar P. 2017. Focal loss for dense improved Yolov5s. Sensors 22(24):9843.
object detection. In 2017 IEEE International Conference on Computer Zhu L, Lee F, Cai J, Yu H, Chen Q. 2022. An improved feature pyramid net-
Vision (ICCV); p. 2999–3007. work for object detection. Neurocomputing. 483:127–139.