Professional Documents
Culture Documents
A Comparative Study of YOLO V4 and V5 Architectures On Pavement Crack Datasets Using Region-Based Detection
A Comparative Study of YOLO V4 and V5 Architectures On Pavement Crack Datasets Using Region-Based Detection
net/publication/370419009
CITATIONS READS
0 63
4 authors:
All content following this page was uploaded by Rauf Fatali on 25 July 2023.
1 Introduction
Frequent utilization of land transportation [1] causes road-pavement surfaces to
deteriorate and unstable climate events [2], such as heavy rainfall, flooding, and
temperature fluctuations, can contribute to further deterioration of the roads if
2 F. Rauf et al.
2 Related Works
Computer vision is one of the widely used fields of artificial intelligence in vari-
ous disciplines that involves several tasks with different strategies, such as object
classification, detection, and segmentation [9]. Object detection and segmenta-
tion involve two main phases: pattern recognition to recognize features from
image data, and classification to classify each object into the appropriate class.
While segmentation task uses pixel-wise detection to replace each pixel with the
appropriate class, object detection task localizes objects and draws bounding
boxes around them.
A Comparative Study of YOLO Architecture on Crack Detection 3
One of the early object detection algorithms, R-CNN, implements deep learn-
ing architecture in two stages: first, the selective search algorithm extracts region
proposals for regions of interest (RoI), then a convolutional neural network ex-
tracts feature from each RoI and classify the object. Fast R-CNN improved the
algorithm to run faster by extracting features in a single forward pass, instead of
feeding each RoI separately. Faster R-CNN further improved the algorithm by
using a Region Proposal Network which enables end-to-end training and faster
processing times. Hacıefendioğlu, K., and Başağa, H. B. [19] used a pre-trained
Faster R-CNN model to detect concrete road-pavement cracks under different
weather conditions. According to their findings, the Faster R-CNN model strug-
gles to detect cracks during fog, cloud, and nighttime. They demonstrated that
Faster R-CNN is noise sensitive which has a significant effect on the decrease
of overall accuracy by half. Xu, Xiangyang, et al. [20] compared R-CNN archi-
tecture with YOLOv3 considering both region-based and pixel-wise implemen-
tation. R-CNN performed well on the detection of deep and shallow crack types
4 F. Rauf et al.
YOLO head layer that is composed of three output blocks to predict bounding
boxes, objectness scores, and class probabilities for a set of predefined anchor
boxes. Additionally, compared to YOLOv4, YOLOv5 uses a new smooth and
non-monotonic Swish activation function [22]. This activation function has been
demonstrated to improve the efficiency of deep neural networks. Table 1 shows
the difference between the structural parts of YOLOv4 and YOLOv5. Although
YOLOv5 is an improved version of YOLOv4, there are some cases where it may
underperform YOLOv4 in terms of accuracy due to the accuracy-speed trade-off
inherent in the architecture.
Table 1: Architectural parts of YOLOv4 and YOLOv5.
YOLOv4 YOLOv5
Neural Network Type Fully convolutional Fully convolutional
Backbone CSPDarknet53 CSPDarknet53
Neck SSP and PANet PANet
Head YOLO layer YOLO layer
3 Comparison Methodology
Building an automated crack detection system is challenging due to the selection
of appropriate architecture which should be capable of detecting almost all thin
and small crack objects from each frame with high speed. In this paper, we
have focused on comparing the performance of two architectures, YOLOv4 and
YOLOv5. The selection of these architectures is warranted due to architectural
improvements at a large scale in YOLOv4 providing better results in terms of
accuracy, whereas YOLOv5 achieves similar results with faster inference time.
In this comparison, we used two publicly available image datasets, Edm-
Crack600 and RDD2022, to train and test YOLO architectures. Each dataset
represents diverse types of road-pavement cracks that can potentially impact the
future condition of the road in a distinct manner. Therefore, comparisons are
conducted not only by architectural evaluation but also by taking into account
the number of classes within datasets.
Fig. 3: Data samples from EdmCrack600 (a) and RDD2022 (b) datasets.
Taking all possible crack types into account is crucial for their future impact
on the road as it is mentioned before. EdmCrack600 dataset provides almost
all possible crack types from literature but in an imbalanced form. Removing
the imbalanced representation of classes has the capability of improving overall
accuracy while detecting all classes effectively. The general idea of the proposed
image augmentation technique is encompassed in Algorithm 1.
8 F. Rauf et al.
This algorithm can be divided into three main parts. The first part involves
identifying the under-represented classes from the whole dataset. This is done
by eliminating classes with high percentage representation within a dataset.
In the second part, we crop the identified under-represented classes from
the original images by checking whether each image is contained by the under-
represented class. The cropping is performed to ensure that the only region of
interest is preserved and not consists of over-represented crack types. Since each
image consists of several crack types, cropping is used in this step to avoid
increasing the number of other types and to achieve a more balanced represen-
tation. Lastly, the size of cropped image is set to a minimum size of 640 pixels
to meet the requirement of YOLO architecture.
The third and last part contains the application of the augmentation pipeline
to cropped images until a balanced representation is achieved. The augmenta-
tion process involves applying a series of transformations to the cropped image.
Applying appropriate transformations to datasets that exhibit variability based
on the characteristics can significantly impact the final results. For instance, in
the context of a crack, rotating a transverse crack by 90 degrees can result in
transforming it into a longitudinal crack.
In this last part, several transformations are used to augment cropped images.
Initially, a set of color transformations is applied that includes converting the
image to grayscale, adjusting brightness or contrast, and slightly shifting RGB
values in order to increase the variety of images. A subsequent set of transforma-
tions adds random noise to the images, such as rain, fog, snow, and blur to make
it challenging to detect cracks. Additionally, some image transformations such as
random flipping and rotating with low values are applied to the cropped images,
and save the resulting augmented images to their corresponding directory.
Implementing this proposed image augmentation technique has the potential
to solve class imbalance and increase the overall accuracy of the object detection
algorithm. To be able to compare and evaluate the impact of the number of crack
A Comparative Study of YOLO Architecture on Crack Detection 9
The YOLO architecture provides different sizes of networks to train object de-
tection tasks. For comparisons, we selected the YOLOv4, YOLOv5-large, and
YOLOv4-tiny, YOLOv5-small, accordingly large and small-sized networks.
Training of these architectures differs from each other according to their con-
figuration file. The recommendation of the authors was highly considered in find-
ing the optimal parameters of each architecture. In the YOLOv4 architecture,
its parameters have a dependence on class number according to the equations1 :
1
https://github.com/kiyoshiiriemon/yolov4 darknet#how-to-train-to-detect-your-
custom-objects
10 F. Rauf et al.
4 Experimental Results
4.1 Experimental conditions
We used a workstation equipped with 64GB RAM, Intel® Core™ i9-10900X
CPU @ 3.70GHz, and 2x NVIDIA GeForce RTX 2080 Ti 12GB GPU, which
provided fast and efficient computation. The 2x12GB memory size allowed us
to use optimal batch sizes during training and perform more complex computa-
tions. The implementation of these architectures in the PyTorch deep learning
framework made GPU training even easier by offering a high-level interface.
Python 3.10, Nvidia driver version 522.06, and CUDA version 11.8 is used for
the PyTorch environment.
The increased number of classes can affect the balanced class representation,
which is noticeable in the EdmCrack600 dataset. While a model is trained using
an imbalanced dataset, it will struggle to learn less-represented classes. Moreover,
having a large number of annotations per image that indicates multiple crack
samples in a single image, can pose a significant challenge during detection and
may lead to reduced model precision and recall.
In order to perform comparisons, the original datasets were first trained on
YOLO architectures using their built-in augmentation mechanism. Afterward,
the augmented Edmcrack600 dataset, which includes all possible crack types, was
trained by disabling the internal augmentations in the best-performed YOLO
architecture.
Table 3 shows results on original datasets where YOLOv5l outperformed
other architectures in terms of mAP, achieving a score of 0.656 and 0.408 on
the RDD2022 and EdmCrack600 dataset with a single crack type, respectively.
The overall results are not robust due to the mentioned complexity of both
datasets. We can easily infer that the imbalanced class representation in the
EdmCrack600 dataset has huge effects on its final results. The more balanced
RDD2022 dataset with a less number of crack types performed better in this
perspective. Additionally, increasing the size of the network resulted in longer
inference times, potentially leading to underwhelming real-time detection results.
The augmented EdmCrack600 dataset was trained and validated on the best-
performed architecture, YOLOv5l, which resulted in increasing mAP from 0.207
to 0.319, as shown in Table 4. It is observable that using augmentation tech-
12 F. Rauf et al.
niques has the potential to improve overall results. The proposed augmentation
strategy contributed to achieving a more balanced result by increasing the overall
precision and recall of under-represented crack types.
In addition to comparisons, hyperparameter tuning and removing the mosaic
augmentation from YOLO architecture were tested to see how they affect results.
The results were affected by the mosaic augmentation removal, whereas tuned
hyperparameters did not have much effect. The mosaic augmentation technique
assists to represent more crack types in a single image by combining four different
ones. Removing this technique resulted in a performance decrease by 8-9% mAP,
particularly for datasets containing more crack types.
In general, YOLOv5 architecture outperformed YOLOv4 in terms of both
accuracy and speed. Due to the large number of samples in RDD2022, YOLOv5
achieved higher precision, recall, and mAP values. Moreover, the implemen-
tation of the augmentation technique significantly increased the results of the
EdmCrack600 dataset with 8 different crack types.
5 Conclusion
Automating crack detection problems using smart cars can help us to detect
road-pavement distresses in the early phase. This paper represents a part of
our state-of-the-art by comparing object detectors that can be implemented in
real-time applications. The comparisons have been done by using two different
featured publicly available datasets, EdmCrack600 and RDD2022, and trained
them on the best-performed YOLO architectures, YOLOv4 and YOLOv5.
Detecting all possible cracks is crucial as different crack types affect the
pavement surfaces differently. Therefore, our comparison methodology exam-
ined models based on both the number of crack types and the architectures
themselves. Additionally, we proposed an augmentation strategy that augments
under-represented crack objects to handle an imbalanced class representation.
The comparisons showed that YOLOv5 outperformed YOLOv4 in terms of
accuracy and speed. Specifically, the large-sized version of YOLOv5 achieved
an mAP of 65.6% on RDD2022 with a single crack type and an almost similar
mAP of 63.2% on the same dataset with four different crack types. Augmenting
the EdmCrack600 dataset, which is represented by almost all crack types, par-
tially removed the imbalanced class representation, while improving the overall
results from 20.7% to 31.9%. These results highlight the importance of having
a well-balanced and well-annotated dataset with a sufficient amount of data
that can help us to implement real-time detection using these state-of-the-art
architectures.
As future work, we aim to investigate pixel-based algorithms that can improve
precision while considerably increasing inference time. We also plan to improve
the introduced augmentation algorithm to make it applicable in both techniques,
object detection and segmentation. Afterward, we are looking forward to build
a hybrid model that collaborates between the region and pixel-based models.
This fast and precise final model can serve as an additional brick to be added to
smart systems in future cars and contribute to the development of smart cities.
A Comparative Study of YOLO Architecture on Crack Detection 13
References
1. Pais, Jorge C., Sara IR Amorim, and Manuel JC Minhoto: Impact of traffic overload
on road pavement performance. Journal of transportation Engineering 139.9 (2013).
2. Qiao, Yaning, Gerardo W. Flintsch, Andrew R. Dawson, and Tony Parry: Exam-
ining effects of climatic factors on flexible pavement performance and service life.
Transportation research record 2349, 100-107 (2013).
3. World Health Organization. Violence, Injury Prevention, and World Health Orga-
nization. Global status report on road safety: time for action. World Health Orga-
nization (2009).
4. Varadharajan, Srivatsan, Sobhagya Jose, Karan Sharma, Lars Wander, and
Christoph Mertz: Vision for road inspection. In IEEE winter conference on ap-
plications of computer vision, pp. 115-122. IEEE, (2014).
5. Yu, Jiang-Miao, Charles Lee, and Lei-Lei Chen: Survival model-based economic eval-
uation of preventive maintenance practice on asphalt pavement. Journal of South
China University of Technology 40, no. 11, 133-137 (2012).
6. Arena, Fabio, Giovanni Pau, and Alessandro Severino: An overview on the current
status and future perspectives of smart cars. Infrastructures 5, no. 7, 53 (2020).
7. Huval, B., Wang, T., Tandon, S., Kiske, J., Song, W., Pazhayampallil, J., and Mu-
jica, F: An empirical evaluation of deep learning on highway driving. arXiv preprint
arXiv:1504.01716 (2015).
8. Dorafshan Sattar, Robert J. Thomas, and Marc Maguire: Comparison of deep con-
volutional neural networks and edge detectors for image-based crack detection in
concrete. Construction and Building Materials 186, 1031-1045 (2018).
9. Klette, Reinhard: Concise computer vision:An Introduction into Theory and Algo-
rithms. Vol. 233. London: Springer, (2014).
10. Girshick, Ross, Jeff Donahue, Trevor Darrell, and Jitendra Malik: Rich feature
hierarchies for accurate object detection and semantic segmentation. In Proceedings
of the IEEE conference on computer vision and pattern recognition (2014).
11. Girshick, Ross: Fast r-cnn. In Proceedings of the IEEE international conference on
computer vision, pp. 1440-1448 (2015).
12. Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun: Faster r-cnn: Towards
real-time object detection with region proposal networks. Advances in neural infor-
mation processing systems 28 (2015).
13. Liu, Wei, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed,
Cheng-Yang Fu, and Alexander C. Berg: Ssd: Single shot multibox detector. In Com-
puter Vision–ECCV 2016: 14th European Conference, Amsterdam, The Nether-
lands, October 11–14, 2016, Proceedings, Part I 14, pp. 21-37. Springer International
Publishing, (2016).
14. Redmon, Joseph, Santosh Divvala, Ross Girshick, and Ali Farhadi: You only look
once: Unified, real-time object detection. In Proceedings of the IEEE conference on
computer vision and pattern recognition, pp. 779-788 (2016).
15. Redmon, Joseph, and Ali Farhadi: YOLO9000: better, faster, stronger. In Pro-
ceedings of the IEEE conference on computer vision and pattern recognition, pp.
7263-7271 (2017).
16. Redmon, Joseph, and Ali Farhadi: Yolov3: An incremental improvement. arXiv
preprint arXiv:1804.02767 (2018).
17. Bochkovskiy Alexey, Chien-Yao Wang, and Hong-Yuan Mark Liao: Yolov4: Opti-
mal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020).
14 F. Rauf et al.
18. Glenn Jocher, Ayush Chaurasia, Alex Stoken, Jirka Borovec, NanoCode012,
Yonghye Kwon, Kalen Michael, TaoXie, Jiacong Fang, imyhxy, Lorna, Zeng Yifu,
Colin Wong, Abhiram V, Diego Montes, Zhiqiang Wang, Cristi Fati, Jebastin Nadar,
Laughing, Mrinal Jain. ultralytics/yolov5: v7.0 - YOLOv5 SOTA Realtime Instance
Segmentation (v7.0). Zenodo. https://doi.org/10.5281/zenodo.7347926 (2022).
19. Hacıefendioğlu, Kemal, and Hasan Basri Başağa: Concrete road crack detection
using deep learning-based faster R-CNN method. Iranian Journal of Science and
Technology, Transactions of Civil Engineering: 1-13 (2022).
20. Xu, Xiangyang, Mian Zhao, Peixin Shi, Ruiqi Ren, Xuhui He, Xiaojun Wei, and
Hao Yang: Crack detection and comparison study based on faster R-CNN and mask
R-CNN. Sensors 22, no. 3: 1215 (2022).
21. ChienYao Wang, HongYuan Mark Liao, YuehHua Wu, PingYang Chen, JunWei
Hsieh, and IHau Yeh. Cspnet: A new backbone that can enhance learning capability
of cnn. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition
Workshops (CVPRW), pages 1571–1580, (2020).
22. Ramachandran, Prajit, Barret Zoph, and Quoc V. Le: Searching for activation
functions. arXiv preprint arXiv:1710.05941 (2017).
23. Zhang, Yuexin, Jie Huang, and Fenghuang Cai: On bridge surface crack detection
based on an improved YOLO v3 algorithm. IFAC-PapersOnLine 53, no. 2 (2020).
24. Li, Li, Baihao Fang, and Jie Zhu: Performance Analysis of the YOLOv4 Algo-
rithm for Pavement Damage Image Detection with Different Embedding Positions
of CBAM Modules. Applied Sciences 12, no. 19, 10180 (2022).
25. Liu, Zhen, Wenxiu Wu, Xingyu Gu, Shuwei Li, Lutai Wang, and Tianjie Zhang:
Application of combining YOLO models and 3D GPR images in road detection and
maintenance. Remote Sensing 13, no. 6, 1081 (2021).
26. Yao, Gang, Yujia Sun, Mingpu Wong, and Xiaoning Lv: A real-time detection
method for concrete surface cracks based on improved YOLOv4. Symmetry 13, no.
9, 1716 (2021).
27. Yao, Gang, Yujia Sun, Yang Yang, and Gang Liao: Lightweight neural network for
real-time crack detection on concrete surface in fog. Frontiers in Materials 8 (2021).
28. Wan, Fang, Chen Sun, Hongyang He, Guangbo Lei, Li Xu, and Teng Xiao:
YOLO-LRDD: a lightweight method for road damage detection based on improved
YOLOv5s. EURASIP Journal on Advances in Signal Processing 2022, no. 1 (2022).
29. Teng, Shuai, Zongchao Liu, Gongfa Chen, and Li Cheng: Concrete crack detection
based on well-known feature extractor model and the YOLO v2 network. Applied
Sciences 11, no. 2 (2021).
30. Mandal, Vishal, Abdul Rashid Mussah, and Yaw Adu-Gyamfi: Deep learning
frameworks for pavement distress classification: A comparative analysis. In 2020
IEEE International Conference on Big Data (Big Data), pp. 5577-5583. IEEE,
(2020).
31. Qiu, Qiwen, and Denvid Lau: Real-time detection of cracks in tiled sidewalks using
YOLO-based method applied to unmanned aerial vehicle (UAV) images. Automa-
tion in Construction 147, 104745 (2023).
32. Q. Mei, M. Gül, and M.R. Azim, Densely connected deep neural network consid-
ering connectivity of pixels for automatic crack detection. Automation in Construc-
tion, 110: p. 103018 (2020).
33. Q. Mei, and M. Gül, A cost effective solution for pavement crack inspection using
cameras and deep neural networks. Construction and Building Materials, 256: p.
119397 (2020).
A Comparative Study of YOLO Architecture on Crack Detection 15