Professional Documents
Culture Documents
Retrieve
Retrieve
Retrieve
1. Introduction
data at highway speed up to 100 km/h (Wang et al. 2008,
Crack information is a primary indicator of pavement distress Wang et al. 2012). These systems have proven their technologi-
for pavement design and management purposes. As reported in cal superiority of automatic acquisition of pavement surface
MCAG (2009), with the appearance of the pavement cracks, the data, but there remains a need for further improvement in an
quality of road surface deteriorates 40% during its first 75% of automated interpretation of pavement data, especially in
life, and then, if without any treatment, it decreases another classification for the major types of pavement cracks.
40% in the next 12% of life. Pavement cracks are usually divided Conventionally, the framework for crack detection is based
into six types (www.asphaltinstitute.org/asphalt-pavement-dis- on designing a variety of feature extractors for each image pixel,
tress-summary/): alligator crack, block crack, edge crack, longi- which is followed by a binary classifier to determine whether
tudinal crack, transverse crack and reflection crack. Typical this pixel contains a crack or not. The local image-processing
examples of them are shown in Figure 1. Traditional crack methods, such as the intensity thresholding, edge detection
detection is conducted by skilled raters going over the road, and sub-window based hand-crafted feature extraction
which is time consuming, labour intensive and expensive. methods (Oliveira and Lobato Correia 2008, Hu and Zhao
The manual inspection is dangerously risky for the personnel 2010, Zou et al. 2012) are widely used in practice. However,
due to traffic hazards. Additionally, the variability and repeat- two major challenges of the automatic interpretation of pave-
ability are also the major issues due to the inconsistency in ment image still exist: (1) extracting cracks from complex back-
manual surveys. Hence, an automated inspection method for ground using low-level image cues and (2) classifying the
detecting pavement distress becomes essential. recognised cracks into a specific category. The industry in the
Over the last decades, with a rapid technological advance of pavement distress inspection has been struggling to produce
machine vision, comprehensive pavement survey systems have production-worthy and fully automated systems.
been developed (Huang and Xu 2006, Wang et al. 2008). In Convolutional Neural Network (CNN), introduced in 1980
accordance to the application of different sensor technologies, (Fukushima 1980) and improved over the past two decades
three approaches dominate: (1) a two-dimensional image pro- (Behnke 2003, Simard et al. 2003), is a powerful technology
cessing (Mahler et al. 1991, Downey and Koutsopoulos 1993); for image classification. Since the early 2000s, CNN has been
(2) a three-dimensional cloud point scanning (Smadja et al. applied with great success to classification (Gao et al. 2017,
2010) and (3) a three-dimensional line scanning (Wang et al. Pedraza et al. 2017), segmentation and recognition (Li et al.
2012, Wöhler 2013). In 1998, Australian Commonwealth 2017) of objects. Face recognition is a major practical success
Scientific and Industrial Research Organization developed an (Garcia and Delakis 2004). CNN has yielded full potentiality
automated road crack detection system, RoadCrack (Ferguson compared with traditional methodologies. Local Connections,
et al. 1998), which was able to identify cracks wider than 1 mm Shared Weights, Pooling and the Use of Many Layers are four
at highway speed. The emerging 3D laser-scanning technique key ideas behind CNN (Lecun et al. 2015). The architecture
has been fully adopted in PaveVision3D Ultra system to achieve of a typical CNN (Figure 2) is structured with convolutional,
30-kHz scanning rate for 1 mm resolution pavement surface pooling and fully connected layers.
Figure 2. Overall architecture of a typical CNN: the first stage and second stage are composed of convolutional layers and pooling layers; the classifier is composed of few
fully connected layers.
INTERNATIONAL JOURNAL OF PAVEMENT ENGINEERING 459
been widely applied, such as histogram-based algorithms and accurate hand-crafted feature extractor. Without designing
iterated clipping algorithms (Kirschke and Velinsky 1992, Oh hand-crafted features extractor, Zhang et al. (2016) and Cha
et al. 1998, Li and Liu 2008). However, they cannot deliver con- et al. (2017) demonstrate that CNNs can provide high perform-
sistent good results only by setting a single threshold for the ance on pavement crack classification regardless of the intensity
entire image, especially when image intensity varies with differ- inhomogeneity and complexity of background.
ent illuminance conditions.
Edge-detection-based methods are another kind of common
techniques. The edge detectors are applied to process over the 3. Proposed method
pavement images, such as Sobel, Prewitt, Laplacian and Canny PaveVision3D Ultra made by WayLink was used to acquire 3D
(Canny 1986). In Yan et al. (2007), the morphological filters are pavement images. A great deal of data was collected at various
introduced to detect crack edges with the assistance of a sections of highways, allowing the authors to create a dataset.
modified median filter to remove noise. In most cases, detecting One thousand images were randomly selected from the col-
disjoint crack curves is the main limitation of using edge- lected database. The size of each image is 4096 × 2048 repre-
detection. senting rough 4-m by 2-m pavement surface.
Wavelet transforms are a powerful tool for pavement crack
detection in frequency-domain. Because of wavelet transforms’
high-performance on processing linear singularity in 2D signal, 3.1. Pre-processing and building dataset
in (Shan et al., 2005). Ridgelet Transform and Curvelet Trans- In order to solve the classification problem, the proposed CNNs
form are applied to image segmentation and enhancement. In are trained on square image patches. The original 3D pavement
Zhou and Huang (2006), pavement cracks are extracted from images are divided into patches with size of 512 × 512, as
background by using wavelet-based statistical criteria. How- depicted in Figure 4. Both the training dataset and testing data-
ever, due to the anisotropic characteristic of wavelets, these set are made of these small patches.
approaches may encounter difficulty in detecting cracks with In the training dataset and testing dataset, small patches are
high curvature or low continuity. classified into five types: (1) non-crack, (2) transverse crack, (3)
Machine learning based methods have attracted more and longitudinal crack, (4) block crack and (5) alligator crack.
more attention of researchers, with rapid growth of data Figure 5 presents a subset of the training dataset. The
usage involved transportation, such as traffic sign and pave-
ment surface images. In Jahangiri and Rakha (2015), the
authors adopts different supervised learning methods to
develop multi-class classifiers that identify the transportation
modes including a driving car, a riding bicycle, a bus, etc. In
Chou et al. (1995), Cheng et al. (2001) and Nguyen et al.
(2009), a vector of various features is extracted from sub-
images cropped from the whole pavement images, and the vec-
tor is used for training and classifying afterwards. In Bray et al.
(2006), the density and histogram are calculated as features that
are passed to a neural network for the classification of images
into crack maps or free-crack maps. For all the methods, the
training and classification are conducted on extracted features
rather than the raw images. Figure 3 depicts two main modules
in a traditional classification system. Due to the variability and Figure 4. An illustration for the way to crop an original 3D pavement image into
richness of image data, it is almost impossible to build an patches.
Figure 5. A subset of the training dataset: the first block diagram – non-crack; the
second block diagram – transverse crack; the third block diagram – longitudinal
Figure 3. Traditional classification system with two modules: a fixed feature crack; the fourth block diagram – block crack; the fifth block diagram – alligator
extractor and a trainable classifier. crack.
460 B. LI ET AL.
Table 1. Target values for five types of data. at the (m + 1)th layer, where the backpropagation algorithm
Type of Transverse Longitudinal Block Alligator derives its name.
crack Non-crack crack crack crack crack
Batch training vs. SGD – SGD algorithm involves ‘on-line’
Target [1 0 0 0 0]T [0 1 0 0 0]T [0 0 1 0 0]T [0 0 0 1 0]T [0 0 0 0 1]T or incremental training, which means that the network par-
value
ameters are updated after each input is presented. Perform-
ing batch training is a good choice, which means that the
average gradient is computed after a batch of inputs is fed
convergence of training is usually faster if the average of each to the network. The updating equation for the batch training
input data is close to zero (Lecun et al. 1998a), so the input algorithm is
data is normalised as follows:
I − mean(I) a Q
∂E
N I= W(k + 1) = W(k) − , (3)
max (I) − min (I)
(1) Q q=1 ∂W
where N_I denotes the normalised data and I denotes the input where Q denotes batch size.
data.
In this classification problem, the target value is a five-
dimensional column vector with value of ‘0’ or ‘1’. Each train- 3.3. The proposed architectures of CNNs
ing data and testing data is labelled with a vector according to
To maximise the performance of classification, it is essential to
Table 1.
compose CNN layers in an optimal way. With more hidden
In the term of training for deep learning, the sufficiency and
layers, the performance of classification could be improved,
diversity of training dataset are fundamental to the success of
but the speed of training may slow down. On the other hand,
training CNNs. Hence, more than 28,000 patches and corre-
high complexity of CNNs with many layers and parameters
sponding target vectors make up the training dataset and test-
may lead to overfitting. Inspired by AlexNet (Krizhevsky
ing dataset. The composition of the training dataset and testing
et al. 2012) and LeNet (Lecun et al. 1998b), four CNN architec-
dataset are listed in Table 2.
tures are proposed through some experiments, which are out-
lined in Table 3. For notational convenience, we refer to the
3.2. CNN training CNN by their names (CNN-1∼4). The parameters of convolu-
tional layer are denoted as ‘Conv < number of feature maps> −
Stochastic gradient descent (SGD) – The goal of training a CNN < size of receptive field >’ for brevity. The loss function E is the
is to find the value of all parameters, W, that can minimise a mean square error for all CNN architectures, which is defined
loss function, E. The loss function E (i) = D(D (i), F(W, Z (i))) as
measures the discrepancy between D (i), the ‘correct’ or desired
output for input Z (i), and output F(W, Z (i))) generated from the 1 Q
2
function F, that is, the CNN. The general problem of minimis- E= (D(i) − O(i) ) , (4)
Q i=1
ing a function with respect to a set of parameters is at the root
of many issues in the field of computer science and mathemat- where D (i) is the ith target vector, and O (i) is the ith output
ics. SGD algorithm has been proven a predominant method- vector.
ology (Chou et al. 1995, Bray et al. 2006, Zhang et al. 2016). ReLu and log-sigm are applied as the activation function in
The loss function can be minimised by estimating the impact convolutional layers and fully connected layers, respectively.
of small variations of the parameter values in the CNN. The Before feeding the CNNs with our training dataset, in order
simplest minimisation procedure using SGD is to save computing time, each patch is resized into 256 × 256.
The batch size Q is equal to 5. All the data processing and
∂E
W(k + 1) = W(k) − a , (2) the CNNs training were running on MATLAB platform
∂W installed in my own computer equipped with Intel Core i7-
where α is learning rate, which is a constant less than 1, and k is 6700 T microprocessor and 12GB of RAM. Once the conver-
iteration time. gence is achieved, the training process is terminated, as illus-
Since the proposed CNN is a multi-layer feedforward net- trated in Figure 6.
work, the relationship between the CNN parameters and the
implicit loss function E is complex. Hence, the chain rule of
calculus is applied to calculate derivatives. In this way, the Table 3. CNN configurations.
gradients at the mth layer are computed from the gradients CNN-1 CNN-2 CNN-3 CNN-4
Input layer 256*256 3D pavement image
Conv6-3*3 Conv6-5*5 Conv6-7*7 Conv6-9*9
Table 2. The composition of the dataset. maxpooling
Conv12-3*3 Conv12-5*5 Conv12-7*7 Conv12-9*9
Non- Transverse Longitudinal Block Alligator maxpooling
Type of crack crack crack crack crack crack Conv24-3*3 Conv24-5*5 Conv24-7*7 Conv24-9*9
Number of 5552 5184 5419 5362 5445 maxpooling
training data FC-1
Number of testing 300 300 300 300 300 FC-2
data FC-3
INTERNATIONAL JOURNAL OF PAVEMENT ENGINEERING 461
accuracies of four proposed CNNs are above 94%. Notably, the Transport Research Ltd conference, 1998, Sydney, New South Wales,
overall accuracy of the CNN-3 is even above 96%. The high Australia.
Fukushima, K., 1980. Neurocognitron: a self organizing neural network
performance and reliablity of classification from the proposed
model for a mechanism of pattern recognition unaffected by shift in
CNNs demonstrate that the CNNs are powerful tools for clas- position. Biological Cybernetics, 36 (4), 193–202.
sifying 3D pavement images. Gao, F., et al., 2017. Dual-branch deep convolution neural network for
In order to explore how the receptive field affects CNN with polarimetric SAR image classification. Applied Sciences, 7 (5), 447.
respect to training time and classification performance, differ- Garcia, C. and Delakis, M., 2004. Convolutional face finder: a neural archi-
tecture for fast and robust face detection. IEEE Transactions on Pattern
ent sizes of receptive field are adopted in the convolutional Analysis & Machine Intelligence, 26 (11), 1408.
layers, while the number of convolutional layers and fully con- He, K., et al., 2016. Identity mappings in deep residual networks. European
nected layers, and the activity functions are fixed. The results of conference on computer vision, Springer, Cham, 630–645.
our experiments reveal that the size of receiptive field has an Hu, Y. and Zhao, C., 2010. A local binary pattern based methods for pave-
important influence on training times. The CNNs with smaller ment crack detection. Journal of Pattern Recognition Research, 1, 1
(20103), 140–147.
size of receptive field require far more iterations to converge.
Huang, Y., and Xu, B., 2006. Automatic inspection of pavement cracking
On the other hand, the size of receptive field has almost no distress. Journal of Electronic Imaging, 15 (1), 013017.
effect on the classification performance of the proposed Jahangiri, A. and Rakha, H.A., 2015. Applying machine learning tech-
CNNs. The research shows that the proposed CNN with the niques to transportation mode recognition using mobile phone sensor
receptive field size of 7 × 7 is the optimal one. data. IEEE Transactions on Intelligent Transportation Systems, 16 (5),
2406–2417.
Further research should be conducted on how other hyper-
Kirschke, K.R. and Velinsky, S.A., 1992. Histogram-based approach for
parameters impact CNNs, such as the number of convolutional automated pavement-crack sensing. Journal of Transportation
layers, and the number of nodes in the connected layer, etc. Engineering, 118 (5), 700–710.
Therefore, more experiments are needed. Krizhevsky, A., Sutskever, I., and Hinton, G.E. 2012. ImageNet classifi-
cation with deep convolutional neural networks. International confer-
ence on neural information processing systems, 1097–1105.
Lecun, Y., et al., 1998a. Efficient BackProp. Neural Networks Tricks of the
Acknowledgements
Trade, 1524 (1), 9–50.
The authors would like to thank Dr. Wang’s team for providing pavement Lecun, Y., et al., 1998b. Gradient-based learning applied to document rec-
3D data. ognition. Proceedings of the IEEE, 86 (11), 2278–2324.
Lecun, Y., Bengio, Y., and Hinton, G., 2015. Deep learning. Nature, 521
(7553), 436–444.
Li, C., et al., 2017. Deepgait: a learning deep convolutional representation
Disclosure statement
for view-invariant gait recognition using joint Bayesian. Applied
No potential conflict of interest was reported by the authors. Sciences, 7 (3), 15.
Li, Q. and Liu, X., 2008. Novel approach to pavement image segmentation
based on neighboring difference histogram method. Congress on image
and signal processing – CISP ’08, 27–30 May 2008. Sanya, Hainan, China.
Funding
Mahler, D.S., et al., 1991. Pavement distress analysis using image proces-
Funding for this research was provided by the National Natural Science sing techniques. Computer-aided Civil & Infrastructure Engineering, 6
Foundation of China (Grant number U153420027, 51478398). The (1), 1–14.
authors gratefully acknowledge these supports MCAG, 2009. M.C.A.o.G., Pavement management Report for Merced
County.
Nguyen, T.S., Avila, M., and Begot, S., 2009. Automatic detection and
References classification of defect on road pavement using anisotropy measure.
European Signal Processing Conference. IEEE, 2009, 1–5.
Available from: www.asphaltinstitute.org/asphalt-pavement-distress- Oh, H., Garrick, N.W., and Achenie, L.E.K., 1998. Segmentation algorithm
summary/ [Accessed 20 April 2017]. using iterative clipping for processing noisy pavement images. Second
Behnke, S, 2003. Hierarchical neural networks for image interpretation. international conference on imaging technologies: techniques and appli-
Lecture Notes in Computer Science, 2766 (3), 1345–1346. cations in civil engineering, Imaging Technologies: Techniques and
Bray, J., et al., 2006. A neural network based technique for automatic Applications in Civil Engineering. Second International Conference,
classification of road cracks. International Joint Conference on Neural Davos, Switzerland, 138–147.
Networks. IEEE, 2006, 907–912. Oliveira, H. and Lobato Correia, P., 2008. Supervised strategies for cracks
Canny, J., 1986. A computational approach to edge detection. IEEE detection in images of road pavement flexible surfaces. European signal
Computer Society, 679–698. processing conference, Signal Processing Conference, 2008, European.
Cha, Y.J., Choi, W., and Büyüköztürk, O., 2017. Deep learning-based crack IEEE, 2008:1–5.
damage detection using convolutional neural networks. Computer- Park, J.K., et al., 2016. Machine learning-based imaging system for surface
aided Civil & Infrastructure Engineering, 32 (5), 361–378. defect inspection. International Journal of Precision Engineering and
Cheng, H., et al., 2001. Novel approach to pavement cracking detection Manufacturing-Green Technology, 3 (3), 303–310.
based on neural network. Transportation Research Record Journal of Pedraza, A., et al., 2017. Automated diatom classification (part B): a deep
the Transportation Research Board, 1764 (1), 119–127. learning approach. Applied Sciences, 7 (5), 460.
Chou, J., O’Neill, W.A., and Cheng, H., 1995. Pavement distress evaluation Shan, T., et al., 2005. Automatic image enhancement driven by evolution
using fuzzy logic and moment invariants. Transportation Research based on Ridgelet frame in the presence of noise. Berlin: Springer.
Record, 39–46. 304–313.
Downey, A.B. and Koutsopoulos, H.N., 1993. Primitive-based classifi- Simard, P.Y., Steinkraus, D., and Platt, J.C., 2003. Best practices for convo-
cation of pavement cracking images. Journal of Transportation lutional neural networks applied to visual document analysis.
Engineering, 119 (3), 402–418. International conference on document analysis & recognition,
Ferguson, R., Pratt, D., and Macintyre, I., 1998. Automated detection and International Conference on Document Analysis & Recognition. IEEE
classification of cracking in road pavements (RoadCrack). 19th ARRB Computer Society, 2003:958.
INTERNATIONAL JOURNAL OF PAVEMENT ENGINEERING 463
Smadja, L., Ninot, J., and Gavrilovic, T., 2010. Road extraction and Yan, M., et al., 2007. Pavement crack detection and analysis for high-grade
environment interpretation from LIDAR sensors. 38, 281–286. highway. International conference on electronic measurement and
Wang, K.C.P., et al., 2012. Potential measurement of pavement surface tex- instruments, 4-548–4-552.
ture based on three-dimensional image data. Transportation Research Zhang, L., et al., 2016. Road crack detection using deep convolutional
Board 91st Annual Meeting. neural network. IEEE international conference on image processing,
Wang, K.C.P., Gong, W., and Hou, Z. 2008. Automated cracking survey. 3708–3712.
6th RILEM international conference on cracking in pavements, Zhou, J. and Huang, P.S., 2006. Wavelet-based pavement distress detection
Chicago, IL. and evaluation. Optical Engineering, 45 (2), 409–411.
Wöhler, C., 2013. Triangulation-based approaches to three-dimensional Zou, Q., et al., 2012. Cracktree: automatic crack detection from pavement
scene reconstruction, 171–187. images. Pattern Recognition Letters, 33 (3), 227–238.
Copyright of International Journal of Pavement Engineering is the property of Taylor &
Francis Ltd and its content may not be copied or emailed to multiple sites or posted to a
listserv without the copyright holder's express written permission. However, users may print,
download, or email articles for individual use.