Professional Documents
Culture Documents
A Deep Learning Approach For Traffic Incident Detection in Urban Networks
A Deep Learning Approach For Traffic Incident Detection in Urban Networks
A Deep Learning Approach For Traffic Incident Detection in Urban Networks
1012
W n pmm where p is the sample size, q is the size of the
input matrix and n is the number of convolutional filters. The
single convolutional layer is formulated as follows:
f ( x;W , b) h {hk }k 1...n
hk ReLU( x Wk bk )
1013
where f (t ) is the gradient or the derivative of the loss B. Tensor Extraction
function L with respect to the parameters at time step t The image was generated using a connectivity matrix
and is an error term. The learning rate is set to be 0.001 where the x-axis and y-axis represent the ILDs, and cells inside
with 0.9 decay in the CNN model. represent traffic flow values of detectors during time step t.
The generated matrix has a dimension of 237×237. The output
In order to avoid the problem of overfitting [26], a dropout of the models is a binary value: 1 for the presence of incidents
technique which consists of setting the output of each hidden and 0 for the absence of incidents. The dataset was divided
neuron to zero with a fixed probability is used. The into two: 80% for training and 20% for testing.
dropped-out neurons, therefore, do not participate in the
optimisation. Consequently, the neural network sub-samples a C. Model Configuration
different architecture, and all these architectures share weights Generally, every layer in a CNN model serves as a
[24]. As a neuron cannot depend on the presence of specific detection filter for features presented in the input data. The
other neurons, dropout reduces complex co-adaptations of first layer recognises the relatively obvious features; the later
neurons and hence forces the neuron to learn more robust layers gradually detect the more abstracted features while the
features. last layer of a CNN makes a specific classification based on all
features detected by previous layers.
C. Model Evaluation
Multilayer Perceptron (MLP) is used to compare with the
proposed CNN method. In order to comprehensively evaluate
the performance of the proposed models, four performance
indicators typically common for evaluating the accuracy of
classification problem are used. They are False Positive Rate
(FPR), Detection Rate (DR), Precision and F-measurements
calculated from True Negative (TN), True Positive (TP), False
Negative (FN) and False Positive (FP), shown in TABLE I.
IV. RESULTS AND ANALYSIS
A. Data Description
The proposed detection methods are tested using traffic
flow and traffic incident data from central London. The case
study area consists of 158 links while 39 links were affected
by 139 unplanned incident events during the study period from
1st Jan 2015 to 24th Mar 2015. The location of the study area
with incident heatmaps is shown in Fig. 2.
The traffic data are recorded by Inductive Loop Detectors
(ILDs). ILDs are widely used for providing inputs to the
SCOOT traffic control system [27]. They report vehicles
presence or absence (0/1 values) sampled at 4Hz at a fixed
location. Traffic variables such as flow and occupancy can be
calculated from the reported data and, and in this paper, traffic
flow data are aggregated into 5-min intervals between 00:00 (a)
and 23:55 every day for analysis.
Sample flow-occupancy plots and incidents are shown in
Fig. 3. Based on the pre-analysis of incident data, most
incidents occurred during the weekends with a moderate level
of severity.
Index Definition
TP
Precision Pre
FP TP
2
F measures F
1 precision 1 DR
FP
FPR FPR (b)
TN FP
TP Figure 2. Study area and incident heatmaps: (a) study area with
DR DR incident heatmap; and (b) London incident heatmap (Source:
FN TP
OpenStreetMap)
1014
with hyperparameters used in this paper is shown in Table II.
The Conv kernels are connected to all kernels in the previous
layer and ReLU activation function is used for the output of
every Conv layer and the first FC layer. The final FC layer is
fed to a softmax activation function to output the predicted
probability of the binary state.
MLP is set up with a hidden layer size of (20, 3) after using
stochastic gradient methods to minimise the loss function and
optimise weights and bias step by step during the training
process.
D. Results and Analysis
This experiment is to examine the accuracy of the
proposed CNN model used for automatic incident detection.
The results are shown in Table III. Notwithstanding the
(a) generally good performance of both the proposed CNN model
and MLP alternatives in the classification of traffic incidents,
the CNN statistically significantly performed better in terms of
2.3% higher DR, 0.77% lower FPR, 2.9% higher precision and
2.6% higher F-measurements when p=0.05 for the test statistic
pˆ1 pˆ 2 n pˆ n pˆ
z where pˆ 1 1 2 2 , pˆ i is the value
1 1 n1 n2
pˆ (1 pˆ )( )
n1 n2
of performance index and ni is the sample size. These results
mean that the CNN is potential for reliably detecting traffic
outliers with less mistaken flags (i.e., false positives and
negatives). Although, a large number of parameters are
essential to configure the CNN, it exhibits effective detection
compared to the benchmark. This superior detection suggests
(b) that spatial correlations may exist among neighbouring
Figure 3. Plots of ILD and incident data: (a) relationship between corridors and the CNN can learn these complex correlations
flow and occupancy; and (b) incident occurrences in terms of day of correctly.
week
TABLE II. MODEL CONFIGURATION
There are two types of parameters considered when No. Layer Filter Size, Stride Output
designing the structure of a CNN, the hyperparameters Input - 237 × 237 × 1*
connected with each layer and the depth of the CNN [19]. The Conv 3 × 3, 1 236 × 236 × 32
1
ReLU - 236 × 236 × 32
selection of hyperparameters, i.e., max pooling kernel size and Max-Pool 2 × 2, 1 118 × 118 × 32
convolutional filter size, relies heavily on expert judgement. Conv 3 × 3, 1 116 × 116 × 64
As no general rules can be used to determine the optimal 2 ReLU - 56 × 56 × 64
hyperparameters, parameters of well-known CNN Max-Pool 2 × 2, 1 58 × 58 × 64
architectures, such as AlexNet [24] and LeNet [26], can be Conv 3 × 3, 1 56 × 56 × 64
used as reasonable values. The max pooling size 2×2 which 3 ReLU - 56 × 56 × 64
Max-Pool 2 × 2, 1 28 × 28 × 64
has been adopted widely in both the AlexNet and the LeNet-5 Conv 5 × 5, 1 24 × 24 × 64
is selected in this paper because it is a typical pooling kernel 4 ReLU - 24 × 24 × 64
shape to reduce the size of feature maps effectively. The Max-Pool 2 × 2, 1 12 × 12 × 64
kernel size 3×3 and 5×5 are used for the proposed Conv 3 × 3, 1 10 × 10 × 64*
convolutional layers as these sizes can generally decrease the 5 ReLU - 5 × 5 × 64
number of parameters in addition to multiple applications of Max-Pool 2 × 2, 1
FC - 64
ReLU layers [28]. 6
ReLu+Drop - 64
On the other hand, the depth of CNN should not be either FC - 1
7
Softmax - 1
too deep or too shallow, so that CNN can efficiently learn the
complex spatial structure while ensuring convergence in the *. Using zero padding for achieving spatial dimension of outputs
end [19]. However, as the size will be reduced substantially
TABLE III. RESULTS OF INCIDEENT DETECTION MODELS
during convolution and pooling, the depth of CNN in this
study is constrained by the input size (237×237) of the image Indexes CNN MLP CNN2
as well as the kernel size (2 × 2) of pooling layers which Detection Rate (DR) 86.6% 84.3% 85.2%
reduce the image by half at each step, and thus the final net False Positive Rate (FPR) 5.12% 5.89% 5.63%
contains 7 layers including 5 convolutional (Conv) layers and Precision 90.4% 87.5% 88.1%
2 Fully-Connected (FC) layers. The detailed CNN architecture F-Measurements 88.5% 85.9% 86.6%
1015
TABLE III also shows the results of another CNN model wavelet-based freeway incident detection algorithm with adapting
consisting of Conv kernel sizes ([7 × 7, 11 × 11]) with the threshold parameters,” Transp. Res. Part C Emerg. Technol., vol.
19, no. 1, pp. 1–19, 2011.
same CNN configuration to study their capabilities to extract [7] R. N. Mussa and J. E. Upchurch, “Simulation assessment of
spatial features. The result shows that the performance incident detection by cellular phone call-in programs,”
decreases in a statistically significant manner (at p=0.05) with Transportation (Amst)., vol. 26, no. 4, pp. 399–416, 1999.
the size of Conv kernel, which reflects that the traffic incidents [8] E. Parkany and C. Xie, “A complete review of incident detection
are moderate with small queue sizes and a kernel size of 3 × 3 algorithms & their deployment: what works and what doesn’t,”
2005.
or 5 × 5 has enough spatial coverage to capture capacity drops [9] V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection: A
during incidents. survey,” ACM Comput. Surv., vol. 41, no. 3, p. 15, 2009.
[10] J. Lu, S. Chen, W. Wang, and H. Van Zuylen, “A hybrid model of
V. CONCLUSION partial least squares and neural network for traffic incident
In this paper, we have presented a network-level incident detection,” Expert Syst. Appl., vol. 39, no. 5, pp. 4775–4784, 2012.
[11] F. L. Hall, Y. Shi, and G. Atala, “On-line testing of the McMaster
detection model based on the CNN architecture. The proposed incident detection algorithm under recurrent congestion,” Transp.
model can be used for incident detection in large urban Res. Rec., no. 1394, pp. 1–7, 1993.
networks where the spatial correlation is a significant factor. [12] S. Yang, K. Kalpakis, and A. Biem, “Detecting road traffic events
Time-series traffic data are extracted into tensors using the by coupling multiple timeseries with a nonparametric bayesian
connectivity matrix to retain the spatial correlations in the method,” IEEE Trans. Intell. Transp. Syst., vol. 15, no. 5, pp.
1936–1946, 2014.
network before imputing them into the proposed model. In [13] Y. Cheng, M. Zhang, and D. Yang, “Automatic incident detection
order to evaluate the performance of the new approach, we for urban expressways based on segment traffic flow density,” J.
tested it to detect traffic incidents with traffic data and incident Intell. Transp. Syst., vol. 19, no. 2, pp. 205–213, 2015.
data collected from Central London, and compared it with an [14] Q. Zeng and H. Huang, “Bayesian spatial joint modeling of traffic
established alternative method, i.e., MLP, using DR, FPR, crashes on an urban road network,” Accid. Anal. Prev., vol. 67, pp.
105–112, 2014.
precision and F-measurements. The results indicate that the [15] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol.
proposed deep-learning based detection model outperforms 521, no. 7553, pp. 436–444, 2015.
the benchmark with high DR, Precision and F-measurement [16] W. Huang, G. Song, H. Hong, and K. Xie, “Deep architecture for
scores and with fewer false alarms. This comparison of the traffic flow prediction: Deep belief networks with multitask
CNN method with an established method from the literature learning,” IEEE Trans. Intell. Transp. Syst., vol. 15, no. 5, pp.
indicates that deep neural network methods, such as the CNN 2191–2201, 2014.
[17] Y. Duan, Y. Lv, W. Kang, and Y. Zhao, “A deep learning based
algorithm, are capable of achieving accurate network-level approach for traffic data imputation,” 17th Int. IEEE Conf. Intell.
traffic incident detection for a large-scale networks. Transp. Syst., pp. 912–917, 2014.
[18] J. Wang, Q. Gu, J. Wu, G. Liu, and Z. Xiong, “Traffic speed
Future studies will focus on precisely locating traffic prediction and congestion source exploration: A deep learning
incidents on the network and classifying traffic incidents method,” Proc. - IEEE Int. Conf. Data Mining, ICDM, pp.
based on the level of severity and incident types. Future 499–508, 2017.
research will also look at traffic state classification and [19] X. Ma, Z. Dai, Z. He, J. Ma, Y. Wang, and Y. Wang, “Learning
distinguish traffic incidents from other types of traffic traffic as images: A deep convolutional neural network for
large-scale transportation network speed prediction,” Sensors, vol.
anomalies, such as sensor faults and recurrent traffic 17, no. 4, p. 818, 2017.
congestion. [20] K. Fukushima, “Neocognitron: A self-organizing neural network
model for a mechanism of pattern recognition unaffected by shift
ACKNOWLEDGEMENT in position,” Biol. Cybern., vol. 36, no. 4, pp. 193–202, 1980.
[21] A. Krizhevsky, I. Sutskever, and H. Geoffrey E., “ImageNet
The authors would like to thank Transport for London Classification with Deep Convolutional Neural Networks,” Adv.
(TfL), particularly Andy Emmonds and Ashley Turner, for the Neural Inf. Process. Syst. 25, pp. 1097–1105, 2012.
provision of traffic flow and incident data used in this paper. [22] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and
The authors would like to also thank Dr Aruna Sivakumar F. F. Li, “Large-scale video classification with convolutional
from Centre for Transport Studies of Imperial College London neural networks,” Proc. IEEE Comput. Soc. Conf. Comput. Vis.
for her valuable comments and suggestions to improve the Pattern Recognit., pp. 1725–1732, 2014.
[23] V. Nair and G. E. Hinton, “Rectified linear units improve restricted
quality of this paper. boltzmann machines,” Proc. 27th Int. Conf. Mach. Learn., no. 3,
pp. 807–814, 2010.
REFERENCES [24] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet
[1] Y. Gu, Z. Qian, and F. Chen, “From Twitter to detector: Real-time classification with deep convolutional neural networks,” Adv.
traffic incident detection using social media data,” Transp. Res. Neural Inf. Process. Syst., pp. 1097–1105, 2012.
Part C Emerg. Technol., vol. 67, pp. 321–342, 2016. [25] Tieleman, Tijmen, and G. Hinton, “Lecture 6.5-RMSprop: Divide
[2] O. Deniz and H. B. Celikoglu, “Overview to some existing incident the gradient by a running average of its recent magnitude,” in
detection algorithms: a comparative evaluation,” Procedia Soc. COURSERA: Neural networks for machine learning, 2012, pp.
Behav. Sci., pp. 1–13, 2011. 26–31.
[3] F. Ahmed and Y. E. Hawas, “A threshold-based real-time incident [26] D. M. Hawkins, “The Problem of Overfitting,” J. Chem. Inf.
detection system for urban traffic networks,” Procedia-Social Comput. Sci., vol. 44, no. 1, pp. 1–12, 2004.
Behav. Sci., vol. 48, pp. 1713–1722, 2012. [27] D. I. Robertson and R. D. Bretherton, “Optimizing Networks of
[4] J. A. Barria and S. Thajchayapong, “Detection and classification of Traffic Signals in Real Time—The SCOOT Method,” IEEE Trans.
traffic anomalies using microscopic traffic variables,” IEEE Trans. Veh. Technol., 1991.
Intell. Transp. Syst., vol. 12, no. 3, pp. 695–704, 2011. [28] K. Simonyan and A. Zisserman, “Very Deep Convolutional
[5] R. Rossi, M. Gastaldi, G. Gecchele, and V. Barbaro, “Fuzzy Networks for Large-Scale Image Recognition,” ImageNet Chall.,
logic-based incident detection system using loop detectors data,” pp. 1–10, 2014.
Transp. Res. Procedia, vol. 10, pp. 266–275, 2015.
[6] Y.-S. Jeong, M. Castro-Neto, M. K. Jeong, and L. D. Han, “A
1016