A Deep Learning Approach For Traffic Incident Detection in Urban Networks

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

2018 21st International Conference on Intelligent Transportation Systems (ITSC)

Maui, Hawaii, USA, November 4-7, 2018

A Deep Learning Approach for Traffic Incident Detection in Urban


Networks
Lin Zhu, Fangce Guo, Rajesh Krishnan, and John W. Polak

While automatic incident detection algorithms have been


Abstract— Incident detection function is vital for traffic


control and management and is an important prerequisite for applied to tunnels as well as motorways, relatively less
quick restoration of smooth traffic flow in urban networks. research has been carried out for urban road systems where
With accurate and reliable incident detection, a wide range of these inherently complex traffic patterns and network
environmental and economic benefits can be realised by structure pose a research challenge [3]. Some challenges and
mitigating congestion quickly. This paper proposes a deep opportunities remain for accurately detecting traffic incidents
learning method, Convolutional Neural Networks (CNN), for on urban networks. With recent advances in machine learning,
automatic detection of traffic incidents in urban networks by such as Convolutional Neural Networks (CNN), it is possible
using traffic flow data. The method was evaluated using traffic to develop methods that make use of spatially distributed
flow and incident datasets from Central London. Performance information from complex urban road networks.
indexes, such as False Positive Rate, Detection Rate, Precision
and F-measurement, are used to comprehensively evaluate the The aim of this paper is to develop a deep learning based
performance of the proposed method in comparison with a method that can provide accurate incident detection on whole
conventional machine learning method, i.e., Multi-Layer road networks with complex traffic patterns rather than on
Perceptron. The results demonstrate that the proposed method links or corridors.
may be superior to traditional neural networks with a higher
Detection Rate and a lower False Positive Rate. The results also The remainder of this paper is organised as follows. In
indicate that deep learning-based incident detection may Section II, we review relevant papers on revealing traffic
improve the accuracy of incident detection, especially in a large incidents, detection methods and deep learning algorithms.
urban network. Section III introduces the framework of the proposed deep
learning-based incident detection method. In section IV, we
I. INTRODUCTION present a case study of applying the proposed model to a
Traffic urban road networks are often affected by real-world dataset. We also discuss the implications and
disruptions, such as accidents, inclement weather and road insights gathered from the experiments. Finally, Section V
works. With the restricted capacity of the roadway, these summarises key findings and discusses the potential
anomalies give rise to congestion and traffic delays, which can application of such a data-driven approach.
in turn create a wide range of negative impacts to the II. BACKGROUND
economy, environment, safety and security [1]. The causes of
congestion and delay have been widely investigated in the past A. Traffic Incidents and Detection
decades. In general, congestion can be categorised into two, As traffic incidents are main factors causing traffic jams, it
namely recurrent and nonrecurrent congestion. The former is of importance to detect incidents proactively as early as
refers to the situation when traffic demand exceeds the road possible in order to minimise the disruptive durations as well
capacity, while the latter is caused by unpredictable changes as the associated negative impacts [4].
and unexpected occurrences such as incidents, adverse
weather and special events [2]. Nonrecurrent congestion is According to the UK Urban Traffic Management and
varied in terms of spatial and temporal dimensions, resulting Control (UTMC) scheme, incidents can be divided into
in more disruptive impacts on the economy and society than planned and unplanned ones. The former incident includes the
that of recurrent congestion. Traffic incidents have proven to planned incidents and road works while the latter one consists
be important factors that lead to lack of accessibility and of crash, hazard, stationary vehicle, flood, alert and
reliability [2]. These factors have resulted in copious amounts congestion. Data from Inductive Loop Detectors (ILD),
of research in the area of traffic incident analysis and consisting of flow and occupancy, is typically used for
management. incident detection. The variation in these traffic variable is
indicative of deviation from normal traffic conditions [5].
In the recent past, incident detection function has been
playing a key role in improving the reliability of transport Existing methods detect incidents by comparing the traffic
networks by helping to trigger quick incident response. Quick variable with a pre-defined threshold. With various traffic
incident response can reduce the severity of congestion and conditions, a fixed threshold value is not suitable for all traffic
the potential for secondary incidents. Thus, efficient and conditions [6].
accurate traffic incident detection and verification are vital in In urban networks, traffic incidents in one corridor can
reducing traffic jams, reducing the operational costs for impact on the neighbouring corridors and even the whole
clearing incidents and recovering road network mobility. network. It is important to integrate the analysis in the spatial
level to improve the robustness and reliability of incident
The authors are with the Centre for Transport Studies, Imperial College detection. However, most research has been focused on the
London, London, SW7 2AZ, UK. (e-mail: lin.zhu14@imperial.ac.uk).

978-1-7281-0323-5/18/$31.00 ©2018 IEEE 1011


causes of incidents in the restricted link and but not a reliable example, Huang et al. [16] presented a Deep Belief Network
detection in the network. Furthermore, most efforts have been architecture for traffic flow detection. Duan et al. [17]
devoted to detecting traffic incidents after that incidents have proposed a deep denoising stacked autoencoder to discover
already impeded the normal traffic [7]. Due to the fact of these the correlations contained in the whole data structure and
challenges, incident detection still remains as a difficult improved accuracy by conducting a fine-tuning for the traffic
problem. missing data imputation problem. Wang et al. [18] used an
error-feedback recurrent CNN structure for continuous traffic
B. Incident Detection Methods speed prediction as its ability to learn from prediction errors so
Generally, incident detection is a branch of anomaly as to adapt for abrupt traffic anomalies. Ma et al. [19] recently
detection which algorithms can be classified into six groups applied a Convolutional Neural Network (CNN) method to
according to their principles, and they are nearest neighbour learn the traffic as an image with the aim to predict large-scale
based algorithms, classification-based algorithms, statistical and network-wide traffic speed taking account of
detection algorithms, clustering based algorithms, information spatio-temporal traffic dynamics with a relatively high
theoretic anomaly detection and spectral anomaly detection accuracy. Deep learning methods, especially CNN, can
algorithms [8]. Among the family of anomaly detection construct much deeper and sophisticated architectures than a
algorithm, classification-based algorithms are used to as conventional method and they can therefore directly capture
supervised models with labelled anomaly instances, while the spatio-temporal traffic features and correlations in the
other algorithms can be used as unsupervised learning without urban traffic network as a whole on a large scale rather in the
labelled instances. Classification-based algorithms and traditional isolated links or corridors [19].
clustering-based algorithms are quite fast in the application
phase because each data point is evaluated based on a CNN is a derivative of conventional multilayer neural
pre-trained model or small size of clusters, while the rest of the networks, which includes convolutional layers, sub-sampling
methods are generally slow in both training and testing phases layers and fully connected layers [20]. CNN is designed to use
especially in training the neural networks [9]. a form of 2D input such as images or speech signals, and this
2D input is able to incorporate the spatial information in the
There are two types of algorithms that can use ILDs data raw input [21]. CNN is also easier to train as it benefits from
for incident detection. The first type of detection algorithm the convolution layer and sub-sampling with fewer parameters
identifies an incident when the traffic pattern is similar to compared with fully connected networks when the number of
those during an incident in the past by using pattern hidden units is the same. Thanks to the locally-connected
recognition or learning algorithm, such as neural networks convoluted layers, a CNN can capture complex spatial
[10]. This type of recognition algorithm also contains incident correlations while its reduced parameters during the pooling
detections based on fundamental diagrams of traffic flow and layer make CNN potentially applicable to large scale traffic
occupancy, such as the McMaster algorithm [11]. The second networks [22]. CNN has been used extensively in applications
type contains algorithms that seek to identify specific outliers involving complex spatial data and its ability to deal with
or anomalies. These algorithms are based on comparing a large-scale spatial structure suggests that it has the potential to
measured or estimated traffic state with a reference traffic be used in the urban incident detection field to fill the gaps
state and assigning an incident if the measurements deviate mentioned above.
significantly from the reference situation, or if these
differences significantly exceed a pre-defined threshold. The III. METHODOLOGY
estimation of traffic states and the definition of appropriate The traffic information should be converted into a 2D
thresholds are usually based on a combination of traffic flow feature space as an input for a CNN. The generated 2D feature
theory and statistical techniques [6]. For example, Yang et al. is treated as a (RGB) channel of an image where each pixel in
[12] used a coupled Bayesian robust principal component the image presents the values in the 2D feature. So before
analysis approach to detect road traffic events based on data specifying a CNN architecture, a tensor extraction step is
from loop detectors. necessary to translate this general framework into the specific
The most common contexts for the development of this context of road traffic incident detection. The final results will
type of incident detection algorithm are large urban corridors be evaluated based on confusion matrix including Detection
[13] and interurban road networks [14]. In these contexts, the Rate (DR) and False Positive Rate (FPR), Precision and
link-based spatial structure is generally straightforward and F-Measurements. Fig. 1 shows the methodology framework.
accordingly the methods have generally operated at the level A. Tensor Extraction
of individual links or corridors. However, to date, little efforts
have been focused on adding the factor of spatial correlations In order to implement the CNN, the time-series traffic flow
among corridors to incident detection in which detection data from ILDs must be converted into a matrix which shares
algorithms can be used in a network level. the same pattern as that used to characterise general 2D
images. This takes the form of a Ni  Ri  Ci matrix where
C. Deep Learning for Detection N i is the number of colour channels while Ri and Ci are the
Deep learning exploits multi-processing layers to learn the height and width of the input respectively.
relationship within the data by using back-propagation
algorithms, and provides better applications in dealing with The transformation between the natural formats of traffic
video, images, audio and speech [15]. Recently, considerable data and image data is not straightforward. Wang et al. [18]
efforts have been focused on the application of deep learning and Ma et al. [19] converted network traffic into a time-space
to traffic related prediction and estimation problems. For image where the x-axis and y-axis present time and space of a

1012
W  n pmm where p is the sample size, q is the size of the
input matrix and n is the number of convolutional filters. The
single convolutional layer is formulated as follows:
 f ( x;W , b)  h  {hk }k 1...n  
 hk  ReLU( x Wk  bk )  

where b  n is a bias for each filter output and  is the


convolution operator that applies on a single input and filter.
The output h  n(q m1)(q m1) is a set of feature maps
extracted by the convolutional layer. ReLU refers to Rectified
Linear Unit, which is a non-saturating activation
function f ( x)  max(0, x) that can capture the nonlinearities
of the neuron’s output [23]. Deep convolutional neural
networks with ReLU could be trained much faster than the
other equivalent activation functions, such as tanh
f ( x)  tanh( x) and sigmoid f ( x)  (1  e x )1 function [24].
A parameter sharing scheme is used in convolutional layers to
control the number of parameters.
Pooling layers in the CNN model summarise the outputs of
neighbouring groups of neurons in the same kernel map. A
max pooling technique, i.e., the most common pooling
strategy, performs a downsampling operation along the spatial
dimensions resulting in a smaller matrix with the maximum of
every values inside the kernel. An important function of
Figure 1. Methodology framework pooling layers is to progressively reduce the size of
representation by half and filter out the undesirable
composition of traffic flow data, and hence reduce the
matrix respectively. This transformation that directly
overfitting.
represents traffic variables with time and space may be
straightforward and efficient for simple structural corridors as The fully-connected layer is employed in the last stage of
it is easy to rank the sequence of space. However, it is not hidden layers to control the dimension of final output. The
directly applicable in the case of urban traffic networks where fully connected layer has full connections to all activations in
the spatial connections could significantly increase the the previous layers. The activation in this layer consists of a
complexity to organise the traffic data in the form of a matrix. matrix multiplication followed by a bias offset. Then the
An alternative way to carry out this translation for highly output will be transformed according to the specified
connected urban networks is to use the concept of connectivity activation function.
matrix.
A binary sigmoid crossentropy loss function is used as an
A connectivity matrix is a square matrix used to represent objective function to be minimised later during the training
the connection between nodes. Typically, it consists of two phase. The loss function L for a binary classification can be
steps: (1) given the original traffic network, node location and formed as follows.
flow direction, each cell representing a direct connection gets
an index of 1 while each cell that does not represent a direct
connection receives an index of 0; (2) assign the value of
 L  yˆ log( y )  (1  yˆ ) log(1  y ) 
i
i i i i 

traffic flow for the cell with index of 1 to the connectivity


matrix. Each time step corresponds to an updated connectivity RMSprop [25] serves as an optimiser to utilise the
matrix with traffic flow values. The sample size is the number magnitude of recent gradients in order to minimise the loss
of time steps in the whole simulation. function. It is used to keep a moving average of the squared
gradient for each weight and update the weight and bias in
B. Deep Learning Model each iteration during the optimisation.
After the feature translation using the connectivity matrix, Given the decay rate  and the learning rate  , the
we defined the architecture of the CNN model to extract
parameters are updated as follows:
spatial features in the connectivity matrix. There are three
main types of layers to build a CNN model: convolutional  rt   f (t 1 )2  (1   ) f (t )2  
layer, pooling layer and fully-connected layer.
The convolutional layer is the most important layer in a 
 t 1  t  f (t )  
CNN model [21]. The convolutional layer connects the input rt  
connectivity matrix defined as x  pqq with a set of filters

1013
where f (t ) is the gradient or the derivative of the loss B. Tensor Extraction
function L with respect to the parameters  at time step t The image was generated using a connectivity matrix
and  is an error term. The learning rate is set to be 0.001 where the x-axis and y-axis represent the ILDs, and cells inside
with 0.9 decay in the CNN model. represent traffic flow values of detectors during time step t.
The generated matrix has a dimension of 237×237. The output
In order to avoid the problem of overfitting [26], a dropout of the models is a binary value: 1 for the presence of incidents
technique which consists of setting the output of each hidden and 0 for the absence of incidents. The dataset was divided
neuron to zero with a fixed probability is used. The into two: 80% for training and 20% for testing.
dropped-out neurons, therefore, do not participate in the
optimisation. Consequently, the neural network sub-samples a C. Model Configuration
different architecture, and all these architectures share weights Generally, every layer in a CNN model serves as a
[24]. As a neuron cannot depend on the presence of specific detection filter for features presented in the input data. The
other neurons, dropout reduces complex co-adaptations of first layer recognises the relatively obvious features; the later
neurons and hence forces the neuron to learn more robust layers gradually detect the more abstracted features while the
features. last layer of a CNN makes a specific classification based on all
features detected by previous layers.
C. Model Evaluation
Multilayer Perceptron (MLP) is used to compare with the
proposed CNN method. In order to comprehensively evaluate
the performance of the proposed models, four performance
indicators typically common for evaluating the accuracy of
classification problem are used. They are False Positive Rate
(FPR), Detection Rate (DR), Precision and F-measurements
calculated from True Negative (TN), True Positive (TP), False
Negative (FN) and False Positive (FP), shown in TABLE I.
IV. RESULTS AND ANALYSIS
A. Data Description
The proposed detection methods are tested using traffic
flow and traffic incident data from central London. The case
study area consists of 158 links while 39 links were affected
by 139 unplanned incident events during the study period from
1st Jan 2015 to 24th Mar 2015. The location of the study area
with incident heatmaps is shown in Fig. 2.
The traffic data are recorded by Inductive Loop Detectors
(ILDs). ILDs are widely used for providing inputs to the
SCOOT traffic control system [27]. They report vehicles
presence or absence (0/1 values) sampled at 4Hz at a fixed
location. Traffic variables such as flow and occupancy can be
calculated from the reported data and, and in this paper, traffic
flow data are aggregated into 5-min intervals between 00:00 (a)
and 23:55 every day for analysis.
Sample flow-occupancy plots and incidents are shown in
Fig. 3. Based on the pre-analysis of incident data, most
incidents occurred during the weekends with a moderate level
of severity.

TABLE I. PERFORMANCE INDEXES

Index Definition
TP
Precision Pre 
FP  TP
2
F measures F
1 precision  1 DR 
FP
FPR FPR  (b)
TN  FP
TP Figure 2. Study area and incident heatmaps: (a) study area with
DR DR  incident heatmap; and (b) London incident heatmap (Source:
FN  TP
OpenStreetMap)

1014
with hyperparameters used in this paper is shown in Table II.
The Conv kernels are connected to all kernels in the previous
layer and ReLU activation function is used for the output of
every Conv layer and the first FC layer. The final FC layer is
fed to a softmax activation function to output the predicted
probability of the binary state.
MLP is set up with a hidden layer size of (20, 3) after using
stochastic gradient methods to minimise the loss function and
optimise weights and bias step by step during the training
process.
D. Results and Analysis
This experiment is to examine the accuracy of the
proposed CNN model used for automatic incident detection.
The results are shown in Table III. Notwithstanding the
(a) generally good performance of both the proposed CNN model
and MLP alternatives in the classification of traffic incidents,
the CNN statistically significantly performed better in terms of
2.3% higher DR, 0.77% lower FPR, 2.9% higher precision and
2.6% higher F-measurements when p=0.05 for the test statistic
pˆ1  pˆ 2 n pˆ  n pˆ
z where pˆ  1 1 2 2 , pˆ i is the value
1 1 n1  n2
pˆ (1  pˆ )(  )
n1 n2
of performance index and ni is the sample size. These results
mean that the CNN is potential for reliably detecting traffic
outliers with less mistaken flags (i.e., false positives and
negatives). Although, a large number of parameters are
essential to configure the CNN, it exhibits effective detection
compared to the benchmark. This superior detection suggests
(b) that spatial correlations may exist among neighbouring
Figure 3. Plots of ILD and incident data: (a) relationship between corridors and the CNN can learn these complex correlations
flow and occupancy; and (b) incident occurrences in terms of day of correctly.
week
TABLE II. MODEL CONFIGURATION

There are two types of parameters considered when No. Layer Filter Size, Stride Output
designing the structure of a CNN, the hyperparameters Input - 237 × 237 × 1*
connected with each layer and the depth of the CNN [19]. The Conv 3 × 3, 1 236 × 236 × 32
1
ReLU - 236 × 236 × 32
selection of hyperparameters, i.e., max pooling kernel size and Max-Pool 2 × 2, 1 118 × 118 × 32
convolutional filter size, relies heavily on expert judgement. Conv 3 × 3, 1 116 × 116 × 64
As no general rules can be used to determine the optimal 2 ReLU - 56 × 56 × 64
hyperparameters, parameters of well-known CNN Max-Pool 2 × 2, 1 58 × 58 × 64
architectures, such as AlexNet [24] and LeNet [26], can be Conv 3 × 3, 1 56 × 56 × 64
used as reasonable values. The max pooling size 2×2 which 3 ReLU - 56 × 56 × 64
Max-Pool 2 × 2, 1 28 × 28 × 64
has been adopted widely in both the AlexNet and the LeNet-5 Conv 5 × 5, 1 24 × 24 × 64
is selected in this paper because it is a typical pooling kernel 4 ReLU - 24 × 24 × 64
shape to reduce the size of feature maps effectively. The Max-Pool 2 × 2, 1 12 × 12 × 64
kernel size 3×3 and 5×5 are used for the proposed Conv 3 × 3, 1 10 × 10 × 64*
convolutional layers as these sizes can generally decrease the 5 ReLU - 5 × 5 × 64
number of parameters in addition to multiple applications of Max-Pool 2 × 2, 1
FC - 64
ReLU layers [28]. 6
ReLu+Drop - 64
On the other hand, the depth of CNN should not be either FC - 1
7
Softmax - 1
too deep or too shallow, so that CNN can efficiently learn the
complex spatial structure while ensuring convergence in the *. Using zero padding for achieving spatial dimension of outputs
end [19]. However, as the size will be reduced substantially
TABLE III. RESULTS OF INCIDEENT DETECTION MODELS
during convolution and pooling, the depth of CNN in this
study is constrained by the input size (237×237) of the image Indexes CNN MLP CNN2
as well as the kernel size (2 × 2) of pooling layers which Detection Rate (DR) 86.6% 84.3% 85.2%
reduce the image by half at each step, and thus the final net False Positive Rate (FPR) 5.12% 5.89% 5.63%
contains 7 layers including 5 convolutional (Conv) layers and Precision 90.4% 87.5% 88.1%
2 Fully-Connected (FC) layers. The detailed CNN architecture F-Measurements 88.5% 85.9% 86.6%

1015
TABLE III also shows the results of another CNN model wavelet-based freeway incident detection algorithm with adapting
consisting of Conv kernel sizes ([7 × 7, 11 × 11]) with the threshold parameters,” Transp. Res. Part C Emerg. Technol., vol.
19, no. 1, pp. 1–19, 2011.
same CNN configuration to study their capabilities to extract [7] R. N. Mussa and J. E. Upchurch, “Simulation assessment of
spatial features. The result shows that the performance incident detection by cellular phone call-in programs,”
decreases in a statistically significant manner (at p=0.05) with Transportation (Amst)., vol. 26, no. 4, pp. 399–416, 1999.
the size of Conv kernel, which reflects that the traffic incidents [8] E. Parkany and C. Xie, “A complete review of incident detection
are moderate with small queue sizes and a kernel size of 3 × 3 algorithms & their deployment: what works and what doesn’t,”
2005.
or 5 × 5 has enough spatial coverage to capture capacity drops [9] V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection: A
during incidents. survey,” ACM Comput. Surv., vol. 41, no. 3, p. 15, 2009.
[10] J. Lu, S. Chen, W. Wang, and H. Van Zuylen, “A hybrid model of
V. CONCLUSION partial least squares and neural network for traffic incident
In this paper, we have presented a network-level incident detection,” Expert Syst. Appl., vol. 39, no. 5, pp. 4775–4784, 2012.
[11] F. L. Hall, Y. Shi, and G. Atala, “On-line testing of the McMaster
detection model based on the CNN architecture. The proposed incident detection algorithm under recurrent congestion,” Transp.
model can be used for incident detection in large urban Res. Rec., no. 1394, pp. 1–7, 1993.
networks where the spatial correlation is a significant factor. [12] S. Yang, K. Kalpakis, and A. Biem, “Detecting road traffic events
Time-series traffic data are extracted into tensors using the by coupling multiple timeseries with a nonparametric bayesian
connectivity matrix to retain the spatial correlations in the method,” IEEE Trans. Intell. Transp. Syst., vol. 15, no. 5, pp.
1936–1946, 2014.
network before imputing them into the proposed model. In [13] Y. Cheng, M. Zhang, and D. Yang, “Automatic incident detection
order to evaluate the performance of the new approach, we for urban expressways based on segment traffic flow density,” J.
tested it to detect traffic incidents with traffic data and incident Intell. Transp. Syst., vol. 19, no. 2, pp. 205–213, 2015.
data collected from Central London, and compared it with an [14] Q. Zeng and H. Huang, “Bayesian spatial joint modeling of traffic
established alternative method, i.e., MLP, using DR, FPR, crashes on an urban road network,” Accid. Anal. Prev., vol. 67, pp.
105–112, 2014.
precision and F-measurements. The results indicate that the [15] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol.
proposed deep-learning based detection model outperforms 521, no. 7553, pp. 436–444, 2015.
the benchmark with high DR, Precision and F-measurement [16] W. Huang, G. Song, H. Hong, and K. Xie, “Deep architecture for
scores and with fewer false alarms. This comparison of the traffic flow prediction: Deep belief networks with multitask
CNN method with an established method from the literature learning,” IEEE Trans. Intell. Transp. Syst., vol. 15, no. 5, pp.
indicates that deep neural network methods, such as the CNN 2191–2201, 2014.
[17] Y. Duan, Y. Lv, W. Kang, and Y. Zhao, “A deep learning based
algorithm, are capable of achieving accurate network-level approach for traffic data imputation,” 17th Int. IEEE Conf. Intell.
traffic incident detection for a large-scale networks. Transp. Syst., pp. 912–917, 2014.
[18] J. Wang, Q. Gu, J. Wu, G. Liu, and Z. Xiong, “Traffic speed
Future studies will focus on precisely locating traffic prediction and congestion source exploration: A deep learning
incidents on the network and classifying traffic incidents method,” Proc. - IEEE Int. Conf. Data Mining, ICDM, pp.
based on the level of severity and incident types. Future 499–508, 2017.
research will also look at traffic state classification and [19] X. Ma, Z. Dai, Z. He, J. Ma, Y. Wang, and Y. Wang, “Learning
distinguish traffic incidents from other types of traffic traffic as images: A deep convolutional neural network for
large-scale transportation network speed prediction,” Sensors, vol.
anomalies, such as sensor faults and recurrent traffic 17, no. 4, p. 818, 2017.
congestion. [20] K. Fukushima, “Neocognitron: A self-organizing neural network
model for a mechanism of pattern recognition unaffected by shift
ACKNOWLEDGEMENT in position,” Biol. Cybern., vol. 36, no. 4, pp. 193–202, 1980.
[21] A. Krizhevsky, I. Sutskever, and H. Geoffrey E., “ImageNet
The authors would like to thank Transport for London Classification with Deep Convolutional Neural Networks,” Adv.
(TfL), particularly Andy Emmonds and Ashley Turner, for the Neural Inf. Process. Syst. 25, pp. 1097–1105, 2012.
provision of traffic flow and incident data used in this paper. [22] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and
The authors would like to also thank Dr Aruna Sivakumar F. F. Li, “Large-scale video classification with convolutional
from Centre for Transport Studies of Imperial College London neural networks,” Proc. IEEE Comput. Soc. Conf. Comput. Vis.
for her valuable comments and suggestions to improve the Pattern Recognit., pp. 1725–1732, 2014.
[23] V. Nair and G. E. Hinton, “Rectified linear units improve restricted
quality of this paper. boltzmann machines,” Proc. 27th Int. Conf. Mach. Learn., no. 3,
pp. 807–814, 2010.
REFERENCES [24] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet
[1] Y. Gu, Z. Qian, and F. Chen, “From Twitter to detector: Real-time classification with deep convolutional neural networks,” Adv.
traffic incident detection using social media data,” Transp. Res. Neural Inf. Process. Syst., pp. 1097–1105, 2012.
Part C Emerg. Technol., vol. 67, pp. 321–342, 2016. [25] Tieleman, Tijmen, and G. Hinton, “Lecture 6.5-RMSprop: Divide
[2] O. Deniz and H. B. Celikoglu, “Overview to some existing incident the gradient by a running average of its recent magnitude,” in
detection algorithms: a comparative evaluation,” Procedia Soc. COURSERA: Neural networks for machine learning, 2012, pp.
Behav. Sci., pp. 1–13, 2011. 26–31.
[3] F. Ahmed and Y. E. Hawas, “A threshold-based real-time incident [26] D. M. Hawkins, “The Problem of Overfitting,” J. Chem. Inf.
detection system for urban traffic networks,” Procedia-Social Comput. Sci., vol. 44, no. 1, pp. 1–12, 2004.
Behav. Sci., vol. 48, pp. 1713–1722, 2012. [27] D. I. Robertson and R. D. Bretherton, “Optimizing Networks of
[4] J. A. Barria and S. Thajchayapong, “Detection and classification of Traffic Signals in Real Time—The SCOOT Method,” IEEE Trans.
traffic anomalies using microscopic traffic variables,” IEEE Trans. Veh. Technol., 1991.
Intell. Transp. Syst., vol. 12, no. 3, pp. 695–704, 2011. [28] K. Simonyan and A. Zisserman, “Very Deep Convolutional
[5] R. Rossi, M. Gastaldi, G. Gecchele, and V. Barbaro, “Fuzzy Networks for Large-Scale Image Recognition,” ImageNet Chall.,
logic-based incident detection system using loop detectors data,” pp. 1–10, 2014.
Transp. Res. Procedia, vol. 10, pp. 266–275, 2015.
[6] Y.-S. Jeong, M. Castro-Neto, M. K. Jeong, and L. D. Han, “A

1016

You might also like