Professional Documents
Culture Documents
Deep Learning Approach For U.S. Traffic Sign Recognition: Emmanuel B. Nuakoh Kaushik Roy Xiaohong Yuan Albert Esterline
Deep Learning Approach For U.S. Traffic Sign Recognition: Emmanuel B. Nuakoh Kaushik Roy Xiaohong Yuan Albert Esterline
Deep Learning Approach For U.S. Traffic Sign Recognition: Emmanuel B. Nuakoh Kaushik Roy Xiaohong Yuan Albert Esterline
47
2.1 Traffic Sign Recognition with Deep (GTSRB) competition, where some of the best methods presented
used CNN for classification [18].
Learning
[10] developed a committee of CNN and a multilayer perceptron There is limited research on traffic sign detection and recognition
(MLP) that were trained on HOG features (HOG3). They trained using US traffic signs. [6] used an R-CNN algorithm to detect US
the best architecture initialized with uniformly random traffic signs and showed some good result on LISA-TS Extension
distribution weights with a hyperbolic tangent activation function. dataset [19]. The classification was based on the speed limit super
The classification had a combined classification rate (CCR) of class alone.
98.98%, outperforming humans in some instances as reported by
Our research tries to extend the boundaries of the current research
[11] in the GTSRB competition. In the same report, [12] placed
to include all the signs in the dataset by replicating work done by
second with 98.97% accuracy using a multi-layer ConvNets with
[12] adopted for classifying the U.S. traffic signs dataset.
more sophisticated non-linearities. An improvement to [10] using
CNN with data augmentation and jittering and achieved an 3.2 Experimental Setup
accuracy of 99.46% [13]. The experimental setup consists of two main parts. Firstly,
[14] proposed a Hinge Loss Stochastic Gradient Descent preparing the data, that spans all the tools used to extract and
(HLSGD) cost function method for training CNNs. This function separate traffic signs into their respective classes. Secondly, the
is similar to Support Vector Machines (SVM) hinge loss and extracted data is fed into a deep neural network for classification.
performed faster than the Stochastic Gradient Descent (SGD), The LISA TS dataset is a publicly available traffic sign dataset for
which is preferred for training CNN. They achieved an accuracy the United States [8]. This dataset offers the researchers, who are
of 99.65% beating the state-of-the-art approach of 99.46% interested in classifying U.S. traffic signs, an avenue to train and
reported in [13, 15]. test their models. The LISA TS dataset was selected for this work
because there is a limited work has been done on U.S. traffic
[4] developed OneCNN, a convolutional neural network, inspired signs.
by [12]. They developed a single network that is deeper and more
complex but less computationally costly, and used it to classify 3.2.1 Preparing the dataset.
multiple datasets, particularly, GTSRB, BTSC, and introduced a The LISA TS dataset comes with a set of Python tools for
new dataset called rMASTIF. They achieved an accuracy of extracting traffic sign images from annotated frames. Since this
99.11% as against the state-of-the-art of 99.65% for the GTSRB research is concerned with the classification aspect of TSRS, these
[14]; 98.17% for BTSC against the state-of-the-art [15] with tools were useful. There are 47 classes of traffic signs that are
98.77% and; recorded 99.53% for rMASTIF. Their work was only named using the convention: “0000”, “0001”, …, “0045” and
related to the classification aspect of the TSRS as does this “0046”. An 80/20 split sets for Training and Testing were
research. obtained using the Python script provided as “split1.csv” and
“split2.csv”, respectively. Traffic sign images under each split set
2.2 Traffic Sign Recognition with Traditional were cropped and copied to their respective class folder. These
Machine Learning class folders were then manually copied into two new folders
[16] used SVM with Gaussian kernels to recognize traffic signs called “Training” and “Testing” as training and test sets,
from blobs which have been categorized into shape classes. To respectively.
test the effect of occlusion on recognition, an occlusion mask was
placed on the images. Small-, medium- and largesized masks 3.2.2 Exploring the dataset.
reported 93.24%, 67.85% and 44.90% probabilities of The LISA TS dataset consists of 6610 frames with 7855
successfully recognizing the signs respectively. An observation annotations [8]. The sizes of the traffic sign images range from as
made was that a large-sized occlusion mask placed in the middle little as 6×6 to 167×168 pixels. The cameras used for image
pictoram’s inner area, showed the worst performance during collection had resolutions ranging from 640×480 to 1024×522
recognition. [17] applied an SVM for classifying traffic signs on pixels with certain images being greyscale while others were color
the GTSRB dataset. They combined local binary pattern (LBP), images.
HOG and GABOR for feature extraction. For LBP alone, the
performance was 93.36%, GABOR alone recorded a performance The extracted signs were split into training and test sets with
of 93.90% and HOG alone recorded 94.56%. A combination of all 80/20 split ratio respectively. Fig. 1 shows the distribution of
three yielded the best performance of 97.04%. HOG and GABOR traffic signs for each class. The most populated classes are
together had a performance of 97.00%; close to the performance “SpeedLimit40”, “doNotPass”, and “rampSpeedAdvisory50” in
of all three combined. Their proposed algorithm was ninth overall an order of decreasing magnitude. The least occurring traffic signs
compared to the results obtained in the 2011 GTSRB competition; in the dataset are “rampSpeedAdvisory20”, “speedLimitUrdbl”,
however, it had the best performance for “Other Prohibitions”, “thruMergeLeft”, and “turnLeft”.
and “Mandatory”, categories scoring 99.86%, and 99.83%, Fig. 2 shows samples of traffic sign images in each class. The
respectively. EBLearn 2LConvNet, and CNN HOG3 were the images shown are the first images in that class. It should be noted
previous best performers in those categories scoring 99.80%, and that several of these signs have numbers showing speed limits of
97.89, respectively. some sort. This raises concern for the performance of the model
especially with the limited samples in each class.
3. METHODOLOGY AND EXPERIMENTS
3.1 Approach
CNN has made a lot of stride in the image recognition space in
recent times. It gained recognition in the traffic sign recognition
space during “The German Traffic Sign Recognition Benchmark”
48
each of memory size of 4GB, installed on a single system were
used to allow multiprocessing of the computations during training.
4. RESULTS
The original research recorded an accuracy of 98.7% on the
GTSRB dataset. This work recorded a training accuracy of
99.02% and a validation accuracy of 99.04% on LISA TS dataset.
The model was trained over 600 epochs. The confusion matrices
of the 600th epoch are presented in Fig. 4 for normalized and non-
normalized samples. 15 out of the 1567 images were wrongly
classified. One image in the “keepRight” class causes a confusion
to the model as “zoneAhead45” 1% of the time. One
“pedestrainCrossing” sign amounting to 1% of the class was
misclassified as “stop”. “rampSpeedAdvisoryUrdbl” causes
Fig. 1. Training Data Distribution per Class for Each Traffic confusion to the model by classifying as “school” 100% of the
Sign time; only one sign is in this class. One “rightLaneMustTurn” sign
making up 6% of the class misclassifies as “stop”.
3.2.3 Deep Neural Network (DNN) Model “speedLimit25” is wrongly classified 2% of the time as
Architecture. “speedLimitUrdbl”, which stands for unreadable speed limits.
“speedLimit55” is misclassified as “truckSpeedLimit55” 100% of
The VGG Layer has a batch normalization layer for faster and
the time. “speedLimit65” is misclassified 6% of the time as
better training, a param ReLu layer for solving dead linear
“stop”. “speedLimitUrdbl” represents speed limit signs that were
rectifier issue during training, convolution layer with parametric
unreadable to be correctly classified by human. This class was
relay activation using Xavier Scheme for weights and biases
classified 8% of the time as “speedLimit35” and 4% of the time as
initialization, fully connected layer with fully connected dense
“speedLimit50”. Finally, “yield” misclassified 8% of the time as
layers also using Xavier Scheme for weights and biases
“stop”.
initializations, and a max pooling operation as its basic elements.
Fig. 5 shows plots of accuracy (left) and loss (right) during
The spatial transformer layer consists of a localization and an
training and validation respectively. It was noted that the model
affine transformation layer. This layer consists of a 5x5
could have performed better if it had been trained a little longer.
Convolution filter, followed by a 3x3 Convolution filter, a 1x1
The model does not overfit since the validation accuracy is less
Convolution Filter, 128 Dense layer, 64 Dense layer and 6
than the training accuracy throughout the whole process. The
Identity transformers.
model shows a loss of 7.03% during training and 8.8% during
The VGG Network (Fig. 3) is made up of 2 back to back validation. The original VGGNet showed a loss of 7.5% during
Convolutions of 2x2 kernel size and a stride of 2. The layer has 1 validation, no value was recorded for training.
pooling layer followed by a dropout layer. The model is made up
of 4 VGG layers. The first layer extracts 32 feature maps from an 5. CONCLUSION AND FUTURE WORK
input layer of 32x32x3 features while the second layer extracts 64 ADAS helps drivers with the perception in a complex
feature maps from a 16x16x32 input, with the third layer environment with a lot of background noise. A computer is rather
extracting 128 feature maps from an 8x8x64 input and the fourth, trained to pick up things that may escape the human eye. This
256 feature maps from an input of 4x4x128. The model also has 3 research seeks to investigate using the VGGNet to recognize U.S.
Fully Connected layers with the 1024 hidden layers that map to traffic signs. Previous work done with the same dataset, but with
512 hidden layers that further map to the 47 classes in the dataset. the speed limits signs only scored an accuracy of 95.7%, a little
under the 97% reported by [6]. This research shows an overall
accuracy of 99.04% during validation.
49
but no known work has been done using TCN for traffic sign [7] Simonyan, K., and Zisserman, 2015. “A. Very deep
recognition. Convolutional Networks for Large-Scale Image
Recognition”. In International Conference on Learning
The dataset would be experimented on modified LeNet, AlexNet,
Representations (ICLR).
and CUDA ConvNet. The GTSRB and BTSC would be used to
test the model performance and compare the algorithm to other [8] Møgelmose, A., Trivedi, M. M., and Moeslund, T. B. 2012.
standardized algorithms. "Vision based Traffic Sign Detection and Analysis for
Intelligent Driver Assistance Systems: Perspectives and
Survey," IEEE Transactions on Intelligent Transportation
Systems.
[9] Mathias, M., Timofte, R., Benenson, R., and Van Gool, L.
2013. “Traffic Sign Recognition—How Far Are We from
The Solution?”. In Neural Networks (IJCNN), The 2013
International Joint Conference on (pp. 1-8). IEEE.
[10] Cireşan, D., Meier, U., Masci, J. and Schmidhuber, J. 2011.
"A committee of neural networks for traffic sign
classification," The 2011 International Joint Conference on
Neural Networks, San Jose, CA, 2011, pp. 1918-1921.
[11] Stallkamp, J., Schlipsing, M., Salmen, J., & Igel, C. (2011,
July). The German Traffic Sign Recognition Benchmark: A
Figure 4. Confusion Matrix of Model after 600 Epochs with multi-class classification competition. In IJCNN 6, 7.
and without Normalization [12] Sermanet, P., and LeCun, Y. 2011. “Traffic Sign Recognition
with Multi-Scale Convolutional Networks”. In Neural
Networks (IJCNN), The 2011 International Joint Conference
on (pp. 2809-2813). IEEE.
[13] CireşAn, D., Meier, U., Masci, J., and Schmidhuber, J. 2012.
Multi-column deep neural network for traffic sign
classification. Neural networks, 32, 333-338.
[14] Jin, J., Fu, K., and Zhang, C. 2014. "Traffic Sign Recognition
with Hinge Loss Trained Convolutional Neural Networks".
In IEEE Transactions on Intelligent Transportation Systems,
vol. 15, no. 5, pp. 1991-2000, Oct. 2014.
Figure 5. Plot of Accuracy and Loss during Training and
Validation [15] Zhu, Y., Wang, X., Yao, C., and Bai, X. 2013. Traffic sign
classification using two-layer image representation. In IEEE
6. REFERENCES International Conference on Image Processing (pp. 3755-
[1] Escalera, A. D. L., Moreno, L., Salichs, M. A., and 3759). (2013, September). IEEE.
Armingol, J. M. 1997. Road traffic sign detection and [16] Maldonado-Bascón, S., Lafuente-Arroyo, S., Gil-Jimenez, P.,
classification. Gómez-Moreno, H., and LópezFerreras, F. 2007. “Road-sign
[2] Timofte, R., Zimmermann, K., and Van Gool, L. 2014. Detection and Recognition Based on Support Vector
“Multi-View Traffic Sign Detection, Recognition, and 3D Machines”. IEEE transactions on intelligent transportation
Localization”. Machine vision and applications, 25(3), 633- systems, 8(2), 264-278.
647. [17] Berkaya, S. K., Gunduz, H., Ozsen, O., Akinlar, C., and
[3] Stallkamp, J., Schlipsing, M., Salmen, J., and Igel, C. 2011. Gunal, S. 2016. “On circular traffic sign detection and
“The German traffic sign recognition benchmark: a multi- recognition”. Expert Systems with Applications, 48, 67-75.
class classification competition”. In Neural Networks [18] Stallkamp, J., Schlipsing, M., Salmen, J., and Igel, C. 2012.
(IJCNN), The 2011 International Joint Conference on (pp. “Man vs. computer: Benchmarking Machine Learning
1453-1460). IEEE. Algorithms for Traffic Sign Recognition”. Neural Networks,
[4] Jurišić, F., Filković, I., and Kalafatić, Z. 2015. “Multiple- Volume 32, 2012, Pages 323-332.
Dataset Traffic Sign Classification with OneCNN”. In [19] Møgelmose, A., Liu, D., and Trivedi, M. M., 2014. “Traffic
Pattern Recognition (ACPR), (2015, November) 3rd IAPR Sign Detection for U.S. Roads: Remaining Challenges and a
Asian Conference on (pp. 614-618). IEEE. Case for Tracking,” IEEE Intelligent Transportation Systems
[5] Zhu, Z., Liang, D., Zhang, S., Huang, X., Li, B., and Hu, S. Conference, (ITSC2014), Oct. 2014.
2016. “Traffic-Sign Detection and Classification in The [20] Lea, C., Flynn, M. D., Vidal, R., Reiter, A., and Hager, G. D.
Wild”. In Proceedings of the IEEE Conference on Computer 2017. "Temporal Convolutional Networks for Action
Vision and Pattern Recognition (pp. 2110-2118). Segmentation and Detection," 2017 IEEE Conference on
[6] Li, Y., Møgelmose, A., and Trivedi, M. M. 2016. “Pushing Computer Vision and Pattern Recognition (CVPR),
the “Speed Limit”: High-Accuracy US Traffic Sign Honolulu, HI, 2017, pp. 1003-1012.
Recognition with Convolutional Neural Networks”. IEEE
Transactions on Intelligent Vehicles, 1(2), 167-176.
50