Bangladeshi Traffic Sign Recognition and Classification Using CNN With Different Kinds of Transfer Learning Through A New (BTSRB) Dataset

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

2023 International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE)

Bangladeshi Traffic Sign Recognition and


Classification using CNN with Different Kinds of
2023 International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE) | 979-8-3503-4745-6/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICDCECE57866.2023.10151254

Transfer Learning through a new (BTSRB) Dataset


1st Md. Abu Sayeed 2nd Md. Saiful Islam 3rd Md Babul Islam
Electrnoics and Telecommunication Electrnoics and Telecommunication School of ICT, DIMES Lab
Engineering Engineering University of Calabria
Chittagong University of Engineering Chittagong University of Engineering Rende, Italy
& Technology & Technology mdbabul.islam@dimes.unical.it
Chattogram, Bangladesh Chattogram, Bangladesh
sayeedrikhal357@gmail.com saiful05eee@cuet.ac.bd

4th Piyush Kumar Pareek 5th Tanbin Islam Rohan


Artificial Intelligence ML Researcher, Bahari Data Research
and Machine Learning House
Nitte Meenakshi Institute ofTechnology Dhaka, Bangladesh
Bengaluru India anbinislam009@gmail.com
piyush.kumar@nmit.ac.in

Abstract—An accident is the cry of a lifetime. The role of result of the adoption of this technology in transport system.
traffic signs is most important in preventing or mitigating The integration of technology and automotive innovations
accidents. In many cases, while driving, traffic signs are not holds the potential to greatly enhance safety and decrease
visible in the conventional approach; if traffic sign detection and expenses and valuable lives related to traffic accidents. [2].
recognition can alert the driver of important sign guidance Automated systems will be capable of managing traffic at
ahead, it will help in accident mitigation. Traffic sign intersections as well as on open roadways. The benefits of
recognition plays a monumental role in expert systems, such as such a system in terms of saving lives and eliminate the costs
traffic assistance driving systems and automatic driving make it clear why such a system was developed [3]. Therefore,
systems. The prime purpose of this paper is to design and
the goal of this work is to create an automatic, deep learning
identify a computer-based system that can spontaneously detect
the direction of a road sign. For this research work, we have
algorithm-based system for Bangladeshi Traffic Sign
created our own dataset, which is called the Bangladeshi Traffic Recognition Benchmark (BTSRB) identification and
Sign Recognition Benchmark (BTSRB) dataset. The dataset, recognition. The proposed method is capable of identifying
BTSRB, was created by capturing images from different angles the warning indications inside images produced by a deep
and under different parameters and conditions. A total of 7320 CNN network using camera data. Most car accidents are the
images were collected to create this comprehensive database. result of human error, such as drivers failing to notice a sign
This dataset called BTSRB all the images collected from or driving in the opposite direction from what is indicated by
Bangladesh. In this paper, we used five different types of models a sign [4] (for example, a sign indicating a speed limit of 100
(CNN, Inception V3, MobileNetV2, ResNet50, and VGG16), KM and a driver traveling at a higher speed). This paper's
which are pre-trained on the ImageNet dataset. Later, we fine- primary goal is to build and improve the effectiveness and
tuned the pre-trained model and used transfer learning. The reliability of the traffic sign detection system for Bangladeshi
main challenge of this research is collecting datasets from a Traffic Sign Recognition Benchmark (BTSRB) as well as
country like Bangladesh, where no recognized dataset is address related concerns. To prevent road accident due to
available. When compared to another model, the accuracy of misidentifications of traffic sign, real-time traffic sign
this model is greater than 91%. This paper emphasizes the recognition systems should include classification of traffic
significance of traffic sign recognition in expert systems and the signs into multiple groups. Such as Speed100Km, Zebra
necessity for a well-established dataset in nations where such crossing, Zigzag and many more. Machine learning
resources are not readily available.
techniques such as supervised learning, unsupervised
Keywords—CNN, TSD, BTSRB, Mobile NetV2, VGG16
learning, semi-supervised learning and reinforced learning
offer different approaches will help achieve this goal. Using
I. INTRODUCTION these different types of machine learning, accurate and
efficient traffic sign classification can be achieved in real-time
A number of the numerous advantages that such a system
applications. Although there are only a few basic traffic signs,
might provide, traffic sign identification and recognition have
this research deliberately chooses to employ deep learning for
increased in importance with advances in image processing.
an unsupervised learning strategy because the dataset includes
Interest in this area has also increased as a result of recent
road signs, street name signs, etc. The possibilities are endless.
advancements and the growing acceptance of self-driving
The ultimate goal is to equip automobiles with a technology
automobiles. A technology for automatically detecting and
that can detect and recognize every traffic sign in order to help
recognizing traffic signs will lead to advanced technological
the driver or aid in the self-driving process. With the use of
vehicle and prudent driving [1]. Even with a driver at the
deep learning algorithms, the system can take unlabeled data
wheel, the system might give the driver intelligent
and automatically extract features without the need for human
information, decreasing human errors that result in accidents.
input [5].
Vehicle accidents are expected to diminish significantly as a

979-8-3503-4745-6/23/$31.00 ©2023 IEEE

Authorized licensed use limited to: Sungkyunkwan University. Downloaded on August 28,2023 at 02:56:41 UTC from IEEE Xplore. Restrictions apply.
Today, traffic sign classification and identification are II. METHODOLOGY
crucial skills, particularly for unmanned autonomous driving.
Numerous studies have been conducted in order to identify A. System Overview
and categorize traffic and road signs. The Convolutional First, the collected pictures are considered input for the data
Neural Network and Support Vector Machines (CNN-SVM) set, then the pictures are preprocessed. Then feature
technique for traffic sign analysis and recognitions was extraction is done, followed by feature addition in a picture
proposed in [1-6]. This technique uses YCbCr color space as or, if necessary, feature reduction to prepare the picture for
its coloring, which is fed into a CNN to separate the color testing. After that, the entire process is completed through the
channels and extract some unique properties. After that, CNN model by checking whether the 62 classes of the
classification is done using SVM. Their suggested technique classification match the training data. The complete system
for classifying and recognizing traffic signs had a 98.6% architecture of the proposed model is shown in Fig. 1.
accuracy rate. The authors of [1-7] suggest a color-based
segmentation system with feature extraction using histogram-
oriented gradients (HOG) and classification using support
vector machines (SVM). To extract color information from a
segment, the model employs sham97 for color appearance.
The model uses the sham97 algorithm to extract color
appearance information from a segment. The authors of [1-8]
presented a scale-aware CNN-based system for classifying
and recognizing traffic signs. Their approach comprises two
CNNs, one for the classification of each region and the other
for region-specific traffic sign recommendations.
Additionally, scale-invariant detection is accomplished using
a fully convolutional network (FCM). The program succeeded
with a precision accuracy of 99.88%. For feature extraction,
the authors [1-9] suggested using HOG and local binary
patterns (LBP), which would then be fed into an extreme
learning machine network for recognition and classification.
The authors of [10] propose an extreme learning machine- Fig. 1. Proposed CNN architecture model
based traffic sign identification system (ELM). Their
approach consists of a single classifier that has been trained B. Dataset Preparing
by ELM to extract features using the histogram of the oriented A well-organized dataset is a crucial part of conducting
gradient variant (HOGv) features. good research. Since my research is focused on Bangladeshi
A number of recent researchers are currently interested in traffic signs, we have been using our own dataset.
the topic of traffic sign detection and recognition. Maximally Unfortunately, there was no benchmark dataset available for
stable external regions (MSERs) were initially recognized as Bangladeshi traffic signs. Hence, we had collected various
candidate regions in chromatic images, according to a method classes of traffic signs from major cities named Dhaka,
proposed by Shahed et al. [11]. Suggested a technique in Chattogram, Cumilla, and Chandpur. We have collected 62
which maximally stable external regions (MSERs) are first different classes. We were able to try to create the balancing
identified as candidate regions in chromatic pictures. Once the data for all of the classes with 140 images. The pictures used
images had been labeled, the detection procedure located the in this work were gathered from several Bangladeshi districts.
regions of interest (ROIs) from which HOG (Histogram of There are 759 daylight hours in all in the dataset images that
Oriented Gradient) features were generated and supplied to an include 32 different types of road signs, including cross the
SVM (Support Vector Machine) classifier for the street, railway crossing ahead, keep left, narrow bridge, no
identification phase. Although the system's performance is parking, overtaking not allowed, pedestrian crossing, railroad,
obvious, their method has trouble detecting images that are road hump, school ahead, sharp left or right bend, side road
both very old and obscured. Due to its speed and accuracy, the left or right, Sharp bend to the right, u turn not allowed. A
Single Short Multibox Detector (SSD) [12] is currently shown smart phone with a 13 MP camera was used to take all of the
as a good object detector, but in this research [13], the dataset pictures. The photos are then downsized to 1280 x 720. Then
is very small and its classes are 16 compared to that of this we used data augmentation to increase the number of samples.
proposed research. There are more than 62 classes. An The complete 62 classes are shown specifically in Table I.
approach to identifying and detecting traffic signs from video
TABLE I. DESCRIPTION OF 62 CLASSES FOR PROPSED DATASET
sequences was disclosed by [14]. They began by considering
the hue of a specific pixel and applying segmentation based Class
Class Name
Class
Class Name
on that. Then, they used linear SVMs to detect traffic signs Number Number
using the shape classification technique. Finally, they used Advanced
1 32 Road Work
Direction Sign
SVMs with a Gaussian kernel to recognize the inner region. 2 Bridge Ahead 33 National Speed Limit
Their technology, however, is unable to pick up a sign when 3 Bus Stop 34 School Ahead
the background and the traffic sign are somewhat similar. Sharp Bend to the
Additionally, their method disregards conservation state, 4 Check point 35
left
rotational angle, and deformation [16-21]. 5
Cycle Round
36
Sharp Bend to the
Ahead Right
Cross Road Sharp Change
6 37
Ahead Direction Left

Authorized licensed use limited to: Sungkyunkwan University. Downloaded on August 28,2023 at 02:56:41 UTC from IEEE Xplore. Restrictions apply.
Double Bend First
Sharp Change of Trainable params: 1,410,517
7 38 Direction TO the
left
Right
Double side
8 Dangerous 39 Side Road Left
Obstacle
9 Drive only Right 40 Small Round About
10 Ferry Ahead 41 Speed Breaker
Fire satiation
11 42 Speed Limit 10
Ahead
12 Hospital Ahead 43 Speed Limit 15
13 Keep Left 44 Speed Limit 20
14 Median Gap 45 Speed Limit 25
15 Mosque Ahead 46 Speed Limit 30
16 Narrow Road Left 47 Speed Limit 35
No Heavy
17 48 Speed Limit 40
Vehicle
18 No Horn 49 Speed Limit 50 Fig. 2. Data management for Training and Testing
No Motor
19 50 Speed Limit 60
Vehicles
20 No Overtaking 51 Step hill Downwards
21 No Parking 52 Step hill Upwards
22 No Turn Left 53 Stop
23 No U Turn 54 Stopping Prohibited
24 No cycle 55 T-junction
Traffic Merges from
25 Other Danger 56
Left
26 No Turn Right 57 Traffic Signal
Pedestrian
27 58 Two Way
Crossing
28 Police Ahead 59 U-Turn
Railway Crossing U turn And Median
29 60
Ahead gap
30 Road Closed 61 Zebra Crossing
Road Narrow
31 62 Zigzag Road
Both Side Fig. 3. Convolution Neural Network Model

C. Data Augmentation The input is processed by two Conv2D layers, which


In data analytics, "data augmentation" corresponds to the extract features from the input data using filters. From fig 3
addition of significantly transformed representations of the output shape of the second Conv2D layer is (None, 10,
existing data or newly generated data by generated 1332, 120), indicating that the number of filters has increased
information from existing data to increase the amount of data. from 1 to 120. Output of the Conv2D layers is passed through
It serves as a regularized and aids in lowering over fitting Activation layers, which apply activation functions to
when a machine learning model is being trained. Data introduce non-linearity into the model.
augmentation is used to extend the training data set in order to
Then output of the Activation layers is passed through a
improve the effectiveness of the CNN model. Fig 2 explains
Dropout layer, which randomly sets a portion of the
that the method of create new photos from captured images.
activations to zero during training to prevent overfitting.
In This research work applied a variety of augmentation
Dropout layer is flattened into a 1D array using the Flatten
techniques, such as shearing, trimming and zooming of the
layer. The flattened output is passed through two Dense layers,
training images, flipping of the images along the vertical and
which perform classification by computing the dot product
horizontal axes, some degree of random rotation of the
between the flattened activations and a set of weights. The
images, and horizontal and vertical shifting of the training
final output has shape (None, 61), indicating that the model is
samples.
making 61 different class predictions.
Layer (type) Output Shape Param #
Dense layers are passed through Activation layers, The
conv2d (Conv2D) (None, 32, 3998, 1) 320 conv2d_1 output is then flattened and processed by two Dense layers for
(Conv2D) (None, 10, 1332, 120) 1200 Activation classification, with the final output being 61 class predictions.
(Activation) (None, 10, 1332, 120) 0 conv2d_2 All 1,410,517 parameters in the model are trainable. Total
(Conv2D) (None, 3, 444, 120) 129720 activation_1 Output classification image will appear when different kinds
(Activation) (None, 3, 444, 120) 0 Dropout (Dropout) of transfer learning applied in to the system fig. 4 show the
(None, 3, 444, 120) 0 Flatten (Flatten) (None, result of classifications.
159840) 0 Dense (Dense) (None, 8) The activation layer plays a crucial role in introducing
1278728 Activation_2 (Activation) (None, 8) 0 non-linearity to the model, aiding in the classification of input
Dropout_1 (Dropout) (None, 8) 0 Dense_1 images into different traffic sign classes. The layer features 61
(Dense) (None, 61) 549 Activation_3 neurons, and the input shape corresponds to a batch of images,
(Activation) (None, 61) 0 each comprising 3 color channels (RGB) and dimensions of
=========================================== 444x120 pixels. The output of the activation layer is a 61-sized
Total params: 1,410,517 vector that reflects the likelihood of each class being present
in the input image. By selecting the class with the highest

Authorized licensed use limited to: Sungkyunkwan University. Downloaded on August 28,2023 at 02:56:41 UTC from IEEE Xplore. Restrictions apply.
probability, the model determines the predicted class for the
input image. The bias term that comes with each neuron in the
activation layer has a value of 0. Overall, the activation
function serves as a crucial component of a neural network,
making the model more expressive and capable of addressing
intricate tasks such as traffic sign recognition.
III. RESULT AND PERFORMANCE ANALYSIS

Fig. 4. Recognized the traffic sign

Fig. 5. Training and Test Accuracy for designed CNN Model Fig. 8. Normalized Confusion Matrix

Training in Convolutional Neural Networks (CNNs)


involves updating the parameters of the model to minimize the
difference between its predictions and the actual ground truth.
Form fig. 5 it will clear observations the test accuracy of
proposed model gets higher value compare then pertained
model. This is achieved by using a labeled dataset, where each
instance is paired with its corresponding label. During
training, the model is presented with inputs, and its predictions
are compared to the actual labels. Based on the difference
between these two, the model adjusts its parameters to reduce
the error and improve its accuracy. The training process is
Fig. 6. Training and Test Accuracy for Rest Net50 repeated for multiple epochs until the error reaches a
satisfactory level.
Accuracy is generally used quantity to evaluate the
performance of CNNs. Form fig 6 shows that accuracy of this
model is 91% It measures the proportion of instances that the
model has correctly classified compared to the total number of
instances. Specifically, accuracy is calculated as the number
of correct predictions divided by the total number of
predictions. In this research from the fig. 7 its accuracy more
than 91%
However, it is important to note that accuracy may not
always be the best metric to use, particularly when the dataset
is imbalanced, meaning that there is a significant difference in
the number of instances belonging to different classes. In these
Fig. 7. Training and Test Accuracy for MobileNetV2 cases, alternative metrics, such as precision, recall 91.68 and
F1-score 91.53, also provide a more comprehensive
evaluation of the model's performance at fig 8(a) and 8(b)

Authorized licensed use limited to: Sungkyunkwan University. Downloaded on August 28,2023 at 02:56:41 UTC from IEEE Xplore. Restrictions apply.
normalized confusion matrix helps to understand the system [9] S. Srinivas, R. K. Sarvadevabhatla, K. R. Mopuri, N. Prabhu, S. S.
very efficiently Kruthiventi and R. V Babu, "A taxonomy of deep convolutional neural
nets for computer vision," Frontiers in Robotics and AI, vol. 2, pp. 36,
Form Table II it will crystal clear for understanding the Jan 2016
proposed system, accuracy and result. A confusion matrix is a [10] Y. Yu, Z. Sun, W. Zhu and J. Gu, "A Homotopy Iterative Hard
table used to evaluate the performance of a classifier, such as Thresholding Algorithm With Extreme Learning Machine for Scene
Recognition," in IEEE Access, vol. 6, pp. 30424-30436, 2018
a Convolutional Neural Network (CNN). It is used to visualize
[11] M. Shahed, M. A. U. Khan and S. A. Chowdhury, "Detection and
the performance of a model in terms of its ability to accurately recognition of Bangladeshi road sign based on maximally stable
predict the true class of a given instance [15]. The matrix extremal region," 3rd International Conference on Electrical
contains four types of entries: True Positive (TP), False Information and Communication Technology (EICT), Khulna,
Positive (FP), True Negative (TN), and False Negative (FN). Bangladesh, pp. 1-6, 2017
[12] Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC.
Based on these entries, different metrics can be calculated Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016:
to evaluate the performance of the model, such as accuracy, 14th European Conference, Amsterdam, The Netherlands, Springer
precision, recall, and F1-score. Confusion matrix is evaluating International Publishing, Proceedings, Part I, pp. 21-37 October 11–
the strengths and weaknesses of a classifier and to determine 14, 2016
areas of improvement of the CNN model. [13] S. M. M. Ahsan, S. Das, S. Kumar and Z. La Tasriba, "A detailed study
on Bangladeshi road sign detection and recognition," 4th International
Conference on Electrical Information and Communication Technology
TABLE II. MODEL COMPARISON RESULT (EICT), Khulna, Bangladesh, pp. 1-6, 2019
Model Accuracy F1-Score Recall Precision [14] S. Ahmed, U. Kamal and M. K. Hasan, "DFR-TSD: A deep learning
CNN 91.05% 91.53 91.54 91.68 based framework for robust traffic sign detection under challenging
VGG16 90.31% 90.32 90.31 90.66 weather conditions," in IEEE Transactions on Intelligent
MobileNetV2 89.65% 89.66 89.65 89.97 Transportation Systems, vol. 23, no. 6, pp. 5150-5162, June 2022.
RestNet50 89.65% 89.70 89.65 89.00 [15] M. B. Islam, K. S. Islam, A. Noman, J. Ncube and X. Chen, "A Fiber
InceptionV3 90.005 90.53 90.54 90.68 Wireless Improved 5G Network-Based Virtual Networking System
Focused on Equal Bandwidth," 2021 2nd International Symposium on
IV. CONCLUSION Computer Engineering and Intelligent Communications (ISCEIC),
2021, pp. 439-445, doi: 10.1109/ISCEIC53685.2021.00098
Deep learning architectures for recognizing various types [16] M. B. Islam, C. Avornu, P. K. Shukla and P. K. Shukla, "Cost Reduce:
of traffic signals are evaluated and compared. The system will Credit Card Fraud Identification Using Machine Learning," 2022 7th
aid driver assistance systems and intelligent vehicles in International Conference on Communication and Electronics Systems
(ICCES), 2022, pp. 1192-1198, doi:
making speed-related decisions. BSTRB dataset will help to 10.1109/ICCES54183.2022.9835811.
understand the positions of Bangladeshi traffic system where
[17] Md Babul Islam, Khandaker Sajidul Islam, Md Helal Khan, Abdullah
peoples aren’t willing to follow the traffic rules. We will MMA Al Omari, and Swarna Hasibunnahar "Detect deception on
expand our model to further classify and compare it to other banking credit card payment system by machine learning classifiers",
models, as well as attempt to work with overall Bangladeshi Proc. SPIE 12339, Second International Conference on Cloud
traffic signs. Computing and Mechatronic Engineering (I3CME 2022), 1233927 (28
September 2022); https://doi.org/10.1117/12.2655113
REFERENCES [18] Islam, M.B., Hasibunnahar, S., Shukla, P.K., Shukla, P.K., Rawat, P.,
Dange, J. (2023). Twitter Opinion Mining on COVID-19 Vaccinations
[1] M. T. Riaz, M. J. Sarwar, M. Imran, R. Hussain and M. F. Moosa. "The by Machine Learning Presence. In: Khanna, A., Gupta, D., Kansal, V.,
intelligent transportation systems with advanced technology of sensor Fortino, G., Hassanien, A.E. (eds) Proceedings of Third Doctoral
and network," 2021 International Conference on Computing, Symposium on Computational Intelligence . Lecture Notes in
Electronic and Electrical Engineering (ICE Cube), Quetta, Pakistan, Networks and Systems, vol 479. Springer, Singapore.
pp. 1-6, Mar 2021 https://doi.org/10.1007/978-981-19-3148-2_4
[2] S. Chen, M. Kuhn, K. Prettner, and D. E. Bloom, "The global [19] M. B. Islam, S. Hasibunnahar, P. K. Shukla, P. K. Shukla and V. Jain,
macroeconomic burden of road injuries: estimates and projections for "Pandemic Outbreak Time: Evaluation of Public Tweet Opinion by
166 countries," The Lancet Planetary Health, vol. 3, no. 9, pp. 390 - Machine Learning," 2022 IEEE International Conference on Current
398, Sep 2019 Development in Engineering and Technology (CCET), Bhopal, India,
[3] V. Shepelev, S. Zhankaziev, S. Aliukov, V. Varkentin, A. Marusin, A. 2022, pp. 1-6, doi: 10.1109/CCET56606.2022.10080351.
Marusin,and A. Gritsenko, "Forecasting the passage time of the queue [20] K. S. Islam, M. B. Islam, C. Avornu, J. Lou and P. K. Shukla,
of highly automated vehicles based on neural networks in the services "Blockchain Based New E-voting Protocol System without Trusted
of cooperative intelligent transport systems," Mathematics, vol. 10, no. Tallying Authorities," 2022 Fifth International Conference on
2, pp. 282, Jan 2022 Computational Intelligence and Communication Technologies
[4] A. J. Khattak, N. Ahmad, B. Wali, and E. Dumbaugh, "A taxonomy of (CCICT), Sonepat, India, 2022, pp. 311-317, doi:
driving errors and violations: Evidence from the naturalistic driving 10.1109/CCiCT56684.2022.00064.
study," Accident Analysis & Prevention, no. 151, pp. 105873, Mar [21] Luce, R. D., & Perry, A. D. (1949). A method of matrix analysis of
2021 group structure.
[5] L. Yang, and G. Cervone, "Analysis of remote sensing imagery for
disaster assessment using deep learning: a case study of flooding
event," Soft Computing, vol. 23, no. 24, pp. 13393-408, Dec 2019
[6] Y. Yu, Y. Lai, H. Wang, L. Lin, "Traffic signal recognition with a
priori analysis of signal position. In communication technologies for
vehicles," 13th International Workshop, Springer International
Publishing Proceedings, Madrid, Spain, vol. 13, pp. 137-148 May 17-
18, 2018
[7] A. Choudhury, H. S. Rana and T. Bhowmik, "Handwritten bengali
numeral recognition using HOG based feature extraction algorithm,"
2018 5th International Conference on Signal Processing and Integrated
Networks (SPIN), Noida, India, pp. 687-690, 2018
[8] Y. Yang, S, Liu, W. Ma, Q. Wang and Z. Liu, " Efficient traffic-sign
recognition with scale-aware CNN," BMVC, pp. 1805-12289, 2017

Authorized licensed use limited to: Sungkyunkwan University. Downloaded on August 28,2023 at 02:56:41 UTC from IEEE Xplore. Restrictions apply.

You might also like