Supervised Traffic Signs Recognition in Digital Images Using Interest Points

Proceedings of XI Workshop de Visão Computacional ‐ October 05th‐07th, 2015
Supervised Traffic Signs Recognition in Digital

Images using Interest Points
Matheus Gutoski, Gilmário Barbosa dos Santos1, Chidambaram Chidambaram
Departamento de Sistemas de Informação – São Bento do Sul
1
Departamento de Ciência da Computação – Joinville
Universidade do Estado de Santa Catarina - UDESC
Brasil
matheusgutoski@gmail.com, gilmario.barbosa, chidambaram@udesc.br
Abstract—This paper presents an approach to recognize the traffic sign recognition works have been proposed since 1990.
traffic signs in digital images using features extracted from In the recent years, many papers have been published trying to
interest points. In recent decades the local features have been find efficient solutions to this problem. The importance of
used for many object recognition applications. Hence this work is safety for drivers, passengers and pedestrians has been
developed using SIFT and SURF with digital images which receiving a growing attention recently [6]. In this context,
mainly involve changes in scale and rotation. Both SIFT and driver support systems are important tools to reduce accidents
SURF have been applied to the traffic sign recognition problem caused by human failure.
in a supervised way using a set of images captured in real-life
traffic scenarios. These images were divided into three categories, Traffic sign recognition has many applications, for
according to the distance and rotation of the traffic sign. example, autonomous navigation, as presented by Farag and
Experimental study was conducted by extracting features from Abdel-hakim [7], where a vehicle collects road data using
traffic images and matching with features extracted from some sensors, and feeds the data into an intelligent system.
templates. The results obtained from the experiments prove that Maldonado-Bascón and colleagues [8] suggested extracting
the interest points can be used for the traffic sign recognition information from traffic sign images to determine their
problem. physical conditions, allowing the maintenance or replacement
of traffic signs in poor conditions.
Keywords—SIFT, SURF, Interest points, Feature Extraction,
Traffic sign. Different methods have been proposed to detect traffic
signs. Arlicot and Collegues [9] wrote that the most common
1. INTRODUCTION methods are detection by color and shape. Traffic signs are
usually red or blue, with the presence of black and white. As
In the last decades, the number of automobiles has grown for the shape, traffic signs are usually rectangular, triangular or
exponentially, increasing the risks of collisions and other circular. However, Miura and colleagues [2] point out that
accidents that may be caused by human failure. This scenario colors are sensitive to changes in illumination, and will
created a need for new technologies to aid vehicle drivers. In eventually fade over the time. Neural network approaches
fact, traffic signs serve as a visual language for drivers in the have also been used in the traffic sign recognition problem.
public roads and streets. In this scenario, traffic sign Lorsakul and Suthakorn [10], and Fang and colleagues [11]
recognition is useful for driver assistance and autonomous developed applications using artificial neural networks to
navigation. Hence it becomes a widely studied subject, due to detect and recognize traffic signs, obtaining high recognition
its great complexity and wide range of applications. rates.
The presence of image variations due to the environmental To overcome the image problems that appear in different
conditions from where images are captured is one of the major image scenarios, many robust algorithms for object recognition
obstacles in any recognition process. The same problem occurs have been proposed. Among these algorithms, SIFT (Scale
in traffic sign recongition since the images are naturally Invariant Feature Transform) [3] and SURF (Speeded up
captured from roads and street views. These problems are Robust Features) [4] have been widely used in various fields of
discussed by Vitabile and collegues [1] and Miura and object recognition in recent decades.
colleagues [2]. They present a variety of factors which makes
traffic sign recognition a difficult task. For instance, climatic Interest points can be an effective way to detect objects in
conditions, shadows cast by other objects, partial occlusion and complex scenarios where simple correlation-based methods
the physical conditions of the traffic sign are some of the may be unreliable. To detect interest points, local image
problems which make traffic sign identification a challenge in features which are invariant to illumination, scale, translation
computer vision and object recognition. and rotation are identified. Instead of single pixels which are
not representative, interest points gather information from a
In recent decades, many approaches based on different neighborhood of pixels, providing more information and
methods have been proposed to obtain efficient traffic sign
recognition systems. According to Ruta and Colleagues [5],
158
describing the local image features like shape, color and Interest points are detected by utilizing a Hessian matrix,
texture [12]. which shows good performance regarding computation time
and accuracy. Each interest point is then represented by a
In computer vision, interest points have been used in a vast feature vector of 64 elements which are calculated using the
amount of applications. For instance, face recognition Haar-Wavelet [4].
applications have been developed using invariant features
extracted from interest points, achieving good results [12]. Experiments on object recognition performed by the
authors have shown that SURF can outperform other methods
Using SIFT, Silva and colleagues [13] and Farag and
by performance and accuracy [4].
Abdel-Hakim [7] developed applications to recognize traffic
signs. SIFT is considered invariant to changes in illumination,
scale and orientation [14]. Hoferlin and Heidemann [15]
consider SIFT one of the most adequate methods to solve the B. SIFT
traffic sign recognition problem. SIFT is a feature detector and descriptor that has been
widely used in computer vision applications since its
SURF has also been used in traffic sign recognition.
publication by David G. Lowe in 1999 [3], due to its various
Solanki and Dixit [16] approached the problem using SURF in
conjunction with other methods. The SURF algorithm also applications.
provides scale and rotation invariance, however, it is Interest points detected by this method are invariant to
considered as faster than other feature matching algorithms image rotation, translation and scaling, while also showing
[4]. robustness regarding changes in illumination, addition of noise
and 3D viewpoint [14].
Based on the context exposed in this section, the main In order to detect interest points, SIFT uses a Difference of
objective of the present work is to evaluate the methods using a Gaussians function. The detected interest points are highly
variety of real-life images obtained under different conditions. distinctive among them, providing high accuracy when a
This paper presents a supervised approach of traffic sign
match occurs [14].
recognition using SIFT and SURF on a set of digital images
Each interest point contains a feature vector of 128
containing traffic signs under different angles and distances. In
addition to the application of SIFT and SURF, the contribution elements, which are calculated based on gradient magnitude
of this work will also include the identification of image and orientation in a region around the interest point [14].
conditions under which the methods achieves a good rate of The author has shown that the algorithm can extract a large
recognition. Thus other complementary methods can be added amount of interest points from an image, detecting even small
to the present approach to improve the results which really can or highly occluded objects [14].
be useful for real applications.
3. TRAFFIC SIGN RECOGNITION
This paper is organized as follows: in Section II we explain This paper proposes an approach to recognize traffic signs
about the fundamentals of interest point detectors SIFT and using images obtained under different conditions. In order to
SURF; in section III, the methodology of the present work is perform this task, a database of real-traffic images captured
detailed which includes the brief description of image database from different scenarios was created. In addition to the
and how the interest points are extracted from real traffic database, a set of traffic sign templates was also defined to
scenary images. In Section IV the experimental setup and the recognize them in the traffic images. This section also
results obtained using different types of query images are addresses the other important aspects of the study such as the
described. Final conclusions and future work directions are definition of thresholds, ROI (Region of Interest) and
drawn in Section V. similarity measure.
As defined by the methods, the SIFT experiments were
2. INTEREST POINT DETECTORS performed using a 128 element feature vector meanwhile, in
In this section, the general description about SURF and SURF, a 64 element feature vector was used.
SIFT is provided. The first step is to find all the interest points in the traffic
image and the template image. Then, each of the interest
A. SURF points descriptors from the template are compared to each of
SURF is an algorithm introduced by Herbert Bay, used to the descriptors from the traffic image, in order to find the two
detect local invariant features in images. These features have nearest neighbors for each interest point. One nearest neighbor
many applications, such as locating objects and faces, camera is defined as the interest point with the smallest distance to
calibration, 3D reconstruction, and others [4]. another point.
SURF was developed aiming for a faster performance and Once the two nearest neighbors for each and every interest
higher accuracy than the existing algorithms such as SIFT. To points of template image have been found, the closest
increase performance, SURF descriptor dimension was neighbor will be selected as a true match if the distance
reduced to 64 elements, instead of 128 elements, while trying between two neighbors and the second closest neighbor is
to keep them distinctive and robust to noise, changes in below pre-defined threshold. In addition, an interest point will
illumination and point of view [4]. only be considered as a true match if it lies within the defined
159
ROI. The ROI definition is used to check whether the Some sample images of category 1 images are shown in
proposed approach can really recognize the traffic sign. “Fig 2”.
The last step is to count how many matches occurred
between the traffic image and the template. The result will be
considered positive or negative based on a second threshold,
which defines the number of matches necessary to confirm the
presence of the template in the traffic image.
A. Types of Thresholds
In this study, two types of thresholds were
established. The first threshold (T1) is the distance between
the descriptors of the two interest points, one from the
template and another from the traffic image. This distance
determines whether two interest points are similar and in other Fig. 2. Sample images of category 1 in different scenarios
words, both points can be considered as a match. The second
threshold (T2) is the number of matched interest points
necessary within the ROI to decide whether the object has
been detected or not. Selected values for each threshold are Category 2: images were obtained at a distance of 10 meters
presented in the next Section IV. or more and present no rotation relative to the y-axis, as
shown in “Fig. 3”.
B. Similarity Measurement
All the interest points that exists in the template are
matched to every interest point of the traffic image. For each
point, the two closest matches are selected. Similarity
measurement is performed by comparing the distance (D1 and
D2) between the two closest matches to an interest point. This
condition is used for SIFT and SURF. A match occurs when
equation 1 is true. The closest point is then defined as the
matching point.
100 * (D1²) > T1² * (D2²) (1)
D1 represents the distance between the descriptors of the

Fig. 3 Sample image of category 2
closest matching interest points meanwhile D2 represents the
distance between the descriptors of the next closest matching
interest point. In equation 1, D1² is multiplied by 100 to
balance the two sides of the equation, as the value of D1² must “Fig. 4” shows category 2 images in different scenarios.
be bigger than a fraction of the value of D2². This fraction is
determined by the threshold value squared.
C. Image Database
The traffic images used in this work were captured
specially to conduct the experiments. Totally 283 traffic
images were obtained under different conditions. These
images contain 25 different types of traffic signs. The images
have a resolution of 2560x1536 pixels and were classified into
three different categories. Each category has approximately
the same number of images.
Category 1: images were obtained at a short distance of 10
meters or less and present no rotation relative to the y-axis, as Fig. 4. Sample images of category 2 in different scenarios
shown in “Fig. 1”.
Category 3: contains a traffic sign in a similar distance to

the category 1 and presents a rotation relative to the y-axis
(change in viewpoint), as shown in “Fig. 5”.
160
Fig. 1 Sample image of category 1
the interest points that are within the ROI are considered as
true matches.
4. EXPERIMENTS AND RESULTS
All experiments were run on a desktop computer with

AMD Athlon 2.70GHz processor and 2GB memory under
Windows operating system. The algorithm is written in
language C using library functions of OpenCV.
To conduct the experiments, all traffic images are matched
Fig. 5 Sample image of category 3 one by one to each of the traffic sign templates. When a traffic
image is matched to its corresponding template, the result may
be a true positive (TP), in case of recognition. On the other
“Fig. 6” shows category 3 images captured under different side, the results may be a true negative (TN), in case the object
traffic scenarios. is not detected. In this work, for this moment, only TP and TN
conditions are considered to evaluate the proposed approach.
In addition to this, the overall execution time is not also
evaluated for comparison purposes.
A. Thresholds Definition
In order to find the suitable thresholds to the image

database, a set of 36 images, containing 12 images from each
category was selected. The traffic images were matched to
each of the templates, as described in Section III. This
experiment was performed multiple times, varying the distance
and number of matches theshold, covering different set of
combinations. The distance threshold value ranges from 6 to 8,
Fig. 6 Sample images of category 3 in different scenarios with an increment of 0.2 per experiment. Each of these values
was combined with the number of matches threshold, which
All templates with a varying resolution from 280x280 to ranges from 3 to 7, with an increment of 1 per experiment.
320x320 were obtained from images of category 1. A total of Totally, 55 different threshold combinations were tested to find
25 templates were used in this study. Images used for out the best set of thresholds. This experiment was performed
extraction of templates were removed from the experiments. twice, one for SIFT and another for SURF, since their
“Fig. 7” shows the templates that were used in this study. implementations are not similar.
In order to define the best set of thresholds among the
results obtained from the experiments, the combination of
thresholds which presented the best average recognition rate
was selected. ”Table 1.” shows the selected combinations for
each of the algorithms.
Table 1 Selected thresholds

Method Threshold Threshold Results Average
1 2
SIFT 6.6 3 TP:86% 84,5%
TN:83%
SURF 7.4 3 TP:77% 83,5%
TN:90%
Fig. 7. Traffic sign templates
The threshold 1 stands for the distance between two
D. Definition of Region of Interest interest points and the threshold 2 stands for the number of
matched interest points necessary within the ROI to determine
This study is conducted in a supervised way. To decide whether the object exists in the scene or not.
whether the algorithm has successfully detected a traffic sign
or not, a ROI was manually defined for all traffic images used
in the experiments. This ROI is nothing but the coordinates of
the traffic sign in the image. When performing the matching,
161
B. Experimental Results rates of TP reach close to 90% in both methods. However, a

All the experiments were conducted using the set of small difference of 6, 9% between SIFT and SURF can noted
thresholds defined in the Section IV. Experiments were in the case of TN.
performed by matching every traffic image to each of the 25 In the next image category 2, the size of the traffic signs
templates, resulting in 7075 matching experiments. is reduced regarding the category 1 since the image is captured
at a distance of more than 10 meters. Under condition TP,
An example image with matched interest points is shown in
SIFT has been affected the least by the change in scale,
“Fig. 8”.
recognizing about 80% of the traffic signs, while SURF
dropped the rate down to 71%.
Rotation was the factor that affected both algorithms the
most. Rotation relative to the y-axis (change in viewpoint)
showed to be a strong complicating factor in this experiment.
Both methods have shown difficulties in recognizing a traffic
sign in category 3 images. The robustness of the methods can
be found in the results when it comes to x-axis rotation.
However, when the rotation occurs in the y-axis, some interest
points can no longer be detected, and thus the drop in
recognition rates.
5. CONCLUSIONS
Fig. 8 Matched interest points in category 1 image
Recognition of objects in environments with no control
over the variables that hinder the process has become a
permanent study in the research world. In this direction, among
Blue lines represent the interest points matched within the many other applications, the study of the traffic sign
region of interest. Red lines represent the interest points recognition becomes a key topic to aid vehicle drivers which
matched outside the region of interest, as shown in “Fig. 9”. may reduce the number of traffic accidents.
The present study is aimed to construct a traffic image
database containing different real-life scenarios and to present
an approach to recognize traffic signs using interest points.
Interest points have shown to be an effective way to recognize
objects in complex scenarios like the traffic environment.
Hence, two well-known methods that detects interest points,
SIFT and SURF are evaluated in this work.
Both methods SIFT and SURF, present satisfactory results
and allow traffic signs to be recognized under different
conditions. However, the results show that both methods are
not powerful enough to recognize the traffic images captured
under different angles and inclinations. Since this type of
Fig. 9 Matched interest points in a different scenario images is part of the real traffic scenarios, a specific
investigation will be required.
The results were divided according to the image As a future work, the image database will be expanded to
categories, in order to show the behavior of each method in evaluate the methods under a large variety of image conditions.
different scenarios. ”Table 2.” shows the average recognition In this work, the traffic images obtained with rotation in the y-
rates obtained with each method. axis certainly reduced the recognition rates regarding other
categories. Therefore, in addition to the methods studied in the
Table 2 Average recognition rates (%) obtained from work, a new attempt will be done using other features
three category of traffic images extraction methods as an additional step to increase the
recognition rates.
Method Category Category Category
Other factors that can make the recognition of traffic signs
1 2 3
difficult are changes in illumination, poor physical conditions
SIFT TP:88,6% TP:80,2% TP:64% and occlusion of the traffic signs. In fact, these conditions are
TN:79,2% TN:93,9% TN:85,7% unavoidable in outdoor scenarios. Hence, these problems will
SURF TP:89,8% TP:71,2% TP:66% also be studied in future works.
TN:86,1% TN:94,3% TN:94,7%
Category 1 images presents a traffic sign at a short
distance with no rotation. In this category, the recognition
162
REFERENCES databases maps.” International Archives of Photogrammetry, Remote

Sensing and Spatial Information Sciences, pp. 205-210, 2009.
 A. Lorsakul, J. Suthakorn, “Traffic sign recognition for intelligent
 S. Vitabile., A. Gentile, F. Sorbello, “A neural network based automatic vehicle/driver assistance system using neural network on OpenCV.”
road signs recognizer.” In: Neural Networks, pp. 2315-2320, 2002. Proceedings of the 4th International Conference on Ubiquitous Robots
 J. Miura, T. Kanda, Y. Shirai, ”An active vision system for real-time and Ambient Intelligence , pp. 22-24, 2007.
traffic sign recognition.” Intelligent Transportation Systems, pp. 52-57,  C. Y. Fang , C. S. Fuh , P. S. Yen , S. Cherng and S. W. Chen. “An
2000. automatic road sign recognition system based on a computational model
 D.G. Lowe, “Object recognition from local scale-invariant features.” of human recognition processing.” Computer Vision and Image
The proceedings of the seventh IEEE international conference, pp. 1150- Understanding, v. 96, n. 2, pp. 237-268, 2004.
1157, 1999.  C. Chidambaram, M.S Marçal, L.B. Dorini, H.V. Neto, H.S Lopes,”An
 H. Bay, A. Ess, T. Tuytelaars, and L. VanGool, “Speeded-up robust Improved ABC Algorithm Approach Using SURF for Face
features (surf),” Computer vision and image understanding, vol. 110, Identification”,Lecture Notes in Computer Science. 1ed : Springer
no.3, pp.346–359, 2008. Berlin Heidelberg, 2012, v.,p. 143-150.
 A. Ruta, Y. Li, X. Liu, “Real-time traffic sign recognition from video by  F.A. Silva, A.O. Artero, M.S.V. de Paiva, and R.L. Barbosa, “Uma
class-specific discriminative features.” Pattern Recognition, v. 43, n. 1, Metodologia para Detectar e Reconhecer Placas de Sinalização de
pp. 416-430, 2010. Trânsito.” VIII Workshop de Visão Computacional. Goiânia: [s.n.].
 M.A. Garrido, M. Ocaña, D.F. Llorca, E. Arroyo, J. Pozuelo, M. 2012.
Gavilán, “Complete vision-based traffic sign recognition supported by  D.G. Lowe, “Distinctive image features from scale-invariant keypoints.
an I2V communication system.” Sensors, v. 12, n. 2, pp. 1148-1169, International journal of computer vision,” v. 60, n. 2, pp. 91-110, 2004.
2012.
 B. Höferlin and G Heidemann, “Selection of an Optimal Set of
 A.A. Farag, A.E. Abdel-Hakim, “Detection, categorization and Discriminative and Robust Local Features with Application to
recognition of road signs for autonomous navigation.” Proceedings of Traffic Sign Recognition”, Proc. WSCG, 18th Int. Conf. in Central
ACIVS, pp. 125-130, 2004. Europe on Computer Graphics, Visualization and Computer Vision. vol.
 S. Maldonado-Bascon, S. Lafuente-Arroyo, P. Gil-Jimenez, H. Gomez- 18, 2010, pp. 9-16.
Moreno and F. Lopez-Ferreras. “Road-sign detection and recognition  D.S. Solanki, G. Dixit, “Traffic Sign Detection and Recognition Using
based on support vector machines.” Intelligent Transportation Systems, Feature Based and OCR Method.” International Journal for Research in
v. 8, n. 2, pp. 264-278, 2007. Science Engineering and Technology. Vol. 2, pp. 32-40, 2012.
 A. Arlicot, B. Soheilian, and N. Paparoditis. “Circular Road sign
extraction from street level images using colour, shape and texture
163

Supervised Traffic Signs Recognition in Digital Images Using Interest Points

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Supervised Traffic Signs Recognition in Digital Images Using Interest Points

Uploaded by

Copyright:

Available Formats

Proceedings of XI Workshop de Visão Computacional ‐ October 05th‐07th, 2015

Supervised Traffic Signs Recognition in Digital

D1 represents the distance between the descriptors of the

Category 3: contains a traffic sign in a similar distance to

4. EXPERIMENTS AND RESULTS

All experiments were run on a desktop computer with

In order to find the suitable thresholds to the image

Table 1 Selected thresholds

B. Experimental Results rates of TP reach close to 90% in both methods. However, a

REFERENCES databases maps.” International Archives of Photogrammetry, Remote

You might also like