Professional Documents
Culture Documents
Supervised Traffic Signs Recognition in Digital Images Using Interest Points
Supervised Traffic Signs Recognition in Digital Images Using Interest Points
Abstract—This paper presents an approach to recognize the traffic sign recognition works have been proposed since 1990.
traffic signs in digital images using features extracted from In the recent years, many papers have been published trying to
interest points. In recent decades the local features have been find efficient solutions to this problem. The importance of
used for many object recognition applications. Hence this work is safety for drivers, passengers and pedestrians has been
developed using SIFT and SURF with digital images which receiving a growing attention recently [6]. In this context,
mainly involve changes in scale and rotation. Both SIFT and driver support systems are important tools to reduce accidents
SURF have been applied to the traffic sign recognition problem caused by human failure.
in a supervised way using a set of images captured in real-life
traffic scenarios. These images were divided into three categories, Traffic sign recognition has many applications, for
according to the distance and rotation of the traffic sign. example, autonomous navigation, as presented by Farag and
Experimental study was conducted by extracting features from Abdel-hakim [7], where a vehicle collects road data using
traffic images and matching with features extracted from some sensors, and feeds the data into an intelligent system.
templates. The results obtained from the experiments prove that Maldonado-Bascón and colleagues [8] suggested extracting
the interest points can be used for the traffic sign recognition information from traffic sign images to determine their
problem. physical conditions, allowing the maintenance or replacement
of traffic signs in poor conditions.
Keywords—SIFT, SURF, Interest points, Feature Extraction,
Traffic sign. Different methods have been proposed to detect traffic
signs. Arlicot and Collegues [9] wrote that the most common
1. INTRODUCTION methods are detection by color and shape. Traffic signs are
usually red or blue, with the presence of black and white. As
In the last decades, the number of automobiles has grown for the shape, traffic signs are usually rectangular, triangular or
exponentially, increasing the risks of collisions and other circular. However, Miura and colleagues [2] point out that
accidents that may be caused by human failure. This scenario colors are sensitive to changes in illumination, and will
created a need for new technologies to aid vehicle drivers. In eventually fade over the time. Neural network approaches
fact, traffic signs serve as a visual language for drivers in the have also been used in the traffic sign recognition problem.
public roads and streets. In this scenario, traffic sign Lorsakul and Suthakorn [10], and Fang and colleagues [11]
recognition is useful for driver assistance and autonomous developed applications using artificial neural networks to
navigation. Hence it becomes a widely studied subject, due to detect and recognize traffic signs, obtaining high recognition
its great complexity and wide range of applications. rates.
The presence of image variations due to the environmental To overcome the image problems that appear in different
conditions from where images are captured is one of the major image scenarios, many robust algorithms for object recognition
obstacles in any recognition process. The same problem occurs have been proposed. Among these algorithms, SIFT (Scale
in traffic sign recongition since the images are naturally Invariant Feature Transform) [3] and SURF (Speeded up
captured from roads and street views. These problems are Robust Features) [4] have been widely used in various fields of
discussed by Vitabile and collegues [1] and Miura and object recognition in recent decades.
colleagues [2]. They present a variety of factors which makes
traffic sign recognition a difficult task. For instance, climatic Interest points can be an effective way to detect objects in
conditions, shadows cast by other objects, partial occlusion and complex scenarios where simple correlation-based methods
the physical conditions of the traffic sign are some of the may be unreliable. To detect interest points, local image
problems which make traffic sign identification a challenge in features which are invariant to illumination, scale, translation
computer vision and object recognition. and rotation are identified. Instead of single pixels which are
not representative, interest points gather information from a
In recent decades, many approaches based on different neighborhood of pixels, providing more information and
methods have been proposed to obtain efficient traffic sign
recognition systems. According to Ruta and Colleagues [5],
158
Proceedings of XI Workshop de Visão Computacional ‐ October 05th‐07th, 2015
describing the local image features like shape, color and Interest points are detected by utilizing a Hessian matrix,
texture [12]. which shows good performance regarding computation time
and accuracy. Each interest point is then represented by a
In computer vision, interest points have been used in a vast feature vector of 64 elements which are calculated using the
amount of applications. For instance, face recognition Haar-Wavelet [4].
applications have been developed using invariant features
extracted from interest points, achieving good results [12]. Experiments on object recognition performed by the
authors have shown that SURF can outperform other methods
Using SIFT, Silva and colleagues [13] and Farag and
by performance and accuracy [4].
Abdel-Hakim [7] developed applications to recognize traffic
signs. SIFT is considered invariant to changes in illumination,
scale and orientation [14]. Hoferlin and Heidemann [15]
consider SIFT one of the most adequate methods to solve the B. SIFT
traffic sign recognition problem. SIFT is a feature detector and descriptor that has been
widely used in computer vision applications since its
SURF has also been used in traffic sign recognition.
publication by David G. Lowe in 1999 [3], due to its various
Solanki and Dixit [16] approached the problem using SURF in
conjunction with other methods. The SURF algorithm also applications.
provides scale and rotation invariance, however, it is Interest points detected by this method are invariant to
considered as faster than other feature matching algorithms image rotation, translation and scaling, while also showing
[4]. robustness regarding changes in illumination, addition of noise
and 3D viewpoint [14].
Based on the context exposed in this section, the main In order to detect interest points, SIFT uses a Difference of
objective of the present work is to evaluate the methods using a Gaussians function. The detected interest points are highly
variety of real-life images obtained under different conditions. distinctive among them, providing high accuracy when a
This paper presents a supervised approach of traffic sign
match occurs [14].
recognition using SIFT and SURF on a set of digital images
Each interest point contains a feature vector of 128
containing traffic signs under different angles and distances. In
addition to the application of SIFT and SURF, the contribution elements, which are calculated based on gradient magnitude
of this work will also include the identification of image and orientation in a region around the interest point [14].
conditions under which the methods achieves a good rate of The author has shown that the algorithm can extract a large
recognition. Thus other complementary methods can be added amount of interest points from an image, detecting even small
to the present approach to improve the results which really can or highly occluded objects [14].
be useful for real applications.
3. TRAFFIC SIGN RECOGNITION
This paper is organized as follows: in Section II we explain This paper proposes an approach to recognize traffic signs
about the fundamentals of interest point detectors SIFT and using images obtained under different conditions. In order to
SURF; in section III, the methodology of the present work is perform this task, a database of real-traffic images captured
detailed which includes the brief description of image database from different scenarios was created. In addition to the
and how the interest points are extracted from real traffic database, a set of traffic sign templates was also defined to
scenary images. In Section IV the experimental setup and the recognize them in the traffic images. This section also
results obtained using different types of query images are addresses the other important aspects of the study such as the
described. Final conclusions and future work directions are definition of thresholds, ROI (Region of Interest) and
drawn in Section V. similarity measure.
As defined by the methods, the SIFT experiments were
2. INTEREST POINT DETECTORS performed using a 128 element feature vector meanwhile, in
In this section, the general description about SURF and SURF, a 64 element feature vector was used.
SIFT is provided. The first step is to find all the interest points in the traffic
image and the template image. Then, each of the interest
A. SURF points descriptors from the template are compared to each of
SURF is an algorithm introduced by Herbert Bay, used to the descriptors from the traffic image, in order to find the two
detect local invariant features in images. These features have nearest neighbors for each interest point. One nearest neighbor
many applications, such as locating objects and faces, camera is defined as the interest point with the smallest distance to
calibration, 3D reconstruction, and others [4]. another point.
SURF was developed aiming for a faster performance and Once the two nearest neighbors for each and every interest
higher accuracy than the existing algorithms such as SIFT. To points of template image have been found, the closest
increase performance, SURF descriptor dimension was neighbor will be selected as a true match if the distance
reduced to 64 elements, instead of 128 elements, while trying between two neighbors and the second closest neighbor is
to keep them distinctive and robust to noise, changes in below pre-defined threshold. In addition, an interest point will
illumination and point of view [4]. only be considered as a true match if it lies within the defined
159
Proceedings of XI Workshop de Visão Computacional ‐ October 05th‐07th, 2015
ROI. The ROI definition is used to check whether the Some sample images of category 1 images are shown in
proposed approach can really recognize the traffic sign. “Fig 2”.
The last step is to count how many matches occurred
between the traffic image and the template. The result will be
considered positive or negative based on a second threshold,
which defines the number of matches necessary to confirm the
presence of the template in the traffic image.
A. Types of Thresholds
In this study, two types of thresholds were
established. The first threshold (T1) is the distance between
the descriptors of the two interest points, one from the
template and another from the traffic image. This distance
determines whether two interest points are similar and in other Fig. 2. Sample images of category 1 in different scenarios
words, both points can be considered as a match. The second
threshold (T2) is the number of matched interest points
necessary within the ROI to decide whether the object has
been detected or not. Selected values for each threshold are Category 2: images were obtained at a distance of 10 meters
presented in the next Section IV. or more and present no rotation relative to the y-axis, as
shown in “Fig. 3”.
B. Similarity Measurement
All the interest points that exists in the template are
matched to every interest point of the traffic image. For each
point, the two closest matches are selected. Similarity
measurement is performed by comparing the distance (D1 and
D2) between the two closest matches to an interest point. This
condition is used for SIFT and SURF. A match occurs when
equation 1 is true. The closest point is then defined as the
matching point.
100 * (D1²) > T1² * (D2²) (1)
C. Image Database
The traffic images used in this work were captured
specially to conduct the experiments. Totally 283 traffic
images were obtained under different conditions. These
images contain 25 different types of traffic signs. The images
have a resolution of 2560x1536 pixels and were classified into
three different categories. Each category has approximately
the same number of images.
Category 1: images were obtained at a short distance of 10
meters or less and present no rotation relative to the y-axis, as Fig. 4. Sample images of category 2 in different scenarios
shown in “Fig. 1”.
160
Fig. 1 Sample image of category 1
Proceedings of XI Workshop de Visão Computacional ‐ October 05th‐07th, 2015
the interest points that are within the ROI are considered as
true matches.
A. Thresholds Definition
161
Proceedings of XI Workshop de Visão Computacional ‐ October 05th‐07th, 2015
162
Proceedings of XI Workshop de Visão Computacional ‐ October 05th‐07th, 2015
163