Professional Documents
Culture Documents
Reliable Object Recognition Using SIFT Features: Florin Alexandru Pavel, Zhiyong Wang, David Dagan Feng
Reliable Object Recognition Using SIFT Features: Florin Alexandru Pavel, Zhiyong Wang, David Dagan Feng
Florin Alexandru Pavel #1, Zhiyong Wang #*2, David Dagan Feng #*3
#
School of Information Technologies, University of Sydney, NSW 2006 Australia
*
Department of Electronic and Information Engineering, Hong Kong Polytechnic University, Hong Kong, China
alexandruyo@yahoo.com
zhiyong@it.usyd.edu.au feng@it.usyd.edu.au
Abstract— SIFT (Scale Invariant Feature Transform) features model improves recognition precision by creating the
have been one of the most efficient descriptors for object possibility to differentiate between a false positive match and
recognition. However, the excessive number of key points and a partial occlusion problem. The measurement of density of a
high dimensionality has limited its capacity in object recognition. matching region of features as well as viewpoint
In this paper we present a novel method based on SIFT features
for reliable object recognition. At first, a matching tree is
approximation gives, with a certain degree of confidence, a
constructed to eliminate non-essential key points. In order to solution to the problem of partial occlusion vs. false positive.
achieve viewpoint independence, a 3D model is constructed for Experimental results show that using the technique described
each object in the filtered SIFT feature space. Experimental in this paper an increase in object recognition precision as
results on both Caltech 101 and COIL 100 datasets indicate the well as a decrease in processing time can be achieved.
effectiveness of our proposed algorithm.
II. SIFT FEATURE FILTERING
I. INTRODUCTION A. SIFT (Scale Invariant Feature Transform)
SIFT [1] is a local feature detection and description
technique. Local image features extracted using this technique SIFT – Scale Invariant Feature Transform [1] is one of
are invariant to rotation and scale changes and are robust to several computer vision algorithms aimed at extracting
other transformation that can appear in an image as well as distinctive and invariant features from images. Features
minor viewpoint changes. SIFT has been proven to be one of extracted using the SIFT algorithm are invariant to image
the most reliable techniques to extract invariant features from scale, rotation, and partially robust to changing viewpoints
images [5], a number of object recognition systems adopting it and changes in illumination. SIFT algorithm applies a four
with great success [7, 8]. Also a number of improvements to stage approach in extracting features from an image.
the technique have been proposed. Most of the research in SIFT applies a four stage approach to in order o extract
enhancing the effectiveness of SIFT [2, 3, 4, 6] has revolved local features. The first two stages are focus on extracting
around better acquisition and description of low level image feature location information and more importantly ensuring
features on a per image basis. Other research has combined repeatable and reliable feature locations. The last two stages
low level image features with high level information in order are focused on creating a feature description. First an
to achieve better performance [9]. Local features obtained orientation is assigned, making the feature invariant to
using SIFT are not always useful for characterizing an object rotation, and then the descriptor is created. A set of 16
for recognition purposes. In real word images features may be histograms with 8 orientation bins is used in calculating the
generated from background responses or other not in focus feature descriptor resulting in a feature vector 128 elements
objects which included in the object model will increase long.
computational times and degrade recognition performance.
Therefore, there is a stringent need to identify and eliminate The invariance and robustness of the features extracted
features that can negatively influence the performance of the using this algorithm makes it an extremely good candidate for
recognition task. object recognition and it is an often used in detection /
In this paper we propose a new technique to reduce the description schemes, achieving one of the best performance
number of key points so as to improve the matching efficiency figures from all current feature extraction techniques [5].
and recognition performance. In addition, in order to further However, due to the amount of features extracted and
achieve viewpoint independence for reliable recognition, 3D variations present in images building a reliable object
object models are constructed to accurately capture spatial recognition system based on SIFT features is still a
relationships between features. The filtering step identifies challenging task. SIFT approach transforms an image or an
object features that present a higher level of reliably and image dataset into a large collection of local feature vectors.
consistency throughout the training dataset and also Each feature vector possesses invariance to scale, translation,
significantly reduces the overall object feature space which orientation and noise. Usually between 1000 to 2000 features
decreases the size of the problem, making the system more are extracted from an average size image giving the possibility
computationally efficient. to recognize an objects with substantial levels of occlusion
Capturing strong spatial relationships between features [5,1] but this also creates a set of problems over large datasets
extracted from different images and constructing a 3D object
MMSP’09, October 5-7, 2009, Rio de Janeiro, Brazil.
978-1-4244-4464-9/09/$25.00 ©2009 IEEE.
of images due to the amount of features and false positive generated by background clutter which heavily impact
matches that can be generated. recognition performance.
Experimental results show that only a small percentage of
B. Feature filtering by longest matching trees features can in fact be used in describing an object
Real-world images, in their vast majority, always include consistently and reliably. Considering that SIFT can generate
content that is not related to, or has only a weak relationship a few thousand features per image, depending on image size
with the objects of interest present in the image. Feature and content, over a large image dataset the elimination of
extraction techniques are indiscriminative and have no background clutter and unrepresentative features is of great
knowledge of image content therefore background responses concern as it significantly affects both recognition precision as
are identified as features and consequently used in the creation well as computational efficiency.
of the model. For small datasets of images this proves not to The creation of the object feature space from a large image
be a major problem but when larger datasets are used, dataset with cluttered background poses its own challenges
background response can cause performance degradation as when aiming to construct an object feature space with
the dimension of the datasets grows. minimal interference from object surroundings.
Using SIFT the amount of features generated from an The method proposed requires analysing the entire image
image is in the order of thousands. Furthermore, a large part dataset and has the following steps:
of the features generated are due to background clutter and not Feature extraction. For each image SIFT features are
all of the features can be considered highly descriptive and extracted and each set of features is annotated with the
consistent for the object contained in the image over a large image id consequently building a raw feature space.
image dataset. In order to address the problems generated by Matching tree construction for every feature. The matching
the amount and quality of extracted features we are proposing tree for a feature is constructed from the raw feature space,
a feature filtering technique called Longest Matching Trees, extracted in the previous step, by performing a top-down
aimed at identifying features that have a higher degree of and than a bottom-up similarity matching. Each new feature
consistency and distinctiveness throughout the object space. set from a new object representation is matched with the
current features contained in the tree and features that match
are introduced in the tree. A bi-directional approach is used
so that all features that share the same level of similarity are
included in the tree with the smallest number of iterations.
Filtering. The longest matching trees are selected, where the
length of a tree is given by the number of object
representations that the features in the tree are part of, and
the resulting collection of features is then used in the
creation of the object model.
Figure 6. COIL dataset – 100 images from each of the 100 object categories
ACKNOWLEDGEMENT
The work presented in this paper is partially supported by
ARC and Hong Kong Polytechnic University grants.
REFERENCES
Figure 8. Comparison between two object categories that generated the [1] D. G. Lowe, “Distinctive image features from scale-invariant
most number of false positives
keypoints,” Int. J. Comput. Vision, vol. 60, no. 2, pp. 91–110, 2004.
In Figure 9 a comparison between two objects from
different categories is presented and we can observe that [2] A. Suga, K. Fukuda, T. Takiguchi, and Y. Ariki, “Object recognition
and segmentation using sift and graph cuts,” in 19th International
partially occluding the objects results in completely Conference on Pattern Recognition. Dec. 2008, pp. 1–4.
eliminating the possibility of distinguishing between them,
[3] F. Liu and M. Gleicher, “Region enhanced scale-invariant saliency
even from the human recognition capabilities perspective. The detection,” IEEE International Conference on Multimedia and Expo, vol.
generation of false positives on these objects categories is not 0, pp. 1477–1480, 2006.
necessarily affecting the future performance measurements on [4] Y. Ke and R. Sukthankar, “Pca-sift: a more distinctive representation for
other datasets as it indicates good generalization capabilities. local image descriptors,” in 2004 IEEE Computer Society Conference
This perspective is has been included in future work as it on Computer Vision and Pattern Recognition, vol. 2, 2004, pp. 506–513.
indicates the possibility of learning an object model from two [5] K. Mikolajczyk and C. Schmid, “A performance evaluation of local
or more categories and creating an inheritance tree of sub- descriptors,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 10, pp.
models to distinguish between objects that share a set of 1615–1630, October 2005.
features but are from different categories. [6] Y. Cui, N. Hasler, T. Thormählen, and H.-P. Seidel, “Scale invariant
feature transform with irregular orientation histogram binning,” in
ICIAR ’09: Proceedings of the International Conference on Image
Analysis and Recognition. Springer, 2009.
[7] Q. Fan, K. Barnard, A. Amir, A. Efrat, and M. Lin, Matching slides to
presentation videos using sift and scene background matching,” in
Figure 9. Comparison between two object categories that generated the most MIR ’06: Proceedings of the 8th ACM international workshop on
number of false positives (only matching regions are shown). Multimedia information retrieval. New York, NY, USA: ACM, 2006,
pp. 239–248.
VII. CONCLUSIONS [8] J. Chen and Y.-S. Moon, “Using sift features in palmprint
We presented a novel approach for reliable object authentication,” 19th International Conference on Pattern Recognition.
recognition by filtering local features to reduce irrelevant ICPR 2008, Dec. 2008, pp. 1–4.
features and to reduce computational complexity, and utilizing [9] H.-H. Jeon, A. Basso, and P. Driessen, “A global correspondence for
3D models created from filtered SIFT feature space to achieve scale invariant matching using mutual information and the graph
search,” in IEEE International Conference on Multimedia and Expo,
viewpoint independence. Feature filtering allows for the July 2006, pp. 1745–1748.
detection of features that can reliably describe that object
[10] S. Nene, S. K. Nayar, and H. Murase, “Columbia object image library
throughout the overall feature space and can eliminate much (coil-100),” Tech. Rep., 1996.
of the background clutter. Features generated by background
clutter or have low quality features can generate, over large [11] F.-F. Li, F. Rod, and P. Pietro, “Learning generative visual models from
few training examples: An incremental bayesian approach tested on 101
datasets, false positives and reduce the certainty of a match. object categories,” in CVPR ’04: Proceedings of the Computer Vision
Experimental results show that the proposed filtering strategy and Pattern Recognition Workshop, 2004, p. 178.
is very effective in identifying reliable features of an object [12] Libsift - scale-invariant feature transform implementation. [Online].
and further improving the performance of the object Available: http://user.cs.tu-berlin.de/ nowozin/libsift/
recognition system as well as computational efficiency. [13] T. Pham and A. Smeulders, “Sparse representation for coarse and fine
Constructing a 3D space for the object model where features object recognition,” IEEE Transactions on Pattern Analysis and
are mapped accurately onto it introduces the possibility of Machine Intelligence, vol. 28, no. 4, pp. 555–567, April 2006