Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 5

Object recognition using improved image feature extraction

and matching method


Djalalov M.M. (SUE UNICON.UZ), Radjabov T.D. (TUIT)
In this paper, improvement of SIFT algorithm for image feature extraction and
matching is presented. We propose new method for image matching step of SIFT
algorithm which is little faster for object recognition. For evaluation, series of simulations
and experiments are conducted, where our Improved-SIFT algorithm showed better
results.

SIFT . SIFT
.

.
SIFT
.
SIFT,
.
, SIFT

1. Introduction
Image matching is a fundamental
aspect of many problems in computer
vision, including object or scene recognition,
solving for 3D structure from multiple
images, stereo correspondence, and motion
tracking. This paper describes image
features that have many properties that
make them suitable for matching differing
images of an object or scene. The features
are invariant to image scaling and rotation,
and partially invariant to change in
illumination and 3D camera viewpoint. They
are well localized in both the spatial and
frequency domains, reducing the probability
of disruption by occlusion, clutter, or noise.
Large numbers of features can be extracted
from typical images with efficient algorithms.
In addition, the features are highly
distinctive, which allows a single feature to
be correctly matched with high probability
against a large database of features,
providing a basis for object and scene
recognition [1-3].

2. Original Scale Invariant Feature


Transform (SIFT) algorithm
Scale Invariant Feature Transform
(SIFT) was first presented by Lowe [4]. The
SIFT algorithm takes an image and
transforms it into a collection of local feature
vectors. Each of these feature vectors is
supposed to be distinctive and invariant to
any scaling, rotation or translation of the
image.
First, the feature locations are
determined as the local extrema of
Difference of Gaussians (DOG pyramid) as
given by (3). To build the DOG pyramid the
input image is convolved iteratively with a
Gaussian kernel (2). This procedure is
repeated as long as the down-sampling is
possible. Each collection of images of the
same size is called an octave. All octaves
build together the so-called Gaussian
pyramid by (1), which is represented by a
3D function L(x, y, ):
L ( x, y , ) G ( x , y , ) I ( x , y )

G ( x, y , )

(1)

2
2
1
e ( x y ) / 2
2
2

D ( x, y , ) (G ( x, y , ) G ( x, y , )) I ( x, y ) L( x, y , k ) L ( x, y , )

(2)
(3)

The local extrema (maxima or minima) of DOG function are detected by comparing each
pixel with its 26 neighbours in the scale-space (8 neighbours in the same scale, 9 corresponding
neighbours in the scale above and 9 in the scale below). The search for for extrema excludes
the first and the last image in each octave because they do not have a scale above and a scale
below respectively. To increase the number of extracted features the input image is doubled
before it is treated by SIFT algorithm, which however increases the computational time
significantly.
Scale-space extrema detection produces too many keypoint candidates, some of which
are unstable. The next step in the algorithm is to perform a detailed fit to the nearby data for
accurate location, scale, and ratio of principal curvatures. This information allows points to be
rejected that have low contrast (and are therefore sensitive to noise) or are poorly localized
along an edge.

Figure 1. Diagram showing the blurred images at different scales, and the computation of the
difference-of-Gaussian images.
For each candidate keypoint, interpolation of nearby data is used to accurately determine
its position. The initial approach was to just locate each keypoint at the location and scale of the
candidate keypoint. The new approach calculates the interpolated location of the extremum,
which substantially improves matching and stability. The interpolation is done using the quadratic
Taylor expansion of the Difference-of-Gaussian scale-space function, D(x, y, ) with the
candidate keypoint as the origin. This Taylor expansion is given by (4):

D( x) D

D T
1 2D
x xT
x,
x
2
x 2

(4)

where D and its derivatives are evaluated at the candidate keypoint and x=(x, y, ) is the offset
from this point.
Next step, each keypoint is assigned one or more orientations based on local image
gradient directions. This is the key step in achieving invariance to rotation as the keypoint
descriptor can be represented relative to this orientation and therefore achieve invariance to
image rotation. First, the Gaussian-smoothed image L(x, y, ) at the keypoint's scale is taken
so that all computations are performed in a scale-invariant manner. For an image sample L(x, y)
at scale , the gradient magnitude, m(x, y), and orientation, (x, y), are precomputed using pixel
differences as (5) and (6):
m( x, y )

(5)

( L ( x 1, y ) L ( x 1, y )) 2 ( L ( x, y 1) L ( x, y 1))

L( x, y 1) L( x, y 1)

L( x 1, y ) L( x 1, y )

( x, y ) tan 1

(6)

Previous steps found keypoint locations at particular scales and assigned orientations to
them. This ensured invariance to image location, scale and rotation. Now we need to compute a
descriptor vector for each keypoint such that the descriptor is highly distinctive and partially
invariant to the remaining variations such as illumination, 3D viewpoint, etc.
3. Improved SIFT algorithm
From the algorithm description given above, it is evident that in general, the SIFTalgorithm can be understood as a local image operator which takes an input image and
transforms it into a collection of local features.
The feature matching between SIFT descriptors of two images includes the
computation of the Euclidean distance between each descriptor of the first image and each
descriptor of the second image in Euclidean space [5]. According to the Nearest
Neighborhood procedure for each ai feature in the model image feature set the
corresponding feature bi must be looked for in the test image feature set. The
corresponding feature is one with the smallest Euclidean distance to the feature ai. A pair
of corresponding features (ai, b i) is called a match M(a i, b i) [6].
In the case where the distance ratio of nearest neighbors Euclidean distance to
second nearest neighbors Euclidean distance exceeds a predefined threshold the
matched feature are discarded.
Euclidean distance means the distance of keypoints in feature space, given by (7).
All keyponts (features) from two images are transformed into multi-dimensional space
based on their gradients, orientations, magnitude, locations, brightness etc. Each feature
in feature space represents feature vector:

D (a, b) (a1 b1 ) 2 (a 2 b2 ) 2 ... (a n bn ) 2

# features

(a
i 1

bi ) 2

(7)

where, D(a, b) stands for the Euclidean distance between feature vector a and vector b,
and the matched-points will be eliminated if D(a, b) is larger than the set threshold.
Calculating the distance between all feature points is computationally expensive. We
suggest to find dot products of these feature vectors (8), which is much faster and more
robust rather than finding distance. Because distance between features can be similar and
mismatching may occur, but angle is always different. Dot product is found and inverse
cosine taken between feature vectors as (9):
n

a b a i bi a1b1 a 2 b2 ... a n bn

(8)

i 1

arccos

a b
a b

(9)

Check if nearest neighbor has angle less than predefined ratio:


pred .ratio .

(10)

In previous SIFT they only compare the nearest neighbor distance to other
distances and take the least value. In Improved-SIFT we compare angles between feature
vectors. We also use distance ratio for Outlier rejection to reduce false matching and take
only positive values:

A1 A2 DisRatio.

(11)

4. Simulation result
In order to verify weather our proposing method working or not, we conduct series of
simulations for object recognition using template matching method. We did our simulations for
matching on different conditions when images are Scaled, Rotated, Shifted and etc. We
simulated using Matlab program on Pentium Dual running at 2.20 GHz PC. The algorithms were
tested using 20 images in different cases. We modified original SIFT matching Matlab code into
Improved-SIFT with dot product and Outlier Rejection values.
In Figure 2 we illustrated results for scaled image matching case.

a)

b)

c)

Figure 2. a) Previous SIFT; b) Improved-SIFT; c) After Outlier Rejection. Simulation results for
matching of two images where the 2nd image is scaled one of 1st image.
In the first image (Fig.2a), previous SIFT was shown and there were many mismatching.
In the second (Fig.2b) image our Improved-SIFT was shown and there were less mismatching
compared to previous one, because we calculate dot product. The third image (Fig.2c) is the
result of using Outlier Rejection for our Improved -SIFT where all mismatches are removed and
correct matching points are displayed.
In Figure 3 the results for scaled and rotated image matching case is illustrated.

a)

b)

c)

Figure 3. a) Previous SIFT; b) Improved-SIFT; c) After Outlier Rejection. Simulation results for
matching of two images where the 2nd image is rotated and scaled one of 1st image.
Also in the scaled and rotated case there were same results as before. Our proposing
method gives better matching result than existing SIFT.
4. Conclusion
this paper we proposed Improved-SIFT
SIFT is very famous algorithm for
feature extraction and matching method for
image feature extraction but in image
object recognition concept. As it was seen in
matching sometimes it doesnt work well. In
simulation results our method recognizes

and matches images more accurately than


previous SIFT. In the future we will use our
Improved-SIFT method in many other fields
like Human face and facial expression
recognition, Panoramic image stitching etc.
References
1. Liang
Cheng,Jianya
Gong,
Xiaoxia Yang, Robust Affine Invariant
Feature Extraction for Image Matching,
IEEE Geoscience and Remote Sensing
Letters, April 2008.
2. Liang Cheng, A new method for
remote sensing image matching by
integrating affine invariant feature extraction
and RANSAC, Image and Signal Processing
(CISP), 2010 3rd International Congress, p.:
1605 1609, 2010.
3. Madzin, H., Zainuddin, R.,
Feature Extraction and Image Matching of

3D Lung Cancer Cell Image, Soft


Computing and Pattern Recognition, 2009.
SOCPAR '09. International Conference,
2009.
4. David G. Lowe, "Distinctive
image
features
from
scale-invariant
keypoints,"
International
Journal
of
Computer Vision, 60, 2 (2004), pp. 91-110.
5. R. Jiayuan, W. Yigang, D. Yun,
Study on eliminating wrong match pairs of
SIFT, Signal Processing (ICSP), 2010 IEEE
10th International Conference, p.: 992
995, 2010.
6. Omercevic, D.;
Drbohlav, O.;
Leonardis, A, High-Dimensional Feature
Matching: Employing the Concept of
Meaningful Nearest Neighbors, Computer
Vision, 2007. ICCV 2007. IEEE 11th
International Conference, p.: 1 8, Oct. 200.

to read
Its advisable

An Introduction to Object Recognition: Selected Algorithms for a Wide


Variety of Applications
Author: Marco Treiber
Publisher: Springer, 2010
The book presents an overview of the diverse applications for object
recognition (OR) and highlights important algorithm classes, presenting
representative example algorithms for each class. The presentation of each
algorithm describes the basic algorithm flow in detail, complete with graphical
illustrations. Pseudocode implementations are also included for many of the
methods, and definitions are supplied for terms which may be unfamiliar to the
novice reader. Supporting a clear and intuitive tutorial style, the usage of
mathematics is kept to a minimum.

You might also like