Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11

Feature description & Extraction

FAST(Features from Accelerated Segment Test)


FAST is an algorithm proposed for identifying interest points in an image. Interest points have high local
information content and they should be ideally repeatable between different images [2]. The reason behind
the work of the FAST algorithm was to develop an interest point detector for use in real time frame rate
applications like SLAM on a mobile robot, which have limited computational resources.

The algorithm is as follows:


● Select a pixel “p‟ in the image of intensity IP. This is the pixel to be identified as an interest point
or not.
● Set a threshold intensity value T,.
● Consider a circle of 16 pixels surrounding the pixel p.
● “N” contiguous pixels out of the 16 need to be either above or below IP by the value T, if the
pixel needs to be detected as an interest point.
● To make the algorithm fast, first compare the intensity of pixels 1, 5, 9 and 13 of the circle with
IP. As evident from the figure above, at least three of these four pixels should satisfy the threshold
criterion so that the interest point will exist.
● If at least three of the four pixel values - I1 ,I5 ,I9 I13 are not above or below IP + T, then P is not
an interest point (corner). In this case we reject the pixel p as a possible interest point. Else if at least
three of the pixels are above or below Ip + T, then check for all 16 pixels.
● Repeat the procedure for all the pixels in the image.
Machine Learning Approach

● Select a set of images for training,run the FAST algorithm to detect the interest points
● For every pixel “p‟, store the 16 pixels surrounding it, as a vector,and repeat this for all pixels
● Now this is the vector P which contains all the data for training.
● Each value in the vector can take three states. Darker than p, lighter than p or similar to p.
● Depending on the states the entire vector P will be subdivided into three subsets, Pd, Ps, Pb.
● Define a variable Kp which is true if p is an interest point and false if p is not an interest point.
● Use the ID3 algorithm (decision tree classifier) to query each subset using the variable Kp
for the knowledge about the true class.
● The ID3 algorithm works on the principle of entropy minimization. Query the 16 pixels in
such a way that the true class is found (interest point or not) with minimum number of
queries. Or in other words, select the pixel x, which has the most information about the pixel

● Recursively apply this entropy minimization to all the three subsets.


● Terminate the process when entropy of a subset is zero.
● This order of querying which is learned by the decision tree can be used for faster detection
● in other images also.

Non Maximal Suppression for removing adjacent corners


Detection of multiple interest points adjacent to one another is one of the other problems of the initial
version of the algorithm. This can be dealt with by applying non maximal suppression after detecting the
interest points.We compute a score function V for each of the detected points. The score function is
defined as:“The sum of the absolute difference between the pixels in the contiguous arc and the centre
pixel”.We compare two adjacent values and discard the lower value.

BRIEF(Binary robust independent elementary


features)
BRIEF provides a shortcut to find the binary strings directly without finding descriptors. It takes
smoothened image patch and selects a set of (x,y) location pairs in an unique way (explained in paper).

Then some pixel intensity comparisons are done on these location pairs. For eg, let first location pairs be
and . If , then its result is 1, else it is 0. This is applied for all the location pairs to get a -
dimensional bitstring.This can be 128, 256 or 512. So once we get this, we can use Hamming Distance
to match these descriptors.
Brief in OpenCV

SIFT(Scale-invariant feature transform)


It is a technique for detecting salient, stable feature points in an image.For every such point it provides a
set of features which are invariant to rotation and scale.

There are four steps of the SIFT algorithm:


•Determine approximate location and scale of salient feature points (also called keypoints)
•Refine their location and scale
•Determine orientation(s) for each keypoint.
•Determine descriptors for each keypoint.

Approximate Location
The SIFT algorithm uses the Difference of Gaussians which is an approximation of LoG. This process is
done for different octaves of the image in the Gaussian Pyramid.Once this DoG is found, images are
searched for local extrema over scale and space. It basically means that the keypoint is best represented
in that scale.

Keypoint Localization

Once potential keypoints locations are found, they have to be refined to get more accurate results. They
used Taylor series expansion of scale space to get more accurate location of extrema, and if the intensity
at this extrema is less than a threshold value (0.03 as per the paper), it is rejected. This threshold is called
contrastThreshold in OpenCV.

DoG has higher response for edges, so edges also need to be removed. For this, a concept similar to
Harris corner detector is used. They used a 2x2 Hessian matrix (H) to compute the principal curvature. So
here we use a simple function: if this ratio is greater than a threshold, that keypoint is discarded. So it
eliminates any low-contrast keypoints and edge keypoints and what remains is strong interest points.

Assigning orientations

Now an orientation is assigned to each keypoint to achieve invariance to image rotation. A neighborhood is
taken around the keypoint location depending on the scale, and the gradient magnitude and direction is
calculated in that region. An orientation histogram with 36 bins covering 360 degrees is created. It is
weighted by gradient magnitude and gaussian-weighted circular window with equal to 1.5 times the
scale of the keypoint. The highest peak in the histogram is taken and any peak above 80% of it is also
considered to calculate the orientation. It creates keypoints with the same location and scale, but different
directions. It contributes to stability of matching.

Descriptors for each key-point


Now the keypoint descriptor is created. A 16x16 neighbourhood around the keypoint is taken. It is divided
into 16 sub-blocks of 4x4 size. For each sub-block, an 8 bin orientation histogram is created.. It is
represented as a vector to form a keypoint descriptor. In addition to this, several measures are taken to
achieve robustness against illumination changes, rotation etc.

Application :Matching SIFT descriptors

Keypoints between two images are matched by identifying their nearest neighbours. But in some cases,
the second closest-match may be very near to the first. It may happen due to noise or some other reasons.
In that case, the ratio of closest-distance to second-closest distance is taken. If it is greater than 0.8, they are
rejected. It eliminates around 90% of false matches while discards only 5% correct matches.
SIFT in OpenCV

SURF(Speeded -Up Robust Features)

There are two stages in obtaining a SURF descriptor, first detecting SURF point and then extracting the
descriptor at the SURF point. The detection of SURF points makes use of scale space theory . For
detection of SURF points, a Fast-Hessian matrix is used. The determinant of the Hessian matrix is used for
deciding whether a point can be chosen as an interest point or not. In an image I, the Hessian matrix at
point X is defined by :
The Gaussian second order derivative needs to be discretized before performing convolution with the
image. Dxx , Dyy and Dxy represent the convolution of box filters with the image. These approximated
second order Gaussian derivative calculations are made fast by using integral images .

The scale space of the image is analyzed by changing the size of the box filter. Generally Box filter begins
with a default size of 9x9 which corresponds to Gaussian derivative with = 1.2. The filter size is later up
scaled to sizes of 15x15,21x21, 27x27 etc. The approximated determinant of Hessian matrix is calculated
at each scale and the non-maximum suppression in 333 neighborhoods is applied to find the maxima. The
SURF points location and scale, s is obtained with the maxima values.

Orientation for the obtained SURF


point is assigned using Haar-wavelet response. In the neighborhood of SURF point i.e. within a radius 6s,
Haar-wavelet response is calculated in both x and y directions. Using these responses, a dominant
orientation direction is determined. In the direction of dominant orientation, a square of size 20s centered
at the SURF point is constructed. This is divided into 44 sub regions. In each of these sub regions,
horizontal and vertical Haar wavelet responses dx and dy are calculated at 55 regularly placed sample
points . These responses are summed up in a particular interval to get . Also the absolute
values of these responses are summed up in a particular interval which gives . Using these
values, a 4 dimensional feature vector is constructed for each sub
region. Thus, each extracted SURF point is associated with a 4x(4x4)
descriptor, which is a 64 dimensional descriptor. This 64 dimensional descriptor is used for performing the
matching operation.
ORB (Oriented Fast and rotated
BRIEF)
ORB is basically a fusion of FAST keypoint detector and BRIEF descriptor with many
modifications to enhance the performance. First it uses FAST to find keypoints, then
applies Harris corner measure to find top N points among them. It also uses pyramids to
produce multiscale-features.
Algorithm of ORB:
It computes the intensity weighted centroid of the patch with the corner located at center. The direction of the vector
from this corner point to centroid gives the orientation. To improve the rotation invariance, moments are computed
with x and y which should be in a circular region of radius , where is the size of the patch.Now for descriptors,
ORB uses BRIEF descriptors. BRIEF is rotation invariant so ORB steers BRIEF according to the orientation of
keypoints. For any feature set of binary tests at location , define a matrix, which contains the
coordinates of these pixels. Then using the orientation of patch, , its rotation matrix is found and rotates the to
get steered(rotated) version .Becoming invariant to rotation, BRIEF becomes more distributed. ORB runs a
greedy search among all possible binary tests to find the ones that have both high variance and means close to 0.5,
as well as being uncorrelated. The result is called rBRIEF.For descriptor matching, multi-probe LSH which improves
on the traditional LSH, is used.

ORB in OpenCV:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.405.9932&rep=rep1&type=pdf
http://www.willowgarage.com/sites/default/files/orb_final.pdf
https://pdfs.semanticscholar.org/2d86/64b7ef983cc5529cf10caf4cb623098724de.pdf?
_ga=2.166235966.1645444858.1586974232-375200504.1586460489
http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/AV1011/AV1FeaturefromAcceleratedSegmentTest.pdf
http://mi.eng.cam.ac.uk/~er258/work/fast.html
http://mi.eng.cam.ac.uk/~cipolla/publications/inproceedings/2010-BMVC-action.pdf
https://www.cs.ubc.ca/~lowe/525/papers/calonder_eccv10.pdf

https://www.epfl.ch/labs/cvlab/research/descriptors-and-keypoints/research-detect-brief/

https://link.springer.com/chapter/10.1007/978-3-642-15561-1_56

https://www.isprs.org/proceedings/xxxviii/part3/b/pdf/7_XXXVIII-part3B.pdf

https://www.cse.iitb.ac.in/~ajitvr/CS763/SIFT.pdf

http://www.cse.iitm.ac.in/~vplab/courses/CV_DIP/PDF/Feature.pdf

http://people.ee.ethz.ch/~surf/eccv06.pdf
http://www.cs.jhu.edu/~misha/ReadingSeminar/Papers/Bay08.pdf
http://www.cim.mcgill.ca/~siddiqi/COMP-558-2008/AnqiGaurav.pdf
https://www.ijser.org/researchpaper/Implementation-of-High-Performance-Speeded-Up-Robust-features-Detection.pdf

You might also like