Professional Documents
Culture Documents
DLCV Day2
DLCV Day2
DLCV Day2
Detailed Process:
1. Feature Extraction:
o Detect and describe features: Use feature detectors like SIFT (Scale-
Invariant Feature Transform), SURF (Speeded-Up Robust Features), ORB
(Oriented FAST and Rotated BRIEF), or others to detect key points and
extract local feature descriptors from images. These descriptors capture
important information about local image patches.
o Descriptors: Each keypoint is described by a vector, which typically has
dimensions corresponding to the algorithm used (e.g., 128 for SIFT).
2. Codebook Generation:
o Clustering: Collect feature descriptors from a set of training images and
cluster them using a clustering algorithm such as k-means. Each cluster center
is considered a "visual word."
o Vocabulary: The set of all cluster centers forms the vocabulary or codebook,
where each cluster center represents a visual word in the dictionary.
3. Image Representation:
o Histogram Construction: For a given image, assign each feature descriptor to
the nearest visual word in the codebook. Construct a histogram of visual
words where each bin corresponds to a visual word and its value represents the
frequency of that word in the image.
4. Matching:
o Distance Metrics: Compare the histograms of different images using distance
metrics like Euclidean distance, Manhattan distance, or chi-squared distance to
find similar images.
Applications:
VLAD is an advanced technique that improves upon the Bag-of-Words model by aggregating
local image descriptors in a way that preserves spatial relationships and produces a more
discriminative representation.
Detailed Process:
1. Feature Extraction:
o Detect and describe features: Extract local features from images using
methods like SIFT, SURF, or ORB.
2. Codebook Generation:
o Clustering: Use k-means clustering to generate a set of cluster centers (visual
words) from a large set of training feature descriptors.
3. Residual Calculation:
o Assignment: For each feature descriptor in an image, find the nearest cluster
center (visual word).
o Residuals: Compute the residual vector, which is the difference between the
feature descriptor and its assigned cluster center.
4. Aggregation:
o Aggregate Residuals: For each cluster center, sum the residuals of all
descriptors assigned to it.
o Concatenation: Concatenate these summed residuals to form the VLAD
descriptor.
5. Normalization:
o Normalization: Normalize the aggregated vector to ensure that the descriptor
is invariant to the number of features and their scale. Common normalization
methods include L2 normalization and power normalization.
Advantages:
Applications:
RANSAC is an iterative method for robustly fitting models to data sets that contain a
significant number of outliers.
Detailed Process:
1. Model Hypothesis:
o Random Sampling: Randomly select a minimal subset of data points needed
to estimate the model parameters.
o Model Estimation: Fit the model to these points.
2. Model Verification:
o Inliers Determination: Determine which data points from the entire set are
consistent with the estimated model within a predefined tolerance.
o Inlier Count: Count the number of inliers.
3. Iteration:
o Repeat: Repeat the process of random sampling and model estimation for a
fixed number of iterations or until a satisfactory model with a high number of
inliers is found.
4. Final Model:
o Best Model: Select the model with the highest number of inliers.
o Refinement: Optionally, refine the model parameters using all inliers to
improve accuracy.
Applications:
• Homography Estimation: Estimate the homography matrix for aligning two images.
• Fundamental Matrix Estimation: Estimate the fundamental matrix in stereo vision.
• Object Recognition and Pose Estimation: Fit geometric models to data in the
presence of noise and outliers.
4. Hough Transform
The Hough Transform is a technique for detecting geometric shapes such as lines, circles, and
ellipses in an image by transforming the problem into a parameter space.
Detailed Process:
1. Edge Detection:
o Detect Edges: Use an edge detection algorithm like Canny edge detection to
find edges in the image.
2. Hough Space Transformation:
o Parameter Space: For each edge point, transform it into a parameter space
where each point represents a potential geometric shape passing through the
edge point.
o Lines: For line detection, use the polar representation (ρ, θ) where ρ is the
distance from the origin to the line, and θ is the angle of the normal to the line.
o Circles: For circle detection, use parameters (a, b, r) where (a, b) is the center
of the circle, and r is the radius.
3. Voting:
o Accumulator Array: Use an accumulator array to keep track of votes in
parameter space. Each edge point votes for all possible shapes that could pass
through it.
o Peak Detection: Identify peaks in the accumulator array, which correspond to
the parameters of the detected shapes.
4. Shape Detection:
o Extract Shapes: Extract the parameters of the shapes corresponding to the
peaks in the accumulator array.
Applications:
• Line Detection: Detect lines in images, useful in applications like lane detection in
autonomous driving.
• Circle Detection: Detect circles, useful in identifying circular objects like coins or
eyes in images.
• General Shape Detection: Detect other geometric shapes by extending the Hough
Transform to different parameter spaces.
5. Pyramid Matching
Pyramid Matching is a technique for matching sets of features by comparing them at multiple
resolutions, making it robust to variations in scale and translation.
Detailed Process:
1. Feature Extraction:
o Detect and describe features: Extract features from images using methods
like SIFT, SURF, or ORB.
2. Pyramid Construction:
o Multi-Resolution Pyramid: Construct a pyramid of feature representations at
multiple resolutions by progressively down-sampling the image and extracting
features at each level.
3. Matching:
o Coarse-to-Fine Matching: Begin by matching features at the coarsest level of
the pyramid. Use these matches to constrain and refine the matching process at
finer levels.
o Feature Correspondence: Determine correspondences between features at
each level, ensuring that matches are consistent across levels.
4. Scoring:
o Similarity Score: Compute a similarity score based on the number and quality
of matches at each pyramid level.
Advantages:
Applications:
6. Optical Flow
Optical Flow refers to the apparent motion of objects in a visual scene caused by the relative
motion between the observer and the scene. It is used to estimate motion vectors for each
pixel in a sequence of images.
Detailed Process:
1. Feature Detection:
o Interest Points: Detect points of interest (keypoints) in the image using
methods like Harris corner detection or FAST.
2. Flow Computation:
o Local Methods: Use local methods like the Lucas-Kanade method, which
assumes small motion and uses a local window to estimate the motion vectors.
o Global Methods: Use global methods like the Horn-Schunck method, which
assumes smoothness of the flow field and uses global optimization to estimate
the motion vectors.
3. Flow Field Representation:
o Velocity Vectors: Represent the motion as a flow field, where each pixel in
the image is associated with a velocity vector indicating its movement
between frames.
Algorithms:
• Lucas-Kanade Method:
o Assumes small displacements.
o Solves for the optical flow by minimizing the error in a local neighborhood.
• Horn-Schunck Method:
o Assumes smoothness of the flow field.
o Uses a global optimization approach to minimize both the error in image
intensity and the smoothness constraint.
Applications:
These techniques form the foundation of many computer vision applications and are often
used in combination to achieve robust and accurate results.