0% found this document useful (0 votes)
19 views8 pages

Feature Detection in Computer Vision

For student

Uploaded by

Giang Nguyeenx
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views8 pages

Feature Detection in Computer Vision

For student

Uploaded by

Giang Nguyeenx
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Understanding Feature Detection and

Description
Computer Vision Lectures

September 23, 2025

Contents

1 What Are Features? 2


1.1 Defining an Image Feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Types of Image Regions: Flat, Edge, and Corner . . . . . . . . . . . . . . . . . . 2

2 The Harris Corner Detector 3


2.1 The Mathematics of Corner Detection . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Interpreting the Structure Tensor with Eigenvalues . . . . . . . . . . . . . . . . . 3
2.3 The Harris Response Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 SIFT: Scale-Invariant Feature Transform 5


3.1 The SIFT Detector: Scale-Space Extrema . . . . . . . . . . . . . . . . . . . . . . 6
3.2 The SIFT Descriptor: A Histogram of Gradients . . . . . . . . . . . . . . . . . . 6

1
1 What Are Features?
1.1 Defining an Image Feature
In computer vision, a feature is an "interesting" part of an image. It is a local, meaningful,
and detectable pattern. The goal of a feature detector is to find the same points of interest in
different images of the same object or scene, regardless of changes in scale, rotation, illumination,
or viewpoint.
What makes a good feature?
• Repeatable and Precise: A detector should find the same feature in different images,
and its location should be precise.
• Distinctive: The region around the feature should have a unique texture or pattern,
making it easy to distinguish from other features.
• Local: A feature occupies a small area of the image, making it robust to clutter and
occlusion.

Figure 1: Matching Features

1.2 Types of Image Regions: Flat, Edge, and Corner


We can classify local image regions into three main types based on how their appearance changes
when we shift a small window over them.
• Flat Region: Shifting a window in any direction results in very little change. This is a
poor feature because it’s not distinctive.
• Edge: Shifting along the edge results in little change, but shifting perpendicular to the edge
causes a large change. This is better, but the exact location along the edge is ambiguous
(the "aperture problem").
• Corner: Shifting in any direction causes a significant change. Corners are well-localized
in two directions and make for excellent, stable features.
Because of these properties, corner detection is a fundamental task in feature detection.

2
Figure 2: Types of Image Regions: Flat, Edge, and Corner

2 The Harris Corner Detector


The Harris detector formalizes the idea of a "corner" by analyzing the local changes in image
intensity.

2.1 The Mathematics of Corner Detection


The core idea is to measure the Sum of Squared Differences (SSD) between an image patch
I(x, y) and a shifted version I(x + u, y + v):
X
E(u, v) = w(x, y)[I(x + u, y + v) − I(x, y)]2
x,y

Here, w(x, y) is a window function (e.g., a Gaussian) that gives more weight to pixels near the
center. Using a Taylor series approximation for small shifts, this error function can be expressed
in a quadratic form:  
 u
E(u, v) ≈ u v M
v
where M is the structure tensor or second-moment matrix, computed from the image gradients
(Ix , Iy ) within the window:  2 
X Ix Ix Iy
M= w(x, y)
Ix Iy Iy2
x,y

2.2 Interpreting the Structure Tensor with Eigenvalues


The matrix M describes the intensity structure around a point. Its eigenvalues, λ1 and λ2 , tell
us the magnitude of intensity change along the principal gradient directions.

• Flat Region: M has two small eigenvalues (λ1 ≈ 0, λ2 ≈ 0). The error E(u, v) is small
for all shifts.

• Edge: M has one large and one small eigenvalue (λ1 ≫ λ2 ≈ 0). The error is large for
shifts perpendicular to the edge but small for shifts along it.

3
Figure 3: Interpreting the Structure Tensor with Eigenvalues

• Corner: M has two large eigenvalues (λ1 , λ2 ≫ 0). The error is large for shifts in any
direction.

2.3 The Harris Response Function


Calculating eigenvalues for every pixel is computationally expensive. Harris & Stephens proposed
a response function R that avoids this calculation but still captures the underlying principle:
R = det(M ) − k(trace(M ))2
In terms of eigenvalues, this is:
R = λ1 λ2 − k(λ1 + λ2 )2
Here, k is a sensitivity parameter, typically in the range [0.04, 0.06].
• If R is large and positive, we have a corner (both eigenvalues large).
• If R is large and negative, we have an edge (one large eigenvalue).
• If |R| is small, we have a flat region.
A corner is detected if a pixel’s response R is a local maximum above a certain threshold.

Programming Exercise

Task: Implement the Harris corner detector on an image.


Input: An image file (e.g., ‘[Link]‘). Output: The original image with detected
corners marked.
Hints (OpenCV):

• Convert the image to grayscale and float32 format.

• Use ‘[Link](gray_image, blockSize, ksize, k)‘ to compute the Harris re-


sponse map. ‘blockSize‘ is the neighborhood size, ‘ksize‘ is the Sobel kernel size, and
‘k‘ is the Harris parameter.

• Threshold the response map to get strong corners. A good starting point is ‘response
> 0.01 * [Link]()‘.

• Mark the locations of the detected corners on the original color image.

4
import cv2
import numpy as np

# Load image
image = [Link]('your_image.jpg') # Replace with a suitable image
if image is None:
print("Could not open or find the image")
exit(0)

gray = [Link](image, cv2.COLOR_BGR2GRAY)


gray_float = np.float32(gray)

# Apply Harris Corner Detector


# blockSize: Neighborhood size for corner detection
# ksize: Aperture parameter for the Sobel operator
# k: Harris detector free parameter
harris_response = [Link](gray_float, blockSize=2, ksize=3, k=0.04)

# Dilate the corner response map to enhance corner points


harris_response = [Link](harris_response, None)

# Threshold for an optimal value, it may vary depending on the image.


# We mark corners where the response is greater than 1% of the max response.
image[harris_response > 0.01 * harris_response.max()] = [0, 0, 255] # Mark in red

# Display the result


[Link]('Harris Corners', image)
[Link](0)
[Link]()

Additional Practice Exercises


1. Shi-Tomasi "Good Features to Track": The Shi-Tomasi detector uses a differ-
ent response function: R = min(λ1 , λ2 ). Implement this detector using OpenCV’s
‘[Link]()‘. Compare the quality and distribution of the detected
points with those from the Harris detector on the same image.

2. Implement Non-Maximal Suppression: The raw output of the Harris detector


often includes clusters of corner responses. Write a simple non-maximal suppression
algorithm that iterates through the response map and, for each corner above the
threshold, keeps it only if it’s the maximum value in its local neighborhood (e.g., a
3 × 3 or 5 × 5 window).

3. Parameter Sensitivity: Experiment with the ‘blockSize‘ and ‘k‘ parameters of


the ‘[Link]‘ function. Describe how changing ‘blockSize‘ affects the scale
of the detected corners. Explain what happens when you use a very small or a very
large value for ‘k‘.

3 SIFT: Scale-Invariant Feature Transform


Once we have detected keypoints, we need to describe the local image region around them in a
way that is robust to geometric and photometric changes. The Scale-Invariant Feature Transform
(SIFT) is a powerful algorithm that provides both a feature detector and a descriptor.

5
Figure 4: The SIFT Detector: Scale-Space Extrema

3.1 The SIFT Detector: Scale-Space Extrema


The Harris detector is not scale-invariant. A corner might disappear or turn into an edge if you
zoom in or out. SIFT solves this by searching for stable features across multiple scales.

1. Constructing a Scale-Space: The image is progressively blurred using Gaussian filters


of increasing sigma (σ). This creates a "pyramid" of images at different scales.

2. Difference-of-Gaussians (DoG): To efficiently find scale-stable keypoints, SIFT uses


the Difference-of-Gaussians. This is calculated by subtracting one blurred image from the
next in the pyramid. The DoG is a good approximation of the Laplacian of Gaussian
(LoG), which is excellent for finding blobs.

3. Finding Local Extrema: A pixel is selected as a keypoint candidate if it is a local


maximum or minimum compared to its 26 neighbors in a 3 × 3 region in the current and
adjacent (upper and lower) DoG images.

4. Keypoint Refinement: The candidate points are refined to sub-pixel accuracy. Low-
contrast points and points along edges are discarded, as they are not stable.

3.2 The SIFT Descriptor: A Histogram of Gradients


For each stable keypoint, SIFT creates a descriptor that is invariant to rotation and illumination.

1. Orientation Assignment: To achieve rotation invariance, a dominant orientation is


assigned to each keypoint. This is done by computing the gradient magnitudes and ori-
entations in the keypoint’s neighborhood and creating an orientation histogram (36 bins).
The peak of this histogram defines the keypoint’s orientation.

2. Descriptor Creation:

6
Figure 5: The SIFT Descriptor

• A 16 × 16 pixel neighborhood around the keypoint is selected. This neighborhood is


rotated to align with the keypoint’s assigned orientation.
• This 16 × 16 region is divided into a 4 × 4 grid of cells.
• For each 4 × 4 cell, an 8-bin orientation histogram is created from the gradients within
it.
• These 16 histograms (of 8 bins each) are concatenated into a single feature vector of
16 × 8 = 128 dimensions.

3. Normalization: The 128D vector is normalized to unit length. This makes the descriptor
robust to affine illumination changes (brightness and contrast).

Programming Exercise

Task: Detect SIFT features in two images of the same scene and match them.
Input: Two images with some overlap (e.g., ‘[Link]‘, ‘[Link]‘). Output: An image
showing the two input images side-by-side with lines drawn between matching keypoints.
Hints (OpenCV):

• You will need ‘opencv-contrib-python‘ for SIFT. Install it with ‘pip install opencv-
contrib-python‘.

• Create a SIFT object: ‘sift = cv2.SIFT_create()‘.

• Detect keypoints and compute descriptors for both images: ‘kp1, des1 =
[Link](img1, None)‘.

• Use a matcher like ‘[Link]()‘ (Brute-Force Matcher).

• Use ‘[Link](des1, des2, k=2)‘ to find the two best matches for each
descriptor.

• Apply Lowe’s ratio test to filter for good matches: keep a match if ‘[Link] <
0.75 * [Link]‘.

• Use ‘[Link]()‘ to visualize the results.

7
import cv2
import numpy as np

# Load the images


img1 = [Link]('[Link]', cv2.IMREAD_GRAYSCALE) # Query image
img2 = [Link]('[Link]', cv2.IMREAD_GRAYSCALE) # Scene image

if img1 is None or img2 is None:


print("Could not open one of the images")
exit(0)

# 1. Initialize SIFT detector


sift = cv2.SIFT_create()

# 2. Find the keypoints and descriptors with SIFT


kp1, des1 = [Link](img1, None)
kp2, des2 = [Link](img2, None)

# 3. Create a BFMatcher object with distance measurement cv2.NORM_L2


bf = [Link](cv2.NORM_L2, crossCheck=False)

# 4. Find the k-best matches for each descriptor.


matches = [Link](des1, des2, k=2)

# 5. Apply Lowe's ratio test to find good matches


good_matches = []
for m, n in matches:
if [Link] < 0.75 * [Link]:
good_matches.append(m)

# 6. Draw the matches


# [Link] needs a list of lists of matches.
match_img = [Link](img1, kp1, img2, kp2, good_matches, None,
flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS)

# Display the result


[Link]('SIFT Matches', match_img)
[Link](0)
[Link]()

Additional Practice Exercises


1. Explore Other Descriptors: SIFT can be slow. OpenCV offers faster alternatives
like ORB (‘cv2.ORB_create()‘) and AKAZE (‘cv2.AKAZE_create()‘). Replace the
SIFT detector in the exercise code with ORB. How do the number of keypoints and
the quality of matches compare? Note that ORB uses binary descriptors, so you will
need to use ‘cv2.NORM_HAMMING‘ for the ‘BFMatcher‘.

2. Investigate the Ratio Test: Lowe’s ratio test is crucial for filtering out ambiguous
matches. In the programming exercise, the ratio is set to 0.75. Experiment by
changing this value (e.g., to 0.6, 0.8, 0.9). How does a stricter (lower) ratio affect
the number and quality of matches? How does a more lenient (higher) ratio change
the result? Explain the trade-off.

You might also like