Computer Vision – Detailed Course Notes
(Approx. 50 Pages)
These notes provide an in-depth and structured explanation of core Computer Vision concepts.
They are designed for undergraduate and postgraduate courses, competitive exams, and research
preparation. Mathematical intuition, algorithmic steps, and applications are emphasized throughout.
1. Image Formation
Image formation studies how a three-dimensional scene is mapped onto a two-dimensional image.
The most common abstraction is the pinhole camera model, which explains perspective projection.
In the pinhole model, light rays pass through a single point (the camera center) and intersect the
image plane. This results in an inverted image. Despite its simplicity, this model captures the
essential geometry of real cameras.
Real-world cameras include lenses, sensors, and apertures. Lenses help focus light and control
blur, while sensors convert photons into electrical signals.
Illumination plays a crucial role in image formation. The observed intensity depends on the light
source, surface reflectance, and viewing direction. Common reflectance models include Lambertian
reflection.
Understanding image formation is critical for tasks such as camera calibration, 3D reconstruction,
and photometric analysis.
Image formation studies how a three-dimensional scene is mapped onto a two-dimensional image.
The most common abstraction is the pinhole camera model, which explains perspective projection.
In the pinhole model, light rays pass through a single point (the camera center) and intersect the
image plane. This results in an inverted image. Despite its simplicity, this model captures the
essential geometry of real cameras.
Real-world cameras include lenses, sensors, and apertures. Lenses help focus light and control
blur, while sensors convert photons into electrical signals.
Illumination plays a crucial role in image formation. The observed intensity depends on the light
source, surface reflectance, and viewing direction. Common reflectance models include Lambertian
reflection.
Understanding image formation is critical for tasks such as camera calibration, 3D reconstruction,
and photometric analysis.
Image formation studies how a three-dimensional scene is mapped onto a two-dimensional image.
The most common abstraction is the pinhole camera model, which explains perspective projection.
In the pinhole model, light rays pass through a single point (the camera center) and intersect the
image plane. This results in an inverted image. Despite its simplicity, this model captures the
essential geometry of real cameras.
Real-world cameras include lenses, sensors, and apertures. Lenses help focus light and control
blur, while sensors convert photons into electrical signals.
Illumination plays a crucial role in image formation. The observed intensity depends on the light
source, surface reflectance, and viewing direction. Common reflectance models include Lambertian
reflection.
Understanding image formation is critical for tasks such as camera calibration, 3D reconstruction,
and photometric analysis.
Image formation studies how a three-dimensional scene is mapped onto a two-dimensional image.
The most common abstraction is the pinhole camera model, which explains perspective projection.
In the pinhole model, light rays pass through a single point (the camera center) and intersect the
image plane. This results in an inverted image. Despite its simplicity, this model captures the
essential geometry of real cameras.
Real-world cameras include lenses, sensors, and apertures. Lenses help focus light and control
blur, while sensors convert photons into electrical signals.
Illumination plays a crucial role in image formation. The observed intensity depends on the light
source, surface reflectance, and viewing direction. Common reflectance models include Lambertian
reflection.
Understanding image formation is critical for tasks such as camera calibration, 3D reconstruction,
and photometric analysis.
2. Geometric Primitives and Transformations
Geometric primitives are the basic elements used to represent shapes in images. These include
points, line segments, curves, and regions.
Points are represented by coordinates, while lines can be represented parametrically or implicitly.
Curves may be defined analytically or discretely using pixels.
Geometric transformations describe how primitives change position or orientation. Basic
transformations include translation, rotation, scaling, and reflection.
Homogeneous coordinates allow transformations to be represented using matrix multiplication. This
unified representation is essential in computer vision pipelines.
Affine and projective transformations are widely used in image alignment, mosaicing, and
perspective correction.
Geometric primitives are the basic elements used to represent shapes in images. These include
points, line segments, curves, and regions.
Points are represented by coordinates, while lines can be represented parametrically or implicitly.
Curves may be defined analytically or discretely using pixels.
Geometric transformations describe how primitives change position or orientation. Basic
transformations include translation, rotation, scaling, and reflection.
Homogeneous coordinates allow transformations to be represented using matrix multiplication. This
unified representation is essential in computer vision pipelines.
Affine and projective transformations are widely used in image alignment, mosaicing, and
perspective correction.
Geometric primitives are the basic elements used to represent shapes in images. These include
points, line segments, curves, and regions.
Points are represented by coordinates, while lines can be represented parametrically or implicitly.
Curves may be defined analytically or discretely using pixels.
Geometric transformations describe how primitives change position or orientation. Basic
transformations include translation, rotation, scaling, and reflection.
Homogeneous coordinates allow transformations to be represented using matrix multiplication. This
unified representation is essential in computer vision pipelines.
Affine and projective transformations are widely used in image alignment, mosaicing, and
perspective correction.
Geometric primitives are the basic elements used to represent shapes in images. These include
points, line segments, curves, and regions.
Points are represented by coordinates, while lines can be represented parametrically or implicitly.
Curves may be defined analytically or discretely using pixels.
Geometric transformations describe how primitives change position or orientation. Basic
transformations include translation, rotation, scaling, and reflection.
Homogeneous coordinates allow transformations to be represented using matrix multiplication. This
unified representation is essential in computer vision pipelines.
Affine and projective transformations are widely used in image alignment, mosaicing, and
perspective correction.
3. Image Processing: Point Operators
Point operators process each pixel independently of its neighbors. They are computationally
efficient and easy to implement.
Brightness adjustment adds or subtracts a constant value from pixel intensities. Contrast
enhancement scales intensity differences.
Thresholding converts grayscale images into binary images and is widely used in segmentation
tasks.
Point operations are often the first step in image preprocessing pipelines.
Despite their simplicity, point operators significantly influence the visual quality of images.
Point operators process each pixel independently of its neighbors. They are computationally
efficient and easy to implement.
Brightness adjustment adds or subtracts a constant value from pixel intensities. Contrast
enhancement scales intensity differences.
Thresholding converts grayscale images into binary images and is widely used in segmentation
tasks.
Point operations are often the first step in image preprocessing pipelines.
Despite their simplicity, point operators significantly influence the visual quality of images.
Point operators process each pixel independently of its neighbors. They are computationally
efficient and easy to implement.
Brightness adjustment adds or subtracts a constant value from pixel intensities. Contrast
enhancement scales intensity differences.
Thresholding converts grayscale images into binary images and is widely used in segmentation
tasks.
Point operations are often the first step in image preprocessing pipelines.
Despite their simplicity, point operators significantly influence the visual quality of images.
Point operators process each pixel independently of its neighbors. They are computationally
efficient and easy to implement.
Brightness adjustment adds or subtracts a constant value from pixel intensities. Contrast
enhancement scales intensity differences.
Thresholding converts grayscale images into binary images and is widely used in segmentation
tasks.
Point operations are often the first step in image preprocessing pipelines.
Despite their simplicity, point operators significantly influence the visual quality of images.
4. Linear Filtering
Linear filtering modifies an image by convolving it with a kernel or mask. Each output pixel is a
weighted sum of neighboring input pixels.
Smoothing filters such as mean and Gaussian filters reduce noise but may blur edges.
Sharpening filters emphasize intensity transitions and enhance fine details.
The choice of kernel size and coefficients affects frequency characteristics of the image.
Linear filters are fundamental in both spatial and frequency domain analysis.
Linear filtering modifies an image by convolving it with a kernel or mask. Each output pixel is a
weighted sum of neighboring input pixels.
Smoothing filters such as mean and Gaussian filters reduce noise but may blur edges.
Sharpening filters emphasize intensity transitions and enhance fine details.
The choice of kernel size and coefficients affects frequency characteristics of the image.
Linear filters are fundamental in both spatial and frequency domain analysis.
Linear filtering modifies an image by convolving it with a kernel or mask. Each output pixel is a
weighted sum of neighboring input pixels.
Smoothing filters such as mean and Gaussian filters reduce noise but may blur edges.
Sharpening filters emphasize intensity transitions and enhance fine details.
The choice of kernel size and coefficients affects frequency characteristics of the image.
Linear filters are fundamental in both spatial and frequency domain analysis.
Linear filtering modifies an image by convolving it with a kernel or mask. Each output pixel is a
weighted sum of neighboring input pixels.
Smoothing filters such as mean and Gaussian filters reduce noise but may blur edges.
Sharpening filters emphasize intensity transitions and enhance fine details.
The choice of kernel size and coefficients affects frequency characteristics of the image.
Linear filters are fundamental in both spatial and frequency domain analysis.
5. Intensity Transformation Functions
Intensity transformation functions map input pixel values to output values.
Negative transformation inverts image intensities and highlights hidden details.
Logarithmic transformations expand dark regions while compressing bright regions.
Power-law (gamma) transformations correct display-related distortions.
Histogram equalization improves global contrast by redistributing intensity values.
Intensity transformation functions map input pixel values to output values.
Negative transformation inverts image intensities and highlights hidden details.
Logarithmic transformations expand dark regions while compressing bright regions.
Power-law (gamma) transformations correct display-related distortions.
Histogram equalization improves global contrast by redistributing intensity values.
Intensity transformation functions map input pixel values to output values.
Negative transformation inverts image intensities and highlights hidden details.
Logarithmic transformations expand dark regions while compressing bright regions.
Power-law (gamma) transformations correct display-related distortions.
Histogram equalization improves global contrast by redistributing intensity values.
Intensity transformation functions map input pixel values to output values.
Negative transformation inverts image intensities and highlights hidden details.
Logarithmic transformations expand dark regions while compressing bright regions.
Power-law (gamma) transformations correct display-related distortions.
Histogram equalization improves global contrast by redistributing intensity values.
6. Neighborhood Operators
Neighborhood operators compute output pixels based on a local window around each pixel.
Median filtering is effective for removing salt-and-pepper noise.
Edge detection operators such as Sobel, Prewitt, and Roberts detect intensity gradients.
Neighborhood size influences noise suppression and edge localization.
These operators form the foundation for higher-level vision algorithms.
Neighborhood operators compute output pixels based on a local window around each pixel.
Median filtering is effective for removing salt-and-pepper noise.
Edge detection operators such as Sobel, Prewitt, and Roberts detect intensity gradients.
Neighborhood size influences noise suppression and edge localization.
These operators form the foundation for higher-level vision algorithms.
Neighborhood operators compute output pixels based on a local window around each pixel.
Median filtering is effective for removing salt-and-pepper noise.
Edge detection operators such as Sobel, Prewitt, and Roberts detect intensity gradients.
Neighborhood size influences noise suppression and edge localization.
These operators form the foundation for higher-level vision algorithms.
Neighborhood operators compute output pixels based on a local window around each pixel.
Median filtering is effective for removing salt-and-pepper noise.
Edge detection operators such as Sobel, Prewitt, and Roberts detect intensity gradients.
Neighborhood size influences noise suppression and edge localization.
These operators form the foundation for higher-level vision algorithms.
7. Points, Patches, Edges, and Contours
Interest points are distinctive locations that can be reliably detected.
Patches represent local neighborhoods used for feature description.
Edges correspond to sharp intensity changes and often indicate object boundaries.
Contours are continuous curves formed by connecting edge pixels.
These features enable object recognition and scene understanding.
Interest points are distinctive locations that can be reliably detected.
Patches represent local neighborhoods used for feature description.
Edges correspond to sharp intensity changes and often indicate object boundaries.
Contours are continuous curves formed by connecting edge pixels.
These features enable object recognition and scene understanding.
Interest points are distinctive locations that can be reliably detected.
Patches represent local neighborhoods used for feature description.
Edges correspond to sharp intensity changes and often indicate object boundaries.
Contours are continuous curves formed by connecting edge pixels.
These features enable object recognition and scene understanding.
Interest points are distinctive locations that can be reliably detected.
Patches represent local neighborhoods used for feature description.
Edges correspond to sharp intensity changes and often indicate object boundaries.
Contours are continuous curves formed by connecting edge pixels.
These features enable object recognition and scene understanding.
8. Contour Tracking and Applications
Contour tracking follows object boundaries across images or video frames.
Active contour models (snakes) evolve curves based on energy minimization.
Contour tracking is robust to shape deformation and partial occlusion.
Applications include medical imaging, surveillance, and gesture recognition.
Reliable contour tracking enables accurate motion and shape analysis.
Contour tracking follows object boundaries across images or video frames.
Active contour models (snakes) evolve curves based on energy minimization.
Contour tracking is robust to shape deformation and partial occlusion.
Applications include medical imaging, surveillance, and gesture recognition.
Reliable contour tracking enables accurate motion and shape analysis.
Contour tracking follows object boundaries across images or video frames.
Active contour models (snakes) evolve curves based on energy minimization.
Contour tracking is robust to shape deformation and partial occlusion.
Applications include medical imaging, surveillance, and gesture recognition.
Reliable contour tracking enables accurate motion and shape analysis.
Contour tracking follows object boundaries across images or video frames.
Active contour models (snakes) evolve curves based on energy minimization.
Contour tracking is robust to shape deformation and partial occlusion.
Applications include medical imaging, surveillance, and gesture recognition.
Reliable contour tracking enables accurate motion and shape analysis.
9. Lines, Vanishing Points, and RANSAC
Line detection identifies linear structures using methods such as the Hough Transform.
Vanishing points arise from the projection of parallel 3D lines.
They provide cues about camera orientation and scene geometry.
RANSAC is a robust estimator that fits models in the presence of outliers.
RANSAC variants improve efficiency and accuracy in real-world vision tasks.
Line detection identifies linear structures using methods such as the Hough Transform.
Vanishing points arise from the projection of parallel 3D lines.
They provide cues about camera orientation and scene geometry.
RANSAC is a robust estimator that fits models in the presence of outliers.
RANSAC variants improve efficiency and accuracy in real-world vision tasks.
Line detection identifies linear structures using methods such as the Hough Transform.
Vanishing points arise from the projection of parallel 3D lines.
They provide cues about camera orientation and scene geometry.
RANSAC is a robust estimator that fits models in the presence of outliers.
RANSAC variants improve efficiency and accuracy in real-world vision tasks.
Line detection identifies linear structures using methods such as the Hough Transform.
Vanishing points arise from the projection of parallel 3D lines.
They provide cues about camera orientation and scene geometry.
RANSAC is a robust estimator that fits models in the presence of outliers.
RANSAC variants improve efficiency and accuracy in real-world vision tasks.