0% found this document useful (0 votes)
46 views99 pages

Computer Vision Fundamentals and Techniques

Uploaded by

Dnyanda Thorat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views99 pages

Computer Vision Fundamentals and Techniques

Uploaded by

Dnyanda Thorat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Unit I

Introduction
Unit I Introduction


Image Processing,
● Computer Vision - Low-level, Mid-level, High-level;
● Fundamentals of Image Formation, Transformation:
● Orthogonal,
● Euclidean,
● Affine,
● Projective,
● Fourier Transform,
● Convolution and Filtering,
● Image Enhancement,
● Restoration,
● Histogram Processing.
Image Processing
Computer Vision

Computer Vision (CV) is a branch of Artificial Intelligence (AI) that


helps computers to interpret and understand visual information much like
humans. This is based on the key concepts such as Image Processing,
Feature Extraction, Object Detection, Image Segmentation and other core
techniques in CV.
Mathematical Prerequisites for Computer Vision
1. Linear Algebra

● Linear Algebra
● Vectors
● Matrices and Tensors
● Eigenvalues and Eigenvectors
● Singular Value Decomposition
2. Probability and Statistics

● Probability and Statistics


● Probability Distributions
● Bayesian Inference and Bayes' Theorem
● Markov Chains
● Kalman Filters
3. Signal Processing

● Signal Processing
● Image Filtering and Convolution
● Discrete Fourier Transform (DFT)
● Fast Fourier Transform (FFT)
● Principal Component Analysis (PCA)
Key Concepts in Computer Vision

1. Image Transformation

● Image Transformation
● Geometric Transformations
● Fourier Transform
● Intensity Transformation
2. Image Enhancement

● Image Enhancement
● Histogram Equalization
● Contrast Enhancement
● Image Sharpening
● Color Correction
3. Noise Reduction Techniques

● Noise Reduction Techniques


● Median Filtering
● Bilateral Filtering
● Wavelet Denoising
4. Morphological Operations

● Morphological Operations
● Erosion and Dilation
● Opening
● Closing
● Morphological Gradient
2. Feature Extraction
1. Edge Detection Techniques

● Computer Vision Algorithms


● Edge Detection Techniques
● Canny Edge Detector
● Sobel Operator
● Laplacian of Gaussian (LoG)

2. Corner and Interest Point Detection

● Harris Corner Detection


3. Feature Descriptors

● Feature Descriptors
● SIFT (Scale-Invariant Feature Transform)
● SURF (Speeded-Up Robust Features)
● ORB (Oriented FAST and Rotated BRIEF)
● HOG (Histogram of Oriented Gradients)
How Does Computer Vision Work?

1. Computer Vision works much like the human eye and brain. First, our eyes capture the
image and send the visual data to our brain. The brain then processes this information and
transforms it into a meaningful interpretation, recognizing and categorizing the object based
on its properties.
2. In a similar way, Computer Vision uses a camera (acting like the human eye) to capture
images. The visual data is then processed by algorithms to recognize and identify the
objects based on patterns it has learned. However, before the system can recognize objects
in new images, it needs to be trained on a large dataset of labeled images. This training
enables the system to identify and associate various patterns with their corresponding
labels.
What are the main steps in a typical Computer
Vision Pipeline?
1. Image Acquisition

The first step in a computer vision pipeline is image acquisition. This involves
capturing images or videos using sensors or cameras. The quality and resolution
of the images significantly impact the performance of the subsequent steps.

● Devices Used: Cameras, smartphones, drones, satellite imagery, and


medical imaging devices.
● Considerations: Lighting conditions, focus, frame rate, and resolution.
2. Preprocessing
Preprocessing involves preparing the raw image data for further analysis. This step includes
several techniques to enhance image quality and normalize the data.

● Noise Reduction: Applying filters (e.g., Gaussian filter) to remove noise from the
image.
● Normalization: Adjusting the intensity values to a common scale, often between 0 and
1.
● Image Scaling: Resizing images to a fixed dimension required by the model.
● Data Augmentation: Techniques like rotation, flipping, cropping, and color
adjustments to artificially expand the dataset.
3. Image Segmentation
Image segmentation is the process of partitioning an image into multiple segments or regions to
simplify its analysis. This step is crucial for identifying objects and their boundaries.

● Thresholding: Simple method that converts grayscale images to binary images based on a
threshold value.
● Edge Detection: Using algorithms like Canny, Sobel, or Laplacian to detect edges within an
image.
● Region-Based Segmentation: Techniques like Region Growing or Watershed to segment an
image based on the similarity of pixels.
● Semantic Segmentation: Assigning a label to each pixel of the image using deep learning
models like U-Net or Fully Convolutional Networks (FCNs).
4. Feature Extraction

Feature extraction involves identifying and extracting relevant features from the image that
can be used for further analysis or classification.

● Keypoint Detection: Identifying key points of interest in the image, such as corners
or blobs, using algorithms like SIFT, SURF, or ORB.
● Descriptors: Creating feature descriptors that represent the local neighborhood of
key points.
● Deep Learning Features: Using convolutional neural networks (CNNs) to
automatically learn and extract features from images.
5. Object Detection
Object detection is the task of identifying and locating objects within an image. This step often
involves bounding box regression and object classification.

● Classical Methods: Techniques like Histogram of Oriented Gradients (HOG) combined


with Support Vector Machines (SVM).
● Deep Learning Methods: Models like Faster R-CNN, YOLO (You Only Look Once),
and SSD (Single Shot Multibox Detector) for real-time object detection.
6. Object Recognition and Classification

After detecting objects, the next step is to recognize and classify them into
predefined categories.

● Classification Algorithms: Using traditional machine learning


algorithms like SVM, k-NN, or deep learning models like CNNs.
● Transfer Learning: Fine-tuning pre-trained models like VGG,
ResNet, or Inception for specific classification tasks.
7. Post-Processing
Post-processing involves refining the results obtained from the previous steps to enhance
accuracy and usability.

● Non-Maximum Suppression: Used in object detection to eliminate redundant


bounding boxes.
● Result Aggregation: Combining results from multiple frames in video analysis to
improve stability and reduce false positives.
● Refinement: Techniques like conditional random fields (CRFs) for improving
segmentation boundaries.
8. Visualization and Interpretation
The final step in the computer vision pipeline is visualizing and interpreting the results. This step
is crucial for understanding the performance and making decisions based on the visual data.

● Overlaying Results: Displaying bounding boxes, segmentation masks, and key points on
the original images.
● Metrics and Evaluation: Using metrics like accuracy, precision, recall, F1-score, and
Intersection over Union (IoU) to evaluate model performance.
● User Interface: Developing interactive dashboards or applications to visualize and
interpret the results in real-time.
Popular Libraries for Computer Vision
To implement computer vision tasks effectively, various libraries are used:

1. OpenCV: Mostly used open-source library for computer vision tasks like image processing,
video capture and real-time applications.
2. TensorFlow: A popular deep learning framework that includes tools for building and training
computer vision models.
3. PyTorch: Another deep learning library that provides great flexibility for computer vision tasks
for research and development.
4. scikit-image: A part of the scikit-learn ecosystem, this library provides algorithms for image
processing and computer vision.
Deep Learning for Computer Vision
1. Convolutional Neural Networks (CNNs)

Convolutional Neural Networks are designed for learning spatial hierarchies of


features from images and its key components include:

● Deep Learning for Computer Vision


● Deep learning
● Convolutional Neural Networks
● Convolutional Layers
● Pooling Layers
● Fully Connected Layers
2. Generative Adversarial Networks (GANs)
It consists of two networks (generator and discriminator) that work against each other to
create realistic images. There are various types of GANs each designed for specific tasks and
improvements:

● Generative Adversarial Networks (GANs)


● Deep Convolutional GAN (DCGAN)
● Conditional GAN (cGAN)
● Cycle-Consistent GAN (CycleGAN)
● Super-Resolution GAN (SRGAN)
● StyleGAN
3. Variational Autoencoders (VAEs)

They are the probabilistic version of autoencoders which forces the model to learn a distribution
over the latent space rather than a fixed point, some other autoencoders used in computer vision
are:

● Autoencoders
● Variational Autoencoders (VAEs)
● Denoising Autoencoders (DAE)
● Convolutional Autoencoder (CAE)
4. Vision Transformers (ViT)

They are inspired by transformers models to treat images and sequence of patches
and process them using self-attention mechanisms, some common vision
transformers include:

● Vision Transformers (ViT)


● Swin Transformer
● CvT (Convolutional Vision Transformer)
Computer Vision Tasks

1. Image Classification

It involves analyzing an image and assigning it a specific label or category based on its content such as
identifying whether an image contains a cat, dog or car.
Its techniques are as follows:

● Computer Vision Tasks


● Image Classification
● Image Classification using Support Vector Machine (SVM)
● Image Classification using RandomForest
● Image Classification using CNN
● Image Classification using TensorFlow
● Image Classification using PyTorch Lightning
Computer Vision - Low-level, Mid-level, High-level

1. Low-Level Vision
Operates directly on raw image data (pixels). Focuses on extracting basic features.
Examples:
● Image Preprocessing: noise removal, smoothing, filtering

● Edge Detection: Canny, Sobel, LoG

● Color Space Conversion: RGB to HSV, Grayscale

● Thresholding: Binary, Otsu’s method

● Gradient Computation: intensity or color changes

● Corner Detection: Harris, Shi-Tomasi


2. Mid-Level Vision

Involves grouping and interpreting low-level features into meaningful structures.


Examples:
● Segmentation: dividing image into regions (e.g., watershed, superpixels)

● Object Proposals / Contour Grouping

● Motion Estimation: optical flow

● Depth Estimation: stereo vision, structure from motion

● Feature Matching: SIFT, SURF, ORB

● Tracking: Kalman filter, Mean-Shift, Optical flow tracking


3. High-Level Vision
Involves semantic understanding — interpreting scenes and recognizing objects.
Examples:
● Object Recognition & Classification (e.g., ResNet, YOLO)

● Face Detection & Recognition

● Scene Understanding: indoor vs outdoor, activity recognition

● Image Captioning: describing an image in natural language

● Pose Estimation

● Visual Question Answering


● Image Preprocessing
Sobel Edge Detection :

What is Sobel Edge Detection


Sobel Edge detection is one of the successful approaches adopted in the field
of image processing and computer vision to detect edges in an image. This is
done through a derivation in which the gradient of the image intensity is
derived for each pixel, and hence, provides a determination of direction as
well as the rate of change in the direction. Masks of the Sobel operator are
two 3x3 convolutions one serves to detect a change in the horizontal direction
and the other in the vertical one.
Features of Sobel Edge detection
● Directional Sensitivity: Sobel edge detection is good for edge detection because it uses two different kernels thus it
can detect edges in both the horizontal and vertical phases.
● Noise Reduction: The operator incorporates smoothing or blurring such as the Gaussian blur that helps in
removing noise hence the operator is less sensitive to minor changes in the images.
● Gradient Magnitude Calculation: It calculates the gradient computed per pixel which aids in enhancing edges,
through quantizing the amount of intensification.
● Simple Implementation: Due to these features, the Sobel operator is easy to implement, and it can work in real-
time software systems.
● Edge Orientation: Besides, the edges are not only detected but also give information about the orientation of the
edges which is very useful for the next levels of image processing
The basic steps involved in this algorithm are:

● Noise reduction using Gaussian filter

● Gradient calculation along the horizontal and vertical axis

● Non-Maximum suppression of false edges

● Double thresholding for segregating strong and weak edges

● Edge tracking by hysteresis


Input Image Output Image
Contours

Contours are edges or outline of a objects in a image and is used in


image processing to identify shapes, detect objects or measure their size.
We use OpenCV's findContours() function that works best for binary
images.
There are three important arguments of this function:

● Source Image: This is the image from which we want to find the contours.
● Contour Retrieval Mode: This determines how contours are retrieved.
● Contour Approximation Method: This decides how much detail to keep when
storing
The function gives us three outputs:

● Image: The image with contours found in it.


● Contours: A list of contours. Each contour is made up of the (x, y)
coordinates that outline a shape in the image.
● Hierarchy: This gives extra information about the contours like which ones
are inside others.
1. Importing Necessary Libraries

First, we need to import libraries like numpy and OpenCV that help us
process image.
import cv2

import numpy as np
2. Reading Image

Now, we load the image we want to work with. We use [Link]() to read
the image and [Link](0) pauses the program until you press a key.
image = [Link]('./[Link]')
[Link](0)
3. Converting Image to GrayScale

To make it easier to process the image, we convert it from color (BGR) to


grayscale. Grayscale images are simpler to work with for tasks like detecting
edges.
gray = [Link](image, cv2.COLOR_BGR2GRAY)
4. Edge Detection Using Canny

Next, we apply Canny edge detection which highlights the edges of


objects in the image. This helps us find boundaries of shapes and objects
easily.
edged = [Link](gray, 30, 200)
[Link](0)
5. Finding Contours

We then find the contours, which are the boundaries of objects in the image.
This helps us detect the shapes in the image. We focus on the external
contours.
contours, hierarchy = [Link](edged,
cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
6. Displaying Canny Edges After Contouring

Now, we show the edges that we found using Canny edge


detection. This gives us a visual idea of where the edges of the
objects are.
[Link]('Canny Edges After Contouring', edged)
[Link](0)
7. Printing Number of Contours Found

print("Number of Contours Found = " + str(len(contours)))

Output - 3
8. Drawing Contours on the Original Image

we draw the contours on the original image to visualize the shapes we found.
The contours are drawn in green, and we display the updated image.

[Link](image, contours, -1, (0, 255, 0), 3)


[Link]('Contours', image)
[Link](0)
[Link]()
Affine Transformation

A transformation that can be expressed in the form of a matrix


multiplication (linear transformation) followed by a vector addition
(translation).
1. Rotations (linear transformation)
2. Translations (vector addition)
3. Scale operations (linear transformation)
How do we get an Affine Transformation?

1. We mentioned that an Affine Transformation is basically a relation between two images.


The information about this relation can come, roughly, in two ways:
1. We know both and and we also know that they are related. Then our task is to find
2. We know and . To obtain we only need to apply . Our information for may be
explicit (i.e. have the 2-by-3 matrix) or it can come as a geometric relation between
points.
2. Let's explain this in a better way (b). Since relates 2 images, we can analyze the simplest
case in which it relates three points in both images. Look at the figure below:
Projective transformations (Perspective transformations)

In mathematics, a linear transformation is a function that maps one vector space into another

and is often implemented by a matrix. Mapping is considered a linear transformation if it

preserves vector addition and scalar multiplication. To apply a linear transformation to a

vector (i.e., coordinates of one point, in our case — x and y values of a pixel), it is necessary

to multiply this vector by a matrix representing the linear transform. As an output, you will

get a vector with transformed coordinates.


Linear algebra for computer vision
● Vector spaces and linear transformations
● Eigendecomposition and singular value decomposition (SVD)
● Matrix factorizations and linear least squares
Fourier Transform

When we work in image processing, Fourier transform is an important image


processing tool which is used to decompose an image into the frequency
domain. the input image of the Fourier transform is the spatial domain(x,y)
equivalent. the output of the Fourier transform represents the image in the
frequency domain.
Fast Fourier Transform in Image Processing

Fast Fourier Transform (FFT) is a mathematical algorithm widely used in


image processing to transform images between the spatial domain and the
frequency domain. ( It is like a special translator for images).

● Spatial domain: Each pixel in image has color or brightness value


and together these values form the image you see. This is the spatial
domain—the image described by its pixels.
● Frequency domain:
● Frequency domain: Now imagine describing the same image in a different way—not
by the pixels directly, but by how patterns of light and dark change across the image.
For example:
○ Low frequencies represent smooth, gradual changes (like large shapes
or blurry areas).
○ High frequencies capture sharp changes (like edges or fine details).
● frequency domain shows how much of these patterns (or frequencies) are present in the
image)
Implementing Fast Fourier Transform in Image Processing

The Fast Fourier Transform (FFT) works in three main steps:

1. Forward FFT (Spatial to Frequency Domain): FFT converts pixel values from the spatial domain
into sine and cosine waves, mapping low frequencies to the center and high frequencies to the edges.
2. Filtering in the Frequency Domain: Filters are applied to modify certain frequency ranges for
purposes like noise removal (by eliminating high frequencies) or sharpening (by enhancing high
frequencies).
3. Inverse FFT (Frequency to Spatial Domain): The modified frequency data is transformed back into
the spatial domain, resulting in a processed version of the original image.
Image Filtering Using Convolution in OpenCV
2-D Convolution
The fundamental and the most basic operation in image processing is
convolution. This can be achieved by using Kernels. Kernel is a matrix
that is generally smaller than the image and the center of the kernel
matrix coincides with the pixels.
In a 2D Convolution, the kernel matrix is a 2-dimensional, Square, A x B
matrix, where both A and B are odd integers
Kernel
a small matrix used to apply effects like blurring, sharpening, or
edge detection
Kernels are used to perform mathematical operations on images,
modifying pixel values to achieve various effects.
The kernel is slid across the input image, and at each position, the kernel's values are
multiplied with the corresponding pixel values in the input image. The results are then
summed up, and the final value is assigned to the corresponding pixel in the output
image.
Examples of Kernels:
Linear Kernel:
The simplest kernel, it calculates the dot product of two vectors. Suitable for linearly
separable data.

Polynomial Kernel:
Introduces polynomial terms to capture non-linear relationships.

Gaussian (RBF) Kernel:


A popular choice for many applications, it uses a Gaussian function to measure similarity.

Sigmoid Kernel:
Useful for neural networks, especially when dealing with specific types of data.
Identity Kernel

Identity Kernel is the simplest and the most basic kernel operation
that could be performed. The output image produced is exactly like
the image that is given as the input. It does change the input image. It
is a square matrix with the center element equal to 1. All the other
elements of the matrix are 0.
Image Enhancement Techniques using OpenCV - Python

Image enhancement is the process of improving the quality and appearance of an


image. It can be used to correct flaws or defects in an image, or to simply make
an image more visually appealing. Image enhancement techniques can be applied
to a wide range of images, including photographs, scans, and digital images.
Some common goals of image enhancement include increasing contrast,
sharpness, and colorfulness; reducing noise and blur; and correcting distortion
and other defects. Image enhancement techniques can be applied manually using
image editing software, or automatically using algorithms and computer
programs such as OpenCV.
Image Restoration

Image Restoration Using Spatial Filtering

Spatial filtering is the method of filtering out noise from images


using a specific choice of spatial filters. Spatial filtering is defined as
the technique of modifying a digital image by performing an
operation on small regions or subsets of the original image pixels
directly. Frequently, we use a mask to encompass the region of the
image where this predefined operation is performed.
Histogram Processing

The histogram of a digital image with gray levels in the range [0, L-1] is a discrete
function.
Points about Histogram:

● Histogram of an image provides a global description of the


appearance of an image.
● Information obtained from histogram is very large in quality.
● Histogram of an image represents the relative frequency of
occurrence of various gray levels in an image.
Explanation of 4x4 Matrix
Each value represents the intensity of a pixel, typically ranging from 0
(black) to 255 (white) in an 8-bit grayscale image. In this example,
values are simplified and range from 1 to 8.

A histogram shows the frequency of pixel intensities. It helps analyze the


brightness and contrast of an image.
[Link]

You might also like