Computer Vision
INM460/IN3060
Lecture 2
Digital images and image processing
Dr Giacomo Tarroni
Slides credits: Giacomo Tarroni, Sepehr Jalali
Recap from the previous lecture
Human visual system:
• How the eye works
• Different theories for colour perception
• How the signal is transmitted from the eye to the visual cortex
• Different functions of the visual cortex parts
• Simple/complex cells
• Advantages & limitations of the human visual system
• Gestalt laws: interpreting how perception is created
• Depth and motion perception
Overview of today’s lecture
• Digital images:
◦ Formation
◦ Digital representation
• Digital image processing:
◦ Brightness and colour transformations
◦ Geometric transformations
◦ Filtering (next lecture)
Images and more images
• Images are produced at an extremely high (and growing) rate
1990s Today
Credits: [Link]
Imaging devices
Digital cameras Home surveillance
Webcams Action cameras
360° cameras Camera arrays
Smartphones
What is an image?
“An image is multi-dimensional signal that measures a physical quantity”
• Multi-dimensional signal:
◦ 2D: Image (a function of 𝑥 and 𝑦)
◦ 3D:
• Image (a function of 𝑥, 𝑦, and 𝑧, e.g. CT scan)
• Video (a function of 𝑥, 𝑦 and 𝑡)
• Physical quantity:
◦ Typically visible light for standard photographs (called “natural images”)
◦ Many other possibilities (temperature, acoustic properties, etc.)
Digital photo Thermal image Ultrasound video
2D Digital image
A 2D digital image is described by a multi-dimensional function (mapping)
between spatial coordinates (𝑥 and 𝑦) and image intensity (for greyscale
images) or colour channel values (for colour images)
Greyscale image
Greyscale image x
𝐼 𝑥, 𝑦 : ℤ2 → ℤ y
with
• 𝑥, 𝑦 spatial coordinates
• 𝐼 image intensity (typical range: [0, 255])
Colour image
𝐼 𝑥, 𝑦 : ℤ2 → ℤ3
with
• 𝑥, 𝑦 spatial coordinates
• 𝐼 = 𝐼𝑅 , 𝐼𝐺 , 𝐼𝐵 colour channel values 𝐼(200, 400) = 178
Digital camera
Typical components of a modern DSLR (i.e. digital single lens reflex) and
mirrorless cameras (also see [Link]
Credits: [Link]
From light to pixels
• The light impinging on the image sensor is a spatially continuous signal
• This signal is converted to a digital signal (i.e. a matrix of pixel values)
through spatial sampling and quantisation
Quantisation
Spatial sampling
Spatial sampling
Sampling converts the domain of a continuous signal into a discrete one:
• The light impinging the image plane is a continuous signal in 𝑥 and 𝑦
• In a digital camera, this signal is sampled on a regular 2D grid
• One pixel per grid cell
Spatial sampling
Sampling converts the domain of a continuous signal into a discrete one:
• The light impinging the image plane is a continuous signal in 𝑥 and 𝑦
• In a digital camera, this signal is sampled on a regular 2D grid
• One pixel per grid cell
Let’s examine the actual process focusing on one dimension:
Spatial sampling
Output signal intensity
(continuous values)
Sensor 𝑥 axis
Pixel width
Image resolution
• Image resolution measures the detail an image holds
• It can be described in different ways:
◦ Pixel resolution: image dimensions in pixels (e.g. 640x480, 0.3 MP)
◦ Spatial resolution: size of each pixel (for each dimension) in length (e.g.
1.4x1.4 mm). Used in specific fields (e.g. medical imaging)
◦ Pixel density (for sensor/screens), pixels per inch (to be used when
printing the image), etc.
640
1.4 x 1.4 mm
480
Credits: [Link]
Colour filter array and demosaicing
• Light captured by digital cameras is binned into separate red, green, and blue
values using a colour filter array (CFA)
• Bayer pattern is most common CFA design, based on GRBG: twice as many
G bins than R or B because of higher sensitivity of human eye to green light
• A demosaicing algorithm interpolates the data so that each pixel has a red,
green, and blue (RGB) value
◦ Final pixel resolution preserved!
◦ Raw file is what is collected before demosaicing
• More detailed explanation: [Link]
Mosaic
Demosaiced image
Quantization
Quantisation limits the values a pixel can have to a finite set
Let’s examine the actual process focusing on one dimension:
Quantisation
Output pixel intensity
(discrete values)
Sensor 𝑥 axis
Quantization
Quantisation limits the values a pixel can have to a finite set
• In a greyscale image, if we use one byte (8 bits) per pixel, then we can
represent 28 = 256 different intensities (range [0, 255])
8 bits per pixel 4 bits per pixel 1 bit per pixel
28 = 256 shades 24 = 16 shades 21 = 2 shades
binary image
Credits: [Link]
Quantization
• In standard colour photographic images, we use 8 bits for each colour
channel (red, green, blue), or 24 bits per pixel (i.e. 24-bit colour)
• That means there are 224 possible colours for a pixel: ~16.7 million colours!
• Human eye can distinguish roughly ~10 million colours
Colour
• Colour images typically have three colour channels corresponding to the
amount of red, green, and blue present at each pixel
• Combining them together produces the final colour
Credits: [Link]
Colour theory: human retina
• Trichromatic theory of vision:
◦ Human eyes perceive colour through stimulation of three different types
cones (S, M, L) in the retina
◦ Three types have peak sensitivity roughly around to red, green and blue
• RGB colour model was defined based on the physiology of human eye
• Spectral responses of each colour filter in CFA are quite similar to that of
LMS cones
S M L
Spectral sensitivity
RGB colour model
• First introduced in 1800s, now used in displays
• Additive mixing property of light: red, green, blue lights are summed to form
the final colour
• RGB colour model:
◦ One axis per colour channel
◦ Range from 0 (no light) to 255 (full intensity) along each axis
◦ A colour is a point in this space and is represented as a vector: 𝐼𝑅 , 𝐼𝐺 , 𝐼𝐵
Credits: [Link]
RGB colour model
• Additive colour mixing: RGB describes what kind of light needs to
be emitted to produce a given colour
• Light is added together to move from black to white
Common colour
Colour components
name
R G B
0 0 0
0 0 255
0 255 0
0 255 255
255 0 0
255 0 255
255 255 0
255 255 255
255 127 0
Credits: [Link]
RGB colour model
• Additive colour mixing: RGB describes what kind of light needs to
be emitted to produce a given colour
• Light is added together to move from black to white
Common colour
Colour components
name
R G B
0 0 0 Black
0 0 255 Blue
0 255 0 Green
0 255 255 Cyan
255 0 0 Red
255 0 255 Magenta
255 255 0 Yellow
255 255 255 White
255 127 0 Orange
Credits: [Link]
CMY(K) colour model
• Developed for printing
• Subtractive mixing property of ink: cyan, magenta, yellow (and black) inks
are summed to form the final colour. Each ink subtracts a portion of the light
that would otherwise be reflected from a white background
• RGB colours are subtracted from white light (W):
W-R = G+B = C; W-G = M; W-B=Y
• Black (K) helps to create additional colours (otherwise dark green instead of
black)
Credits: [Link]
[Link]
HSV colour model
• More intuitive and more perceptually relevant than RGB
• Cylindrical coordinate system with
◦ Hue (H): discernible colour based on the dominant wavelength
◦ Saturation (S): vividness of the colour (zero saturation means a greyscale
colour in the centre of the cylinder)
◦ Value (V): brightness
• Example: a bright red colour has a red hue, full saturation, and high value
“How to halve the saturation in RGB space?”
Credits: [Link]
LAB colour model
• RGB and HSV are not perceptually uniform:
the distance between colours in the colour space does not match human
visual perception of colour difference
• CIE LAB attempts to be perceptually uniform
• It is based on opponent process theory: observation that for humans,
retinal colour stimuli are translated into distinctions between
◦ blue versus yellow
◦ red versus green
◦ black (dark) versus white (light)
• Larger gamut than both RGB and CMYK
Other common image types
• RGBD
In addition to a colour image (RGB), acquires
a depth image (D), which represents the
distance from each pixel from the camera
𝐼 𝑥, 𝑦 : ℤ2 → ℤ4
• Volumetric images
• Image data stored as voxels (i.e. volume
elements: small cubes or 3D pixels)
• Example: computed tomography (CT)
𝐼 𝑥, 𝑦, 𝑧 : ℤ3 → ℤ
• Video
A sequence of images over time
𝐼 𝑥, 𝑦, 𝑡 : ℤ3 → ℤ3
Image file formats
• BitMap (BMP): uncompressed, therefore large and lossless. If 24-bit colour
format, then:
◦ a 12MP image requires 36MB of storage
◦ a 30 fps video generates 30 x 36MB = 1.008GB per second
◦ a 2 hour video (image only) requires 7257.6GB (7.26TB) of storage
• JPEG: Joint Photographic Experts Group is a lossy compression method
• TIFF: Tagged Image File Format is a flexible format (both loss and lossless)
• GIF: Graphics Interchange Format is usually limited to an 8-bit palette (256
colours). The GIF format is most suitable for storing graphics with few
colours, such as simple diagrams, shapes, logos and cartoon style images
• PNG: the PNG (Portable Network Graphics) file format was created as a free,
open-source alternative to GIF
Images in Python
• Some of the most common Python packages for handling images are scikit-
image, OpenCV (with its bindings in Python), Pillow, SciPy (with its ndimage
submodule)
• For most common tasks, we will use scikit-image (skimage), which is simple
to use and has a large collection of image processing algorithms
• For plotting images, we will instead use matplotlib
from [Link] import imread
import [Link] as plt
img = imread(‘[Link]’)
[Link](img)
[Link]()
Credits: "Surinamese peppers" by Daveness_98 licensed under CC BY 2.0
Images in Python
What is the img variable? It’s a numpy ndarray object with:
• Shape:
print([Link])
>> (777, 1024, 3)
This means the image has a height of 777 (number of rows of the ndarray), width
of 1024 (number of columns), and 3 colour channels 𝐼𝑅 , 𝐼𝐺 , 𝐼𝐵
• Data type:
print([Link])
>> uint8
This means that each value in each colour
channel is an unsigned integer with 8 bits of
precision. Therefore there are 2 8 = 256
values possible, in the range [0, 255], for
each channel
Credits: "Surinamese peppers" by Daveness_98 licensed under CC BY 2.0
Image data types in skimage
• Images are typically uint8 (range [0, 255]) when loaded
• Performing operations on uint8 images can generate unwanted results:
# Trying to increase red channel
# brightness
img_r = img[:, :, 0]
img_r_mod = img_r+100
fig, ax = [Link](1, 2)
ax[0].imshow(img_r, cmap=‘gray’)
ax[1].imshow(img_r_mod, cmap=‘gray’)
[Link]() Pixel values greater than 255 will be
replaced by the result of %255
• In addition, some image processing operations (e.g. filtering) provide
accurate results only using floating point data dypes
• Thus it is often required to convert images to float (range [0, 1] or [-1, 1])
• skimage has functions for data types conversions: e.g. img_as_float and
img_as_ubyte. These functions expect the input to be in a correct range
• Never use numpy’s astype!
Image data types in skimage
Additional notes:
• Float data type is not restricted to the [-1, 1] range, but you should make sure
your images are to avoid issues in future conversions
• Some skimage functions may return different data types for convenience.
You can use the conversion functions on the output if needed
• Some functions (e.g. filtering) can generate negative values, and that is
normal. However if you want to reconvert to an unsigned type you should first
rescale to the [0, 1] range, otherwise negative values will be clipped to 0
• Rescaling of intensities can be performed using the
exposure.rescale_intensity function
• Have a look at [Link]
[Link]/docs/stable/user_guide/data_types.html for more details on these
aspects
Image plotting with matplotlib
The main plotting function that we will use is [Link]:
• For one channel (width, height, 1) inputs, the data is rescaled to the [0, 1]
range and then mapped to a chosen colormap
• For RGB (height, width, channels) inputs, the data has to be either:
◦ Float type in the [0, 1] range
◦ Uint8 ([0, 255] range)
• Out of range RGB values are clipped:
# Note: syntax not recommended!
img_r = img_as_float(img[:, :, 0])
img_r_mod = img_r+100
fig, ax = [Link](1, 2)
ax[0].imshow(img_r, cmap=‘gray’)
ax[1].imshow(img_r_mod, cmap=‘gray’)
[Link]()
One-channel images are rescaled before plotting
• [Link]
• [Link]
Colour space conversions
The color submodule provides many functions for colour space conversions
• rgb2gray converts from RGB to grayscale, reducing the channels from 3 to 1
and converting the data type from uint8 to float ([0, 1] range)
from [Link] import imread
from [Link] import rgb2gray
import [Link] as plt
img = imread(‘[Link]’)
img_gray = rgb2gray(img)
fig, ax = [Link](1, 2)
ax[0].imshow(img)
ax[1].imshow(img_gray, cmap=‘gray’)
[Link]() Conversion to grayscale
print(img_gray.dtype)
>> float64
There are functions for the most common conversions:
• rgb2hsv and hsv2rgb
• rgb2lab and lab2rgb
Imperfections in images
Low
resolution
Noise Bloom
(i.e. light bleeding on darker background)
Imperfections in images
Motion blur Poor contrast
Compression artefacts Lens distortion
Digital image processing
• Digital image processing is the use of computer algorithms to transform an
image, sometimes trying to fix imperfections
• Example transformations:
◦ Brightness and colour transformations
◦ Geometric transformations
◦ Filtering (next lecture)
Brightness transformations
• Brightness transformations are position-independent and take the form
𝐽=𝑓 𝐼
where
• 𝐼(𝑥, 𝑦) is the intensity (or colour channel value) of original image
• 𝐽(𝑥, 𝑦) is the intensity (or colour channel value) of transformed image
• f describes the transformation between them (e.g.: 𝐽 = 𝐼 + 100)
• The function 𝑓 is independent of position in the image (therefore the (𝑥, 𝑦)
arguments are dropped in the equation above): the result of 𝑓 depends on
the intensity of each pixel and not on their position
• Additional notes:
◦ Care must be taken to ensure data types and their ranges are respected
◦ Most transformations will be introduced for grayscale images. For colour
images, either apply the transformation channel-by-channel or convert to
HSV and apply to V channel
Negative
The negative of a 24-bit colour image can be formed simply as
𝐽 = 255 − 𝐼
[imports]
img = imread(‘[Link]’)
img_neg = 255 – img
fig, ax = [Link](1, 2)
ax[0].imshow(img)
ax[1].imshow(img_neg)
[Link]()
Original image Negative
Tinting
Tinting (and colour balancing) applies an adjustment to the colours, normally
through a multiplication
𝐼𝑅 ′ 𝑠𝑅 𝐼𝑅
𝐼𝐺 ′ = 𝑠𝐺 𝐼𝐺
𝐼𝐵 ′ 𝑠𝐵 𝐼𝐵
where 𝑠𝑅 , 𝑠𝐺 , and 𝑠𝐵 scale the red, green, and blue colour of each pixel
Tinting
Example: increasing redness of an image
𝑠𝑅 = 2, 𝑠𝐺 = 1, and 𝑠𝐵 = 1:
Original image
[imports]
s_r, s_g, s_b = 2, 1, 1
# Load and convert to float
img = img_as_float(imread(‘[Link]’))
img_tint = np.empty_like(img)
img_r = img[:, :, 0]
img_g = img[:, :, 1]
img_b = img[:, :, 2]
Transformed image
img_tint[:, :, 0] = s_r * img_r
img_tint[:, :, 1] = s_g * img_g
img_tint[:, :, 2] = s_b * img_b
img_tint[img_tint > 1] = 1 # Clip values > 1
fig, ax = [Link](2, 1)
ax[0].imshow(img)
ax[1].imshow(img_tint)
[Link]()
Histogram
• A histogram gives the count of pixel intensities in an image
• The full available range is divided into bins for easier interpretation
• Can be computed both for each colour channel or grayscale version
• It can be computed with matplotlib’s hist function (remember to flatten
your input!) or with numpy’s histogram function
• Note: the dynamic range of an image is easily defined using the histogram as
the difference between the min and max intensity values in an image
Image Histogram
min 𝐼 = 103
Dynamic
range
max 𝐼 = 210
Histogram
• A histogram can be normalised using the total number of pixels
• The result is a probability density function (pdf) representing the probability
that a random pixel has a particular intensity
• [Link] with density = True
• In the example in this slide, we could say that a pixel has a higher probability
of having an intensity of 160 than an intensity of 200
Contrast stretching
• If most of the intensities are clustered in the [100, 200] range, the image looks
“washed out” (lack of darker and brighter values)
• Contrast stretching is a linear transformation that aims at extending the
dynamic range of the image Original
• We can transform the values
using a function
𝐽 = 𝛼𝐼 + 𝛽
with 𝛼 and 𝛽 user-defined
parameters
After contrast stretching
• skimage function:
exposure.rescale_intensity
(where instead of 𝛼 and 𝛽, the
user defines the intensity values
to be linearly stretched)
Gamma correction
• Gamma correction applies a non-linear transformation of pixel values
• It tries to take advantage of the non-linear manner in which humans perceive
light (greater sensitivity to differences between darker tones than lighter ones)
• Non-linear: relative Original
distances between pixel
values can both increase
and decrease for different
ranges
• The mathematical form is
𝐽 = 𝐴𝐼 𝛾
After gamma correction
where 𝛾 is a parameter and Credits: [Link]
usually 𝐴 = 255(1−𝛾) for an
image with range [0, 255]
• skimage function:
exposure.adjust_gamma
Cumulative distribution function
• The cumulative distribution function (cdf) gives, for a given intensity value,
the percentage of pixels in the image that have that value or a lower one
• It can be easily computed from the normalised histogram (pdf) by summing to
the value of each bin those of the previous bins
• [Link] with density = True and cumulative = True
cdf
Histogram equalization
• Histogram equalisation applies a non-linear transformation that tries to
uniformly distribute the pixel values in the dynamic range
• Pixels with different intensities can be assigned to the same bin as a
consequence of this transformation Original
• Histogram equalisation leverages
the information in the cdf. The
output’s cdf will be roughly a
straight line
• skimage functions:
◦ exposure.equalize_hist
◦ exposure.equalize_adapt
hist After histogram equalization
• [Link]
ns/26818568/whats-the-
difference-between-histeq-and-
adapthisteq
Geometric transformations
• Geometric transformations change the spatial position of pixels in the
image
• Geometric transformations (generically called image warpings) have a variety
of practical uses, including
◦ Registration: aligning different (but similar) images
◦ Removing distortion
◦ Simplifying further processing
Corrected image
Distorted image
Geometric transformations
• The positions of pixels in the image are transformed
• Mathematically, this is expressed as
𝒙′ = 𝑇 𝒙
where:
◦ 𝒙 = (𝑥, 𝑦) is the position of a point in the distorted image 𝐼
◦ 𝒙′ = (𝑥′, 𝑦′) is the position of a point in the corrected image 𝐽
◦ 𝑇 𝒙 is a mapping function
• The easiest way to implement an image warping from 𝐼 to 𝐽 is as follows:
For every pixel position 𝒙′ in the corrected image:
◦ Using 𝑇 −1 , determine 𝒙 (i.e. where 𝒙′ came from in the distorted image)
◦ Interpolate a value from 𝐼(𝒙) to produce 𝐽(𝒙′)
Geometric transformations
• You may notice this is applied somewhat backwards: rather than using a
(forward) mapping 𝑇 to transform pixels from the distorted image to the
corrected image, we use the (inverse) transform 𝑇 −1
𝑇 Forward mapping
may result in gaps
𝑇 −1 Inverse mapping
ensures no gaps
Interpolation
• This ensures that all the pixels in the corrected image will be filled
• However, it’s often necessary to interpolate pixels from the distorted image
Affine transformation
• Affine transformations are transformations that preserve (among others):
◦ Collinearity (i.e. aligned points will remain aligned)
◦ Parallelism (i.e. parallel lines will remain parallel)
• It can be fully expressed through matrix product:
𝒙′ = 𝐴𝒙
𝑥′ 𝑎11 𝑎12 𝑎13 𝑥 𝑎11 𝑥 + 𝑎12 𝑦 + 𝑎13
𝑦′ = 𝑎21 𝑎22 𝑎23 ∙ 𝑦 = 𝑎21 𝑥 + 𝑎22 𝑦 + 𝑎23
1 0 0 1 1 1
• In skimage, you can define an affine transformation using
[Link], either:
◦ Passing the matrix parameters
A = [Link]([[a11, a12, a13], [a21, a22, a23], [0, 0, 1]])
tform = [Link](A)
◦ Passing the explicit transformation parameters (see next slides)
• To apply the transformation to an image, you can use [Link]
Special cases: translation
• Translation
𝑥′ 1 0 𝑡𝑥 𝑥 𝑥 + 𝑡𝑥
𝑦′ = 0 1 𝑡𝑦 ∙ 𝑦 = 𝑦 + 𝑡𝑦
1 0 0 1 1 1
tform = [Link](translation=(tx, ty))
from [Link] import imread
from skimage import transform
import [Link] as plt
img = imread(‘[Link]’)
# Define affine transformation
tform = [Link](translation=(100, 0))
# Apply to image (note: inverse transformation!)
img_transformed = [Link](img, [Link])
fig, ax = [Link](2, 1)
ax[0].imshow(img)
ax[1].imshow(img_transformed)
[Link]()
Special cases: rotation
• Rotation
𝑥′ cos 𝜃 − sin 𝜃 0 𝑥 𝑥 cos 𝜃 − 𝑦 sin 𝜃
𝑦′ = sin 𝜃 cos 𝜃 0 ∙ 𝑦 = 𝑥 sin 𝜃 + 𝑦 cos 𝜃
1 0 0 1 1 1
tform = [Link](rotation=theta)
from [Link] import imread
from skimage import transform
import [Link] as plt
from math import radians
img = imread(‘[Link]’)
# Define affine transformation
tform = [Link](rotation=radians(30))
# Apply to image (note: inverse transformation!)
img_transformed = [Link](img, [Link])
fig, ax = [Link](2, 1)
ax[0].imshow(img)
ax[1].imshow(img_transformed)
[Link]()
Special cases: scaling
• Scaling
𝑥′ 𝑠𝑥 0 0 𝑥 𝑠𝑥 𝑥
𝑦′ = 0 𝑠𝑦 0 ∙ 𝑦 = 𝑠𝑦 𝑦
1 0 0 1 1 1
tform = [Link](scale=(sx, sy))
from [Link] import imread
from skimage import transform
import [Link] as plt
img = imread(‘[Link]’)
# Define affine transformation
tform = [Link](scale=(0.8, 0.5))
# Apply to image (note: inverse transformation!)
img_transformed = [Link](img, [Link])
fig, ax = [Link](2, 1)
ax[0].imshow(img)
ax[1].imshow(img_transformed)
[Link]()
Special cases: shear
• Shear (or skew)
𝑥′ 1 −sin 𝜃 0 𝑥 𝑥 − 𝑦 sin 𝜃
𝑦′ = 0 cos 𝜃 0 ∙ 𝑦 = 𝑦 cos 𝜃
1 0 0 1 1 1
tform = [Link](shear=theta)
from [Link] import imread
from skimage import transform
import [Link] as plt
from math import radians
img = imread(‘[Link]’)
# Define affine transformation
tform = [Link](shear=radians(30))
# Apply to image (note: inverse transformation!)
img_transformed = [Link](img, [Link])
fig, ax = [Link](2, 1)
ax[0].imshow(img)
ax[1].imshow(img_transformed)
[Link]()
Additional notes
Regarding transformations:
• You can always access the affine matrix of an AffineTransform object by
using [Link]
• Different tform objects can be combined sequentially: see [Link]
[Link]/docs/stable/auto_examples/transform/plot_transform_types.html
• For rotation around the image centre and simple rescaling and resizing
operations, there are dedicated functions in the transform submodule called
rotate, rescale, resize. More details at [Link]
[Link]/docs/stable/auto_examples/transform/plot_rescale.html
Regarding [Link]:
• Interpolation is usually needed, the type of which can be specified
• How to deal with points outside the boundaries can also be specified
• In line with what said in the previous slides, in transformations the source
image is considered the output, and the destination the input. As a
consequence, tform must be inverted before warping
Exercise
Test: manually transform image in order to have well-aligned digits
from [Link] import imread
from skimage import transform
import [Link] as plt
from math import radians
img = imread(‘[Link]’)
# Rotate around centre
img_tran = [Link](img, -15)
# Define affine transformation (translation + shear)
tform = [Link](translation=(-30, 0),
shear=radians(-17))
# Apply to image
img_tran2 = [Link](img_tran, [Link])
fig, ax = [Link](2, 1)
ax[0].imshow(img)
ax[1].imshow(img_tran)
ax[2].imshow(img_tran2)
[Link]()
Common CV tasks
Image matching/alignment/registration
• Identify the transformation which aligns two or more images
• Used for multi-modal comparison, image stitching, etc.
Matching
M. Brown. Automatic panoramic image stitching using invariant features, IJCV 2007
Image matching
Test: find the affine transformation that aligns two images
• The goal is to determine the six coefficients in the 𝐴 matrix
• This can be achieved by finding at least 3 correspondences (e.g. plate
corners) in the two images
𝒑3 𝒒1
𝒒3
𝒑1
𝒒2
𝒑2
Let’s assume that we have manually selected the following correspondences:
𝒑1 = 18, 47 𝑇 𝒒1 = 48, 50 𝑇
𝒑2 = 15, 100 𝑇 𝒒2 = 48, 100 𝑇
𝒑3 = 178, 6 𝑇 𝒒3 = 212, 50 𝑇
Image matching
For each couple of corresponding points 𝒑-𝒒 we must have
𝑥′ 𝑎11 𝑎12 𝑎13 𝑥 𝑎11 𝑥 + 𝑎12 𝑦 + 𝑎13
𝑦′ = 𝑎21 𝑎22 𝑎23 ∙ 𝑦 = 𝑎21 𝑥 + 𝑎22 𝑦 + 𝑎23
1 0 0 1 1 1
Then for three correspondences we can write
𝑥1′ 𝑥1 𝑦1 1 0 0 0 𝑎11
′
𝑦1 0 0 0 𝑥1 𝑦1 1 𝑎12
′
𝑥2 𝑥2 𝑦2 1 0 0 0 𝑎13
= ∙
𝑦2′ 0 0 0 𝑥2 𝑦2 1 𝑎21
𝑥3′ 𝑥3 𝑦3 1 0 0 0 𝑎22
𝑦3′ 0 0 0 𝑥3 𝑦3 1 𝑎23
Or in matrix form
𝒒 = 𝑀𝒂
• The 6 coefficients of the affine transformation can be found with 𝒂 = 𝑀−1 𝒒
• If we had more than 3 correspondences, we could have solved this using
least-squares
Image matching
skimage method: [Link]
[imports]
img = imread('[Link]’)
# Define arrays of point coordinates
P_points = [Link]([[18, 47], [15, 100], [178, 6]])
Q_points = [Link]([[48, 50], [48, 100], [212, 50]])
# Create empty tform object
tform = [Link]()
# Estimate tform parameters using point correspondences
[Link](Q_points, P_points)
# Apply to image
img_aligned = [Link](img, tform)
Projective transformation
• Images normally acquired by photographic cameras are formed by
perspective projection
• A rectangular surface not parallel to the image plane will be projected into a
trapezoid
• If we want to transform it into a rectangle, an affine transformation will not be
enough
• Instead, we must use a projective transformation:
𝑥′ 𝑝11 𝑝12 𝑝13 𝑥
𝑦′ = 𝑝21 𝑝22 𝑝23 ∙ 𝑦
𝑤′ 𝑝31 𝑝32 1 1
• To perform image matching with a projective transformation, at least 4
correspondences are needed
• This is due to the eight unknowns 𝑝𝑖𝑗 , which correspond to the degrees of
freedom (DOF) of this type of projective transformations
Projective transformation
Test: transform the pinball surface into a square
Original Affine Projective
Hierarchy of 2D transformations
Transformation Matrix Transformed squares Properties preserved
𝑝11 𝑝12 𝑝13
Projective 𝑝21 𝑝22 𝑝23
8 DOF Collinearity
𝑝31 𝑝32 1
𝑎11 𝑎12 𝑎13
Affine 𝑎21 𝑎22 𝑎23 + Parallelism of lines
6 DOF 0 0 1
cos 𝜃 − sin 𝜃 𝑡𝑥
Rigid + Lengths, angles, areas
sin 𝜃 cos 𝜃 𝑡𝑦
3 DOF
0 0 1
with DOF meaning “degrees of freedom”
Credits: Marc Pollefeys
Non-linear transformations
• Non-linear transformations cannot be represented as a matrix multiplication
• For them, we need to use the general function that transforms pixel locations
independently:
𝒙′ = 𝑇 𝒙
• Examples of common non-linear transformations include:
Radial lens distortion Non-rigid transformation
Overview of next week’s lecture
• Image filtering
• Linear filtering:
◦ Convolution
◦ Common filters: moving average, Gaussian, sharpening
• Non-linear filtering: median filter
• Edge detection
◦ Gradient-based
◦ Laplacian-based
◦ Non-maximum suppression