0% found this document useful (0 votes)
36 views64 pages

Digital Image Processing Overview

The lecture covers digital images and image processing, including the formation and representation of digital images, as well as transformations and filtering techniques. Key topics include the human visual system, image resolution, color models (RGB, CMY, HSV, LAB), and common image file formats. The lecture also introduces Python packages for image handling and processing, emphasizing the importance of data types in image manipulation.

Uploaded by

rahmatzada586
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views64 pages

Digital Image Processing Overview

The lecture covers digital images and image processing, including the formation and representation of digital images, as well as transformations and filtering techniques. Key topics include the human visual system, image resolution, color models (RGB, CMY, HSV, LAB), and common image file formats. The lecture also introduces Python packages for image handling and processing, emphasizing the importance of data types in image manipulation.

Uploaded by

rahmatzada586
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Computer Vision

INM460/IN3060

Lecture 2
Digital images and image processing

Dr Giacomo Tarroni
Slides credits: Giacomo Tarroni, Sepehr Jalali
Recap from the previous lecture
Human visual system:
• How the eye works
• Different theories for colour perception
• How the signal is transmitted from the eye to the visual cortex
• Different functions of the visual cortex parts
• Simple/complex cells
• Advantages & limitations of the human visual system
• Gestalt laws: interpreting how perception is created
• Depth and motion perception
Overview of today’s lecture
• Digital images:
◦ Formation
◦ Digital representation
• Digital image processing:
◦ Brightness and colour transformations
◦ Geometric transformations
◦ Filtering (next lecture)
Images and more images
• Images are produced at an extremely high (and growing) rate
1990s Today

Credits: [Link]
Imaging devices

Digital cameras Home surveillance


Webcams Action cameras

360° cameras Camera arrays


Smartphones
What is an image?
“An image is multi-dimensional signal that measures a physical quantity”
• Multi-dimensional signal:
◦ 2D: Image (a function of 𝑥 and 𝑦)
◦ 3D:
• Image (a function of 𝑥, 𝑦, and 𝑧, e.g. CT scan)
• Video (a function of 𝑥, 𝑦 and 𝑡)
• Physical quantity:
◦ Typically visible light for standard photographs (called “natural images”)
◦ Many other possibilities (temperature, acoustic properties, etc.)

Digital photo Thermal image Ultrasound video


2D Digital image
A 2D digital image is described by a multi-dimensional function (mapping)
between spatial coordinates (𝑥 and 𝑦) and image intensity (for greyscale
images) or colour channel values (for colour images)

Greyscale image
Greyscale image x
𝐼 𝑥, 𝑦 : ℤ2 → ℤ y
with
• 𝑥, 𝑦 spatial coordinates
• 𝐼 image intensity (typical range: [0, 255])

Colour image
𝐼 𝑥, 𝑦 : ℤ2 → ℤ3
with
• 𝑥, 𝑦 spatial coordinates
• 𝐼 = 𝐼𝑅 , 𝐼𝐺 , 𝐼𝐵 colour channel values 𝐼(200, 400) = 178
Digital camera
Typical components of a modern DSLR (i.e. digital single lens reflex) and
mirrorless cameras (also see [Link]

Credits: [Link]
From light to pixels
• The light impinging on the image sensor is a spatially continuous signal
• This signal is converted to a digital signal (i.e. a matrix of pixel values)
through spatial sampling and quantisation

Quantisation

Spatial sampling
Spatial sampling
Sampling converts the domain of a continuous signal into a discrete one:
• The light impinging the image plane is a continuous signal in 𝑥 and 𝑦
• In a digital camera, this signal is sampled on a regular 2D grid
• One pixel per grid cell
Spatial sampling
Sampling converts the domain of a continuous signal into a discrete one:
• The light impinging the image plane is a continuous signal in 𝑥 and 𝑦
• In a digital camera, this signal is sampled on a regular 2D grid
• One pixel per grid cell

Let’s examine the actual process focusing on one dimension:

Spatial sampling
Output signal intensity
(continuous values)

Sensor 𝑥 axis
Pixel width
Image resolution
• Image resolution measures the detail an image holds
• It can be described in different ways:
◦ Pixel resolution: image dimensions in pixels (e.g. 640x480, 0.3 MP)
◦ Spatial resolution: size of each pixel (for each dimension) in length (e.g.
1.4x1.4 mm). Used in specific fields (e.g. medical imaging)
◦ Pixel density (for sensor/screens), pixels per inch (to be used when
printing the image), etc.
640
1.4 x 1.4 mm

480

Credits: [Link]
Colour filter array and demosaicing
• Light captured by digital cameras is binned into separate red, green, and blue
values using a colour filter array (CFA)
• Bayer pattern is most common CFA design, based on GRBG: twice as many
G bins than R or B because of higher sensitivity of human eye to green light
• A demosaicing algorithm interpolates the data so that each pixel has a red,
green, and blue (RGB) value
◦ Final pixel resolution preserved!
◦ Raw file is what is collected before demosaicing
• More detailed explanation: [Link]

Mosaic
Demosaiced image
Quantization
Quantisation limits the values a pixel can have to a finite set

Let’s examine the actual process focusing on one dimension:

Quantisation
Output pixel intensity
(discrete values)

Sensor 𝑥 axis
Quantization
Quantisation limits the values a pixel can have to a finite set
• In a greyscale image, if we use one byte (8 bits) per pixel, then we can
represent 28 = 256 different intensities (range [0, 255])

8 bits per pixel 4 bits per pixel 1 bit per pixel


28 = 256 shades 24 = 16 shades 21 = 2 shades
binary image

Credits: [Link]
Quantization
• In standard colour photographic images, we use 8 bits for each colour
channel (red, green, blue), or 24 bits per pixel (i.e. 24-bit colour)
• That means there are 224 possible colours for a pixel: ~16.7 million colours!
• Human eye can distinguish roughly ~10 million colours
Colour
• Colour images typically have three colour channels corresponding to the
amount of red, green, and blue present at each pixel
• Combining them together produces the final colour

Credits: [Link]
Colour theory: human retina
• Trichromatic theory of vision:
◦ Human eyes perceive colour through stimulation of three different types
cones (S, M, L) in the retina
◦ Three types have peak sensitivity roughly around to red, green and blue
• RGB colour model was defined based on the physiology of human eye
• Spectral responses of each colour filter in CFA are quite similar to that of
LMS cones

S M L

Spectral sensitivity
RGB colour model
• First introduced in 1800s, now used in displays
• Additive mixing property of light: red, green, blue lights are summed to form
the final colour
• RGB colour model:
◦ One axis per colour channel
◦ Range from 0 (no light) to 255 (full intensity) along each axis
◦ A colour is a point in this space and is represented as a vector: 𝐼𝑅 , 𝐼𝐺 , 𝐼𝐵

Credits: [Link]
RGB colour model
• Additive colour mixing: RGB describes what kind of light needs to
be emitted to produce a given colour
• Light is added together to move from black to white

Common colour
Colour components
name
R G B

0 0 0

0 0 255

0 255 0

0 255 255

255 0 0

255 0 255

255 255 0

255 255 255

255 127 0
Credits: [Link]
RGB colour model
• Additive colour mixing: RGB describes what kind of light needs to
be emitted to produce a given colour
• Light is added together to move from black to white

Common colour
Colour components
name
R G B

0 0 0 Black

0 0 255 Blue

0 255 0 Green

0 255 255 Cyan

255 0 0 Red

255 0 255 Magenta

255 255 0 Yellow

255 255 255 White

255 127 0 Orange


Credits: [Link]
CMY(K) colour model
• Developed for printing
• Subtractive mixing property of ink: cyan, magenta, yellow (and black) inks
are summed to form the final colour. Each ink subtracts a portion of the light
that would otherwise be reflected from a white background
• RGB colours are subtracted from white light (W):
W-R = G+B = C; W-G = M; W-B=Y
• Black (K) helps to create additional colours (otherwise dark green instead of
black)

Credits: [Link]
[Link]
HSV colour model
• More intuitive and more perceptually relevant than RGB
• Cylindrical coordinate system with
◦ Hue (H): discernible colour based on the dominant wavelength
◦ Saturation (S): vividness of the colour (zero saturation means a greyscale
colour in the centre of the cylinder)
◦ Value (V): brightness
• Example: a bright red colour has a red hue, full saturation, and high value

“How to halve the saturation in RGB space?”

Credits: [Link]
LAB colour model
• RGB and HSV are not perceptually uniform:
the distance between colours in the colour space does not match human
visual perception of colour difference
• CIE LAB attempts to be perceptually uniform
• It is based on opponent process theory: observation that for humans,
retinal colour stimuli are translated into distinctions between
◦ blue versus yellow
◦ red versus green
◦ black (dark) versus white (light)
• Larger gamut than both RGB and CMYK
Other common image types
• RGBD
In addition to a colour image (RGB), acquires
a depth image (D), which represents the
distance from each pixel from the camera
𝐼 𝑥, 𝑦 : ℤ2 → ℤ4

• Volumetric images
• Image data stored as voxels (i.e. volume
elements: small cubes or 3D pixels)
• Example: computed tomography (CT)
𝐼 𝑥, 𝑦, 𝑧 : ℤ3 → ℤ

• Video
A sequence of images over time
𝐼 𝑥, 𝑦, 𝑡 : ℤ3 → ℤ3
Image file formats
• BitMap (BMP): uncompressed, therefore large and lossless. If 24-bit colour
format, then:
◦ a 12MP image requires 36MB of storage
◦ a 30 fps video generates 30 x 36MB = 1.008GB per second
◦ a 2 hour video (image only) requires 7257.6GB (7.26TB) of storage
• JPEG: Joint Photographic Experts Group is a lossy compression method
• TIFF: Tagged Image File Format is a flexible format (both loss and lossless)
• GIF: Graphics Interchange Format is usually limited to an 8-bit palette (256
colours). The GIF format is most suitable for storing graphics with few
colours, such as simple diagrams, shapes, logos and cartoon style images
• PNG: the PNG (Portable Network Graphics) file format was created as a free,
open-source alternative to GIF
Images in Python
• Some of the most common Python packages for handling images are scikit-
image, OpenCV (with its bindings in Python), Pillow, SciPy (with its ndimage
submodule)
• For most common tasks, we will use scikit-image (skimage), which is simple
to use and has a large collection of image processing algorithms
• For plotting images, we will instead use matplotlib

from [Link] import imread


import [Link] as plt

img = imread(‘[Link]’)
[Link](img)
[Link]()

Credits: "Surinamese peppers" by Daveness_98 licensed under CC BY 2.0


Images in Python
What is the img variable? It’s a numpy ndarray object with:
• Shape:
print([Link])
>> (777, 1024, 3)

This means the image has a height of 777 (number of rows of the ndarray), width
of 1024 (number of columns), and 3 colour channels 𝐼𝑅 , 𝐼𝐺 , 𝐼𝐵

• Data type:
print([Link])
>> uint8

This means that each value in each colour


channel is an unsigned integer with 8 bits of
precision. Therefore there are 2 8 = 256
values possible, in the range [0, 255], for
each channel

Credits: "Surinamese peppers" by Daveness_98 licensed under CC BY 2.0


Image data types in skimage
• Images are typically uint8 (range [0, 255]) when loaded
• Performing operations on uint8 images can generate unwanted results:
# Trying to increase red channel
# brightness

img_r = img[:, :, 0]
img_r_mod = img_r+100

fig, ax = [Link](1, 2)
ax[0].imshow(img_r, cmap=‘gray’)
ax[1].imshow(img_r_mod, cmap=‘gray’)
[Link]() Pixel values greater than 255 will be
replaced by the result of %255

• In addition, some image processing operations (e.g. filtering) provide


accurate results only using floating point data dypes
• Thus it is often required to convert images to float (range [0, 1] or [-1, 1])
• skimage has functions for data types conversions: e.g. img_as_float and
img_as_ubyte. These functions expect the input to be in a correct range
• Never use numpy’s astype!
Image data types in skimage
Additional notes:
• Float data type is not restricted to the [-1, 1] range, but you should make sure
your images are to avoid issues in future conversions
• Some skimage functions may return different data types for convenience.
You can use the conversion functions on the output if needed
• Some functions (e.g. filtering) can generate negative values, and that is
normal. However if you want to reconvert to an unsigned type you should first
rescale to the [0, 1] range, otherwise negative values will be clipped to 0
• Rescaling of intensities can be performed using the
exposure.rescale_intensity function
• Have a look at [Link]
[Link]/docs/stable/user_guide/data_types.html for more details on these
aspects
Image plotting with matplotlib
The main plotting function that we will use is [Link]:
• For one channel (width, height, 1) inputs, the data is rescaled to the [0, 1]
range and then mapped to a chosen colormap
• For RGB (height, width, channels) inputs, the data has to be either:
◦ Float type in the [0, 1] range
◦ Uint8 ([0, 255] range)
• Out of range RGB values are clipped:
# Note: syntax not recommended!

img_r = img_as_float(img[:, :, 0])


img_r_mod = img_r+100

fig, ax = [Link](1, 2)
ax[0].imshow(img_r, cmap=‘gray’)
ax[1].imshow(img_r_mod, cmap=‘gray’)
[Link]()
One-channel images are rescaled before plotting

• [Link]
• [Link]
Colour space conversions
The color submodule provides many functions for colour space conversions
• rgb2gray converts from RGB to grayscale, reducing the channels from 3 to 1
and converting the data type from uint8 to float ([0, 1] range)

from [Link] import imread


from [Link] import rgb2gray
import [Link] as plt

img = imread(‘[Link]’)
img_gray = rgb2gray(img)

fig, ax = [Link](1, 2)
ax[0].imshow(img)
ax[1].imshow(img_gray, cmap=‘gray’)
[Link]() Conversion to grayscale

print(img_gray.dtype)
>> float64

There are functions for the most common conversions:


• rgb2hsv and hsv2rgb
• rgb2lab and lab2rgb
Imperfections in images

Low
resolution

Noise Bloom
(i.e. light bleeding on darker background)
Imperfections in images

Motion blur Poor contrast

Compression artefacts Lens distortion


Digital image processing
• Digital image processing is the use of computer algorithms to transform an
image, sometimes trying to fix imperfections
• Example transformations:
◦ Brightness and colour transformations
◦ Geometric transformations
◦ Filtering (next lecture)
Brightness transformations
• Brightness transformations are position-independent and take the form

𝐽=𝑓 𝐼

where
• 𝐼(𝑥, 𝑦) is the intensity (or colour channel value) of original image
• 𝐽(𝑥, 𝑦) is the intensity (or colour channel value) of transformed image
• f describes the transformation between them (e.g.: 𝐽 = 𝐼 + 100)

• The function 𝑓 is independent of position in the image (therefore the (𝑥, 𝑦)


arguments are dropped in the equation above): the result of 𝑓 depends on
the intensity of each pixel and not on their position

• Additional notes:
◦ Care must be taken to ensure data types and their ranges are respected
◦ Most transformations will be introduced for grayscale images. For colour
images, either apply the transformation channel-by-channel or convert to
HSV and apply to V channel
Negative
The negative of a 24-bit colour image can be formed simply as
𝐽 = 255 − 𝐼

[imports]

img = imread(‘[Link]’)
img_neg = 255 – img

fig, ax = [Link](1, 2)
ax[0].imshow(img)
ax[1].imshow(img_neg)
[Link]()
Original image Negative
Tinting
Tinting (and colour balancing) applies an adjustment to the colours, normally
through a multiplication
𝐼𝑅 ′ 𝑠𝑅 𝐼𝑅
𝐼𝐺 ′ = 𝑠𝐺 𝐼𝐺
𝐼𝐵 ′ 𝑠𝐵 𝐼𝐵

where 𝑠𝑅 , 𝑠𝐺 , and 𝑠𝐵 scale the red, green, and blue colour of each pixel
Tinting
Example: increasing redness of an image
𝑠𝑅 = 2, 𝑠𝐺 = 1, and 𝑠𝐵 = 1:

Original image
[imports]
s_r, s_g, s_b = 2, 1, 1

# Load and convert to float


img = img_as_float(imread(‘[Link]’))
img_tint = np.empty_like(img)

img_r = img[:, :, 0]
img_g = img[:, :, 1]
img_b = img[:, :, 2]

Transformed image
img_tint[:, :, 0] = s_r * img_r
img_tint[:, :, 1] = s_g * img_g
img_tint[:, :, 2] = s_b * img_b
img_tint[img_tint > 1] = 1 # Clip values > 1

fig, ax = [Link](2, 1)
ax[0].imshow(img)
ax[1].imshow(img_tint)
[Link]()
Histogram
• A histogram gives the count of pixel intensities in an image
• The full available range is divided into bins for easier interpretation
• Can be computed both for each colour channel or grayscale version
• It can be computed with matplotlib’s hist function (remember to flatten
your input!) or with numpy’s histogram function

• Note: the dynamic range of an image is easily defined using the histogram as
the difference between the min and max intensity values in an image
Image Histogram

min 𝐼 = 103

Dynamic
range

max 𝐼 = 210
Histogram
• A histogram can be normalised using the total number of pixels
• The result is a probability density function (pdf) representing the probability
that a random pixel has a particular intensity
• [Link] with density = True

• In the example in this slide, we could say that a pixel has a higher probability
of having an intensity of 160 than an intensity of 200
Contrast stretching
• If most of the intensities are clustered in the [100, 200] range, the image looks
“washed out” (lack of darker and brighter values)
• Contrast stretching is a linear transformation that aims at extending the
dynamic range of the image Original

• We can transform the values


using a function
𝐽 = 𝛼𝐼 + 𝛽
with 𝛼 and 𝛽 user-defined
parameters
After contrast stretching
• skimage function:
exposure.rescale_intensity
(where instead of 𝛼 and 𝛽, the
user defines the intensity values
to be linearly stretched)
Gamma correction
• Gamma correction applies a non-linear transformation of pixel values
• It tries to take advantage of the non-linear manner in which humans perceive
light (greater sensitivity to differences between darker tones than lighter ones)
• Non-linear: relative Original
distances between pixel
values can both increase
and decrease for different
ranges
• The mathematical form is
𝐽 = 𝐴𝐼 𝛾

After gamma correction


where 𝛾 is a parameter and Credits: [Link]
usually 𝐴 = 255(1−𝛾) for an
image with range [0, 255]
• skimage function:
exposure.adjust_gamma
Cumulative distribution function
• The cumulative distribution function (cdf) gives, for a given intensity value,
the percentage of pixels in the image that have that value or a lower one
• It can be easily computed from the normalised histogram (pdf) by summing to
the value of each bin those of the previous bins
• [Link] with density = True and cumulative = True

cdf
Histogram equalization
• Histogram equalisation applies a non-linear transformation that tries to
uniformly distribute the pixel values in the dynamic range
• Pixels with different intensities can be assigned to the same bin as a
consequence of this transformation Original
• Histogram equalisation leverages
the information in the cdf. The
output’s cdf will be roughly a
straight line
• skimage functions:
◦ exposure.equalize_hist
◦ exposure.equalize_adapt
hist After histogram equalization

• [Link]
ns/26818568/whats-the-
difference-between-histeq-and-
adapthisteq
Geometric transformations
• Geometric transformations change the spatial position of pixels in the
image
• Geometric transformations (generically called image warpings) have a variety
of practical uses, including
◦ Registration: aligning different (but similar) images
◦ Removing distortion
◦ Simplifying further processing

Corrected image
Distorted image
Geometric transformations
• The positions of pixels in the image are transformed
• Mathematically, this is expressed as

𝒙′ = 𝑇 𝒙

where:
◦ 𝒙 = (𝑥, 𝑦) is the position of a point in the distorted image 𝐼
◦ 𝒙′ = (𝑥′, 𝑦′) is the position of a point in the corrected image 𝐽
◦ 𝑇 𝒙 is a mapping function

• The easiest way to implement an image warping from 𝐼 to 𝐽 is as follows:


For every pixel position 𝒙′ in the corrected image:
◦ Using 𝑇 −1 , determine 𝒙 (i.e. where 𝒙′ came from in the distorted image)
◦ Interpolate a value from 𝐼(𝒙) to produce 𝐽(𝒙′)
Geometric transformations
• You may notice this is applied somewhat backwards: rather than using a
(forward) mapping 𝑇 to transform pixels from the distorted image to the
corrected image, we use the (inverse) transform 𝑇 −1

𝑇 Forward mapping
may result in gaps

𝑇 −1 Inverse mapping
ensures no gaps

Interpolation

• This ensures that all the pixels in the corrected image will be filled
• However, it’s often necessary to interpolate pixels from the distorted image
Affine transformation
• Affine transformations are transformations that preserve (among others):
◦ Collinearity (i.e. aligned points will remain aligned)
◦ Parallelism (i.e. parallel lines will remain parallel)
• It can be fully expressed through matrix product:

𝒙′ = 𝐴𝒙

𝑥′ 𝑎11 𝑎12 𝑎13 𝑥 𝑎11 𝑥 + 𝑎12 𝑦 + 𝑎13


𝑦′ = 𝑎21 𝑎22 𝑎23 ∙ 𝑦 = 𝑎21 𝑥 + 𝑎22 𝑦 + 𝑎23
1 0 0 1 1 1

• In skimage, you can define an affine transformation using


[Link], either:
◦ Passing the matrix parameters
A = [Link]([[a11, a12, a13], [a21, a22, a23], [0, 0, 1]])
tform = [Link](A)
◦ Passing the explicit transformation parameters (see next slides)
• To apply the transformation to an image, you can use [Link]
Special cases: translation
• Translation
𝑥′ 1 0 𝑡𝑥 𝑥 𝑥 + 𝑡𝑥
𝑦′ = 0 1 𝑡𝑦 ∙ 𝑦 = 𝑦 + 𝑡𝑦
1 0 0 1 1 1
tform = [Link](translation=(tx, ty))

from [Link] import imread


from skimage import transform
import [Link] as plt

img = imread(‘[Link]’)

# Define affine transformation


tform = [Link](translation=(100, 0))

# Apply to image (note: inverse transformation!)


img_transformed = [Link](img, [Link])

fig, ax = [Link](2, 1)
ax[0].imshow(img)
ax[1].imshow(img_transformed)
[Link]()
Special cases: rotation
• Rotation
𝑥′ cos 𝜃 − sin 𝜃 0 𝑥 𝑥 cos 𝜃 − 𝑦 sin 𝜃
𝑦′ = sin 𝜃 cos 𝜃 0 ∙ 𝑦 = 𝑥 sin 𝜃 + 𝑦 cos 𝜃
1 0 0 1 1 1
tform = [Link](rotation=theta)

from [Link] import imread


from skimage import transform
import [Link] as plt
from math import radians

img = imread(‘[Link]’)

# Define affine transformation


tform = [Link](rotation=radians(30))

# Apply to image (note: inverse transformation!)


img_transformed = [Link](img, [Link])

fig, ax = [Link](2, 1)
ax[0].imshow(img)
ax[1].imshow(img_transformed)
[Link]()
Special cases: scaling
• Scaling
𝑥′ 𝑠𝑥 0 0 𝑥 𝑠𝑥 𝑥
𝑦′ = 0 𝑠𝑦 0 ∙ 𝑦 = 𝑠𝑦 𝑦
1 0 0 1 1 1
tform = [Link](scale=(sx, sy))

from [Link] import imread


from skimage import transform
import [Link] as plt

img = imread(‘[Link]’)

# Define affine transformation


tform = [Link](scale=(0.8, 0.5))

# Apply to image (note: inverse transformation!)


img_transformed = [Link](img, [Link])

fig, ax = [Link](2, 1)
ax[0].imshow(img)
ax[1].imshow(img_transformed)
[Link]()
Special cases: shear
• Shear (or skew)
𝑥′ 1 −sin 𝜃 0 𝑥 𝑥 − 𝑦 sin 𝜃
𝑦′ = 0 cos 𝜃 0 ∙ 𝑦 = 𝑦 cos 𝜃
1 0 0 1 1 1
tform = [Link](shear=theta)

from [Link] import imread


from skimage import transform
import [Link] as plt
from math import radians

img = imread(‘[Link]’)

# Define affine transformation


tform = [Link](shear=radians(30))

# Apply to image (note: inverse transformation!)


img_transformed = [Link](img, [Link])

fig, ax = [Link](2, 1)
ax[0].imshow(img)
ax[1].imshow(img_transformed)
[Link]()
Additional notes
Regarding transformations:
• You can always access the affine matrix of an AffineTransform object by
using [Link]
• Different tform objects can be combined sequentially: see [Link]
[Link]/docs/stable/auto_examples/transform/plot_transform_types.html
• For rotation around the image centre and simple rescaling and resizing
operations, there are dedicated functions in the transform submodule called
rotate, rescale, resize. More details at [Link]
[Link]/docs/stable/auto_examples/transform/plot_rescale.html

Regarding [Link]:
• Interpolation is usually needed, the type of which can be specified
• How to deal with points outside the boundaries can also be specified
• In line with what said in the previous slides, in transformations the source
image is considered the output, and the destination the input. As a
consequence, tform must be inverted before warping
Exercise
Test: manually transform image in order to have well-aligned digits
from [Link] import imread
from skimage import transform
import [Link] as plt
from math import radians

img = imread(‘[Link]’)

# Rotate around centre


img_tran = [Link](img, -15)

# Define affine transformation (translation + shear)


tform = [Link](translation=(-30, 0),
shear=radians(-17))

# Apply to image
img_tran2 = [Link](img_tran, [Link])

fig, ax = [Link](2, 1)
ax[0].imshow(img)
ax[1].imshow(img_tran)
ax[2].imshow(img_tran2)
[Link]()
Common CV tasks
Image matching/alignment/registration
• Identify the transformation which aligns two or more images
• Used for multi-modal comparison, image stitching, etc.

Matching

M. Brown. Automatic panoramic image stitching using invariant features, IJCV 2007
Image matching
Test: find the affine transformation that aligns two images
• The goal is to determine the six coefficients in the 𝐴 matrix
• This can be achieved by finding at least 3 correspondences (e.g. plate
corners) in the two images

𝒑3 𝒒1
𝒒3
𝒑1
𝒒2
𝒑2

Let’s assume that we have manually selected the following correspondences:

𝒑1 = 18, 47 𝑇 𝒒1 = 48, 50 𝑇
𝒑2 = 15, 100 𝑇 𝒒2 = 48, 100 𝑇
𝒑3 = 178, 6 𝑇 𝒒3 = 212, 50 𝑇
Image matching
For each couple of corresponding points 𝒑-𝒒 we must have
𝑥′ 𝑎11 𝑎12 𝑎13 𝑥 𝑎11 𝑥 + 𝑎12 𝑦 + 𝑎13
𝑦′ = 𝑎21 𝑎22 𝑎23 ∙ 𝑦 = 𝑎21 𝑥 + 𝑎22 𝑦 + 𝑎23
1 0 0 1 1 1

Then for three correspondences we can write


𝑥1′ 𝑥1 𝑦1 1 0 0 0 𝑎11

𝑦1 0 0 0 𝑥1 𝑦1 1 𝑎12

𝑥2 𝑥2 𝑦2 1 0 0 0 𝑎13
= ∙
𝑦2′ 0 0 0 𝑥2 𝑦2 1 𝑎21
𝑥3′ 𝑥3 𝑦3 1 0 0 0 𝑎22
𝑦3′ 0 0 0 𝑥3 𝑦3 1 𝑎23

Or in matrix form
𝒒 = 𝑀𝒂
• The 6 coefficients of the affine transformation can be found with 𝒂 = 𝑀−1 𝒒
• If we had more than 3 correspondences, we could have solved this using
least-squares
Image matching
skimage method: [Link]

[imports]

img = imread('[Link]’)

# Define arrays of point coordinates


P_points = [Link]([[18, 47], [15, 100], [178, 6]])
Q_points = [Link]([[48, 50], [48, 100], [212, 50]])

# Create empty tform object


tform = [Link]()

# Estimate tform parameters using point correspondences


[Link](Q_points, P_points)

# Apply to image
img_aligned = [Link](img, tform)
Projective transformation
• Images normally acquired by photographic cameras are formed by
perspective projection
• A rectangular surface not parallel to the image plane will be projected into a
trapezoid
• If we want to transform it into a rectangle, an affine transformation will not be
enough
• Instead, we must use a projective transformation:

𝑥′ 𝑝11 𝑝12 𝑝13 𝑥


𝑦′ = 𝑝21 𝑝22 𝑝23 ∙ 𝑦
𝑤′ 𝑝31 𝑝32 1 1

• To perform image matching with a projective transformation, at least 4


correspondences are needed
• This is due to the eight unknowns 𝑝𝑖𝑗 , which correspond to the degrees of
freedom (DOF) of this type of projective transformations
Projective transformation
Test: transform the pinball surface into a square

Original Affine Projective


Hierarchy of 2D transformations
Transformation Matrix Transformed squares Properties preserved
𝑝11 𝑝12 𝑝13
Projective 𝑝21 𝑝22 𝑝23
8 DOF Collinearity
𝑝31 𝑝32 1

𝑎11 𝑎12 𝑎13


Affine 𝑎21 𝑎22 𝑎23 + Parallelism of lines
6 DOF 0 0 1

cos 𝜃 − sin 𝜃 𝑡𝑥
Rigid + Lengths, angles, areas
sin 𝜃 cos 𝜃 𝑡𝑦
3 DOF
0 0 1

with DOF meaning “degrees of freedom”

Credits: Marc Pollefeys


Non-linear transformations
• Non-linear transformations cannot be represented as a matrix multiplication
• For them, we need to use the general function that transforms pixel locations
independently:

𝒙′ = 𝑇 𝒙

• Examples of common non-linear transformations include:

Radial lens distortion Non-rigid transformation


Overview of next week’s lecture
• Image filtering
• Linear filtering:
◦ Convolution
◦ Common filters: moving average, Gaussian, sharpening
• Non-linear filtering: median filter
• Edge detection
◦ Gradient-based
◦ Laplacian-based
◦ Non-maximum suppression

You might also like