0% found this document useful (0 votes)
18 views25 pages

Image Classification Methods in Remote Sensing

The document discusses methods for image classification in remote sensing, focusing on supervised and unsupervised pixel-based classification. It describes algorithms such as the minimum distance classifier and the maximum likelihood classifier, explaining their functioning and applications. The goal is to translate the colors of the pixels into thematic classes to create land cover maps.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views25 pages

Image Classification Methods in Remote Sensing

The document discusses methods for image classification in remote sensing, focusing on supervised and unsupervised pixel-based classification. It describes algorithms such as the minimum distance classifier and the maximum likelihood classifier, explaining their functioning and applications. The goal is to translate the colors of the pixels into thematic classes to create land cover maps.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

UNIVERSITY OF TOLIARA

************
FACULTY OF SCIENCES
************
FIELD: SCIENCES AND TECHNOLOGIES
************
GEOSCIENCES
*********
MASTER
*********
LEVEL M1/ S7

TELEDECTION
METHODS OF IMAGE CLASSIFICATION

An assignment written by:


LAHINIRIKO Bruno Sitraka Alifa
HAJANIRINA FANOMEZANA Bertho

Required by:
Doctor Tsiorisoa HAREMPAHASOAVANA
As part of the module entitled:
Remote sensing

Academic Year: 2023-2024


Table of contents

1. INTRODUCTION

2. THE PIXEL SUPERVISED CLASSIFICATION................................................. 2

2.1 Minimum Distance Classifier............................................................. 7

2.2 Maximum Likelihood Classifier................................................. 9

2.3 Non-parametric classifiers (here, 'Decision Tree')...........12

3. THE UNSUPERVISED CLASSIFICATION BY PIXEL.18

4. RESUME.22
1. INTRODUCTION

If the visual interpretation of a satellite image may be sufficient for some.


uses, it also has significant shortcomings.

For example, if you want to know where all the urban areas are located.
image
a) it will take a lot of time, even for an experienced image analyst,
to examine an entire image and determine which pixels are 'urban' and
which are not, and
b) the map of urban areas that results will necessarily be somewhat
somewhat subjective, as it is based on the individual analyst's interpretation of
what 'urban' means and how it is likely to appear on
the image.

A very common solution is to use an algorithm for


classification to translate the observed color in each pixel into
a thematic class that describes its dominant land cover,
transforming the image into a land cover map. This process
is called IMAGE CLASSIFICATION.

We can distinguish two categories of approaches to the


image classification:

The traditional and simplest method consists of examining


each pixel individually and determine the thematic class
corresponding to its color. This is called pixel classification,
and that is what we will see below.

A newer and increasingly popular method involves


divide the image into homogeneous segments, then determine the thematic class
corresponding to the attributes of each segment. These attributes can be the
color of the segment as well as other elements such as shape, size, the
texture and location. This is generally referred to as image analysis.
based on the object.

1
In the pixel classification category, two approaches
different ones are available:

Supervised classification, because the image analyst


"supervise" the classification by providing additional information
in the early stages, and

Unsupervised classification, as an algorithm performs


most of the work (almost) without help, and the image analyst no longer has
to intervene at the end to finish the work.

Each of these methods has advantages and disadvantages, which


are described below.

2. PIXEL SUPERVISED CLASSIFICATION

The idea behind supervised classification is that the analyst of


the image provides the computer with certain information that allows
to calibrate a classification algorithm. This algorithm is then
applied to each pixel of the image to produce the required map.
The best way to explain how this works is to take a
example. The image of figure01 comes from California, and we want to translate
this image in a classification with the following three (main) classes:
« Urbain », « Végétation » et « Eau ».
The number of classes and the definition of each class can have an impact
important for the success of the classification.
In our example, we do not take into account the fact that a large part of
the area consists of bare soil (and therefore does not really fall into one of our
three classes).
We can also realize that the water in the image has a
color very different depending on its degree of turbidity, and that some areas
Urban ones are very light while others are a darker gray, but we
Let's ignore these questions for now.

2
Figure 1-Composite in true colors showing a part of a Landsat image of California. An area
Urban area with murky water is visible in the upper right corner, bordered by a mix of areas.
of vegetation and barren areas and a lake with clear water. A largely mountainous area
The wooded area is visible in the lower left part of the image, with some light areas near the
center left. By Anders Knudby,CC BY 4.0, based on a Landsat 5 image (USGS).

The 'supervision' in supervised classification almost always appears


in the form of a calibration dataset, which consists of a
set of points and/or polygons that we know (or believe) that they
belong to each class.

In figure 02, such a data set has been provided in the form of three
polygons :

The red polygon defines an area known to be "urban".


Similarly, the blue polygon is "water" and
The green polygon is "vegetation".

3
Figure 2-The same image as in Figure01, with three overlapping polygons. The red polygon
represents an area known by the image analyst as 'urban', the blue polygon is
"water", and the green polygon is "vegetation". By Anders Knudby,CC BY 4.0.

Note:
Note that the example in figure 02 is not a good practice example—
It is better to have more numerous and smaller polygons for each.
class, distributed across the entire image, as it allows polygons to
cover only the pixels of the desired class, and also to integrate the variations
spatial, for example, the density of vegetation, the quality of water, etc.

Let's now look at how these polygons help us to


transform the image into a map of the three classes.

Basically, the polygons tell the computer, 'look at the pixels under the polygon.'
red - this is what 'urban' pixels look like, and the computer can
So find all the other pixels in the image that also look like this, and the
label 'urban'. And so on for the other classes.

4
However, some pixels may resemble a bit like 'urban' and
a bit like 'vegetation', we need to find a mathematical way to
determine which class each pixel resembles the most. We need
of a classification algorithm.
If we take all the values of all the pixels from bands 3 and 4 of Landsat
and if we show them on a scatter plot, we get something like
Figure 03. This image has a radiometric resolution of 8 bits, so the
the values of each band theoretically range from 0 to 255, we observe that
the smallest values of the image are greater than 0. The values of band 3
are represented on the x-axis, and those of band 4 on the y-axis.

Figure 3-Scatter
plot showing all pixel values in bands 3 and 4 for the image
Figure01. Point cloud created using ENVI software. By Anders Knudby,CC BY 4.0.

Now, if we color all the points that come from the pixels
under the red polygon (that is to say the pixels that we "know" to be
"Urbains"), and that we do the same with the pixels under the polygons
blues and greens, we get something like figure 04.
There are a few important elements to note in figure 04:
All the blue points ('Water') are located in the bottom corner.
left of the figure, under the yellow circle, with low values in band 3
and low values in band 4. This is indeed typical of water, as water
absorbs very efficiently the incoming radiation in the wavelengths of
red (band 3) and near infrared (band 4), so that very few are
reflect to be detected by the sensor.
5
The green points ('vegetation') form a long area on the side.
left of the figure, with low values in band 3 and values
moderate to high in the 4 band. Again, this seems reasonable, as the
vegetation effectively absorbs incoming radiation in the red band (in
using it for photosynthesis) while reflecting incoming radiation
in the near-infrared band.
The red points (“Urbain”) form a larger area near the
center of the figure, and cover a much wider range of values than
the other two classes. If their values are similar to those of "Vegetation"
In band 4, they are generally higher than in band 3.

Figure 4-Like
Figure 03, but with each point colored by the polygon under which it is located.
Point cloud created using ENVI software. By Anders Knudby,CC BY 4.0.

What we want the supervised classification algorithm to do


now, it is taking all the other pixels of the image (that is to say all the
white points on the point cloud) and assign them to one of the three classes in
function of their color.
For example, what class do you think the white points in the circle belong to?
Should the yellow in figure 04 be affected? The water, probably. And what about...
What about those in the light brown circle? Vegetation, probably.
But what about those in the light blue circle? It's not so easy to determine.

6
Note:
The classification algorithm can use all the bands of the Landsat image,
as well as any other information we provide for the whole of
the image (like a digital elevation model), but as it is easier
to continue to represent this image in two dimensions using
only bands 3 and 4, we will continue to do it. Keep in mind
that the point cloud is actually a graph in n dimensions, where n is equal
in the number of bands (and other data layers) that we want to use
in the classification.

One way to estimate which class each pixel belongs to is to


calculate the 'distance' between the pixel and the center of all the pixels
known to belong to each class, then assign it to the class
the closest. It is the Minimum Distance Classifier.

2.1 Classifier with minimal distance

By 'distance', we mean here the distance in 'space'


characteristic", in which the dimensions are defined by each of the
variables that we consider (in our case, bands 3 and 4), as opposed to
to the physical distance.
Our characteristic space is therefore two-dimensional, and the distances can
to be calculated using a standard Euclidean distance.
For example, for the points in figure 05, we calculated the values
averages of all green, red, and blue pixels for bands 3 and 4, and that
we have indicated by large dots. Let's say they have the following values:

Table 1-Theaverage values in bands 3 and 4 for the classes "Urban", "Vegetation" and
"Water" is presented in Figure 06

7
Let's say that a pixel indicated by the yellow point in figure05 has a value
of 55 in band 3, and of 61 in band 4.
We can then calculate the Euclidean distance between this point and the value
average of each class:
Distance to the average red: (100-55)2+(105-61)2= 62.9
Distance to the green average: (40-55)2+(135-61)2= 75.5
Distance to the blue average: (35-55)2+(20-61)2= 45.6
Minimal will affect this particular point in the class 'blue'.

Figure 5-The minimum distance classifier assigns the class whose center is the closest.
close (in the feature space) to each pixel. The average value of all
red points, in bands 3 and 4, are indicated by the large red dot, as well as for
the green and blue points. The yellow point indicates a pixel that we want to assign to one
of the three classes. Scatter plot created using ENVI software. By Anders
Knudby,CC BY 4.0.

8
Although the minimum distance classifier is very simple and fast
And although it often gives good results, this example illustrates a weakness.
important:

In our example, the distribution of values for the class "Water" is very
weak—water is generally always dark and blue-green, and even murky water
Water containing a lot of algae always looks dark and blue-green.
The distribution of values for the class 'Vegetation' is much more
important, especially in band 4, because some vegetation is dense
and others are not, some vegetation is healthy and others are not
no, some vegetation can be mixed with dark soils, some
clear skies, or even urban elements like a road. The same goes for
for the class 'urban', which shows a wide distribution of values in the
bands 3 and 4.
In reality, the yellow point in figure 05 is probably not water, because water
Someone with such high values in bands 3 and 4 does not exist. It is much
more likely that it is an unusual type of vegetation, an urban area
unusual or (more likely) a mixture of these two classes.

There is a classifier that explicitly takes into account the distribution of


values in each class, to remedy this problem. It is the
Maximum likelihood classifier.

2.2 Maximum Likelihood Classifier

Until about ten years ago, the likelihood classifier


Maximum was the reference algorithm for image classification.
It is still popular, implemented in all remote sensing software
serious and generally ranks among the most performant algorithms for
a given task.
The mathematical descriptions of its functioning may seem
complicated, as they rely on Bayesian statistics applied to
several dimensions, but the principle is relatively simple:

9
Instead of calculating the distance to the center of each class (in the space of
characteristics) and thus find the closest class, we will calculate
the probability that the pixel belongs to each class, and thus find
the most likely class.

For this calculation to work, certain assumptions must be made.


We will assume that before knowing the color of the pixel, the probability
that it belongs to a class is the same as the probability that it
belongs to any other class. This seems quite reasonable (although
In our image, there is clearly much more 'vegetation' than
of "water", one could therefore say that a pixel of unknown color has more of
chances of being vegetation than water... this can be incorporated
in the classifier, but it is very rare, and we will ignore it for
the moment).
We will assume that the distribution of values in each band and
for each Gaussian class, that is to say, it follows a
normal distribution (a bell curve).

To start with a one-dimensional example, our situation could


look like this if we only had two classes (02):

Figure 6-Unidimensional example of classification by maximum likelihood with


two classes. By Anders Knudby,CC BY 4.0.

10
In figure 06, the x-axis represents the values in an image band, and
The y-axis shows the number of pixels in each class that has a value
data in this strip.

It is clear that class A generally presents low values (in the


band), and class B of high values, but the distribution of values in
each band is significant enough for there to be a certain
overlap between the two.

Both distributions being Gaussian, we can calculate both the


average and standard deviation for each class, then calculate the z-score (how much
standard deviations separate us from the mean.

In figure 06, the two classes have the same standard deviation (the 'bells' have the
even 'width'), and because the point is located a little closer to the average
for class B than for class A, his z score would be the lowest for class B
and he would be assigned to this class.

A slightly more realistic example is provided below in the figure.


07, where we have two dimensions and three classes.

Figure 7-Two-dimensional example of the classification situation by maximum of


likelihood, with six classes that have uneven standard distributions. By Anders
Knudby,CC BY 4.0.

11
The standard deviations of band 4 (x-axis) and band 3 (y-axis) are
represented as contours of equiprobability.

In this case, the challenge for the maximum likelihood classifier is to


find the class for which the point is within the boundary
of equiprobability closest to the center of the class.

For example, see that the boundaries of classes A and B overlap, and that
the standard deviations of class A are greater than those of class B.
consequently, the red point is closer (in the feature space)
from the center of class B than from the center of class A.

But it is also found on the third equi-probability contour of class B.


and on the second of class A.

The minimum distance classifier would classify this point into 'Class B'
the basis of the shortest Euclidean distance, while the classifier of
maximum likelihood would classify it in the 'class A' due to its higher
high probability of belonging to this class (according to the assumptions used).

What is most likely to be correct? Most comparisons


among these classifiers suggest that the likelihood classifier
maximum tends to produce more accurate results, but this does not guarantee
not that it is always superior.

2.3 Non-parametric classifiers (here, 'Decision Tree')

In recent years, remote sensing scientists have


are increasingly focused on the field of machine learning for
adopt new classification techniques. The idea of classification is
fundamentally very generic: you have data on something
choose (in our case, the values of the bands of a pixel) and you want
to know what it is about (in our case, what the coverage is)
terrestrial).
12
A problem can hardly be more generic, so much so that one can ...
find versions everywhere:
A bank has certain information about a client (age,
sex, address, income, loan repayment history) and wants to know
if it should be considered a "low risk", a "medium risk" or a
"high risk" for a new loan of 100,000 dollars.
A meteorologist has information about the current weather.
("rain, 5 °C") and on the atmospheric variables ("1003 mb, 10 m/s wind
You must determine whether it will rain or not in three hours.
A computer has certain information about fingerprints.
digital found at a crime scene (length, curvature, relative position)
of each line) and must determine whether they are yours or those of a third party.)

Because the task is generic, and because users outside


the field of remote sensing has large amounts of money and
can use classification algorithms to generate profits,
computer scientists have developed many techniques to solve this
generic task, and some of these techniques have been adopted in
remote sensing.
We will examine a single example here, but remember that there are
many other generic classification algorithms that can be
used in remote sensing.

The one we are going to examine is called a CLASSIFIER A


DECISION TREE.

Like other classifiers, the decision tree classifier works


according to a two-step process:
a) Calibrate the classification algorithm, and
b) Apply it to all the pixels of the image.

A decision tree classifier is calibrated by recursively dividing


the entire dataset (all pixels under the polygons of figure02) for
maximize the homogeneity of the two parts (called nodes).

13
A small illustration: let's say we have 7 data points (you do not
you should never have only seven data points when you calibrate a
classifier, this small number is used only for illustration purposes!)

Table 2-Seven data points used to develop a tree classifier.


decision.

The first task is to:

Find a value, either in band 1 or in the band


2, which can be used to split the dataset in two
nodes so that, as far as possible, all points of the
class A is found in a node and all points of class B in
the other.

From an algorithmic point of view, this is done by testing all values.


possible and by quantifying the homogeneity of the resulting classes. Thus, we
We can observe that the smallest value in band 1 is 10, and the most
the average is 45. If we divide the data set according to the following rule:
All points where band 1 is less than 11 go to node X, and all the
others at node y ", we will obtain divided points as follows:

Table 3-Distribution of points according to the threshold value 11 in band 1. By Anders


KnudbyCC BY 4.0.

14
As we can see, this leaves us with a single point " A " in a
node (X), and two points 'A' and four points 'B' in the other node (Y).
To see if we can do better, we can try using the value
12 instead of 11 (which gives us the same result), 13 (still the same),
and so on, and when we tested all the values of band 1,
we continue with all the values of band 2.

Ultimately, we will find that using value31 in band 1 gives us


the following result:

Table 4Points cracked according to the threshold value of 31 in band 1. By Anders Knudby,CC BY
4.0.

It’s almost perfect, except that we have only one ' B ' in node X. But
Well, it's pretty good for a first share.

We can represent this in the form of a 'tree' like this:

Figure 8-The 'tree-like' structure that emerges from the division of data on value
threshold 31 in band 1. Each set of data points is called a node. The
"root node" contains all data points. The "leaf nodes", also
called "end nodes", are the endpoints. By Anders Knudby,CC BY 4.0.

15
The node with data points 5, 6, and 7 (which all have bandwidth values
1 greater than 31) is now what is called a "pure node"—it is
composed of data points from a single class, we no longer have
need to divide it.

The nodes that are terminal points are also called "leaves".
node with data points 1, 2, 3, and 4 is not "pure" because it contains a
mix of class A and class B points.

We are therefore starting to test all the different possible values that
we can use as a threshold to split this node (and only this node),
in both bands.

The point of class B in this node has a value within the band.
2 which is higher than all other points, hence a separation value
The 45 works well, and we can update the tree like this:

Figure 9-The final 'tree-like' structure. All the nodes (final parts of the set)
Data) are now pure. By Anders Knudby,CC BY 4.0.

16
Once the "tree" is in place, we can now take all
the other pixels of the image and make them "fall" into the tree to see in
which leaf they land on.

For example, a pixel with values of 35 in band 1 and 25 in the


band 2 "will go to the right" in the first test and will therefore land in the sheet that
contains data points 5, 6 and 7. As all these points belong
In class B, this pixel will be classified in class B. And so on.

Note that the leaves do not necessarily have to be 'pure', some


trees stop dividing the nodes when they are below a certain size,
or using another criterion. In this case, a pixel landing in such a
the class that has the most points in this sheet will be assigned to the sheet
(and the information that it was not a pure sheet can even
to be used to indicate that the classification of this particular pixel is subject
to a certain uncertainty).

The decision tree classifier is just one of many examples


possible non-parametric classifiers. It is rarely used.
directly in the form presented above, but it forms the basis of
some of the most effective classification algorithms used today.

Other popular non-parametric classification algorithms include


neural networks and support vector machines, both of which are
implemented in many remote sensing software.

17
3. UNSUPERVISED CLASSIFICATION BY PIXEL

What to do if we do not have the necessary data for


calibrate a classification algorithm? If we do not have the polygons
illustrated in figure 02, or the data points indicated in table 2? What
Shall we then? We use an unsupervised classification instead!

Unsupervised classification involves allowing an algorithm to divide


the pixels of an image into "natural clusters", that is to say into combinations of
band values that are frequently found in the image.
Once these natural groups are identified, the image analyst can then...
labeling, generally based on a visual analysis of the location
of these groups in the image.
The grouping is largely automatic, although the analyst provides
some initial parameters.
One of the most commonly used algorithms to find groups
Natural elements in an image is the K-Means algorithm, which works as follows:

a) The analyst determines the desired number of classes.


Basically, if you want a map with a high level of
thematic detail, you can define a large number of classes. Note
also that the classes can be combined later, so it is thus
it is often wise to set the desired number of classes at one level
slightly higher than what you think would be your final result.
A number of "starting points" equal to the desired number of classes is
then placed randomly in the feature space (figure 10).

b) Clusters are then generated around the 'seed' points in


assigning all other points to the closest seed (figure 11).

c) The centroid (geographical center) of the points in each group becomes


the new 'seed' (figure 12).

18
Figure 10-Classification K-Means step 1. A certain number of "initial points" (points
colored) are randomly distributed in the feature space. The gray points
represent the pixels to be grouped here. Modified fromK Means Example Step 1by
[Link], Wikimedia Commons,CC BY-SA

Figure 11-A cluster is formed around each seed by assigning all the points to the
closest seed. Modified fromK Means Example Step 2by [Link]
Wikimedia CommonsCC BY-SA 3.0.

Figure 12-The seeds are moved towards the centroid of each cluster. The centroid is
calculated as the geographic center of each cluster, that is to say it is located at the
average x value of all points in the cluster, and to the average y value of all the
cluster points. Modified fromK Means Example Step 3by [Link],
Wikimedia Commons,CC BY-SA 3.0.

19
d) Repeat steps "b" and "c" until the stopping criterion.
The stopping criterion can be that no point moves to another cluster.
or that the centroid of each cluster moves by less than a distance
prespecified, or that a certain number of iterations have been completed.

Other unsupervised classification algorithms perform the


slightly different regrouping.
For example, a popular algorithm called ISODATA also allows for the
division of large clusters during the declustering process, and the same
way to merge small nearby clusters.
Nevertheless, the result of the clustering algorithm is that each pixel of
The entire image is part of a cluster. The hope is therefore that each cluster
represents a type of land use that can be identified by the analyst
images, for example by superimposing the location of all the pixels of the cluster
on the original image to visually identify what this cluster corresponds to.
This is the final step of unsupervised classification:

The labeling of each of the clusters that have been produced.


This is the stage where it can be useful to merge clusters, if for example
you have a cluster that corresponds to cloudy water and another that corresponds
to clear water. Unless you are specifically interested in the
water quality, differentiating the two is probably not important, and the
merging will provide a clearer mapping product. Similarly, it may be that
you simply have two clusters that both seem to correspond to
a healthy leafy forest. Even if you work for a forestry service, unless
that you cannot determine with certainty what the difference is between these
two groupings, you can merge them into one and call it 'forest'
of leaves.

For example, the image below shows the original image in the background, and
the central pixels colored according to the product of an unsupervised classification. It
it is clear that the 'blue' zone corresponds to the pixels covered by water, and that the
The green zone largely corresponds to the vegetation. A more detailed analysis
The image would be necessary to label each area, especially the areas
reds and grays, appropriately.

20
Figure 13-Example of correspondence between the original image and the clusters formed in a
unsupervised classification process. By Anders Knudby,CC BY 4.0.

One of the typical flaws of image classification systems that


function at the pixel-by-pixel level is that the images are noisy, and
that land use maps created from images inherit this noise.

Another more significant flaw is that there is information in an image.


beyond what is found in individual pixels. An image is the illustration
perfect example of the saying "the whole is greater than the sum of its parts", because the
images have a structure, and the structure is not taken into account when
look at each pixel independently of the context provided by all the pixels
neighbors.
For example, even without knowing the color of a pixel, if I know that all its
neighboring pixels are classified as 'water', I can say with great confidence.
that the pixel in question is also "water". I will be wrong on occasions, but I will
reason most of the time.

This is where a technique called object-based image analysis comes into play.
Object-Based Image Analysis (OBIA), which takes into account the context
when it generates image classifications. This advantage often allows it
to surpass more traditional methods of classification, pixel by pixel.
21
4. RESUME

Classification and image analysis operations are used for


identify and numerically classify pixels on an image. The classification
is usually done on multispectral databases, and this
process gives each pixel of an image a certain class or theme, based
on the statistical characteristics of the pixel intensity value. There are
a variety of approaches taken to create a numerical classification:

a. Supervised classification
Use homogeneous and representative samples (training areas) of
different types of surfaces (the polygons of figure 02).
The selection of these areas is based on knowledge and familiarity of
the analyst with the studied regions.
Then, a specific software is used to define classes corresponding to
to the training zones according to the digital information for each pixel.
A special program (algorithm) determines the numerical properties of
each of the classes.
Finally, these classes are applied to the entire studied area according to their
similar properties.
The training areas must be as homogeneous as possible.
Field verification must be carried out on the day of the satellite pass.

22
b. Unsupervised classification
Based solely on the digital information, no class is established at

The analyst simply specifies the desired number of classes. Then according to an algorithm
specifically, an automatic classification process groups individuals who
have similar properties to define what are called spectral classes.
The latter are then associated with useful classes corresponding to the
real objects on site.
If the classification is not satisfactory, the analyst may need to reapply.
the classification algorithm by changing the number of classes in order to
to combine or further separate spectral classes.

23

You might also like