0% found this document useful (0 votes)
10 views86 pages

Image Classification

The document outlines image classification procedures aimed at categorizing pixels in images into land cover classes using spectral, spatial, and temporal pattern recognition. It discusses the differences between information classes defined by analysts and spectral classes inherent in remote sensor data, as well as various classification approaches including supervised and unsupervised methods. The document also details decision rules for classification, including parametric and non-parametric techniques, and highlights the importance of feature space in distinguishing between different land cover types.

Uploaded by

Min Thade Dee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views86 pages

Image Classification

The document outlines image classification procedures aimed at categorizing pixels in images into land cover classes using spectral, spatial, and temporal pattern recognition. It discusses the differences between information classes defined by analysts and spectral classes inherent in remote sensor data, as well as various classification approaches including supervised and unsupervised methods. The document also details decision rules for classification, including parametric and non-parametric techniques, and highlights the importance of feature space in distinguishing between different land cover types.

Uploaded by

Min Thade Dee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Image Classification

Prepared by
Dr. Sao Hone Pha
RS & GIS Lab
Yangon Technological University
Objective

The overall objective of image classification procedures is to


automatically categorize all pixels in an image into land
cover classes or themes.

Urban
Forest

Lake
Agriculture

2
Image Classification
 The spectral pattern present within the data for each pixel
is used as the numerical basis for categorization.
 Different feature types manifest different combinations of
DNs based on their inherent spectral reflectance and
emittance properties.

Urban
Forest

Lake
Agriculture

3
4
Spectral Pattern Recognition

It refers to the family of classification procedures that utilizes


this pixel-by-pixel spectral information as the basis for
automated land cover classification.

5
Spatial Pattern Recognition

It involves the categorization of image pixels on the basis of


their spatial relationship with pixel surrounding them.

 Spatial classifiers might consider such aspects as image


texture, pixel proximity, feature size, shape, directionality,
repetition, and context.

 Much more complex and computationally intensive than


spectral pattern recognition procedures.
6
Temporal Pattern Recognition

It uses time as an aid in feature identification.

Example
Distinct spectral and spatial changes during growing season
can permit discrimination on multidate imagery that would be
impossible given any single date.

7
Temporal Pattern Recognition

It uses time as an aid in feature identification.

Before Tsunami After Tsunami


8
Information Class
Vs
Spectral Class
Information Class Spectral Class
(Forest) (Forest)

10
Information Class and Spectral Class

 It is also important for the analyst to realize that there is a


fundamental difference between information classes and
spectral classes.

 * Information classes are those that human beings define.

 * Spectral classes are those that are inherent in the remote


sensor data and must be identified and then labeled by the
analyst.

11
Information Class

Information class is a class specified by image analyst.

It refers to the information to be extracted. These classes are


those categories of interest that the analyst actually tires to
identify in the imagery, such as different kinds of crops,
different forest types or tree species, different geologic units
or rock types, etc.

12
Spectral Class

Spectral Class is a class which includes similar grey-level


vectors in the multi-spectral space.

Spectral classes are groups of pixels that are uniform (or


near-similar) with respect to their brightness values in the
different spectral channels of the data.

The objective is to match the spectral classes in the data to


the information classes of interest.

13
Approach to Classification

 We need some form of automated (rule-based) classification

algorithm to allow us to distinguish one surface type from


another

 Supervised Classification

 Unsupervised Classification

14
Supervised Classification
Selected multispectral scanner measurements made along one scan line
sensor covers Spectral bands 1(blue),2(Green), 3(Red), 4(NIR), 5(thermal IR)
16
Supervised Classification

In supervised classification, image analyst “supervises” the


pixel categorization process by specifying, to the computer
algorithm, numerical descriptors of the various land cover
types present in a scene.

17
Basic Steps in Supervised Classification

Training Stage
The analyst identifies representative training areas and develops
a numerical description of the spectral attributes of each land
cover type of interest in the scene.

Classification Stage
Each pixel in the image data set is categorized into land cover
class it most closely resembles. If the pixel is insufficiently
similar to any training data set, it is usually labeled “unknown”.
The category label assigned to each pixel in this process is then
recorded in the corresponding cell of an interpreted data set
18 (an “output image”).
Basic Steps in Supervised Classification

Output Stage
After the entire data set has been categorized, the results are
presented in the output stage.

Three typical form of output product:


Thematic maps
Tables of full area Statistics
Digital data files

19
Supervised Classification

Steps in supervised classification


• Identification of sample areas
(training areas)
• Partitioning of the feature
space A class sample
• is a number of training
pixels
•forms a cluster in feature
space

A cluster
• is the representative for a class
• includes a minimum number of
observations (30*n)
• is distinct

20
Basic Steps in Supervised Classification

21
Decision Rules for Supervised Classification

 Parametric Decision Rule


1. Minimum Distance to Mean
2. Maximum Likelihood
3. Linear discriminant

 Non-parametric Decision Rule


1. Parallelepiped
2. Feature Space

22
Parametric Decision Rule

Based on statistics (mean, variance/covariance)

A parametric decision rule is trained by parametric


signatures. These signatures are defined by the mean vector
and covariance matrix for the data file values of the pixels in
the signatures.

When a parametric decision rule is used, every pixel is


assigned to a class, since the parametric decision space is
continuous.

23
Non-parametric Decision Rule
A non-parametric decision rule is not based on statistics;
therefore, it is independent of the properties of the data.

If a pixel is located within the boundary of a non-parametric


signature, then the decision rule assigns the pixel to the signature’s
class. Basically, a non-parametric decision rule determines
whether or not the pixel is located inside of non-parametric
signature boundary or inside the specified spectral limit.

Non-parametric techniques are sometimes termed “robust”,


because they can be applied with a wide variety of classification, if
class signatures are distinct to begin with ( Schowengredt 1997;
Joseph 2005)
24
Classification Stage

S
S S
S
U U U S S
C
S S C
U
U U U
U
U CC C
C Known Cover Type
U U U U CC
CC Selected from
U U U CC training Sites
Band 3 Digital Number

H H
H H H W  Water
H
H H H F  Forest
HHH H H
W H  Hay
H H
W WW H H H F S  Sand
H F F F
WWWW F F F U  Urban
W W FF FF F C  Corn
W FF
WW F F FF

Band 4 Digital Number

25 Pixel Observations from selected training sites plotted on scatter diagram


Feature Space
 Each feature vector is a point in the so-called feature space.
 Similar objects yields similar measurement results (feature
vectors). That is Nearby points in feature space correspond
to similar objects.
 Distance in feature space is related to dissimilarity.
 Points that belong to the same class form a cloud in feature
space.

26
Feature Space

 N = the number of bands = dimensions


…. an (n) dimensional data (feature) space
 Features can be
 Raw bands Feature Space - 2dimensions
 Derived Images
Measurement Mean
Vector or Feature Vector Vector
Band B 190
 v1  µ  85
   1
v2  µ 2 Band A
 
 µ 3
v 
 3
   
   
vn   µ n 
27
A. 1-dimensional (Image histogram)
Water
Land
C. 3-dimensional (Feature space)
255
Vegetation
Soil
Urban
0 255
Band x
B. 2-dimensional (Scatter plot)
0
255 Band x 255

255

28 0
0 Band x 255
Dimensionality of Data

 Spectral Dimensionality is determined by the number of

sets of values being used in a process.

 In image processing, each band of data is a set of values.

An image with four bands of data is said to be four-


dimensional (Jensen, 1996).

29
Measurement Vector

 The measurement vector of a pixel is the set of data file values for
one pixel in all n bands.
 Although image data files are stored band-by-band, it is often
necessary to extract the measurement vectors for individual pixels.

30
Measurement Vector

i  Particular band
Vi  the data file value of the pixel in band i, then the measurement
vector for this pixel is
 
V 
 1 
 
V 
 
 2
 
V 
31  3 
Mean Vector

 When the measurement vectors of several pixels are


analyzed, a mean vector is often calculated.

 This is the vector of the means of the data file values


in each band. It has n elements.

Mean Vector µI =
32
Image Space

Single-band Image Multi-band Image

 Image space (col,row)


 Array of elements corresponding to reflected or emitted energy
from IFOV
 Spatial arrangement of the measurements of the reflected or
33 emitted energy
Feature Space

 A feature space image is simply a graph of the data file values


of one band of data against the values of another band.

ANALYZING PATTERNS IN MULTISPECTRAL DATA

PIXEL A: 34,25,117
PIXEL B: 34,24,119
PIXEL C: 11,77,51

34
One Dimensional Feature Space

Input layer

No distinction between classes

Distinction between classes

35
Multi-dimensional Feature Space

Feature vectors

36
Feature Space (Scattergram)
Low frequency

High
frequency

Two/three Dimensional Graph (or) Scattered Diagram

 Formation of clusters of points representing DN values in


two/three spectral bands
 Each cluster of point corresponds to a certain cover type on
37 ground
Distances and Clusters in Feature Space

Max y
.. .
Band y . ... .
..
(units of 5 DN)

. Min y

(0,0) Band x (units of 5 DN) (0,0) Min x Max x


Euclidian distance Cluster

38
Spectral Distance
Euclidean Spectral distance is distance in n- dimensional
spectral space. It is a number that allows two measurement
vectors to be compared for similarity. The spectral distance
between two pixels can be calculated as follows:

n  2
=D ∑  di − ei 

i=1
Where:
D = spectral distance
n = number of bands (dimensions)
i = a particular band
di = data file value of pixel d in band i
ei = data file value of pixel e in band i
This is the equation for Euclidean distance—in two dimensions (when n = 2), it can be
simplified to the Pythagorean Theorem (c2 = a2 + b2), or in this case:
39 D2 = (di - ei)2 + (dj - ej)2
Classifiers
Minimum Distance to Means Classifier

S
U SSS
U U U U S+ S
S C
U U S C
U U 2 C C
U U U +U 1 C +
U U U U U U CC C Known Cover Type
CC HH
U U U U H Selected from
H HH
Band 3 Digital Number

H H H training Sites
H
H H+ H
H H H HH W  Water
H H F  Forest
H H FF
WW F
F FF F
H  Hay
WWW +
F F F FF S  Sand
W +W FF F
W F FF U  Urban
W W
W C  Corn
WW

Band 4 Digital Number


•  Unknown Pixels
41
+  Category Mean
Minimum Distance to Means Classifier

First, the mean (or) average spectral value in each band for
each category is determined. These values comprise the mean
vector.

A pixel of unknown identity may be classified by computing


the distance between the value of unknown pixel and each of the
category means. After computing the distances, the unknown
pixel is assigned to the “closet” class.

If the pixel is farther than an analyst defined distance from


any category mean, it would be classified as “unknown”.
42
Minimum Distance to Means Classifier

Advantages
 Mathematically simple and computationally efficient.
 Since every pixel is spectrally closer to either one sample
mean or other, so there is no unclassified pixels

Disadvantages
It is insensitive to different degrees of variance in the spectral
response data.

Because of such problems, this classifier is not widely used in


applications where spectral classes are close to one another in the
43
measurement space and have high variance.
Minimum Distance to Means Classifier
S
U SSS
U U U U S+ S
S C
U U S C
U U 2 C C
U U U +U 1 C +
U U U U U U CC C
CC HH
U U U U H
H HH Known Cover Type
Band 3 Digital Number

H H H
H Selected from
H H+ H
H H H HH training Sites
H H
H H FF W  Water
WW F
F FF F F  Forest
WWW +
F F F FF
W +W
W FF F H  Hay
W W F FF S  Sand
W
WW U  Urban
C  Corn
Band 4 Digital Number
Unknown pixel 2 would be assigned by the distance-to-mean
classifier to the “sand” category, in spite of the fact that the greater •  Unknown Pixels
variability in the “urban” category suggests that “urban” would be +  Category Mean
44 a more appropriate class assignment.
Minimum Distance to Means Classifier
Euclidian Spectral Distance

Y
92, 153

Yd = 85-153
Distance = 111.2

180, 85

Xd = 180 -92

X
Parallelepiped Classifier

 It introduce sensitivity to category variance by considering


the range of values in each category training set.
 The range may be defined by the highest and lowest DN
values in each band and appears as a rectangular area in
our two channel scatter diagram.

An unknown pixel is classified according to the category range,


or decision region, in which it lies or as “unknown” if it lies
outside all regions.

46
Parallelepiped Classifier Positive Covariance
More variable (Slant upward to the right)
(Lack of Covariance) Highly repeatable

S
U SSS
U U U UU S S
U SS C
U U 2 C CC
U
U U U 1 C C
U U U U U U CC Known Cover Type
U U CC HH Selected from
U U H
H HH
Band 3 Digital Number

H H H training Sites
H
H H H
HHH H H W  Water
H H F  Forest
H H FF
WW F H  Hay
F FF F
WWW S  Sand
W W F F F FF
W FF F
U  Urban
F F F
W W
W C  Corn
WW

Band 4 Digital Number


•  Unknown Pixels
Negative Covariance
47 +  Category Mean
(Slant down to the right)
Parallelepiped Classifier

 Unknown pixel observations that occur in the overlap


areas will be classed as “not sure” or be arbitrarily placed
in one (or both) of the two overlapping classes.

 Overlap is caused largely because category distributions


exhibiting correlation or high covariance are poorly
described by the rectangular decision regions.

 Covariance is the tendency of spectral values to vary


similarity in two bands, resulting in elongated, slanted
clouds of observations on the scatter diagram.
48
Advantages
 Very fast and computational efficient.
 Gives a broad classification thus narrows down the number
of possible classes to which each pixel can be assigned before
more time consuming calculations are made.
 Not dependent on normal distributions.
 Sensitive to variance or spread

Disadvantages
 Since parallelepiped has corners, pixels that are actually
quite far, spectrally from the mean of the signature may be
classified or INSENSITITIVE TO COVARIANCE
 Regions Overlap

49
Parallelepiped Classifier

Stepped Parallelepipeds

Simple boxes defined by min/max limits …so use stepped boxes



50 of each training class. But overlaps……..?
Gaussian Maximum Likelihood Classifier

 Maximum Likelihood Classifier applies probability theory to


the classification take. (assume data in a class are unimodal Gaussian
(normal) distribution)

 From the training set classes, the method determines the class
centres and the variability in raster values in each input band
for classes.

 The probability depends upon the distance from the cell to the
class centre, and the size and shape of the class in the spectral.

 Maximum likelihood method computes all of the class


probabilities for each raster cell and assigns the cell to the class
with the highest probability value.
51
Probability Density Function defined by MLC
PDF are used to classified an undefined
pixel by computing the probability of
Ellipsoidal the pixel vale belonging to each
“Equiprobability Contours” category.

After eveluating the probability in each category, the pixel would be assigned to the
most likely class (highest probability value) or be labeled “unknown” if the probability
52 values are all below a threshold set by the analyst.
Equiprobability Contours defined by MLC

Shapes of the
equiprobability
contours expresses the
sensitivity of the
likelihood
classifier to covariance

53
Gaussian Maximum Likelihood Classifier

Advantages
 Most accurate of classifiers (if input sample have normal
distribution) because it takes the most variables.
 Takes variability of classes into account.

Disadvantages
 An extensive equation, takes long time to compute.
 It is parametric.
 Tends to over classify signatures with relatively large
values in the covariance matrix.
54
Training Stage
The overall objective of the training process is to assemble a
set of statistics that describe the spectral response pattern for
each land cover type to be classified an image.

56
Training Stage

 The quality of the training process determines the success


of the classification stage and, therefore, the value of the
information generated from the entire classification effort.

 To yield acceptable classification results, training data must


be both representative and complete. This means that the
image analyst must develop training statistics for all
spectral classes constituting each information class to be
discriminated by the classifier.

57
Training Stage

 The point that must be emphasized is that all spectral classes


constituting each information class must be adequately
represented in the training set statistics used to classify an image.

 When using any statistically based classifier (such as MLC),


the theoretical lower limit of the number of pixels that
must be contained in a training set is (n+1), where “n” is
the number of spectral bands.
In practice, a minimum of from (10n to 100n) pixels is used
since the estimates of the mean vectors and covariance
matrices improve as the number of pixels in the training set
increases.
58
Training Set Refinements

Trade off
 Sufficient sample size to ensure the accurate
determination of the statistical parameters used by the
classifier
 Redundant spectral class

59
Training Set Refinements

1. Graphical representation of the spectral response

patterns

1. Quantitative Expression of category separation

2. Self-classification of training set data

60
Normal
Graphical Representation Distribution
of the Spectral Response
Patterns
Distribution is
Bimodal
Training data set chosen by the analyst
to represent “hay” is composed of 2 subclasses
with slightly different spectral characteristics
the classification accuracy will be improved
if each of the subclasses is treated as a separate Normal
category Distribution

Normal
Figure: Distribution
Sample Histogram for
data points included in Normal
the training areas for Distribution
61 cover type “hay”
Figure:
Coincident Spectral Plots
for training data obtained
in five bands for six cover
types

The plot indicate the overlap between


category response patterns.

“Hay” and “Corn” response patterns


overlap in all spectral bands.

Example: Band 3 and Band 5 should


be used for “Hay” and
“Corn” separation

62
Correlation Between different bands of LISS – II
Poanta Image

Band 3
Band 2

Band 4
Band 1 Band 1 Band 1
Band 4

Band 4
Band 3

Band 2 Band 2 Band 3


63 Figure: Scatter Diagram of the two different bands
Graphical Separability
Methods

 Cospectral Feature Space


Plots
 Training site are plotted as
Ellipses or rectangles on
scatter Plots or feature space
plots
 For classification ellipses
must be spectrally distinct

64
Band 1 Histogram Band 2 Histogram

Scatter diagram of Band1 and


Band 2

Band 2 are highly correlated

65 Band 1
Band 3 Histogram

Band 4 Histogram
Scatter diagram of Band4 and Band 3
66 are less correlated
Band 1 Band 2

Band 3 Band 4
67
Quantitative Expressions of Category Separation

 A measure of the statistical separation between category

response patterns can be computed for all pairs of classes


and can be represented in the form of a matrix.
 One statistical parameter commonly used for this purpose

is transformed divergence, a covariance-weighted distance


between training patterns.
The larger the transformed divergence  The greater the “statistical distance”
between patterns  the higher the probability of correct classification of
classes
68
Quantitative Expressions of Category Separation
Portion of a Divergence Matrix Used to Evaluate Training Class Spectral Separability
Spectral Class W1 W2 W3 C1 C2 C3 C4 H1 H2 …
W1 0
W2 1185 0
W3 1410 680 0
C1 1997 2000 1910 0
C2 1953 1890 1874 860 0
C3 1980 1953 1930 1340 1353 0
C4 1992 1997 2000 1700 1810 1749 0
H1 2000 1839 1911 1410 1123 860 1712 0
H2 1995 1967 1935 1563 1602 1197 1621 721 0
.
.
.
WWater, C  Corn, H  Hay Maximum possible divergence = 2000
Value < 15000  Spectrally Similar Classes
69
Transformed Divergence
Signature Seperability………

70
Jeffries-Matusita Distance

Range of JM is between 0 and 1414. The JM distance has a saturating


behavior with increasing class separation like transformed divergence.
However, it is not as computationally efficient as transformed
71
divergence” (Jensen, 1996).
Unsupervised Classification
Unsupervised Image Classification

 Unsupervised classifier do not utilize training data, as

the basis for classification.


 The family of the classifiers involves algorithms that

examine the unknown pixels in an image and aggregate


them into a number of classes based on the natural
groupings or clusters present in image values.

73
Unsupervised Image Classification

 The classes that result from unsupervised classification are

spectral class.
 Because they are solely based on the natural groupings in

the image values, the identity of the spectral classes will


not be initially known.
 The analyst must compare the classified data with some

form of reference data to determine the identity and


informational value of the spectral classes.
74
Unsupervised Image Classification

75
Unsupervised Image Classification

 Clustering algorithm
 User defined cluster parameters
 Class mean vectors are arbitrarily
set by algorithm (iteration 0)
 Class allocation of feature vectors
 Compute new class mean vectors
 Class allocation (iteration 2)
 Re-compute class mean vectors
 Iterations continue until convergence
threshold has been reached
 Final class allocation
 Cluster statistics reporting

76
Clustering
“K- mean” Clustering (How it works?)
K-means accepts from the analyst the number of clusters to be
located in the data. The algorithm then arbitrarily “seed,” or
locates, that the number of cluster centers in the
multidimensional measurement space. Each pixel in the image is
then assigned to the cluster whose arbitrary mean vector is closet.
After all pixels have been classified in this manner, revised mean
vectors for each of the clusters in the image data. The procedure
continues until there is no significant change in the location of
class mean vectors between successive iterations of the algorithm.
Once this point is reached, the analyst determines the land cover
78
identity of each spectral class.
Unsupervised Classification: K- means
• A large number of clustering algorithms exist

• K-means
– input number of clusters desired

– algorithm typically initiated with arbitrarily-located 'seeds' for


cluster means

– each pixel then assigned to closest cluster mean

– revised mean vectors are then computed for each cluster

– repeat until some convergence criterion is met (e.g. cluster


means don't move between iterations)

– computationally-expensive because it is iterative


79
“K- mean” Clustering

80
A typical example of the k-means convergence to a local minimum. In this example, the
result of k-means clustering (the right figure) contradicts the obvious cluster structure of the
data set. The small circles are the data points, the four ray stars are the centroids (means).
The initial configuration is on the left figure. The algorithm converges after five iterations
presented on the figures, from the left to the right.

Initial condition Iteration 1 Iteration 2

81
Iteration 3 Iteration 4 Iteration 5
ISODATA Clustering

 This algorithm permits the number of clusters to change

from one iteration to the next, by merging, splitting, and


deleting clusters.

 In each iteration, following the assignment of pixels the

clusters, the statistics describing each cluster are evaluated.

If the distance between the mean points of two clusters is less


than some predefined minimum distance, the two clusters are
merged together.
82
ISODATA Clustering
 If a single cluster has a standard deviation(in any one
dimension) that is greater than a predefined maximum value,
the cluster is split in two.
 Clusters with fewer than the specified minimum number
of pixels are deleted.
 Finally, as with K-means, all pixels are then reclassified
into the revised set of clusters, and the process repeats, until
either there is no significant change in the cluster statistics or
some maximum number of iteration is reached.
83
ISODATA Clustering

 Step 1: Selection of No. and Centre of clusters or classes

 Step 2: Each point in the feature space is classified or

labelled to the closest center ( Min. Dist. To mean)

 Step 3: Mean is calculated for each cluster

 Step 4 : Reclassify each point in the FS using the new mean

 Iterate ( Repeat step 2 – 4) until mean does not change

within an acceptable value or until acceptable % of pixels


doesn’t change between clusters.
84
ISODATA Example: 2 Classes, 2 Bands

DN Initial cluster DN Cluster means move towards


B1 means B1 pixels 1 and 2 respectively
a Assign pixel 1 to
cluster a, 2 to b etc. a
Pixel 2
Pixel 2
Pixel 1 Pixel 1
b

DN DN
B2 B2

DN DN
B1 All pixels B1
assigned to a or SD of cluster a
b - update stats too large? Split a into 2,
recalculate.
Repeat….
New positions of New positions of
cluster means cluster means
DN DN
B2 B2
85
Example ISODATA
Band 2

Band 2

Band 2
Band 1 Band 1 Band 1

1. Data is clustered but [Link] and green 3. Either assign outliers


blue cluster is very clusters only have 2 or to nearest cluster, or
stretched in band 1. less pixels. So they will mark as unclassified.
be removed.

You might also like