0% found this document useful (0 votes)

43 views28 pages

Linear Discriminant Analysis Overview

The document discusses Linear Discriminant Analysis (LDA) and its applications in various fields, such as face recognition and disease classification. It explains the mathematical foundations of LDA, including the formulation of discriminant functions, decision boundaries, and the optimization of class separation using Fisher's criterion. Additionally, it highlights the implementation of LDA in Python and the challenges associated with least squares methods in classification tasks.

Uploaded by

goldrosen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views28 pages

Linear Discriminant Analysis Overview

Uploaded by

goldrosen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Birla Institute of Technology and Science Pilani, Hyderabad Campus

27.09.2024

BITS F464: Machine Learning (1st Sem 2024-25)

LINEAR DISCRIMINANT ANALYSIS
Chittaranjan Hota, Sr. Professor
Dept. of Computer Sc. and Information Systems
hota@[Link]
Linear Discriminant Functions: Applications

• Fisher’s Linear Discriminant Analysis for reducing the number of features

required for Face Recognition.
• Classifying patient’s disease state as Mild, Moderate or Severe.
• Identifying the type of customers who might buy a particular product.
Linear Discriminant Functions
• Used to discriminate between two or more classes based on
a set of predictor variables.

x = [x1,..,xD]T CK
Discriminant Function

• Learns the mapping between feature vector and class labels.

• Does it not create a decision boundary?

Hyperplanes
(1-D, a point/ threshold) (2-D, a line) (3-D, a plane)

Logistic Regression may be unstable for well separated classes and few examples. Why?
Two-class Linear Discriminant Functions (K=2)

• y(x) = wT x + w0 = wi xi + w0

Where, wT is the weight vector and w0 is the bias. The negative of

bias (i.e. -w0) sometimes is called as threshold.
C1 : If w0 = 0, 3-dimension
then the
y(x) > 0 hyperplane will pass
C2 : through what?
y(x) < 0

location
orientation

(geometry of LDF in 2-dimension)

Distance of Origin to Decision Surface (Bias: w0)

• Let xA and xB be points that lie

on the decision surface.

xA y(xA) = y(xB) = 0
xB

w T xA + w 0 = w T xB + w 0 = 0

wT (xA – xB) = 0

If x is a point on the decision xA – xB is an arbitrary vector

surface, y(x) = 0 -> wTx = -w0 parallel to the line.

So, the normal distance from Hence, w is orthogonal to

origin to decision surface: every vector lying on the
decision surface. orientation
Distance of a point ‘x’ to the Decision surface ( r )

Let ‘x’ be an arbitrary point and

be it’s orthogonal projection on the
decision surface.

by vector addition

Second term is a normalized vector

to the decision surface, which is
collinear with ‘w’.
As = 1 , we need to scale it by r.

As y( ) = 0 and wTw = ||w||2 y(x) = wT x + w0 = wT ( ) + w0

wT w
wT + w0 + =0+r = r.||w|| r = y(x) / ||w||
Multi-class Linear Discriminant Functions (K>2)
Approach 1: By combining a number of two-class discriminant
functions.
Alternative

K K
= =
? ?

(K-1 classifiers with each one (K (K-1)/2) classifiers with one for
separating points in a particular every possible pair of classes. Each
class Ck from points not in that point is classified according to a
class) one-versus-the-rest majority vote. one-versus-one
Another Example…

bc + xpT wc > 0
bj + xpT wj < 0, j =1…c, j != c

y = argmax bj + xT wj
j = 1, …c

global maximum
Solution: Using K-discriminant functions
• Building a single K-class discriminant comprising K-linear
functions of the form:
• yk (x) = wkT x + wk0

• Then, assigning a point x to class Ck if yk(x) > yj(x), V j = k.

• The decision boundary between Ck and Cj : yk(x) = yj(x)

• Defined by: (wk – wj)T x + (wk0 – wj0) = 0 Same as 2-class

• Hence, same geometrical properties apply.
Decision regions of such discriminants are always singly
connected and convex.

Proof of Convexity Next…

Proof of Convexity of Decision Region

Where,
Rj
From the Linearity of
Ri
Discriminant functions:
Rk

xB As XA and XB lie inside RK :

x and
xA
Lies in RK

x must also lie in RK RK is Singly Connected and Convex

Multi-class Classification using LDA (sklearn)

X_train,X_test,y_train,y_test =
train_test_split(X,y,test_size=0.3)
[Link](X_train,y_train)

Alcohol, magnesium, hue, proline, …

Least Squares for Classification
• Straightforward way to adapt regression techniques for
classification tasks.
• How do we compute y(x), and w0, w1, w2, …wd ?
• Each class Ck is described by its own linear model:
• yk(x) = wkT x + wk0 (where x and w have D dimensions each)
• We can group these together using a vector notation:
Augmented input vector
A Parameter matrix whose kth column is a D+1-dimensional
vector:
A new input x is then assigned to a class for which the output
is largest. Get by minimizing the Sum-of-squares.
An example classification using LDF
• Suppose we have a dataset of two classes: Class A and Class B. Each data point has two
features, x1 and x2. Our task is to classify new data points into either Class A or Class B using
a linear discriminant function.
• Class A (positive class): XA = {(2,3), (3,3), (4,5), (5,6)} & Class B (negative class): XB =
{(1,1),(2,2),(3,1),(4,2)}
• Step 1: Define the Linear discriminant function:
g(x) = w1x1 + w2x2 + w0 , Decision rune is: If g(x) > 0, classify as Class A, else Class B.
• Step 2: Train the classifier:
Suppose, after training we get the LDF as: g(x) = 2x1 + 3x2 - 15

Least squares
• Step 3: Classify new points:
x = (3,4)  g(3,4) = 2X3 + 3X4 -15 = 3
 As, g(3,4) = 3 > 0, we classify it as Class A.
What class a point (2,1) will belong to?
• Decision boundary:
g(x) = 0  x2 = (-2/3).x1 + 5
Minimizing sum-of-squares error func
Let there be a training dataset {xn, tn} where n = 1, …N
Define a matrix T whose nth row is the vector tnT and matrix
whose nth row is:
Then, the sum-of-squares error function can be written as:

Multiplying a matrix with its’s transpose results in a square

matrix.
Taking a Trace of this square matrix (sum of the elements on
the main diagonal)
LSE Computation: An Example
Minimizing Sum-of-Squares

To minimize the error, set the

derivative equal to zero:

0 =0

pseudoinverse solution
Least-squares: highly sensitive to outliers

Magenta: Least squares, Green: Logistic regression

Least squares: more severe problems
Certain datasets: Unsuitable 3-classes, 2-D space, synthetic data

(Least-squares classification) (Logistic Regression classification)

The region of input space assigned to the green class is too small and so
most of the points from this class are misclassified.
Fisher’s Linear Discriminant: Motivation
• Why do we need it?

Question: How difficult are these transformations to figure out?

Image source: [Link]
Fisher’s Linear Discriminant
Ronald A. Fisher
• View classification in terms of dimensionality reduction
• Project D-dimensional input vector x into one
dimension using: y = wTx
• Place threshold on y to classify y >= -w0 as class C1 else
class C2
• We get a standard linear classifier
• Classes well-separated in D-dimension space may strongly
overlap in 1-dimension
• Adjust component of the weight vector w
• Select projection to maximize class-separation
FLD seeks to maximize the ratio of between-class variance to within-class
variance, thus maximizing class discrimination.
An illustration of Fisher’s LDF

(Projection onto the line joining (Projection based on Fisher’s

the class means) Linear discriminant function)

What is the degree of class overlap? Is the class separation improved?

Maximizing Mean Separation
• Let us consider a two-class problem with N1 points of C1
class and N2 points of C2 class
• Mean vectors:

• Choose w to best separate class means:

• Maximize m2 – m1 = wT(m2 – m1), where mk = wTmk is the
mean of the projected data from class Ck
• Can be made arbitrarily large by increasing the magnitude
of w:
• We could have w to be of unit length i.e.
• Using a Lagrange multiplier, maximize
There is still a problem with this approach…
Illustration of the problem

Image source: [Link]

After re-projection, the data exhibit some sort of class

overlapping - shown by the yellow ellipse on the plot.
This difficulty arises from the strongly non-diagonal co-variances of the
class distributions.
Minimizing Variance and Optimizing
• Project D-dimensional input vector x into one dimension
using: yn = wTxn
• The within-class variance of the transformed data from
class Ck is given by:
sk2 = (yn- mk)2
• Total within-class variance for the whole dataset is: s12 + s22
• Fisher’s criterion:
Rewriting (to make the dependence on w explicit:
Where, is the between-class covariance matrix &
& I the within-class covariance matrix.
Differentiating with respect to w, J(w) is maximized when:
Dropping scalar factors, and noting SB is in the same direction as m2 – m1
and multiplying both the sides by SW-1 : Fisher’s LD
Optimization of J(w)

Quotient rule

To maximize J(w):

This is an eigenvalue problem, where λ is the

eigenvalue, and w is the eigenvector.
Problems:
Non-linear

Illustration with Fisher’s LDF

models,
Small
sample size
Image source: [Link]
Fisher’s Linear Discriminant Functions L
D
A

O
N

I
R
I
S

D
A
T
A
S
Assignment 3
E
Ref: [Link]
T
Thank you!

Linear Models for Classification in ML
No ratings yet
Linear Models for Classification in ML
72 pages
Supervised Regression in Machine Learning
No ratings yet
Supervised Regression in Machine Learning
74 pages
Discriminant Functions in Machine Learning
No ratings yet
Discriminant Functions in Machine Learning
33 pages
Overview of Linear Classifiers
No ratings yet
Overview of Linear Classifiers
48 pages
Understanding Linear Discriminant Functions
No ratings yet
Understanding Linear Discriminant Functions
2 pages
Linear Discriminant Functions Overview
No ratings yet
Linear Discriminant Functions Overview
14 pages
Linear Classification Methods Explained
No ratings yet
Linear Classification Methods Explained
50 pages
Linear Discriminant Analysis Overview
No ratings yet
Linear Discriminant Analysis Overview
17 pages
Key Points and Equations for LDA and Logistic Regression
No ratings yet
Key Points and Equations for LDA and Logistic Regression
13 pages
Key Concepts in Linear Classification
No ratings yet
Key Concepts in Linear Classification
9 pages
Probabilities in Linear Classification
No ratings yet
Probabilities in Linear Classification
40 pages
Discriminant Rules in Classification
No ratings yet
Discriminant Rules in Classification
48 pages
Linear Discriminant Functions Overview
No ratings yet
Linear Discriminant Functions Overview
41 pages
Discriminant Functions in ML
No ratings yet
Discriminant Functions in ML
6 pages
Supervised Learning: LDA and QDA Methods
No ratings yet
Supervised Learning: LDA and QDA Methods
15 pages
Linear Discriminant Functions Overview
No ratings yet
Linear Discriminant Functions Overview
45 pages
Pattern Recognition Course Overview
No ratings yet
Pattern Recognition Course Overview
104 pages
Gaussian Discriminant Analysis Explained
No ratings yet
Gaussian Discriminant Analysis Explained
6 pages
Linear Models for Multi-Class Classification
No ratings yet
Linear Models for Multi-Class Classification
26 pages
Reviewed - IJAMSS - Equivalence of Fisher Discriminant Analysis and Least Square
No ratings yet
Reviewed - IJAMSS - Equivalence of Fisher Discriminant Analysis and Least Square
11 pages
Discriminant Functions in Classification
No ratings yet
Discriminant Functions in Classification
8 pages
LDA and QDA: Classification Tutorial
No ratings yet
LDA and QDA: Classification Tutorial
16 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
53 pages
Bishop's Classification Techniques Explained
No ratings yet
Bishop's Classification Techniques Explained
21 pages
Fisher Linear Discriminant Analysis Explained
No ratings yet
Fisher Linear Discriminant Analysis Explained
6 pages
LDA vs PCA: Key Differences Explained
No ratings yet
LDA vs PCA: Key Differences Explained
28 pages
Linear Classification Models Overview
No ratings yet
Linear Classification Models Overview
30 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
Machine Learning Unit 4: SVM & LDA
No ratings yet
Machine Learning Unit 4: SVM & LDA
29 pages
Discriminant Analysis Techniques Explained
No ratings yet
Discriminant Analysis Techniques Explained
12 pages
Introduction to Support Vector Machines
No ratings yet
Introduction to Support Vector Machines
40 pages
Linear Classification Algorithms in Python
No ratings yet
Linear Classification Algorithms in Python
76 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
57 pages
Understanding Linear Discriminant Analysis
No ratings yet
Understanding Linear Discriminant Analysis
23 pages
Overview of Pattern Recognition Techniques
0% (1)
Overview of Pattern Recognition Techniques
37 pages
Linear Classification Methods Overview
No ratings yet
Linear Classification Methods Overview
14 pages
Linear Discriminants in Machine Learning
No ratings yet
Linear Discriminants in Machine Learning
16 pages
Reference Material - LDA
No ratings yet
Reference Material - LDA
24 pages
Support Vector Machines Overview
No ratings yet
Support Vector Machines Overview
36 pages
CSE 474/574 Machine Learning Assignment 3
No ratings yet
CSE 474/574 Machine Learning Assignment 3
3 pages
Feature Translation Methods Overview
No ratings yet
Feature Translation Methods Overview
22 pages
Linear Discrimination in Classification
No ratings yet
Linear Discrimination in Classification
25 pages
LDA for Pattern Recognition Analysis
No ratings yet
LDA for Pattern Recognition Analysis
33 pages
Understanding SVM Classes and Margins
No ratings yet
Understanding SVM Classes and Margins
33 pages
Model Fitting and Classification Techniques
No ratings yet
Model Fitting and Classification Techniques
25 pages
Discriminant Functions in Normal Density
No ratings yet
Discriminant Functions in Normal Density
22 pages
Linear Regression Model Insights
No ratings yet
Linear Regression Model Insights
8 pages
Understanding Linear Classifiers
No ratings yet
Understanding Linear Classifiers
9 pages
Bayesian Reasoning and KNN in ML
No ratings yet
Bayesian Reasoning and KNN in ML
8 pages
Support Vector Machines Overview
No ratings yet
Support Vector Machines Overview
19 pages
Discriminant Functions in Machine Learning
No ratings yet
Discriminant Functions in Machine Learning
13 pages
Essential Math for Machine Learning
No ratings yet
Essential Math for Machine Learning
9 pages
cs221 Lecture11
No ratings yet
cs221 Lecture11
71 pages
Fisher's Linear Discriminant Analysis
No ratings yet
Fisher's Linear Discriminant Analysis
24 pages
Decision Boundary in Discriminant Functions
No ratings yet
Decision Boundary in Discriminant Functions
7 pages
Understanding Linear Discriminant Analysis
No ratings yet
Understanding Linear Discriminant Analysis
6 pages
Muthayammal College Overview 2023
No ratings yet
Muthayammal College Overview 2023
29 pages
RPA: Concepts, Tools, and Applications
No ratings yet
RPA: Concepts, Tools, and Applications
13 pages
IoT and Machine Learning in Food Monitoring
No ratings yet
IoT and Machine Learning in Food Monitoring
7 pages
CSC Academy Overview and Research Insights
No ratings yet
CSC Academy Overview and Research Insights
2 pages
Introduction to Internet of Things
No ratings yet
Introduction to Internet of Things
47 pages
S S M College of Engineering Overview
No ratings yet
S S M College of Engineering Overview
52 pages
Devi K's Research on Cloud Computing
No ratings yet
Devi K's Research on Cloud Computing
2 pages
ICOST 2024: Sustainable Future Conference
No ratings yet
ICOST 2024: Sustainable Future Conference
150 pages
BBA Research Methodology Exam 2022
No ratings yet
BBA Research Methodology Exam 2022
3 pages
Food & Beverage Industry Efficiency Analysis
No ratings yet
Food & Beverage Industry Efficiency Analysis
42 pages
Introduction to Machine Learning Basics
No ratings yet
Introduction to Machine Learning Basics
31 pages
Linear Regression Analysis Questions
No ratings yet
Linear Regression Analysis Questions
3 pages
B.Tech Data Science Exam Paper 2024
No ratings yet
B.Tech Data Science Exam Paper 2024
17 pages
Airport Noise Impact on Property Values
No ratings yet
Airport Noise Impact on Property Values
28 pages
Machine Learning Based Fatigue Life Prediction With Effects of Additivemanufacturing Process Parameters For Printed SS 316L
No ratings yet
Machine Learning Based Fatigue Life Prediction With Effects of Additivemanufacturing Process Parameters For Printed SS 316L
13 pages
LAMP Model for HR Analytics Framework
No ratings yet
LAMP Model for HR Analytics Framework
84 pages
Car Sales Forecasting Models in R
No ratings yet
Car Sales Forecasting Models in R
3 pages
J. Holton Wilson, Barry Keating - Business Forecasting With ForecastX-McGraw-Hill - Irwin (2008)
91% (11)
J. Holton Wilson, Barry Keating - Business Forecasting With ForecastX-McGraw-Hill - Irwin (2008)
526 pages
Extrinsic Rewards and Employee Creativity
No ratings yet
Extrinsic Rewards and Employee Creativity
25 pages
Advanced Concepts in Deep Learning
No ratings yet
Advanced Concepts in Deep Learning
5 pages
Share of Wallet in Retailing The Effects of Customer Satisfaction
No ratings yet
Share of Wallet in Retailing The Effects of Customer Satisfaction
10 pages
Airbnb Dataset Analysis: Netherlands vs Belgium
No ratings yet
Airbnb Dataset Analysis: Netherlands vs Belgium
20 pages
Calorie Burn Prediction Model Report
100% (1)
Calorie Burn Prediction Model Report
17 pages
Interactions with Dummy Variables
No ratings yet
Interactions with Dummy Variables
4 pages
Data Analysis vs. Data Mining Explained
No ratings yet
Data Analysis vs. Data Mining Explained
55 pages
Mm16-Yejun 1
No ratings yet
Mm16-Yejun 1
4 pages
Demand Forecasting via Multiple Regression
No ratings yet
Demand Forecasting via Multiple Regression
26 pages
Data Management in Statistics Lesson
No ratings yet
Data Management in Statistics Lesson
57 pages
Ensemble Learning for Highway Bid Prediction
No ratings yet
Ensemble Learning for Highway Bid Prediction
10 pages
Steps in Regression Analysis Explained
No ratings yet
Steps in Regression Analysis Explained
5 pages
Predictors of Grade 11 Absenteeism
No ratings yet
Predictors of Grade 11 Absenteeism
16 pages
Predicting Creep Properties of CreMo Steel
No ratings yet
Predicting Creep Properties of CreMo Steel
16 pages
Effective Forecasting Techniques Guide
No ratings yet
Effective Forecasting Techniques Guide
92 pages
Machine Learning's Impact on Digital Marketing
No ratings yet
Machine Learning's Impact on Digital Marketing
12 pages
Impact of Crisis on Inflation Expectations
No ratings yet
Impact of Crisis on Inflation Expectations
41 pages
Nonlinear Trends in Inferential Statistics
No ratings yet
Nonlinear Trends in Inferential Statistics
4 pages
Financial Distress Prediction Review
No ratings yet
Financial Distress Prediction Review
16 pages
Cochrane's Asset Pricing: Chapter 1 Notes
No ratings yet
Cochrane's Asset Pricing: Chapter 1 Notes
29 pages

Linear Discriminant Analysis Overview

Uploaded by

Linear Discriminant Analysis Overview

Uploaded by

Birla Institute of Technology and Science Pilani, Hyderabad Campus

BITS F464: Machine Learning (1st Sem 2024-25)

• Fisher’s Linear Discriminant Analysis for reducing the number of features

• Learns the mapping between feature vector and class labels.

• Does it not create a decision boundary?

Where, wT is the weight vector and w0 is the bias. The negative of

(geometry of LDF in 2-dimension)

• Let xA and xB be points that lie

If x is a point on the decision xA – xB is an arbitrary vector

So, the normal distance from Hence, w is orthogonal to

Let ‘x’ be an arbitrary point and

Second term is a normalized vector

As y( ) = 0 and wTw = ||w||2 y(x) = wT x + w0 = wT ( ) + w0

• Then, assigning a point x to class Ck if yk(x) > yj(x), V j = k.

• The decision boundary between Ck and Cj : yk(x) = yj(x)

• Defined by: (wk – wj)T x + (wk0 – wj0) = 0 Same as 2-class

Proof of Convexity Next…

xB As XA and XB lie inside RK :

x must also lie in RK RK is Singly Connected and Convex

Alcohol, magnesium, hue, proline, …

Multiplying a matrix with its’s transpose results in a square

To minimize the error, set the

Magenta: Least squares, Green: Logistic regression

(Least-squares classification) (Logistic Regression classification)

Question: How difficult are these transformations to figure out?

(Projection onto the line joining (Projection based on Fisher’s

What is the degree of class overlap? Is the class separation improved?

• Choose w to best separate class means:

Image source: [Link]

After re-projection, the data exhibit some sort of class

This is an eigenvalue problem, where λ is the

Illustration with Fisher’s LDF

You might also like