0% found this document useful (0 votes)

25 views78 pages

Support Vector Machine Overview

Uploaded by

pg7541745

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views78 pages

Support Vector Machine Overview

Uploaded by

pg7541745

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Support Vector Machine Algorithm

• Support Vector Machine(SVM) is a supervised machine learning algorithm used for both classification and
regression

• The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-
dimensional space into classes so that we can easily put the new data point in the correct category in the
future. This best decision boundary is called a hyperplane.

• SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme cases are
called as support vectors, and hence algorithm is termed as Support Vector Machine.

Two types:

• Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can be classified into
two classes by using a single straight line, then such data is termed as linearly separable data, and classifier
is used called as Linear SVM classifier.
• Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means if a dataset cannot
be classified by using a straight line, then such data is termed as non-linear data and classifier used is
called as Non-linear SVM classifier. [Link] Govindarajan
Hyperplane:

• There can be multiple lines/decision boundaries to segregate the classes in n-dimensional space, but we need to
find out the best decision boundary that helps to classify the data points. This best boundary is known as
the hyperplane of SVM.
• The dimensions of the hyperplane depend on the features present in the dataset, which means if there are 2
features, then hyperplane will be a straight line. And if there are 3 features, then hyperplane will be a 2-
dimension plane.
• We always create a hyperplane that has a maximum margin, which means the maximum distance between
the data points - Maximal Margin Hyperplane

Support Vectors:

• The data points or vectors that are the closest to the hyperplane and which affect the position of the
hyperplane are termed as Support Vector. Since these vectors support the hyperplane, hence called a Support
vector.

[Link] Govindarajan
Linear SVM:
• The working of the SVM algorithm can be understood by using an example. Suppose we have a dataset that has
two tags (green and blue), and the dataset has two features x1 and x2. We want a classifier that can classify
the pair(x1, x2) of coordinates in either green or blue.

• So as it is 2-d space so by just using a straight line, we can easily separate these two classes. But there can be
multiple lines that can separate these classes.

[Link] Govindarajan
• The SVM algorithm helps to find the best line or decision boundary; this best boundary or region is called
as a hyperplane. SVM algorithm finds the closest point of the lines from both the classes. These points are
called support vectors. The distance between the vectors and the hyperplane is called as margin. And the goal of
SVM is to maximize this margin. The hyperplane with maximum margin is called the optimal hyperplane.

[Link] Govindarajan
Non-Linear SVM:
• If data is linearly arranged, then we can separate it by using a straight line, but for non-linear data, we cannot
draw a single straight line.

[Link] Govindarajan
• So to separate these data points, we need to add one more dimension. For linear data, we have used two
dimensions x and y, so for non-linear data, we will add a third dimension z. By adding the third
dimension, the sample space will become,

• So now, SVM will divide the datasets into classes in the following way, Since we are in 3-d Space, hence it is looking like a
plane parallel to the x-axis.

[Link] Govindarajan
• If we convert it in 2d space with z=1, then it will look like

Advantages of SVM

•Effective in high-dimensional cases.

•Its memory is efficient as it uses a subset of training points in the decision function called support vectors.

[Link] Govindarajan
Example: we see a strange cat that also has some features of dogs, so if we want a model that can accurately
identify whether it is a cat or dog, so such a model can be created by using the SVM algorithm.
We will first train our model with lots of images of cats and dogs so that it can learn about different features
of cats and dogs, and then we test it with this strange creature. So as support vector creates a decision
boundary between these two data (cat and dog) and choose extreme cases (support vectors), it will see the extreme
case of cat and dog. On the basis of the support vectors, it will classify it as a cat. SVM algorithm can be used
for Face detection, image classification, text categorization, etc.

[Link] Govindarajan
Kernel Optimization

Kernel optimization in machine learning focuses on finding the best kernel function and parameters to improve
model performance, particularly in tasks involving non-linear data and high-dimensional spaces.

This involves techniques like cross-validation, hyperparameter tuning, and exploring different kernel types to
achieve optimal accuracy and efficiency.

Kernel Methods :

• Kernel methods are a class of machine learning algorithms that leverage the "kernel trick" to perform
complex, non-linear operations in high-dimensional spaces without explicitly mapping the data.

• They are particularly useful for algorithms like Support Vector Machines (SVMs).

• The kernel function acts as a similarity measure between data points, allowing the algorithm to find patterns
and relationships in a higher-dimensional space.
[Link] Govindarajan
Need of Kernel Optimization:

Non-Linear Data:
• Many real-world datasets are not linearly separable, and kernel methods provide a powerful way to handle such
data by implicitly mapping it to a higher-dimensional space where it might be linearly separable.

Computational Efficiency:
• The kernel trick allows algorithms to operate in high-dimensional spaces without the computational cost
of explicitly mapping the data, making them efficient.

Flexibility:

• Kernel methods are versatile and can be applied to various types of data and tasks, as long as a suitable
kernel function can be defined.

[Link] Govindarajan
Neural networks learning
Neural network learning refers to how artificial neural networks (ANNs) adjust their internal parameters (weights
and biases) to improve performance on a given task. The learning process is typically based on optimization
algorithms and data-driven training.

How it works:

1. Forward Propagation
• The input data passes through multiple layers of neurons.
• Each neuron applies a weighted sum of its inputs, followed by an activation function (e.g., ReLU, Sigmoid,)
ReLU, Rectified Linear Unit, is a popular activation function in deep learning that outputs the input directly if it’s positive,
and zero otherwise
• The final output is produced based on the learned parameters.

2. Loss Calculation
• The output of the network is compared to the actual target values using a loss function (e.g., Mean Squared
Error for regression, Cross-Entropy Loss for classification).
• The loss function quantifies how far the predictions are from the true labels.
[Link] Govindarajan
[Link] Govindarajan
[Link] Govindarajan
[Link] Govindarajan
[Link] & Gradient Descent
• Backpropagation: The gradient of the loss function is computed with respect to each weight using the chain
rule of calculus.
• Gradient Descent: The network updates the weights in the opposite direction of the gradient to reduce the
loss.

4. Iterative Optimization
• Steps 1–3 are repeated for multiple epochs (full passes through the dataset).
• The network gradually learns patterns from data and improves predictions.

5. Convergence & Generalization

• Training stops when the model achieves good performance on the training set and generalizes well to
unseen data (avoiding overfitting).
• Regularization techniques (Dropout, L2 regularization) and validation datasets help ensure good generalization.

[Link] Govindarajan
Non-Linear hypothesis

[Link] Govindarajan
Methods for Non-linear Hypothesis Testing:

• Likelihood Ratio Test (LR Test):

A statistical test that compares the likelihood of the data under two different models, one with the null
hypothesis and one with the alternative hypothesis.

• Wald Test:
A statistical test that uses the estimated parameters and their standard errors to test a hypothesis.

• Bayesian Methods:
Methods that use Bayes' theorem to update prior beliefs about a hypothesis based on observed data.

• Surrogate Data Method:

A method for testing nonlinearity in time series data by generating surrogate data sets that are consistent with
a linear process and comparing them to the original data.
[Link] Govindarajan
Software and Tools:

Stata:
The testnl command in Stata can be used to test non-linear hypotheses after estimation.

R:
The hypothesis function in the brms package in R can be used for non-linear hypothesis testing in Bayesian models.

Why Use It :

•Captures more complex relationships in data

•Useful when a simple straight line does not fit well

Challenges:

•Higher risk of overfitting if not regularized properly

•More computational complexity

•Requires careful feature selection or kernel methods

[Link] Govindarajan
Example :

A classic example of a non-linear hypothesis in machine learning is using a polynomial feature expansion in a
linear regression model to fit a curve to data that isn't linearly separable, or using decision trees or neural
networks to classify data that is not linearly separable.

Neural Networks:

Neural networks are powerful models that can learn complex, non-linear relationships in data.

They consist of interconnected "neurons" that process information and make predictions.

The neurons use activation functions (like sigmoid or ReLU) that introduce non-linearity, allowing the
network to model complex patterns.

By adjusting the weights and biases of the connections between neurons, the network can learn to map
inputs to outputs in a non-linear way.

[Link] Govindarajan
Perceptrons

[Link] Govindarajan
[Link] Govindarajan
[Link] Govindarajan
[Link] Govindarajan
[Link] Govindarajan
[Link] Govindarajan
[Link] Govindarajan
[Link] Govindarajan
Backpropagation Algorithm

Backpropagation is a crucial algorithm in machine learning, particularly for training artificial neural networks,
where it facilitates the learning process by minimizing errors through a process of forward and
backward passes, adjusting weights and biases to improve accuracy.

What it is:
• Backpropagation is a supervised learning method used to train artificial neural networks. It's
essentially a method for fine-tuning the weights of a neural network based on the error rate (or loss)
obtained in the previous epoch (or iteration).

How it works:
• Forward Pass: Input data flows through the network, and predictions are generated.
• Backward Pass: The error (the difference between predicted and actual outputs) is calculated and
propagated backward through the network.
• Weight Adjustment: The algorithm uses this error to adjust the weights and biases of the network,
aiming to minimize the error in future predictions.
[Link] Govindarajan
Explanation with an exemplar

[Link] Govindarajan
[Link] Govindarajan
• One can update all the weights.

• After which, the error 0.298371109 on the network - when one feds forward the
0.05 and 0.1 inputs. In the first round of Backpropagation, the total error is down
to 0.291027924.

• After repeating this process 10,000, the total error is down to 0.0000351085. At
this point, the outputs neurons generate 0.159121960 and 0.984065734 i.e., nearby
the target value when one feeds forward the 0.05 and 0.1.

[Link] Govindarajan
Why it's important

• Training Neural Networks: Backpropagation is a fundamental algorithm for training neural networks, enabling
them to learn from data.
• Gradient Descent: It enables the use of gradient descent algorithms to update network weights, which is how deep
learning models "learn".
• Efficiency: It's efficient in training multi-layered networks and handling non-linear relationships, making it
suitable for complex tasks like image recognition and language processing.

Key Concepts

• Loss Function: Measures the difference between predicted and actual outputs.
• Gradient: Indicates the direction of the steepest increase in the loss function.
• Chain Rule: Backpropagation is an efficient application of the chain rule of calculus to compute gradients.

[Link] Govindarajan
Advantages:
Scalability: Backpropagation is scalable and can be used with large networks.
Automation: It automates the calculation of gradients, simplifying the training process.
Generalization: Trained networks using backpropagation can generalize well to unseen data.

Limitations:

Vanishing/Exploding Gradients: Can occur in deep networks, making training difficult.

Computational Cost: Can be computationally expensive, especially for large networks.

[Link] Govindarajan
Bayesian networks
• Bayesian networks, a type of probabilistic graphical model, are used in machine learning to represent and
reason about uncertainty, capturing probabilistic relationships between variables using nodes and edges,
enabling inference and learning from data.
• A Bayesian Network is a directed acyclic graph and:
- its vertices (or nodes) are random variables
- each of its arrows corresponds to a conditional dependency relation: an arrow B → A indicates that A
depends on B
- moreover, we attach to each node A the conditional probability distribution of the corresponding random
variable A given its parents (i.e. given the nodes B for which there is an arrow B → A).

[Link] Govindarajan
[Link] Govindarajan
[Link] Govindarajan
[Link] Govindarajan
[Link] Govindarajan
[Link] Govindarajan
[Link] Govindarajan
Applications:

Bayesian networks are used in various machine learning applications, including:

•Classification: Predicting the class of an instance based on its features.

•Diagnosis: Identifying the cause of a problem or disease based on symptoms.

•Risk Assessment: Assessing the likelihood of an event occurring.

•Document Classification: Categorizing documents based on their content.

•Image Processing: Analyzing and understanding images.

•Gene Regulatory Network: Modeling the interactions between genes.

[Link] Govindarajan
Unsupervised learning : clustering
• In unsupervised learning, clustering is a technique used to group unlabeled data points based on their
similarities, revealing underlying patterns and structures within the data.
• Clustering is a fundamental unsupervised machine learning task where algorithms identify groups or clusters
within a dataset without any prior knowledge or labels.

How it works:
• Clustering algorithms analyze the data to find similarities or differences between data points, grouping
those that are more alike into the same cluster.

Unsupervised nature:
• Unlike supervised learning, where algorithms learn from labeled data, unsupervised learning, including
clustering, operates on unlabeled data, allowing it to discover hidden patterns and structures.

[Link] Govindarajan
[Link] Govindarajan
Cluster Analysis
• Clustering analyzes data objects without consulting class labels. Clustering can be used to generate class
labels for a group of data. The objects are clustered or grouped based on the principle of maximizing the
intraclass similarity and minimizing the interclass similarity.

[Link] Govindarajan
[Link] Govindarajan
[Link] Govindarajan
Clustering is a powerful and versatile tool in machine learning, especially in unsupervised learning,
where it helps uncover hidden patterns in data. By grouping similar data points together, clustering
enables better data exploration and decision-making in a variety of fields like finance, marketing,
healthcare, and beyond.

[Link] Govindarajan
Spectral clustering

[Link] Govindarajan
The three major steps involved in Spectral Clustering Algorithm are: constructing a similarity graph, projecting data onto a lower-
dimensional space, and clustering the data. Given a set of points S in a higher-dimensional space, it can be elaborated as follows:
[Link] a distance matrix
[Link] the distance matrix into an affinity matrix A
[Link] the degree matrix D and the Laplacian matrix L = D – A.
[Link] the eigenvalues and eigenvectors of L.
[Link] the eigenvectors of k largest eigenvalues computed from the previous step form a matrix.
[Link] the vectors.
[Link] the data points in k-dimensional space. [Link] Govindarajan
An affinity matrix, also known as a similarity matrix or kernel, is a square, symmetric matrix that
represents the pairwise similarities between objects in a dataset. Affinity matrices capture the relationships
between data points, highlighting which points are similar or dissimilar.

The degree matrix of an undirected graph is a diagonal matrix which contains information about the degree of
each vertex—that is, the number of edges attached to each vertex.

The Laplacian matrix, also called the graph Laplacian, admittance matrix, Kirchhoff matrix or discrete Laplacian,
is a matrix representation of a graph.

An eigenvector of a matrix A is a non-zero vector v such that when multiplied by A, the resulting vector is a
scalar multiple of the original vector v.

An eigenvalue (also known as a characteristic value) is a scalar value associated with an eigenvector.

[Link] Govindarajan
Spectral Clustering Matrix Representation

[Link] Govindarajan
[Link] Govindarajan
Subspace clustering
Subspace clustering is a machine learning technique used to identify clusters within a dataset by analyzing
specific subsets of dimensions, or subspaces, of the data. It's particularly useful for high-dimensional data
where traditional clustering methods can struggle due to the "curse of dimensionality". The core idea is to find
clusters that exist within a subset of relevant features, rather than requiring agreement across all features.

Why Subspace Clustering?

• High Dimensionality:
• Traditional clustering algorithms often perform poorly in high-dimensional spaces because distances between
data points become less meaningful, and clusters can become indistinct.
• Irrelevant Dimensions:
• Many dimensions in high-dimensional datasets might be irrelevant or noisy, obscuring underlying cluster
structures.
• Data Structure:
• In some datasets, data points may naturally belong to clusters defined by a subset of dimensions, or "subspaces,"
rather than all dimensions.

[Link] Govindarajan
How Subspace Clustering Works:
1. Identifying Subspaces:
Subspace clustering algorithms aim to find the relevant subspaces within the data.
2. Clustering within Subspaces:
Once the subspaces are identified, clustering algorithms can be applied to the data points within those subspaces.
3. Combining Results:

The results from clustering within different subspaces can be combined to obtain a final clustering of the entire dataset.

[Link] Govindarajan
Types of Subspace Clustering Algorithms

• Algebraic Methods:
These methods use linear algebra techniques to find the underlying subspaces, such as finding the
eigenvectors of a matrix representing the data.

• Iterative Methods:
These methods iteratively refine the subspace representation by updating the projection matrices or cluster
assignments.

• Statistical Methods:
These methods use statistical models to describe the data distribution within subspaces.

• Spectral Clustering-based Methods:

These methods use the concept of spectral clustering to partition the data based on the eigenvectors of a
similarity matrix representing the data.
[Link] Govindarajan
Applications:

Subspace clustering has a wide range of applications, including:

• Image Processing and Computer Vision: Finding objects or regions in images.
• Data Mining: Identifying patterns and trends in datasets.
• Gene Expression Analysis: Clustering genes based on their expression patterns.
• Text Mining: Clustering documents based on their content.

Benefits of Subspace Clustering:

• Handles high-dimensional data: It can effectively cluster data even when the number of dimensions is very
large.
• Handles irrelevant dimensions: It can ignore irrelevant dimensions and focus on the most informative ones.
• More robust than traditional clustering: It can be more robust to noise and outliers.

[Link] Govindarajan
Challenges:

• Computational complexity: Some algorithms can be computationally expensive, especially for very large
datasets.

• Parameter tuning: Some algorithms require tuning of parameters, which can be challenging.

• Finding the right number of subspaces: Determining the optimal number of subspaces can be difficult.

[Link] Govindarajan
Dimensionality Reduction

Dimensionality reduction is a method for representing a given dataset using a lower number of features (that is,
dimensions) while still capturing the original data’s meaningful properties. This amounts to removing irrelevant
or redundant features, or simply noisy data, to create a model with a lower number of variables.
Dimensionality reduction covers an array of feature selection and data compression methods used during
preprocessing. While dimensionality reduction methods differ in operation, they all transform high-
dimensional spaces into low-dimensional spaces through variable extraction or combination.

• When working with machine learning models, datasets with too many features can cause issues like slow
computation and overfitting. Dimensionality reduction helps by reducing the number of features while
retaining key information.
• Techniques like principal component analysis (PCA), singular value decomposition (SVD) and linear
discriminant analysis (LDA) project data onto a lower-dimensional space, preserving important details.

[Link] Govindarajan
[Link] Govindarajan
[Link] Govindarajan
[Link] Govindarajan
Advantages of Dimensionality Reduction

•Faster Computation: With fewer features, machine learning algorithms can process data more quickly. This results
in faster model training and testing, which is particularly useful when working with large datasets.

•Better Visualization: As we saw in the earlier figure, reducing dimensions makes it easier to visualize data,
revealing hidden patterns.

•Prevent Overfitting: With fewer features, models are less likely to memorize the training data and overfit. This
helps the model generalize better to new, unseen data, improving its ability to make accurate predictions.

Disadvantages of Dimensionality Reduction

•Data Loss & Reduced Accuracy – Some important information may be lost during dimensionality reduction,
potentially affecting model performance.

•Choosing the Right Components – Deciding how many dimensions to keep is difficult, as keeping too few may lose
valuable information, while keeping too many can lead to overfitting.

[Link] Govindarajan
Data Compression
• Data compression is the process of encoding, restructuring or otherwise modifying data in order to
reduce its size. Fundamentally, it involves re-encoding information using fewer bits than the
original representation.
• Compression is done by a program that uses functions or an algorithm to effectively discover how to
reduce the size of the data.
• A good example of this often occurs with image compression. When a sequence of colors, like ‘blue,
red, red, blue’ is found throughout the image, the formula can turn this data string into a single
bit, while still maintaining the underlying information.
• Text compression can usually succeed by removing all unnecessary characters, instead inserting a
single character as reference for a string of repeated characters, then replacing a smaller bit string
for a more common bit string. With proper techniques, data compression can effectively lower a text
file by 50% or more, greatly reducing its overall size.
• For data transmission, compression can be run on the content or on the entire transmission.
When information is sent or received via the internet, larger files, either on their own or with others, or
as part of an archive file, may be transmitted in one of many compressed formats, like ZIP, RAR, 7z,

[Link] Govindarajan
Lossy vs Lossless
• Lossless compression: Removes bits by locating and removing statistical redundancies. Because of this
technique, no information is actually removed. Lossless compression will often have a smaller compression
ratio, with the benefit of not losing any data in the file. This is often very important when needing to maintain
absolute quality, as with database information or professional media files. Formats such as FLAC (audio) and
PNG offer lossless compression options.
• Lossy compression: Lowers size by deleting unnecessary information, and reducing the complexity of
existing information. Lossy compression can achieve much higher compression ratios, at the cost of possible
degradation of file quality. JPEG offers lossy compression options, and MP3 is based on lossy compression.

[Link] Govindarajan
• Compression reduces the cost of storage, increases the speed of algorithms, and reduces the transmission cost.
Compression is achieved by removing redundancy, that is repetition of unnecessary data. Coding redundancy refers to
the redundant data caused due to suboptimal coding techniques.

Variable-Length Codes:
Frequently occurring characters get shorter codes, while rare ones get longer codes. This reduces the average number of bits used per symbol.
A=00, B=01, C=10, D=11 – Fixed-Length and A=0, B=10, C=110, D=111 - Variable-Length (Huffman) [Link] Govindarajan
Coding techniques are related to the concepts of entropy and information content, which are studied as a subject called
information theory. Information theory also deals with uncertainty present in a message is called the information content.

Why Compression Matters in ML:

•Faster Training & Inference: Smaller datasets mean quicker I/O and computation.
•Reduced Storage: Essential when working with large-scale datasets.
•Better Generalization: Compressed representations can reduce overfitting by removing noise or irrelevant features.
•Enables Edge Deployment: Compressed models/data can run on mobile devices or IoT systems.

Use Cases:

•Image Classification: Compress images before feeding them to CNNs.

•Natural Language Processing: Use tokenization and embeddings to compress text.
•Streaming Applications: Use compressed representations for real-time predictions. [Link] Govindarajan
Importance of Data Compression

• The main advantages of compression are reductions in storage hardware, data transmission time, and
communication bandwidth. This can result in significant cost savings. Compressed files require significantly
less storage capacity than uncompressed files, meaning a significant decrease in expenses for storage. A
compressed file also requires less time for transfer while consuming less network bandwidth. This can also help
with costs, and also increases productivity.

• The main disadvantage of data compression is the increased use of computing resources to apply
compression to the relevant data. Because of this, compression vendors prioritize speed and resource
efficiency optimizations in order to minimize the impact of intensive compression tasks.

[Link] Govindarajan
Principal Components Analysis

Principal components analysis (PCA; also called the Karhunen-Loeve, or K-L, method) searches for k n-
dimensional orthogonal vectors that can best be used to represent the data, where k<=n. The original data
are thus projected onto a much smaller space, resulting in dimensionality reduction.

The basic procedure is as follows:

1. The input data are normalized, so that each attribute falls within the same range. This step helps ensure that
attributes with large domains will not dominate attributes with smaller domains.

2. PCA computes k orthonormal vectors that provide a basis for the normalized input data. These are unit
vectors that each point in a direction perpendicular to the others. These vectors are referred to as the
principal components. The input data are a linear combination of the principal components.

These vectors represent the directions (principal components) where the data has the most variance, and PCA finds the
orthonormal basis that maximizes this variance.
Orthogonal vectors are perpendicular to each other, meaning their dot product is zero. In PCA, this implies that
different principal components are uncorrelated.
[Link] Govindarajan
• The principal components are sorted in order of decreasing “significance” or strength. The principal
components essentially serve as a new set of axes for the data, providing important information about variance.
That is, the sorted axes are such that the first axis shows the most variance among the data, the second axis
shows the next highest variance, and so on.

• For example, shows the first two principal components, Y1 and Y2, for the given set of data originally mapped to
the axes X1 and X2. This information helps identify groups or patterns within the data.

• Because the components are sorted in decreasing order of “significance,” the data size can be reduced by
eliminating the weaker components, that is, those with low variance. Using the strongest principal
components, it should be possible to reconstruct a good approximation of the original data.

• PCA can be applied to ordered and unordered attributes. Multidimensional data of more than two dimensions
can be handled by reducing the problem to two dimensions. Principal components may be used as inputs to
multiple regression and cluster analysis. [Link] Govindarajan
Geometrically speaking, principal components
represent the directions of the data that explain
a maximal amount of variance, that is to say, the
lines that capture most information of the data.

The relationship between variance and

information here, is that, the larger the
variance carried by a line, the larger the
dispersion of the data points along it, and the
larger the dispersion along a line, the more
information it has.
[Link] Govindarajan
[Link] Govindarajan
[Link] Govindarajan
Linear Discriminant Analysis
• When working with high-dimensional datasets it is important to apply dimensionality reduction techniques to make data
exploration and modeling more efficient. One such technique is Linear Discriminant Analysis (LDA) which helps in
reducing the dimensionality of data while retaining the most significant features for classification tasks. It works by finding
the linear combinations of features that best separate the classes in the dataset.

[Link] Govindarajan
Core Assumptions of LDA

For LDA to perform effectively certain assumptions are made:

[Link] Distribution: Data within each class should follow a Gaussian distribution.

[Link] Covariance Matrices: Covariance matrices of the different classes should be equal.

[Link] Separability: A linear decision boundary should be sufficient to separate the classes.

[Link] Govindarajan
[Link] Govindarajan
[Link] Govindarajan
[Link] Govindarajan
[Link] Govindarajan

Perceptron and SVM in Machine Learning
No ratings yet
Perceptron and SVM in Machine Learning
24 pages
Support Vector Machine Overview and Implementation
No ratings yet
Support Vector Machine Overview and Implementation
28 pages
Support Vector Machine Overview and Python Guide
No ratings yet
Support Vector Machine Overview and Python Guide
53 pages
This Is
No ratings yet
This Is
7 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
21 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
70 pages
Neural Networks and SVMs Explained
No ratings yet
Neural Networks and SVMs Explained
25 pages
Advanced NLP Techniques: GloVe & SVM
No ratings yet
Advanced NLP Techniques: GloVe & SVM
38 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
29 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
11 pages
Support Vector Machine Overview and Python Guide
No ratings yet
Support Vector Machine Overview and Python Guide
30 pages
Neural Network Algorithms Overview
No ratings yet
Neural Network Algorithms Overview
25 pages
Support Vector Machine in Machine Learning
No ratings yet
Support Vector Machine in Machine Learning
39 pages
SVM and Naive Bayes Classifiers Explained
No ratings yet
SVM and Naive Bayes Classifiers Explained
15 pages
Understanding Support Vector Machines
100% (3)
Understanding Support Vector Machines
22 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
9 pages
Support Vector Machine Overview
No ratings yet
Support Vector Machine Overview
13 pages
Support Vector Machine Concepts Explained
No ratings yet
Support Vector Machine Concepts Explained
11 pages
Support Vector Machines Overview
No ratings yet
Support Vector Machines Overview
27 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
52 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
12 pages
SVM vs Linear Regression Explained
No ratings yet
SVM vs Linear Regression Explained
33 pages
SVM Algorithm Flowchart Overview
No ratings yet
SVM Algorithm Flowchart Overview
50 pages
Understanding Support Vector Machines (SVM)
No ratings yet
Understanding Support Vector Machines (SVM)
139 pages
Support Vector Machine Overview
No ratings yet
Support Vector Machine Overview
45 pages
Machine Learning: SVM and Naive Bayes
No ratings yet
Machine Learning: SVM and Naive Bayes
25 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
30 pages
Linear Discriminant Analysis Explained
No ratings yet
Linear Discriminant Analysis Explained
19 pages
Bayesian Learning and SVM Overview
No ratings yet
Bayesian Learning and SVM Overview
90 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
25 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
18 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
52 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
21 pages
SVM Algorithm: Overview and Examples
No ratings yet
SVM Algorithm: Overview and Examples
17 pages
SVM: Solved Example and Explanation
No ratings yet
SVM: Solved Example and Explanation
67 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
26 pages
Support Vector Machine Overview
No ratings yet
Support Vector Machine Overview
35 pages
Support Vector Machine Overview
No ratings yet
Support Vector Machine Overview
44 pages
Support Vector Machines Overview
No ratings yet
Support Vector Machines Overview
105 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
22 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
30 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
46 pages
Supervised Learning in Machine Learning
No ratings yet
Supervised Learning in Machine Learning
79 pages
Support Vector Machine Overview
No ratings yet
Support Vector Machine Overview
4 pages
Advanced Classification Techniques Overview
No ratings yet
Advanced Classification Techniques Overview
46 pages
Understanding Linear SVM Classifiers
No ratings yet
Understanding Linear SVM Classifiers
14 pages
Overview of Machine Learning Techniques
No ratings yet
Overview of Machine Learning Techniques
17 pages
SVM and RBF Kernel Explained
No ratings yet
SVM and RBF Kernel Explained
13 pages
SVM and ANN: Concepts and Algorithms
No ratings yet
SVM and ANN: Concepts and Algorithms
69 pages
Understanding SVM Algorithm Basics
No ratings yet
Understanding SVM Algorithm Basics
10 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
8 pages
SVM in Machine Learning Explained
No ratings yet
SVM in Machine Learning Explained
52 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
34 pages
SVM and Decision Trees Explained
No ratings yet
SVM and Decision Trees Explained
41 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
13 pages
Introduction to Support Vector Machines
No ratings yet
Introduction to Support Vector Machines
25 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
14 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
9 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
4 pages
Matlab Introduction: Dr. Antonio A. Trani Professor Dept. of Civil and Environmental Engineering
No ratings yet
Matlab Introduction: Dr. Antonio A. Trani Professor Dept. of Civil and Environmental Engineering
76 pages
Google Maps Navigation Tutorial
No ratings yet
Google Maps Navigation Tutorial
42 pages
Enhancing Performance Measurement with Predictive Analytics
No ratings yet
Enhancing Performance Measurement with Predictive Analytics
10 pages
Secondary 3 Math Marking Scheme 2024
No ratings yet
Secondary 3 Math Marking Scheme 2024
7 pages
Customer Database Management Essentials
No ratings yet
Customer Database Management Essentials
13 pages
Career-Banding Terms Dictionary
No ratings yet
Career-Banding Terms Dictionary
6 pages
Foundations of Computer Science C Edition (Aho, Ullman) (1994)
100% (3)
Foundations of Computer Science C Edition (Aho, Ullman) (1994)
885 pages
Sensitivity vs Specificity Analysis
No ratings yet
Sensitivity vs Specificity Analysis
16 pages
E-Foto Stereoplotter Module Guide
No ratings yet
E-Foto Stereoplotter Module Guide
10 pages
Tracking Area Update in Lte
No ratings yet
Tracking Area Update in Lte
10 pages
Soldering Iron Driver - DP
No ratings yet
Soldering Iron Driver - DP
3 pages
Object-Oriented Analysis and Design Overview
100% (1)
Object-Oriented Analysis and Design Overview
281 pages
General Ledger - Journal Import Process
100% (1)
General Ledger - Journal Import Process
20 pages
Uninformed Search Algorithms Overview
No ratings yet
Uninformed Search Algorithms Overview
14 pages
NJ Youtube Ness
No ratings yet
NJ Youtube Ness
15 pages
Digital Marketing Packages
No ratings yet
Digital Marketing Packages
12 pages
SEO Specialist Application Resume
No ratings yet
SEO Specialist Application Resume
2 pages
B.Tech Data Structures Exam Paper 2017
No ratings yet
B.Tech Data Structures Exam Paper 2017
2 pages
TGEAPCET 2025 Saved Options List
No ratings yet
TGEAPCET 2025 Saved Options List
2 pages
ALTOSONIC V12 Modbus Setup Guide
No ratings yet
ALTOSONIC V12 Modbus Setup Guide
28 pages
Mainboard User's Manual
No ratings yet
Mainboard User's Manual
39 pages
Setpointps Configuration Software: Programming Manual
No ratings yet
Setpointps Configuration Software: Programming Manual
98 pages
Social Media Assessment Workbook: Edition
No ratings yet
Social Media Assessment Workbook: Edition
34 pages
Negros Island Girl Scout Camp Details
No ratings yet
Negros Island Girl Scout Camp Details
2 pages
AD3251 Data Structures Question Paper
No ratings yet
AD3251 Data Structures Question Paper
6 pages
פתרונות לבגרות במדעי המחשב 2025
No ratings yet
פתרונות לבגרות במדעי המחשב 2025
15 pages
Activation Codes for Various Apps
No ratings yet
Activation Codes for Various Apps
1 page
As 2700-2011 Colour Standards For General Purposes
0% (1)
As 2700-2011 Colour Standards For General Purposes
8 pages
Hadoop for Crime Analysis in India
No ratings yet
Hadoop for Crime Analysis in India
22 pages

Support Vector Machine Overview

Uploaded by

Support Vector Machine Overview

Uploaded by

Support Vector Machine Algorithm

•Effective in high-dimensional cases.

5. Convergence & Generalization

• Likelihood Ratio Test (LR Test):

• Surrogate Data Method:

•Captures more complex relationships in data

•Useful when a simple straight line does not fit well

•Higher risk of overfitting if not regularized properly

•More computational complexity

•Requires careful feature selection or kernel methods

Vanishing/Exploding Gradients: Can occur in deep networks, making training difficult.

Bayesian networks are used in various machine learning applications, including:

•Classification: Predicting the class of an instance based on its features.

•Diagnosis: Identifying the cause of a problem or disease based on symptoms.

•Risk Assessment: Assessing the likelihood of an event occurring.

•Document Classification: Categorizing documents based on their content.

•Image Processing: Analyzing and understanding images.

•Gene Regulatory Network: Modeling the interactions between genes.

Why Subspace Clustering?

• Spectral Clustering-based Methods:

Subspace clustering has a wide range of applications, including:

Benefits of Subspace Clustering:

Disadvantages of Dimensionality Reduction

Why Compression Matters in ML:

•Image Classification: Compress images before feeding them to CNNs.

The basic procedure is as follows:

The relationship between variance and

For LDA to perform effectively certain assumptions are made:

You might also like