0% found this document useful (0 votes)
17 views29 pages

Understanding Support Vector Machines

Support Vector Machine (SVM) is a supervised learning algorithm used for classification and regression, known for its effectiveness with complex datasets and high-dimensional spaces. It works by finding the optimal hyperplane that maximizes the margin between classes, utilizing techniques like kernel tricks for non-linear data. SVM has advantages such as robustness to overfitting and versatility with different kernels, but it can be slow with large datasets and requires careful tuning of parameters.

Uploaded by

hemaldesai3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views29 pages

Understanding Support Vector Machines

Support Vector Machine (SVM) is a supervised learning algorithm used for classification and regression, known for its effectiveness with complex datasets and high-dimensional spaces. It works by finding the optimal hyperplane that maximizes the margin between classes, utilizing techniques like kernel tricks for non-linear data. SVM has advantages such as robustness to overfitting and versatility with different kernels, but it can be slow with large datasets and requires careful tuning of parameters.

Uploaded by

hemaldesai3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Support Vector

Machine (SVM)
SVM (Support Vector Machine)
• SVM (Support Vector Machine) is a supervised algorithm, effective for
both regression and classification, though it excels in classification
tasks.
• Popular since the 1990s, it performs well on smaller or complex
datasets with minimal tuning
What is a Support Vector
Machine(SVM)?
• A Support Vector Machine (SVM) is a machine learning algorithm
used for classification and regression.
• This finds the best line (or hyperplane) to separate data into groups,
• It can handle nonlinear data using kernels to transform it into higher
dimensions.
Types of Support Vector
Machine (SVM) Algorithms
• Linear SVM: When the data is perfectly linearly separable only
then we can use Linear SVM. Perfectly linearly separable means
that the data points can be classified into 2 classes by using a
single straight line(if 2D).
• Non-Linear SVM: When the data is not linearly separable, we can
use Non-Linear SVM. This happens when the data points cannot
be separated into two classes using a straight line (if 2D). In such
cases, we use advanced techniques like kernel tricks to classify
them.
Support Vector Machine (SVM)
• Maximizes the Margin:
• SVM focuses on finding the decision boundary that maximizes the margin (the distance
between the boundary and the closest data points of each class). This makes it more
robust to new data.
• Handles Non-Linear Data:
• SVM can handle non-linear data using the “kernel trick,” which transforms the data into a
higher-dimensional space where it becomes easier to separate.
• Effective in High Dimensions:
• SVM works well even when the number of features (dimensions) is much larger than the
number of samples, making it suitable for complex datasets.
• Robust to Overfitting:
• By focusing on the points closest to the boundary (support vectors), SVM is less likely to
overfit, especially in smaller datasets.
• Requires Tuning:
• SVM requires careful tuning of parameters (like the choice of kernel and regularization)
to achieve optimal performance, which can be time-consuming
Applications of support vector
machine:
• Face detection,
• Intrusion detection,
• Classification of emails, news articles and web pages,
• Classification of genes, and
• Handwriting recognition.
How Does Support Vector
Machine Algorithm Work?
• SVM is defined such that it is defined in terms of the support vectors
only, we don’t have to worry about other observations since the
margin is made using the points which are closest to the hyperplane
(support vectors), whereas in logistic regression the classifier is
defined over all the points.
• Hence SVM enjoys some natural speed-ups.
Working of SVM
• The best hyperplane is that plane that has the maximum distance
from both the classes, and this is the main aim of SVM. This is done
by finding different hyperplanes which classify the labels in the best
way then it will choose the one which is farthest from the data points
or the one which has a maximum margin.
Advantages of Support Vector
Machine
1. Works well with complex data: SVM is great for datasets where the
separation between categories is not clear. It can handle both linear
and non-linear data effectively.
2. Effective in high-dimensional spaces: SVM performs well even when
there are more features (dimensions) than samples, making it
useful for tasks like text classification or image recognition.
3. Avoids overfitting: SVM focuses on finding the best decision
boundary (margin) between classes, which helps in reducing the
risk of overfitting, especially in high-dimensional data.
Advantages of Support Vector
Machine
4. Versatile with kernels: By using different kernel functions (like linear,
polynomial, or radial basis function), SVM can adapt to various
types of data and solve complex problems.
5. Robust to outliers: SVM is less affected by outliers because it
focuses on the support vectors (data points closest to the margin),
which helps in creating a more generalized model.
Disadvantages of Support
Vector Machine
1. Slow with large datasets: SVM can be computationally expensive and
slow to train, especially when the dataset is very large.
2. Difficult to tune: Choosing the right kernel and parameters (like C and
gamma) can be tricky and often requires a lot of trial and error.
3. Not suitable for noisy data: If the dataset has too many overlapping
classes or noise, SVM may struggle to perform well because it tries to
find a perfect separation.
4. Hard to interpret: Unlike some other algorithms, SVM models are not
easy to interpret or explain, especially when using non-linear kernels.
5. Memory-intensive: SVM requires storing the support vectors, which can
take up a lot of memory, making it less efficient for very large datasets.
Margin in Support Vector
Machine
• We all know the equation of a hyperplane is w.x+b=0 where w is a
vector normal to hyperplane and b is an offset.
• To classify a point as negative or positive we need to define a decision
rule. We can define decision rule as:
Soft Vs Hard Margin
• In Support Vector Machines (SVM), a hard margin aims to perfectly
separate data into distinct classes with a hyperplane that maximizes
the margin, while a soft margin allows for some misclassifications by
introducing slack variables to handle non-linearly separable data or
outliers.
• Parameter C:
• A hyperparameter (C) controls the trade-off between maximizing the
margin and minimizing the misclassifications. A larger C means a
narrower margin and fewer misclassifications, while a smaller C
means a wider margin but more misclassifications
Soft Vs Hard Margin

Feature Hard Margin SVM Soft Margin SVM

Linearly separable and non-separable


Separability Linearly separable data only
data

Misclassifications No misclassifications allowed Some misclassifications are allowed

Slack Variables Not used Used to handle misclassifications

Controls the trade-off between margin


Parameter C Not applicable
and misclassifications
Kernels in Support Vector Machine
• The most interesting feature of SVM is that it can even work with a
non-linear dataset and for this, we use “Kernel Trick” which makes it
easier to classifies the points.

• Here we see we cannot draw a single line or say hyperplane which can
classify the points correctly.
Kernels
• We do is try converting this lower dimension space to a higher
dimension space using some quadratic functions which will allow us
to find a decision boundary that clearly divides the data points.
• These functions which help us do this are called Kernels and which
kernel to use is purely determined by hyperparameter tuning.
Different Kernel Functions
1. Polynomial Kernel
2. Sigmoid Kernel
3. RBF Kernel
4. Bessel function kernel
5. Anova Kernel
Polynomial Kernel
• Following is the formula for the polynomial kernel:

• Here d is the degree of the polynomial, which we need to specify


manually.
• Suppose we have two features X1 and X2 and output variable as Y, so
using polynomial kernel we can write it as:
Polynomial Kernel
• So we basically need to find X12 , X22 and X1.X2, and now we can see
that 2 dimensions got converted into 5 dimensions.
Sigmoid Kernel
• We can use it as the proxy for neural networks. Equation is:
RBF Kernel
• What it actually does is to create non-linear combinations of our
features to lift your samples onto a higher-dimensional feature space
where we can use a linear decision boundary to separate your classes It
is the most used kernel in SVM classifications, the following formula
explains it mathematically:

• where,
• ‘σ’ is the variance and our hyperparameter
• ||X₁ – X₂|| is the Euclidean Distance between two points X₁ and X₂
Bessel function kernel
Anova Kernel
• It performs well on multidimensional regression problems.
How to Choose the Right Kernel?
• It is necessary to choose a good kernel function because the
performance of the model depends on it.
Here are the Points to choose the right Kernal:
• Kernel selection depends on the dataset type.
• For linearly separable data, use a linear kernel:
• It is simple and has lower complexity compared to other kernels.
• Start by assuming your data is linearly separable and try the linear kernel first.
• Move to more complex kernels if needed.
How to Choose the Right Kernel?
• Commonly used kernels:
• Linear and RBF (Radial Basis Function) are widely used.
• Polynomial kernels are rarely used due to poor efficiency.
• If both linear and RBF kernels give similar results:
• Choose the simpler option, which is the linear kernel.
Here are some points you should go
through:
• The complexity of the RBF kernel increases with the size of the
training data.
• Preparing the RBF kernel is computationally expensive.
• The kernel matrix must be stored and maintained, which requires
additional memory.
• Projection into the “infinite” higher-dimensional space (where data
becomes linearly separable) is costly, especially during prediction.
• Using a linear kernel on a non-linear dataset results in very low
accuracy and is not suitable.
Grid Search for Hyperparameter
Tuning
• Grid Search is an optimization method to find the best parameters for the SVM
model. It systematically evaluates the combination of hyperparameters, such as
C and gamma, and selects the best-performing combination based on cross-
validation.
• Parameters to Optimize
• C: Regularization parameter controlling the trade-off between maximizing the
margin and minimizing the classification error.
• High C: Low bias and high variance (overfitting).
• Low C: High bias and low variance (underfitting).
• Gamma (γ): Defines how far the influence of a single training example reaches. A
low value of gamma results in a smoother decision boundary, while a high value
causes the boundary to be more complex and fit the data tightly.

You might also like