0% found this document useful (0 votes)
35 views6 pages

Understanding Support Vector Machines

Support Vector Machine (SVM) is a versatile supervised machine learning algorithm used for classification and regression, focusing on finding the optimal hyperplane that maximizes the margin between classes. It can handle both linear and nonlinear data through the use of kernel functions, making it robust to outliers. Additionally, the document discusses Bayesian classification based on Bayes' Theorem and various clustering methods, including partitioning, hierarchical, density-based, grid-based, and model-based approaches.

Uploaded by

andraprudhvi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views6 pages

Understanding Support Vector Machines

Support Vector Machine (SVM) is a versatile supervised machine learning algorithm used for classification and regression, focusing on finding the optimal hyperplane that maximizes the margin between classes. It can handle both linear and nonlinear data through the use of kernel functions, making it robust to outliers. Additionally, the document discusses Bayesian classification based on Bayes' Theorem and various clustering methods, including partitioning, hierarchical, density-based, grid-based, and model-based approaches.

Uploaded by

andraprudhvi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

2)Support Vector Machine (SVM) is a powerful machine learning algorithm used for

linear or nonlinear classification, regression, and even outlier detection tasks. SVMs can be
used for a variety of tasks, such as text classification, image classification, spam
detection, handwriting identification, gene expression analysis, face detection, and anomaly
detection. SVMs are adaptable and efficient in a variety of applications because they can
manage high-dimensional data and nonlinear relationships.
SVM algorithms are very effective as we try to find the maximum separating hyperplane
between the different classes available in the target feature.
Support Vector Machine
Support Vector Machine (SVM) is a supervised machine learning algorithm used for both
classification and regression. Though we say regression problems as well it’s best suited for
classification. The main objective of the SVM algorithm is to find the
optimal hyperplane in an N-dimensional space that can separate the data points in different
classes in the feature space. The hyperplane tries that the margin between the closest points
of different classes should be as maximum as possible. The dimension of the hyperplane
depends upon the number of features. If the number of input features is two, then the
hyperplane is just a line. If the number of input features is three, then the hyperplane
becomes a 2-D plane. It becomes difficult to imagine when the number of features exceeds
three.
Let’s consider two independent variables x1, x2, and one dependent variable which is either
a blue circle or a red circle.

Linearly Separable Data points

From the figure above it’s very clear that there are multiple lines (our hyperplane here is a
line because we are considering only two input features x1, x2) that segregate our data
points or do a classification between red and blue circles. So how do we choose the best
line or in general the best hyperplane that segregates our data points?
How does SVM work?
One reasonable choice as the best hyperplane is the one that represents the largest
separation or margin between the two classes.
Multiple hyperplanes separate the data from two classes

So we choose the hyperplane whose distance from it to the nearest data point on each side
is maximized. If such a hyperplane exists it is known as the maximum-margin
hyperplane/hard margin. So from the above figure, we choose L2. Let’s consider a
scenario like shown below

Here we have one blue ball in the boundary of the red ball. So how does SVM classify the
data? It’s simple! The blue ball in the boundary of red ones is an outlier of blue balls. The
SVM algorithm has the characteristics to ignore the outlier and finds the best hyperplane
that maximizes the margin. SVM is robust to outliers.

Support Vector Machine Terminology


1. Hyperplane: Hyperplane is the decision boundary that is used to separate the data
points of different classes in a feature space. In the case of linear classifications, it will
be a linear equation i.e. wx+b = 0.
2. Support Vectors: Support vectors are the closest data points to the hyperplane, which
makes a critical role in deciding the hyperplane and margin.
3. Margin: Margin is the distance between the support vector and hyperplane. The main
objective of the support vector machine algorithm is to maximize the margin. The
wider margin indicates better classification performance.
4. Kernel: Kernel is the mathematical function, which is used in SVM to map the original
input data points into high-dimensional feature spaces, so, that the hyperplane can be
easily found out even if the data points are not linearly separable in the original input
space. Some of the common kernel functions are linear, polynomial, radial basis
function(RBF), and sigmoid.
5. Hard Margin: The maximum-margin hyperplane or the hard margin hyperplane is a
hyperplane that properly separates the data points of different categories without any
misclassifications.
6. Soft Margin: When the data is not perfectly separable or contains outliers, SVM
permits a soft margin technique. Each data point has a slack variable introduced by the
soft-margin SVM formulation, which softens the strict margin requirement and permits
certain misclassifications or violations. It discovers a compromise between increasing
the margin and reducing violations.
7. C: Margin maximisation and misclassification fines are balanced by the regularisation
parameter C in SVM. The penalty for going over the margin or misclassifying data
items is decided by it. A stricter penalty is imposed with a greater value of C, which
results in a smaller margin and perhaps fewer misclassifications.
8. Hinge Loss: A typical loss function in SVMs is hinge loss. It punishes incorrect
classifications or margin violations. The objective function in SVM is frequently
formed by combining it with the regularisation term.
9. Dual Problem: A dual Problem of the optimisation problem that requires locating the
Lagrange multipliers related to the support vectors can be used to solve SVM. The dual
formulation enables the use of kernel tricks and more effective computing.
Types of Support Vector Machine
Based on the nature of the decision boundary, Support Vector Machines (SVM) can be
divided into two main parts:
 Linear SVM: Linear SVMs use a linear decision boundary to separate the data points
of different classes. When the data can be precisely linearly separated, linear SVMs are
very suitable. This means that a single straight line (in 2D) or a hyperplane (in higher
dimensions) can entirely divide the data points into their respective classes. A
hyperplane that maximizes the margin between the classes is the decision boundary.
 Non-Linear SVM: Non-Linear SVM can be used to classify data when it cannot be
separated into two classes by a straight line (in the case of 2D). By using kernel
functions, nonlinear SVMs can handle nonlinearly separable data. The original input
data is transformed by these kernel functions into a higher-dimensional feature space,
where the data points can be linearly separated. A linear SVM is used to locate a
nonlinear decision boundary in this modified space.

3) Bayesian classification

Bayesian classification is based on Bayes' Theorem. Bayesian classifiers


are the statistical classifiers. Bayesian classifiers can predict class
membership probabilities such as the probability that a given tuple
belongs to a particular class.

Baye's Theorem
Bayes' Theorem is named after Thomas Bayes. There are two types of
probabilities −

 Posterior Probability [P(H/X)]


 Prior Probability [P(H)]

where X is data tuple and H is some hypothesis.

According to Bayes' Theorem,

P(H/X)= P(X/H)P(H) / P(X)


Bayesian Belief Network
Bayesian Belief Networks specify joint conditional probability distributions.
They are also known as Belief Networks, Bayesian Networks, or
Probabilistic Networks.

 A Belief Network allows class conditional independencies to be


defined between subsets of variables.
 It provides a graphical model of causal relationship on which
learning can be performed.
 We can use a trained Bayesian Network for classification.

There are two components that define a Bayesian Belief Network −

 Directed acyclic graph


 A set of conditional probability tables

Explore our latest online courses and learn new skills at your own pace.
Enroll and become a certified expert to boost your career.

Directed Acyclic Graph


 Each node in a directed acyclic graph represents a random variable.
 These variable may be discrete or continuous valued.
 These variables may correspond to the actual attribute given in the
data.

Directed Acyclic Graph Representation


The following diagram shows a directed acyclic graph for six Boolean
variables.
The arc in the diagram allows representation of causal knowledge. For
example, lung cancer is influenced by a person's family history of lung
cancer, as well as whether or not the person is a smoker. It is worth noting
that the variable PositiveXray is independent of whether the patient has a
family history of lung cancer or that the patient is a smoker, given that we
know the patient has lung cancer.

Conditional Probability Table


The conditional probability table for the values of the variable LungCancer
(LC) showing each possible combination of the values of its parent nodes,
FamilyHistory (FH), and Smoker (S) is as follows −

4) Cluster Analysis: The process of grouping a set of physical or abstract objects into classes of
similar objects is called clustering. A cluster is a collection of data objects that are similar to one
another within the same cluster and are dissimilar to the objects in other clusters. A cluster of data
objects can be treated collectively as one group and so may be considered as a form of data
compression. Cluster analysis tools based on k-means, k-medoids, and several methods have also
been built into many statisticalanalysis software packages or systems, such as S-Plus, SPSS, and SAS

Major Clustering Methods:

 Partitioning Methods

 Hierarchical Methods

Density-Based Methods

 Grid-Based Methods

 Model-Based Methods

4.2.1 Partitioning Methods: A partitioning method constructs k partitions of the data, where each
partition represents a cluster and k <= n. That is, it classifies the data into k groups, which together
satisfy the following requirements: Each group must contain at least one object, and Each object
must belong to exactly one group. A partitioning method creates an initial partitioning. It then uses
an iterative relocation technique that attempts to improve the partitioning by moving objects from
one group to another. The general criterion of a good partitioning is that objects in the same cluster
are close or related to each other, whereas objects of different clusters are far apart or very
different.
4.2.2 Hierarchical Methods: A hierarchical method creates a hierarchical decomposition ofthe given
set of data objects. A hierarchical method can be classified as being eitheragglomerative or divisive,
based on howthe hierarchical decomposition is formed.  Theagglomerative approach, also called
the bottom-up approach, starts with each objectforming a separate group. It successively merges the
objects or groups that are closeto one another, until all of the groups are merged into one or until a
termination condition holds.  The divisive approach, also calledthe top-down approach, starts with
all of the objects in the same cluster. In each successiveiteration, a cluster is split up into smaller
clusters, until eventually each objectis in one cluster, or until a termination condition holds.
Hierarchical methods suffer fromthe fact that once a step (merge or split) is done,it can never be
undone. This rigidity is useful in that it leads to smaller computationcosts by not having toworry
about a combinatorial number of different choices. There are two approachesto improving the
quality of hierarchical clustering:  Perform careful analysis ofobject ―linkages‖ at each hierarchical
partitioning, such as in Chameleon, or  Integratehierarchical agglomeration and other approaches
by first using a hierarchicalagglomerative algorithm to group objects into microclusters, and then
performingmacroclustering on the microclusters using another clustering method such as iterative
relocation.

4.2.3 Density-based methods:  Most partitioning methods cluster objects based on the distance
between objects. Such methods can find only spherical-shaped clusters and encounter difficulty at
discovering clusters of arbitrary shapes.  Other clustering methods have been developed based on
the notion of density. Their general idea is to continue growing the given cluster as long as the
density in the neighborhood exceeds some threshold; that is, for each data point within a given
cluster, the neighborhood of a given radius has to contain at least a minimum number of points.
Such a method can be used to filter out noise (outliers)and discover clusters of arbitrary shape. 
DBSCAN and its extension, OPTICS, are typical density-based methods that growclusters according to
a density-based connectivity analysis. DENCLUE is a methodthat clusters objects based on the
analysis of the value distributions of density functions.

4.2.4 Grid-Based Methods:  Grid-based methods quantize the object space into a finite number of
cells that form a grid structure.  All of the clustering operations are performed on the grid structure
i.e., on the quantized space. The main advantage of this approach is its fast processing time, which is
typically independent of the number of data objects and dependent only on the number of cells in
each dimension in the quantized space.  STING is a typical example of a grid-based method. Wave
Cluster applies wavelet transformation for clustering analysis and is both grid-based and density-
based.

4.2.5 Model-Based Methods:  Model-based methods hypothesize a model for each of the clusters
and find the best fit of the data to the given model.  A model-based algorithm may locate clusters
by constructing a density function that reflects the spatial distribution of the data points.  It also
leads to a way of automatically determining the number of clusters based on standard statistics,
taking ―noise‖ or outliers into account and thus yielding robust clustering method

You might also like