Dr K N MADHAVI LATHA
ASSOCIATE PROFESSOR
DEPT OF CSE
SIRCRRCOE
MACHINE LEARNING UNIT 3 (PART A)
SYLLABUS:
Support Vector Machine:
Linear SVM Classification, Nonlinear SVM Classification SVM
Regression, Naïve Bayes Classifiers.
Support Vector Machine or SVM:
Support Vector Machine or SVM is one of the most popular Supervised Learning
algorithms, which is used for Classification as well as Regression problems. However,
primarily, it is used for Classification problems in Machine Learning.
The goal of the SVM algorithm is to create the best line or decision boundary that can
segregate n-dimensional space into classes so that we can easily put the new data point
in the correct category in the future. This best decision boundary is called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane. These
extreme cases are called as support vectors, and hence algorithm is termed as Support
Vector Machine. Consider the below diagram in which there are two different categories
that are classified using a decision boundary or hyperplane:
SVM algorithm can be used for Face detection, image classification, text
categorization, etc.
Types of SVM:
SVM can be of two types:
Dr K N MADHAVI LATHA
ASSOCIATE PROFESSOR
DEPT OF CSE
SIRCRRCOE
➢ Linear SVM: Linear SVM is used for linearly separable data, which means if a
dataset can be classified into two classes by using a single straight line, then such
data is termed as linearly separable data, and classifier is used called as Linear
SVM classifier.
➢ Non-linear SVM: Non-Linear SVM is used for non-linearly separated data,
which means if a dataset cannot be classified by using a straight line, then such
data is termed as non-linear data and classifier used is called as Non-linear SVM
classifier.
Hyperplane and Support Vectors in the SVM algorithm:
Hyperplane: There can be multiple lines/decision boundaries to segregate the classes in
n-dimensional space, but we need to find out the best decision boundary that helps to
classify the data points. This best boundary is known as the hyperplane of SVM.
The dimensions of the hyperplane depend on the features present in the dataset, which
means if there are 2 features (as shown in image), then hyperplane will be a straight line.
And if there are 3 features, then hyperplane will be a 2-dimension plane.
We always create a hyperplane that has a maximum margin, which means the maximum
distance between the data points.
Support Vectors:
The data points or vectors that are the closest to the hyperplane and which affect the
position of the hyperplane are termed as Support Vector. Since these vectors support the
hyperplane, hence called a Support vector.
Linear SVM Classification:
The working of the SVM algorithm can be understood by using an example. Suppose
we have a dataset that has two tags (green and blue), and the dataset has two features x1
and x2. We want a classifier that can classify the pair(x1, x2) of coordinates in either
green or blue. Consider the below image:
Dr K N MADHAVI LATHA
ASSOCIATE PROFESSOR
DEPT OF CSE
SIRCRRCOE
So as it is 2-d space so by just using a straight line, we can easily separate these two
classes. But there can be multiple lines that can separate these classes. Consider the
below image:
Hence, the SVM algorithm helps to find the best line or decision boundary; this best
boundary or region is called as a hyperplane. SVM algorithm finds the closest point of
the lines from both the classes. These points are called support vectors. The distance
between the vectors and the hyperplane is called as margin. And the goal of SVM is to
maximize this margin. The hyperplane with maximum margin is called the optimal
hyperplane.
Dr K N MADHAVI LATHA
ASSOCIATE PROFESSOR
DEPT OF CSE
SIRCRRCOE
Dr K N MADHAVI LATHA
ASSOCIATE PROFESSOR
DEPT OF CSE
SIRCRRCOE
Dr K N MADHAVI LATHA
ASSOCIATE PROFESSOR
DEPT OF CSE
SIRCRRCOE
Dr K N MADHAVI LATHA
ASSOCIATE PROFESSOR
DEPT OF CSE
SIRCRRCOE
Nonlinear SVM Classification:
If data is linearly arranged, then we can separate it by using a straight line, but for non-
linear data, we cannot draw a single straight line. Consider the below image:
So to separate these data points, we need to add one more dimension. For linear data, we
have used two dimensions x and y, so for non-linear data, we will add a third dimension
z. It can be calculated as:
z=x2 +y2
By adding the third dimension, the sample space will become as below image:
So now, SVM will divide the datasets into classes in the following way. Consider the
below image:
Dr K N MADHAVI LATHA
ASSOCIATE PROFESSOR
DEPT OF CSE
SIRCRRCOE
Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis. If we
convert it in 2d space with z=1, then it will become as:
Hence we get a circumference of radius 1 in case of non-linear data.
Dr K N MADHAVI LATHA
ASSOCIATE PROFESSOR
DEPT OF CSE
SIRCRRCOE
1
Dr K N MADHAVI LATHA
ASSOCIATE PROFESSOR
DEPT OF CSE
SIRCRRCOE
Dr K N MADHAVI LATHA
ASSOCIATE PROFESSOR
DEPT OF CSE
SIRCRRCOE
Dr K N MADHAVI LATHA
ASSOCIATE PROFESSOR
DEPT OF CSE
SIRCRRCOE
SVM Regression:
The SVM algorithm is quite versatile: not only does it support linear and nonlinear
classification, but it also supports linear and nonlinear regression. The trick is to reverse
the objective: instead of trying to fit the largest possible street between two classes while
limiting margin violations, SVM Regression tries to fit as many instances as possible on
the street while limiting margin violations (i.e., instances off the street). The width of the
street is controlled by a hyperparameter ϵ. Figure 1. shows two linear SVM Regression
models trained on some random linear data, one with a large margin (ϵ = 1.5) and the
other with a small margin (ϵ = 0.5).
Figure 1: SVM Regression
Adding more training instances within the margin does not affect the model’s
predictions; thus, the model is said to be ϵ-insensitive.
We can use Scikit-Learn’s LinearSVR class to perform linear SVM Regression. The
following code produces the model represented on the left of Figure 1 (the training data
should be scaled and centered first):
from [Link] import LinearSVR
svm_reg = LinearSVR(epsilon=1.5)
svm_reg.fit(X, y)
To tackle nonlinear regression tasks, you can use a kernelized SVM model. For
example, Figure 2 shows SVM Regression on a random quadratic training set, using a
2nd-degree polynomial kernel. There is little regularization on the left plot (i.e., a large
C value), and much more regularization on the right plot (i.e., a small C value).
Dr K N MADHAVI LATHA
ASSOCIATE PROFESSOR
DEPT OF CSE
SIRCRRCOE
Figure 2. SVM regression using a 2nd-degree polynomial kernel
Naive Bayes Classifiers:
Naive Bayes algorithm is a supervised learning algorithm, which is based on Bayes
theorem and used for solving classification problems. It is mainly used in text
classification that includes a high-dimensional training dataset. Naïve Bayes Classifier is
one of the simple and most effective Classification algorithms which help in building
the fast machine learning models that can make quick predictions. It is a probabilistic
classifier, which means it predicts on the basis of the probability of an object. Some
popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental analysis,
and classifying articles.
The Naive Bayes algorithm is comprised of two words Naïve and Bayes, Which can be
described as:
Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is
independent of the occurrence of other features. Such as if the fruit is identified on the
bases of color, shape, and taste, then red, spherical, and sweet fruit is recognized as an
apple. Hence each feature individually contributes to identify that it is an apple without
depending on each other.
Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.
Bayes' Theorem:
Dr K N MADHAVI LATHA
ASSOCIATE PROFESSOR
DEPT OF CSE
SIRCRRCOE
➢ Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to
determine the probability of a hypothesis with prior knowledge. It depends on
the conditional probability.
➢ The formula for Bayes' theorem is given as:
Where,
P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.
P(B|A) is Likelihood probability: Probability of the evidence given that the probability
of a hypothesis is true.
P(A) is Prior Probability: Probability of hypothesis before observing the evidence.
P(B) is Marginal Probability: Probability of Evidence.
To solve a problem, we need to follow the below steps:
1. Convert the given dataset into frequency tables.
2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.
Example:
Suppose we have a dataset of weather conditions and corresponding target variable
"Play". So using this dataset we need to decide that whether we should play or not on a
particular day according to the weather conditions.
Dr K N MADHAVI LATHA
ASSOCIATE PROFESSOR
DEPT OF CSE
SIRCRRCOE
To normalize the values,
VNB (Yes) = VNB (Yes) / (VNB (Yes) + VNB (Yes)) = 0.0053 / (0.0053 + 0.0206) = 0.205
VNB (No) = VNB (No) / (VNB (Yes) + VNB (Yes)) = 0.0206 / (0.0053 + 0.0206) = 0.795
Naive Bayes Classifiers or Probabilistic classifier predicts the output on the basis of
the probability of an object.
Predicted Output: Play Tennis = NO