0% found this document useful (0 votes)
19 views42 pages

Supervised Learning in Machine Learning

Uploaded by

obsndine1498
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views42 pages

Supervised Learning in Machine Learning

Uploaded by

obsndine1498
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Machine Learning:

Supervised Learning

Department of Computer
Science
Haramaya University

2025
Topics
• Introduction to Supervised Learning

• Classification
o Decision tree
o Bayes classification
o SVM
o KNN
o ANN
• Regression-continuous value prediction

2
Supervised Learning
• A supervised scenario is characterized by the concept of
a teacher or supervisor, whose main task is to provide
the agent with a precise measure of its error (directly
comparable with output values)
• The goal is to infer a function or mapping from training
data that is labeled.
• The training data consist of input vector X and output
vector Y of labels or tags.
• Based the training set, the algorithms generalize to
respond correctly to all possible inputs i.e. it is called
learning from Examples.

3
Supervised Learning
• A data set denoted in the form of
o Where the inputs are , the outputs are and i=1 to N, N is the
number of observation.
• Generalization: the algorithms should produce sensible
output for the inputs that were not encountered during
learning.
Supervised Learning categorized into two:
o Classification: data is classified into one of two or more classes
o Regression: a task of predicting continuous quantity.

4
Classification…
Classification:
o It is a supervised learning model as the classifier already has a
set of classified examples and from these examples, the
classifier learns to assign unseen new examples.
• Example:
o assigning a given email into Spam or non-spam category
o eye color classification into: blue, brown, or green
• widely-used classifiers: decision tree, support vector
machine, naïve Bayes, neural network, K-nearest
neighbors etc.

5
6
Decision Tree(DT)
• Decision tree (DT) is a statistical model that
builds classification models in the form of
tree structure.
• This model classifies data in a dataset by
flowing through a query structure from the
root until it reaches the leaf, which
represents one class.
• The root represents the attribute that plays
a main role in classification, and the leaf
represents the class.
o Given an input, at each node, a test is applied
and one of the branches is taken depending on
the outcome.
7
Decision Tree…
Classify whether a person will buy a laptop.
Since:
Buys
Age Income • “High” income tends to lead to Yes
Laptop?
Young High No (Middle-aged case),
• and “Low” income tends to lead to Yes
Young Low Yes
(Young case),
Middle High Yes • while only “High income” in “Young”
Old Medium ? case gave No,
we can infer that medium income
Age?
generally leans toward Yes.
/ | \
Young Middle Old • Predicted classification:
/ \ | |
Old, Medium Income → Buys
High Low High Medium
No Yes Yes Yes Laptop = Yes

8
Decision Tree…
• DT learning is supervised, because it constructs DT from
class-labeled training tuples.
• Two DT algorithms are ID3 (Iterative Dichotomiser 3)and C4.5
(Successor of ID3)
• The statistical measure used to select attribute (that best
splits the dataset in terms of given classes) are
information gain and gain ratio.
• Both measures have a close relationship with another
concept called entropy.

9
Decision Tree…
• Entropy: Measures uncertainty or randomness in the
dataset. Lower entropy means more pure (better)
classification.
• Information Gain: Measures how much an attribute
reduces entropy. Higher information gain means a better
attribute for splitting. (used in ID3, Iterative
Dichotomiser 3)
• Gain Ratio: An improved version of information gain that
adjusts for attribute bias (C4.5 ,Successor of ID3)
• Gini Index: Measures impurity in a dataset. Lower Gini
Index means a better attribute for classification. Used in
CART (Classification and Regression Trees)
10
Algorithm for decision tree learning
• Basic algorithm (a greedy divide-and-conquer algorithm)
o Assume attributes are categorical (continuous attributes can be handled
too)
o Tree is constructed in a top-down recursive manner/no backtracking
o At start, all the training examples are at the root
o Examples are partitioned recursively based on selected attributes
o Attributes are selected on the basis of an impurity function (e.g.,
information gain)
• Conditions for stopping partitioning
o All examples for a given node belong to the same class
o There are no remaining attributes for further partitioning – majority
class is the leaf
o There are no examples left

11
Choose an attribute to partition data
• The key to building a decision tree :
1. choose an attribute from the dataset
2. Calculate the significance of attributes in splitting data
3. Split data based on the value of the best attribute
4. Go to step 1
• The objective is to reduce impurity or uncertainty in data as much as possible.
o A subset of data is pure if all instances belong to the same class.
• The best attribute is selected for splitting the training examples using a
Goodness function- mathematical function used to evaluate how good a split is.
• The best attribute:
 separates the classes of the training examples faster, and
 the tree is smallest

12
Decision Tree…

Entropy: is a metric used to measure the impurity of the data


• It is a degree of randomness in the data
• it is measure of impurity or uncertainty of data.
• The higher the entropy the more the information content

Where
• i is simply the identifier for each class in the dataset.
• m= is the number of class labels
• is the nonzero probability that an arbitrary tuple in D belongs to class and
estimated by pi= ,
|ci,=means the number of tuples (data points) that belong to class 𝑐𝑖 in the
• =total number of tuples in the dataset.

dataset 𝐷

.
Decision Tree…

• Example

• Entropy ≈ 0.97 means the data are mixed (not completely pure).

.
Decision Tree…
Entropy: measures the level of disorder or uncertainty in a
given dataset or system

15
Decision Tree…
Entropy:

16
Decision Tree…
Entropy:

17
Decision Tree
• Suppose a set of D containing a total of N examples
which are positive and are negative outcomes and the
entropy is given by:

• Some useful property of the entropy:


o D(

 means that all the examples are in the same class


o means that half the examples in S are of one class and half in
the opposite class.

18
Decision tree…
Information Gain:

19
Decision tree…
Information Gain
• We want to determine which attribute in a given set of
training feature vectors is most useful for discriminating
between the classes to be learned.
• Information gain tells us how important a given attribute
of the feature vectors is.
• We use it to decide the ordering of attributes in the
nodes of a decision tree
Information Gain = Entropy before splitting -
Entropy after splitting

20
Decision tree…
Information Gain:
•Selects the attribute with the highest information gain,
that create small average disorder:
•The attribute with the highest information gain is
considered as the best.
First, compute the disorder using Entropy; the
expected information needed to classify objects into
classes
Second, measure the Information Gain to calculate by
how much the disorder of a set would reduce by
knowing the value of a particular attribute.

21
Decision tree…
Information Gain

22
Entropy

23
Entropy

24
Entropy

25
Entropy

26
Entropy

27
Entropy

28
29
30
Gain Ratio

 Gain Ratio or Uncertainty Coefficient is used to normalize the


information gain of an attribute against how much entropy that
attribute has. Formula of gain ratio is given by:

Gain Ratio=Information Gain/Entropy


 From the above formula, it can be stated that if entropy is very small, then the
gain ratio will be high and vice versa.
 First, determine the information gain of all the attributes, and then
compute the average information gain.
 Second, calculate the gain ratio of all the attributes whose calculated
information gain is larger or equal to the computed average information
gain, and then pick the attribute of higher gain ratio to split.

31
Gini Index
 The Gini index can also be used for feature selection.
 The tree chooses the feature that minimizes the Gini impurity
index. The higher value of the Gini Index indicates the impurity
is higher. Both Gini Index and Gini Impurity are used
interchangeably
 It performs only binary split. For categorical variables, it gives
the results in terms of “success” or “failure”
 Gini Index can be calculated from the below mathematical
formula where c is the number of classes and pi is the probability
associated with the ith class.

32
Example-Decision tree
The problem of “Sunburn”: You want to predict whether another
person is likely to get sunburned if he is back to the beach.
How can you do this? Data Collected: predict based on the
observed properties of the people
Name Hair Height Weight Lotion Result
Sarah Blonde Average Light No Sunburned
Dana Blonde Tall Average Yes None
Alex Brown Short Average Yes None
Annie Blonde Short Average No Sunburned
Emily Red Average Heavy No Sunburned
Pete Brown Tall Heavy No None
John Brown Average Heavy No None
Kate Blonde Short Light Yes None

33
Decision tree

34
Decision Tree

Test expected information for each attribute


Hair 0.50
height 0.69
weight 0.94
lotion 0.61

35
Decision tree
• Information gain of each attribute

Gain(hair) = 0.954 - 0.50 = 0.454


Gain(height) = 0.954 - 0.69 =0.264
Gain(weight) = 0.954 - 0.94 =0.014
Gain (lotion) = 0.954 - 0.61 =0.344
Which decision variable maximises the Info Gain?

36
The best decision tree?

is_sunburned
Hair colour
blonde
red brown
Sunburned None
?

Sunburned = Sarah, Annie,


None = Dana, Katie

• Once we have finished with hair colour we then need to


calculate the remaining branches of the decision tree.
• Which attributes is better to classify the remaining ?

37
The best Decision Tree
• This is the simplest and optimal one possible and
it makes a lot of sense.
• It classifies 4 of the people on just the hair colour
alone.
is_sunburned
Hair colour
blonde brown
red
None
Sunburned
Lotion used

no yes

Sunburned
None

38
Avoid overfitting in classification
• Overfitting: A tree may overfit the training data
o Good accuracy on training data but poor on test data
o Symptoms: tree too deep and too many branches, some may
reflect anomalies due to noise or outliers
• Two approaches to avoid overfitting
o Pre-pruning: Stop the tree from growing
once it meets certain conditions.
Post-pruning: Remove branches or sub-trees from a “fully grown”
tree.
• First grow the tree completely.
• Then remove branches that don’t improve accuracy on a validation
dataset.
39
Decision Tree

 You can view Decision Tree as an IF-


THEN_ELSE statement which tells us
whether someone will suffer from
sunburn.
If (Hair-Colour=“red”) then
return (sunburned = yes)
else if (hair-colour=“blonde” and lotion-used=“No”)
then
return (sunburned = yes)
else
return (false)

40
Decision Tree
• The benefits of having a decision tree are as
follows −
o It does not require any domain knowledge.
o It is easy to understand.
o The learning and classification steps of a
decision tree are simple and fast.

41
Thank you!
?

42

You might also like