0% found this document useful (0 votes)

32 views58 pages

Understanding Decision Trees in ML

Uploaded by

Vinod hathiyani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views58 pages

Understanding Decision Trees in ML

Uploaded by

Vinod hathiyani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Topic 7

Decision Trees
What are trees?

2
Decision Trees
• Classify between lemon and apples

G
Images from [Link]
3
Decision Trees

Root node
D

Branches
D
L W

Leaves

W ~

4
Images from [Link]
Rules for classifying data using attributes
• The tree consists of decision nodes and leaf
nodes.
• A decision node has two or more branches,
each representing values for the attribute
tested.
• A leaf node attribute produces a
homogeneous result (all in one class), which
does not require additional classification
testing

5
Each internal node: tests
Root node
one feature Xi

Each branch from a node:

Branches selects one value for Xi

Leaves Each leaf node:

prediction for Y

Features can be discrete, continuous or categorical

6
Images from [Link]
Read
F
• Features can be discrete, continuous or categorical
-
~ -
&

• Each internal node: test some set of features {Xi}

• Each branch from a node: selects a set of value for {Xi}
• Each leaf node: prediction for Y

7
Example: What to do this Weekend?
• If my parents are visiting
– We’ll go to the cinema *
• If not
Et

⑳
– Then, if it’s sunny I’ll play tennis -

– But if it’s windy and I’m rich, I’ll go shopping

– If it’s windy and I’m poor, I’ll go to the cinema
– If it’s rainy, I’ll stay in

weather
↓
8
edgei ↓
rain's
X
I ↓
yes
No

e Treather
r
-

&
-

we
-

Home
T
Miner 8=
⑫is Richy Pr

-
Ishep
M
Written as a Decision Tree

Root of tree

Leaves

9
Using the Decision Tree
(No parents on a Sunny Day)

I
&

10
Using the Decision Tree
(No parents on a Sunny Day)

O O

11
From Decision Trees to Logic
• Read from the root to every tip
– If this and this and this … and this, then do this

• In our example:
– If no_parents and sunny_day, then play_tennis
– no_parents ∧ sunny_day è play_tennis

12
How to design a decision tree
• Decision tree can be seen as rules for performing a
categorisation
– E.g., “what kind of weekend will this be?”

• Remember that we’re learning from examples

– Not turning thought processes into decision trees

• The major question in decision tree learning is

– Which nodes to put in which positions
– Including the root node and the leaf nodes

13
Training and Visualizing a Decision Tree TRY
T

-
-

notebook
-
=
Jupter
-
Iris Decision Tree O
-
-

• Decision trees require very O

little data preparation.
• Measure of node impurity.
Pure (gini=0) ⑤
& &
100
• Gini Impurity: O O
3
-

S -
-
--
00

•- -
max_depth=2& 52
• White box model – easy to ⑭
interpret -
-
• Not a black box model – like -
I &
-
I
..

neural networks - -
samples

-It ( H
54

+ +
1-

= % -
petal
Decision Tree Boundaries weight
of
&
&

• For
max_depth=2 E

-
• Max_depth=3 &
①
2

add the vertical O

T Of of
dotted lines.
T

O =
CART Training Algorithm
-

• Classification and regression Algorithm

0 0
• Splits training set into two subsets using a single feature k and a
threshold tk in this case petal length less than 2.45.
-

• Searches for (k,tk) combination that produces the purest subsets,

weighted by their size. The cost function is: -

# -
50x0
150
+
10x0
-
⑬
-
-
Decision Tree Regularisation
-

• Recall in Data pre-processing lecture – decision tree gave a model with 0 error –
they have a high tendency to overfit. Hence regularisation!!!
-

&
-

• T
max_depth: Maximum depth of the tree in terms of layers.
• S
Max_features: Maximum number of features that are evaluated for splitting at each node
• Max_leaf_nodes: Maximum number of leaf nodes&
• min_samples_split: Minimum number of samples a node must have before it can be split
• min_samples_leaf: Minimum number of samples a leaf node must have to be created
• min_weight_fraction_leaf: Same as min_samples_leaf but expressed as a fraction of the
-
total number of weighted instances
-
Non-regularized vs. Regularized Tree
-

• Test accuracy:
-
-
No restrictions:
0.898

O
Restricted:
0.92
Regression with Decision Trees
-

• Instead of predicting a class, it predicts

a value.
• Example xnew = 0.2
Regression with Decision Trees
• Predicted values for each region is the average target value of the
instances in that region.

-0

-0 3
.
CART for Regression
• Instead of trying to minimize
-
impurity, tries to split the training data
-
in order to minimize the MSE.
-

- =
-
Importance of Regularization in Regression
fo -
use

L
Sensitivity to Axis Orientation
• Decision trees love orthogonal decision boundaries.
• Rotate the data by 450 and note the convoluted boundary.
High Variance in Decision Trees
• Decision Trees have high variance.
• Small changes in hyperparameters leads to very different models
• Since the training algorithm used by Sci-Kit learn is stochastic in
nature, retraining the same model on the same data produces a very
different model.
• By averaging predictions over many trees, it’s possible to reduce the
variance. Such an ensemble of trees is called a random forest.
• Next slide shows an example of the same dataset and two different
tree configurations.
High Variance
The ID3 Algorithm

• Invented by J. Ross Quinlan in 1979

• ID3 uses a measure called Information Gain
– Used to choose which node to put next
• Node with the highest information gain is chosen
– When there are no choices, a leaf node is put on
• Builds the tree from the top down, with no
backtracking
• Information Gain is used to select the most useful
attribute for classification
15
Entropy – General Idea
• From Tom Mitchell’s book:
– “In order to define information gain precisely, we begin by
defining a measure commonly used in information theory,
called entropy that characterizes the (im)purity of an
arbitrary collection of examples”

• A notion of impurity in data

• A formula to calculate the homogeneity of a sample
• A completely homogeneous sample has entropy of 0
• An equally divided sample has entropy of 1

16
Entropy - Formulae

• Given a set of examples, S

Er
• For example, in a binary categorization
– Where p+ is the proportion of positives
– And p- is the proportion of negatives
-

l For examples belonging to classes c1 to cn

– Where pn is the proportion of examples in cn
n

17
1 Entropy Example
20 =

0
1=
1092

t &

18
outlook

/ ↓ Y
Rain
sunny overcast
N
Ni
Y
I
& i In
E(R) = 0 97
47
.

0
E(s) = 0 .
E(0) =
47
+ 500
40
.

+
5
%
0 .
97 Th 69
14 = 0
.

T4
5
69 =
0 94-0
.

Outlook
⑳5
.

volume class .

al
It for
like
attributes
other
Find 14 for
1) Temp
2) Humidity
3) wind
Entropy Example

Entropy(S) =
- (9/14) Log2 (9/14) - (5/14) Log2 (5/14)
= 0.940

19
Information Gain (IG)
• Information gain is based on the decrease in entropy after a dataset is
split on an attribute.
• Which attribute creates the most homogeneous branches?

• First the entropy of the total dataset is calculated

• The dataset is then split on different attributes
• The entropy for each branch is calculated. Then it is added
proportionally, to get total entropy for the split
• The resulting entropy is subtracted from the entropy before the split

• The result is the Information Gain, or decrease in entropy

• The attribute that yields the largest IG is chosen for the decision node
-
reduction in the
the expected entropy of target
It is
data sample s due to sorting.
variable y for 20
Information Gain (cont’d)
• A branch set with entropy of 0 is a leaf node.
• Otherwise, the branch needs further splitting to classify its
dataset.

• The ID3 algorithm is run recursively on the non-leaf branches,

until all the data is classified.

21
Information Gain (cont’d)

• Calculate Gain(S,A)
– Estimate the reduction in entropy we obtain if we know
the value of attribute A for the examples in S

22
An Example Calculation of
Information Gain
14
• Suppose we have a set of examples P+ =

– S = {s1, s2, s3, s4}

3/4
– In a binary categorization =
p =

• With one positive example and three negative examples

• The positive example is s1
= S
·

• And Attribute A
– Which takes values v1, v2, v3 S2 =
• S1 takes value v2 for A, S2 takes value v2 for A
S3 = Vy
S3 takes value v3 for A, S4 takes value v1 for A ~

Sy = V
,
-
23
First Calculate Entropy(S)
• Recall that
Entropy(S) = -p+log2(p+) – p-log2(p-)

• From binary categorisation, we know that

p+ = ¼ and p- = ¾

• Hence, Entropy(S) = -(1/4)log2(1/4) – (3/4)log2(3/4)

= 0.811

24
Calculate Gain for each Value of A
• Remember that

• And that Sv = {set of example with value V for A}

– So, Sv1 = {s4}, Sv2 = {s1,s2}, Sv3={s3}
"

Su. 2[Si)
↓ =
0

• Now, (|Sv1|/|S|) * Entropy(Sv1) 4 5

= (1/4) * (-(0/1)*log2(0/1)-(1/1)log2(1/1))
= (1/4) * (0 - (1)log2(1)) = (1/4)(0-0) = 0

• Similarly, (|Sv2|/|S|) = 0.5 and (|Sv3|/|S|) = 0

25
(2)
-log
- 1lose()
-

E(S
= ) =
(2)
12)
+
loga
=
1 logn
112
I
112 +

~
&

EXSv) = /
Final Calculation

• So, we add up the three calculations and take them

from the overall entropy of S:

• Final answer for information gain:

– Gain(S,A) = 0.811 – (0.25*0 +1/2*1 + 0*0.25) = 0.311

26
A Worked Example
6C271S2
5N SPAR
12
34WCR 5
Weekend Weather Parents Money Decision (Category)
W1 Sunny Yes Rich > Cinema
&

34
S
T -

W2 Sunny -

No Rich Tennis
-
↑

27
↑

W3 Windy Yes Rich Cinema

>
J
-
&
-

W4
W5
Rainy
Rainy
Yes
No
S

&Poor
Rich
- Cinema
Stay in
-
13]
-

- Ish
W6 Rainy Yes &

O-
Poor Cinema -

W7
W8
Windy
Windy
No
No
0Poor
Rich
>
- Cinema -

& - Shopping
W9 Windy Yes -

Rich -
&
Cinema S

W10 Sunny No ↓
Rich - Tennis

27
0.
2 0 2 log
2
logo
-

0
.

3
-
,

blog ,
. .

0
.

0 4322 +.
0 464h
.
+ 0 .
6644 ⑫
=

= 1 571
.
Information Gain for All of S
• S = {W1,W2,…,W10}
• Firstly, we need to calculate:
– Entropy(S) = … = 1.571

• Next, we need to calculate information gain

– For all the attributes we currently have available
• (which is all of them at the moment)
– Gain(S, weather) = 0.7
– Gain(S, parents) = 0.61
– Gain(S, money) = 0.2816

28
Gain (S money
,

⑫ 2x
(10)
E()
-

z10E)
-

-
2 .
8074 .

+ 2x
+
2 x 18074 7

EX1.222 ↓
v
1 .
28906

o
=
+ 0 802
5164
.

0
5237
.

:
0 1 2890
.

1 57
.

I(log21) 0 37
.
= O
=
0
2816
.
The ID3 Algorithm
• Given a set of examples, S
– Described by a set of attributes Ai
– Categorised into categories cj
1. Choose the root node to be attribute A
– Such that A scores highest for information gain
• Relative to S, i.e., gain(S,A) is the highest over all
attributes
2. For each value v that A can take
– Draw a branch and label each with corresponding v

29
The ID3 Algorithm
• For each branch you’ve just drawn (for value v)
– If Sv only contains examples in category c
• Then put that category as a leaf node in the tree
– If Sv is empty
• Then find the default category (which contains the most
examples from S)
– Put this default category as a leaf node in the tree
– Otherwise
• Remove A from attributes which can be put into nodes
• Replace S with Sv
• Find new attribute A scoring best for Gain(S, A)
• Start again at part 2
• Make sure you replace S with Sv 30
Explanatory Diagram

31
Information Gain for All of S
• S = {W1,W2,…,W10}
• Firstly, we need to calculate:
– Entropy(S) = … = 1.571
• Next, we need to calculate information gain
– For all the attributes we currently have available
• (which is all of them at the moment)
– Gain(S, weather) = … = 0.7
– Gain(S, parents) = … = 0.61
– Gain(S, money) = … = 0.2816
• Hence, the weather is the first attribute to split on
-

– Because this gives us the biggest information gain

33
Top of the Tree
• So, this is the top of our tree:
• Now, we look at each branch in turn
– In particular, we look at the examples with the attribute
prescribed by the branch
• Ssunny = {W1,W2,W10}
– Categorisations are cinema, tennis and tennis for W1,W2
and W10
– What does the algorithm say?
• Set is neither empty, nor a single category
• So we have to replace S by Ssunny and start again

34
Getting to the leaf nodes
• If it’s sunny and the parents have turned up
– Then, looking at the table in previous slide
• There’s only one answer: go to cinema
• If it’s sunny and the parents haven’t turned up
– Then, again, there’s only one answer: play tennis
• Hence our decision tree looks like this:

36
What is the optimal Tree Depth?
• We need to be careful to pick an appropriate
tree depth.
• If the tree is too deep, we can overfit.
• If the tree is too shallow, we underfit
• Max depth is a hyper-parameter that should
be tuned by the data. Alternative strategy is to
create a very deep tree, and then to prune it.

37
Control the size of the tree
• If we stop early, not all
training samples would
be classified correctly.
• How do we classify a new
instance:
– We label the leaves of this
smaller tree with the
majority of training
samples’ labels
38
Summary of learning classification
trees
• Advantages:
– Easily interpretable by human (as long as the tree is not too big)
– Computationally efficient
– Handles both numerical and categorical data
– It is parametric thus compact: unlike Nearest Neighborhood
Classification, we do not have to carry our training instances
around Building block for various ensemble methods (more on
this later)
• Disadvantages
– Heuristic training techniques
– Finding partition of space that minimizes empirical
error is NP-hard.
– We resort to greedy approaches with limited
theoretical underpinning.
39
Feature Space
• Suppose that we have p explanatory variables
X1, . . . , Xp and n observations.

– a numeric variable: n − 1 possible splits

– an ordered factor: k − 1 possible splits
– an unordered factor: −→ 2(k−1) − 1 possible splits.

41
Measures of Impurity
• At each node i of a classification tree, we have a
probability distribution p_{ik} over k classes.

• Deviance:
• Entropy:
• Gini index:
• Residual sum of squares

Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
43 pages
Decision Trees and ID3 Algorithm Overview
No ratings yet
Decision Trees and ID3 Algorithm Overview
15 pages
Decision Trees in Machine Learning
No ratings yet
Decision Trees in Machine Learning
54 pages
Implementing Decision Tree Algorithms
No ratings yet
Implementing Decision Tree Algorithms
128 pages
Decision Trees in Data Science Explained
No ratings yet
Decision Trees in Data Science Explained
34 pages
Machine Learning: Decision Trees Overview
No ratings yet
Machine Learning: Decision Trees Overview
41 pages
CSE 445: Decision Tree Basics
No ratings yet
CSE 445: Decision Tree Basics
107 pages
Decision Tree Learning Overview
No ratings yet
Decision Tree Learning Overview
118 pages
Introduction to Decision Trees Basics
No ratings yet
Introduction to Decision Trees Basics
75 pages
Artificial Intelligence 11. Decision Tree Learning
No ratings yet
Artificial Intelligence 11. Decision Tree Learning
25 pages
Supervised Learning and Decision Trees
No ratings yet
Supervised Learning and Decision Trees
84 pages
Decision Tree Learning Methods Explained
No ratings yet
Decision Tree Learning Methods Explained
43 pages
Decision Tree Learning in Machine Learning
No ratings yet
Decision Tree Learning in Machine Learning
37 pages
Decision Tree Learning Overview
No ratings yet
Decision Tree Learning Overview
45 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
70 pages
Decision Trees in Machine Learning
No ratings yet
Decision Trees in Machine Learning
28 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
31 pages
Introduction to Decision Trees in ML
No ratings yet
Introduction to Decision Trees in ML
8 pages
Decision Trees in Machine Learning Guide
No ratings yet
Decision Trees in Machine Learning Guide
8 pages
Understanding Decision Trees and ID3 Algorithm
No ratings yet
Understanding Decision Trees and ID3 Algorithm
34 pages
Understanding Decision Trees in Classification
No ratings yet
Understanding Decision Trees in Classification
75 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
45 pages
Decision Tree Induction for Classification
No ratings yet
Decision Tree Induction for Classification
55 pages
Understanding Decision Trees in Python
No ratings yet
Understanding Decision Trees in Python
67 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
53 pages
Understanding Decision Trees in AI
No ratings yet
Understanding Decision Trees in AI
83 pages
Decision Rules Versus Instance Memory
No ratings yet
Decision Rules Versus Instance Memory
15 pages
Understanding Decision Trees for Prediction
No ratings yet
Understanding Decision Trees for Prediction
61 pages
Decision Trees for Classification in ML
No ratings yet
Decision Trees for Classification in ML
47 pages
ID3 Decision Tree Algorithm Overview
No ratings yet
ID3 Decision Tree Algorithm Overview
41 pages
Lecture9 DecisionTree-2
No ratings yet
Lecture9 DecisionTree-2
36 pages
Understanding Decision Tree Learning
No ratings yet
Understanding Decision Tree Learning
42 pages
Decision Trees in Machine Learning
No ratings yet
Decision Trees in Machine Learning
144 pages
MLT UNIT-3 Notes
No ratings yet
MLT UNIT-3 Notes
35 pages
Decision Tree Learning Overview
No ratings yet
Decision Tree Learning Overview
79 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
60 pages
33 Decision Tree Notes
No ratings yet
33 Decision Tree Notes
11 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
40 pages
Understanding Decision Tree Learning
No ratings yet
Understanding Decision Tree Learning
52 pages
Decision Tree Classification Overview
No ratings yet
Decision Tree Classification Overview
61 pages
Decision Trees and Ensemble Learning in ML
No ratings yet
Decision Trees and Ensemble Learning in ML
22 pages
Decision Trees and Random Forests Explained
No ratings yet
Decision Trees and Random Forests Explained
46 pages
Understanding Decision Trees in Machine Learning
No ratings yet
Understanding Decision Trees in Machine Learning
45 pages
Decision Trees and Ensemble Methods
No ratings yet
Decision Trees and Ensemble Methods
38 pages
Decision Trees for Classification Explained
No ratings yet
Decision Trees for Classification Explained
26 pages
Decision Trees for Classification Techniques
100% (1)
Decision Trees for Classification Techniques
62 pages
Machine Learning: Decision Tree Overview
No ratings yet
Machine Learning: Decision Tree Overview
24 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
625 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
118 pages
Decision Trees: Overview and Learning
No ratings yet
Decision Trees: Overview and Learning
118 pages
Decision Trees and KNN Overview
100% (1)
Decision Trees and KNN Overview
38 pages
Decision Trees: ID3 and Entropy Insights
No ratings yet
Decision Trees: ID3 and Entropy Insights
51 pages
Decision Trees: Overview and Examples
No ratings yet
Decision Trees: Overview and Examples
22 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
8 pages
ID3 Decision Tree Implementation in Java
No ratings yet
ID3 Decision Tree Implementation in Java
20 pages
ID3 Decision Tree Algorithm Implementation
No ratings yet
ID3 Decision Tree Algorithm Implementation
20 pages
Safe Learning in Robotics - From Learning-Based Control To Safe Reinforcement Learning
No ratings yet
Safe Learning in Robotics - From Learning-Based Control To Safe Reinforcement Learning
36 pages
Neural Networks & Deep Learning Guide
No ratings yet
Neural Networks & Deep Learning Guide
21 pages
Early Heart Disease Diagnosis with ML
No ratings yet
Early Heart Disease Diagnosis with ML
8 pages
Tool Wear Prediction in Machining
No ratings yet
Tool Wear Prediction in Machining
13 pages
Deep Network With Support Vector Machines: Abstract. Deep Learning Methods Aims at Learning Features Automatically at
No ratings yet
Deep Network With Support Vector Machines: Abstract. Deep Learning Methods Aims at Learning Features Automatically at
2 pages
Music Emotion Recognition Challenges
No ratings yet
Music Emotion Recognition Challenges
9 pages
50 Final Year Project Ideas & Resources
No ratings yet
50 Final Year Project Ideas & Resources
12 pages
Polarimetric Sar Images Classification Using Deep Belief Networks With Learning Features
No ratings yet
Polarimetric Sar Images Classification Using Deep Belief Networks With Learning Features
4 pages
Deep Learning in PCB Defect Detection
No ratings yet
Deep Learning in PCB Defect Detection
22 pages
K-Means Clustering in Python from Scratch
No ratings yet
K-Means Clustering in Python from Scratch
10 pages
FlavorMiner: ML Tool for Flavor Prediction
No ratings yet
FlavorMiner: ML Tool for Flavor Prediction
13 pages
ESTI-CHECK: ML Price Estimation Tool
No ratings yet
ESTI-CHECK: ML Price Estimation Tool
14 pages
Machine Learning for Rice Disease Detection
No ratings yet
Machine Learning for Rice Disease Detection
8 pages
Generative AI for Personalized Data Insights
No ratings yet
Generative AI for Personalized Data Insights
6 pages
AI's Role in Psychotherapy: Risks and Insights
No ratings yet
AI's Role in Psychotherapy: Risks and Insights
6 pages
Morphology and Phonology in NLP
No ratings yet
Morphology and Phonology in NLP
6 pages
Industry Technology: Innovations & Impact
No ratings yet
Industry Technology: Innovations & Impact
3 pages
Machine Learning & Game Development CV
No ratings yet
Machine Learning & Game Development CV
2 pages
Deep Learning Model Question Paper
No ratings yet
Deep Learning Model Question Paper
3 pages
Human Action Recognition Using Depth Maps
No ratings yet
Human Action Recognition Using Depth Maps
4 pages
Cyberbullying Detection Using DEA-RNN
No ratings yet
Cyberbullying Detection Using DEA-RNN
78 pages
Dynatrace Associate Certification Guide
100% (1)
Dynatrace Associate Certification Guide
24 pages
K-Means Clustering on Iris Dataset
No ratings yet
K-Means Clustering on Iris Dataset
4 pages
GDP Forecasting with Machine Learning Techniques
No ratings yet
GDP Forecasting with Machine Learning Techniques
18 pages
AI and Child Pornography Risks
No ratings yet
AI and Child Pornography Risks
16 pages
AutoML in Enterprise AI: Challenges & Opportunities
No ratings yet
AutoML in Enterprise AI: Challenges & Opportunities
11 pages
Understanding Artificial Intelligence Basics
No ratings yet
Understanding Artificial Intelligence Basics
13 pages
AI Heuristics and Search Techniques
No ratings yet
AI Heuristics and Search Techniques
14 pages
Graph Contrastive Learning Framework
No ratings yet
Graph Contrastive Learning Framework
12 pages
Data Preparation in Machine Learning
No ratings yet
Data Preparation in Machine Learning
21 pages