0% found this document useful (0 votes)

11 views27 pages

Decision Tree Learning in AI

The document discusses decision tree induction as a method of inductive learning, focusing on its application in classification and regression tasks. It explains how decision trees operate by testing attributes to reach a decision, and outlines the process of inducing decision trees from examples, including the selection of attributes based on information gain. Additionally, it touches on the challenges of handling noise in data and the importance of choosing effective attributes to minimize the depth of the decision tree.

Uploaded by

jayanthsocial82

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views27 pages

Decision Tree Learning in AI

Uploaded by

jayanthsocial82

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Module-V

LEARNING
Chapter 18: Learning from Examples

Department of CSE, GIT ECS302: AI 1

18.3 Learning Decision Trees
Decision tree induction serves as a good introduction to the area of inductive learning, and is easy to implement.

Decision trees as performance elements:

 Decision tree takes as input an object or situation described by a set of attributes and returns a "decision--the
predicted output value for the input.
 The input attributes can be discrete or continuous.
For now, we assume discrete inputs.
 The output value can also be discrete or continuous.

 Learning a discrete-valued function is called classification learning.

 Learning a continuous function is called regression.

 We will concentrate on Boolean classification,

wherein each example is classified as true (positive) or false (negative).

Department of CSE, GIT ECS302: AI 2

Continued…
 A decision tree reaches its decision by performing a sequence of tests.

 Each internal node in the tree corresponds to a test of the value of one of the properties,
and the branches from the node are labeled with the possible values of the test.

 Each leaf node in the tree specifies the value to be returned if that leaf is reached.

Ex: The problem of whether to wait for a table at a restaurant.

The aim here is to learn a definition for the goal predicate Will Wait.

The list of attributes:

1. Alternate: whether there is a suitable alternative restaurant nearby.
2. Bar: whether the restaurant has a comfortable bar area to wait in.
3. Fri/Sat: true on Fridays and Saturdays.
4. Hungry: whether we are hungry.
5. Patrons: how many people are in the restaurant (values are None, Some, and Full).

Department of CSE, GIT ECS302: AI 3

Continued…
6. Price: the restaurant's price range ($, $$, $$$).
7. Raining: whether it is raining outside.
8. Reservation: whether we made a reservation.
9. Type: the kind of restaurant (French, Italian, Thai, or burger).
10. WaitEstimate: the wait estimated by the host (0-10 minutes, 10-30, 30-60, >60).

Department of CSE, GIT ECS302: AI 4

Continued…
 The tree does not use the Price and Type attributes (since they are irrelevant).
 Examples are processed by the tree starting at the root and following the appropriate branch until a leaf is
reached.
For instance, an example with Patrons = Full and WaitEstimate = 0-10
will be classified as positive (i.e., yes, we will wait for a table).

Expressiveness of decision trees:

Logically speaking, any particular decision tree hypotlhesis Ifor the Will Wait goal predicate can be seen as an
assertion of the form:

where each condition Pi (s) is a

conjunction of tests corresponding to a path from the root of the tree to a leaf with a positive outcome.

Department of CSE, GIT ECS302: AI 5

Continued…
• Although this looks like a first-order sentence, it is, in a sense, propositional,
because it contains just one variable and all the predicates are unary.

• The decision tree is really describing a relationship between Will Wait and some logical combination of attribute
values.

• Decision trees can express any function of the input attributes. For Boolean functions, truth table row gives path
to leaf.

• If the function is the parity function, which returns 1 if and only if an even number of inputs are 1, then an
exponentially large decision tree will be needed. It is also difficult to use a decision tree to represent a majority
function, which returns 1 if more than half of its inputs are 1.

• The truth table has 2n rows, because each input case is described by n attributes.
We can consider the "answer" column of the table as a 2n -bit number that defines the function.

Department of CSE, GIT ECS302: AI 6

Continued…
Inducing decision trees from examples :
Ex: A Boolean decision tree consists of a vector of' input attributes, X, and
a single Boolean output value 3.
A set of examples (X1,y1),…..,(X12, y12) is shown in Figure 18.3.

Department of CSE, GIT ECS302: AI 7

Continued…
• The positive examples are the ones in which the goal WillWait is true (X 1, X3,…..).
• The negative examples are the ones in which it is false (X2, X5,……).
• The complete set of examples is called the training set.

A trivial (unimportant) solution for the problem of finding a decision tree that agrees with the training set:

• Construct a decision tree that has one path to a leaf for each example,
where the path tests each attribute in turn and follows the value for the example and the leaf has the
classification of the example.

• When given the same example again, the decision tree will come up with the right classification.

• Unfortunately, it will not have much to say about any other cases!

Department of CSE, GIT ECS302: AI 8

Continued…
Figure 18.4 shows how the algorithm gets started.

Department of CSE, GIT ECS302: AI 9

Continued…
• We are given 12 training examples, which we classify into positive and negative sets.

• We then decide which attribute to use as the first test in the tree.

• Figure 18.4(a) shows that Type is a poor attribute, because it leaves us with four possible outcomes,
each of which has the same number of positive and negative examples.

• On the other hand, in Figure 18.4(b) we see that Patrons is a fairly important attribute, because if the value is
None or Some, then we are left with example sets for which we can answer definitively (No and Yes,
respectively).

• If the value is Full, we are left with a mixed set of examples.

• In general, after the first attribute test splits up the examples, each outcome is a new decision tree learning
problem in itself, with fewer examples and one fewer attribute.

Department of CSE, GIT ECS302: AI 10

Continued…
There are four cases to consider for these recursive problems:

1. If there are some positive and some negative examples, then choose the best attribute to split them.
(Figure 18.4(b) shows Hungry being used to split the remaining examples.)

2. If all the remaining examples are positive (or all negative), then we are done: we can answer Yes or No.
(Figure 18.4(b) shows examples of this in the None and Some cases.)

3. If there are no examples left, it means that no such example has been observed, and we return a default value
calculated from the majority classification at the node's parent.

4. If there are no attributes left, but both positive and negative examples, we have a problem.
 It means that these examples have exactly the same description, but different classifications.
 This happens when some of the data are incorrect; we say there is noise in the data.
 It also happens either when the attributes do not give enough information to describe the situation fully, or when
the domain is truly nondeterministic. One simple way out of the problem is to use a majority vote.

Department of CSE, GIT ECS302: AI 11

Continued…
Decision Tree Learning Algorithm:

Department of CSE, GIT ECS302: AI 12

Continued…
The Decision Tree induced:

Department of CSE, GIT ECS302: AI 13

Continued…
• The learning algorithm looks at the examples, not at the correct function, and in fact, its hypothesis (see Figure
18.6) not only agrees with all the examples, but is considerably simpler than the original tree.

• The learning algorithm has no reason to include tests for Raining and Reservation, because it can classify all the
examples without them.

4. Choosing attribute tests:

• The scheme used in decision tree learning for selecting attributes is designed to minimize the depth of the final
tree.
• The idea is to pick the attribute that goes as far as possible toward providing an exact classification of the
examples.

• A perfect attribute divides the examples into sets that are all positive or all negative.
-- The Patrons attribute is not perfect, but it is fairly good.
-- A really useless attribute, such as Type, leaves the example sets with roughly the same proportion of
positive and negative examples as the original set.

Department of CSE, GIT ECS302: AI 14

Continued…
• We need a formal measure of "fairly good" and "really useless" and we can implement the CHOOSE-ATTRIBUTE
function of Figure 18.5.

• The measure should have its maximum value when the attribute is perfect and its minimum value when the
attribute of no use at all.

• One suitable measure is the expected amount of information provided by the attribute.
Ex: Whether a coin will come up heads.
-- The amount of information contained in the answer depends on one's prior knowledge.
-- The less you know, the more information is provided.

• Information theory measures information content in bits.

• One bit of information is enough to answer a yes/no question about which one has no idea, such as the flip of a
fair coin.

Department of CSE, GIT ECS302: AI 15

Continued…
• In general, if the possible answers vi have probabilities P(vi),

then the information content I of the actual answer is given by

To check this equation, for the tossing of a fair coin, we get

• If the coin is loaded to give 99% heads, we get I (1/100,99/100) = 0.08 bits, and
as the probability of heads goes to 1, the information of the actual answer goes to 0.

Department of CSE, GIT ECS302: AI 16

Continued…
• A correct decision tree can answer the question “what is the correct classification?”

• An estimate of the probabilities of the possible answers before any of the attributes have been tested is given by
the proportions of positive and negative examples in the training set.

• Suppose the training set contains p positive examples and n negative examples.
• Then an estimate of the information contained in a correct answer is:

• The restaurant training set in Figure 18.3 has p = n = 6, so we need 1 bit of information.

• Now a test on a single attribute A will not usually tell us this much information, but it will give us some of it.

• We can measure exactly how much by looking at how much information we still need after the attribute test.

Department of CSE, GIT ECS302: AI 17

Continued…
• Any attribute A divides the training set E into subsets E1,…..,Ev according to their values for A,
where A can have v distinct values.
• Each subset Ei has pi positive examples and ni negative examples,
so if we go along that branch, we will need an additional

bits of information to answer the question.

• A randomly chosen example from the training set has the ith value for the attribute
with probability (pi + ni)/(p+n).
 So on average, after testing attribute A, we will need

bits of information to classify the example.

Department of CSE, GIT ECS302: AI 18
Continued…
• The information gain from the attribute test is
the difference between the original information requirement and the new requirement:

• The heuristic used in the CHOOSE-ATTRIBUTE function is just to choose the attribute with the largest gain.
• Returning to the attributes considered in Figure 18.4, we have

confirming our intuition that Patrons is a better attribute to split on.

 In fact, Patrons has the highest gain of any of the attributes and would be chosen by the decision-tree learning
algorithm as the root.
***

Department of CSE, GIT ECS302: AI 19

13.7 The Wumpus world revisited

 Uncertainty arises in the wumpus world because the agent's sensors give only partial, local information about the
world.

Department of CSE, GIT ECS302: AI 20

Continued...

 Figure 13.6 shows a situation in which each of the three reachable squares-[1,3], [2,2], and [3,1]-might contain a
pit.
 Pure logical inference can conclude nothing about which square is most likely to be safe,
so a logical agent might be forced to choose randomly.
 A probabilistic agent can do much better than the logical agent.

Department of CSE, GIT ECS302: AI 21

Continued...
Aim: To calculate the probability that each of the three squares contains a pit.
(For the purposes of this example, we will ignore the wumpus and the gold.)

The relevant properties of the wumpus world are that

(1) a pit causes breezes in all neighboring squares, and
(2) each square other than [1,1] contains a pit with probability 0.2.

The first step is to identify the set of random variables we need:

• Pij is true if and only if square [i, j] actually contains a pit.

• Bij if and only if square [i, j] is breezy

• These variables are included only for the observed squares--[1,1], [1,2], and [2,1].

Department of CSE, GIT ECS302: AI 22

Continued...
Now specify the full joint distribution:

Applying the product rule, we have: [ Product rule: P (a Ʌ b) = P (a | b) P (b) ]

• The first term (on the RHS) : Conditional probability of a breeze configuration, given a pit configuration
-- this is 1 if the breezes are adjacent to the pits and 0 otherwise.
• The second term : Prior probability of a pit configuration.
-- Each square contains a pit with probability 0.2 independently of the other squares.
Hence,

For a configuration with n pits, this is just 0.2n X 0.816-n .

Department of CSE, GIT ECS302: AI 23

Continued...
In the situation in Figure 13.6(a), the evidence consists of --
the observed breeze (or its absence) in each square that is visited, combined with the fact that
each such square contains no pit.

We'll abbreviate these facts as :

b = ¬b1,1 Ʌ b1,2 Ʌ b2,1 and known = ¬p1,1 Ʌ ¬p1,2 Ʌ ¬p2,1

We are interested in answering queries such as : P(P1,3 | known, b).

(how likely is it that [1, 3] contains a pit, given the observation so far?)

To answer this query, we can follow the standard approach suggested by the equation

--- (4)

namely summing over entries from the full joint distribution.

Department of CSE, GIT ECS302: AI 24

Continued...
• Let Unknown be a composite variable consisting of the Pi,j variables for squares other than the Known squares and
the query square [1,3].
• Then by equation (4) we have

Department of CSE, GIT ECS302: AI 25

Continued...

Department of CSE, GIT ECS302: AI 26

Continued...

Department of CSE, GIT ECS302: AI 27

Understanding AI Learning Types and Decision Trees
No ratings yet
Understanding AI Learning Types and Decision Trees
49 pages
TYCS: Learning Decision Trees in AI
No ratings yet
TYCS: Learning Decision Trees in AI
84 pages
Introduction to AI Learning Concepts
No ratings yet
Introduction to AI Learning Concepts
32 pages
Introduction to AI Learning Methods
No ratings yet
Introduction to AI Learning Methods
32 pages
Decision Tree Learning Overview
No ratings yet
Decision Tree Learning Overview
33 pages
Learning Decision Trees Explained
0% (1)
Learning Decision Trees Explained
8 pages
Learning Decision Trees Explained
No ratings yet
Learning Decision Trees Explained
11 pages
Decision Tree Learning in AI
No ratings yet
Decision Tree Learning in AI
70 pages
Decision Tree Learning Overview
No ratings yet
Decision Tree Learning Overview
79 pages
Decision Tree Learning Overview
No ratings yet
Decision Tree Learning Overview
7 pages
Candidate Splits in Regression Trees
No ratings yet
Candidate Splits in Regression Trees
101 pages
Decision Trees in AI Learning
No ratings yet
Decision Trees in AI Learning
49 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
60 pages
CSE 445: Decision Tree Basics
No ratings yet
CSE 445: Decision Tree Basics
107 pages
Decision Tree Learning Explained
No ratings yet
Decision Tree Learning Explained
102 pages
Decision Tree Learning in Machine Learning
No ratings yet
Decision Tree Learning in Machine Learning
20 pages
Decision Trees and Weather Prediction
No ratings yet
Decision Trees and Weather Prediction
17 pages
ID3 Decision Tree Algorithm Overview
No ratings yet
ID3 Decision Tree Algorithm Overview
41 pages
Supervised Learning and Decision Trees
No ratings yet
Supervised Learning and Decision Trees
84 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
118 pages
Decision Tree Basics and Algorithms
No ratings yet
Decision Tree Basics and Algorithms
117 pages
Decision Tree Learning in Machine Learning
No ratings yet
Decision Tree Learning in Machine Learning
12 pages
Decision Tree Learning Overview
No ratings yet
Decision Tree Learning Overview
42 pages
Understanding Decision Tree Learning
No ratings yet
Understanding Decision Tree Learning
14 pages
Decision Tree Learning Overview
No ratings yet
Decision Tree Learning Overview
46 pages
Decision Tree Learning and ID3 Algorithm
No ratings yet
Decision Tree Learning and ID3 Algorithm
15 pages
Decision Tree Analysis in Machine Learning
No ratings yet
Decision Tree Analysis in Machine Learning
8 pages
Decision Tree Learning Overview
No ratings yet
Decision Tree Learning Overview
28 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
57 pages
AI Concepts: Bayesian Networks & KNN
No ratings yet
AI Concepts: Bayesian Networks & KNN
34 pages
Understanding Decision Trees in AI
No ratings yet
Understanding Decision Trees in AI
16 pages
Decision Tree Classification Explained
No ratings yet
Decision Tree Classification Explained
19 pages
Decision Tree and Neural Network Basics
No ratings yet
Decision Tree and Neural Network Basics
23 pages
Decision Tree Learning & SVM Overview
No ratings yet
Decision Tree Learning & SVM Overview
39 pages
Understanding Learning Types and Decision Trees
No ratings yet
Understanding Learning Types and Decision Trees
42 pages
Decision Tree Induction for Classification
No ratings yet
Decision Tree Induction for Classification
55 pages
AI Learning: Classification & Decision Trees
100% (1)
AI Learning: Classification & Decision Trees
35 pages
Decision Tree Learning Concepts Explained
No ratings yet
Decision Tree Learning Concepts Explained
24 pages
Learning Decision Trees with ID3
No ratings yet
Learning Decision Trees with ID3
87 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
18 pages
Decision Trees and Ensemble Learning in ML
No ratings yet
Decision Trees and Ensemble Learning in ML
22 pages
Supervised Learning for Classification Tasks
No ratings yet
Supervised Learning for Classification Tasks
31 pages
Understanding Decision Trees in AI
No ratings yet
Understanding Decision Trees in AI
62 pages
Decision Trees in Machine Learning
No ratings yet
Decision Trees in Machine Learning
49 pages
Google Sheets Decision Tree Guide
No ratings yet
Google Sheets Decision Tree Guide
20 pages
Decision Tree Learning Techniques
No ratings yet
Decision Tree Learning Techniques
26 pages
Decision Tree Learning Overview
No ratings yet
Decision Tree Learning Overview
118 pages
Ai Mod3@Azdocuments - in
No ratings yet
Ai Mod3@Azdocuments - in
42 pages
Understanding Decision Trees for Prediction
No ratings yet
Understanding Decision Trees for Prediction
61 pages
Decision Tree Learning Overview
No ratings yet
Decision Tree Learning Overview
7 pages
Understanding Decision Tree Learning
No ratings yet
Understanding Decision Tree Learning
42 pages
Decision Tree Learning with ID3
No ratings yet
Decision Tree Learning with ID3
20 pages
Knowledge Representation in Data Mining
No ratings yet
Knowledge Representation in Data Mining
43 pages
Decision Tree Learning Explained
No ratings yet
Decision Tree Learning Explained
70 pages
Decision Tree Learning Overview
No ratings yet
Decision Tree Learning Overview
85 pages
Decision Tree Learning Overview
No ratings yet
Decision Tree Learning Overview
124 pages
Understanding Conflict Definitions
100% (2)
Understanding Conflict Definitions
10 pages
Media and Information Literacy Quiz
No ratings yet
Media and Information Literacy Quiz
36 pages
Web Methods Trading Networks Concepts Guide 6.5
No ratings yet
Web Methods Trading Networks Concepts Guide 6.5
126 pages
Cross-Domain Fusion for Video Captioning
No ratings yet
Cross-Domain Fusion for Video Captioning
15 pages
Ficha Tecnica Castrol Hyspin AWS Range
No ratings yet
Ficha Tecnica Castrol Hyspin AWS Range
2 pages
Online Interaction: Safety and Responsibilities
No ratings yet
Online Interaction: Safety and Responsibilities
20 pages
JAD Methodology in System Development
No ratings yet
JAD Methodology in System Development
4 pages
Introduction to SEBoK and Its Scope
No ratings yet
Introduction to SEBoK and Its Scope
62 pages
Mindell Segal Gerovitch 2003
No ratings yet
Mindell Segal Gerovitch 2003
35 pages
Schemas and Categories
No ratings yet
Schemas and Categories
5 pages
Architectural Design Methods Assignment
No ratings yet
Architectural Design Methods Assignment
3 pages
Ethical AI Frameworks for Class 10
No ratings yet
Ethical AI Frameworks for Class 10
4 pages
Biotremology: Studying Vibrational Behavior
100% (1)
Biotremology: Studying Vibrational Behavior
516 pages
Measurement Accuracy via Information Entropy
No ratings yet
Measurement Accuracy via Information Entropy
4 pages
Accounting Text Book For Year 12
No ratings yet
Accounting Text Book For Year 12
216 pages
Multimedia Document Architecture Explained
No ratings yet
Multimedia Document Architecture Explained
13 pages
TfL Subject Access Request Guide
No ratings yet
TfL Subject Access Request Guide
4 pages
Grade 8 English Daily Lesson Plan
No ratings yet
Grade 8 English Daily Lesson Plan
3 pages
Information Theory in Psycholinguistics
No ratings yet
Information Theory in Psycholinguistics
15 pages
Grade 4 Science: Energy and Sound Lesson
No ratings yet
Grade 4 Science: Energy and Sound Lesson
15 pages
Uganda ICT Ministerial Policy Statement 2015
No ratings yet
Uganda ICT Ministerial Policy Statement 2015
234 pages
Death Penalty Confirmation in FIR Case
100% (1)
Death Penalty Confirmation in FIR Case
182 pages
Antinet Zettelkasten Guide by Scott Scheper
100% (1)
Antinet Zettelkasten Guide by Scott Scheper
63 pages
Courage to Inform for Safety
No ratings yet
Courage to Inform for Safety
3 pages
Analisis Capaian Pembelajaran Bahasa Inggris
No ratings yet
Analisis Capaian Pembelajaran Bahasa Inggris
3 pages
Master's in Privacy and Cyber Security
No ratings yet
Master's in Privacy and Cyber Security
3 pages
Evolution of Reading Instruction Insights
No ratings yet
Evolution of Reading Instruction Insights
21 pages
CPS and CMS Modeling in Smart Manufacturing
No ratings yet
CPS and CMS Modeling in Smart Manufacturing
20 pages
Internet Literature Search Guide
No ratings yet
Internet Literature Search Guide
14 pages

Decision Tree Learning in AI

Uploaded by

Decision Tree Learning in AI

Uploaded by

Module-V

Department of CSE, GIT ECS302: AI 1

Decision trees as performance elements:

 Learning a discrete-valued function is called classification learning.

 We will concentrate on Boolean classification,

Department of CSE, GIT ECS302: AI 2

Ex: The problem of whether to wait for a table at a restaurant.

The list of attributes:

Department of CSE, GIT ECS302: AI 3

Department of CSE, GIT ECS302: AI 4

Expressiveness of decision trees:

where each condition Pi (s) is a

Department of CSE, GIT ECS302: AI 5

Department of CSE, GIT ECS302: AI 6

Department of CSE, GIT ECS302: AI 7

Department of CSE, GIT ECS302: AI 8

Department of CSE, GIT ECS302: AI 9

• If the value is Full, we are left with a mixed set of examples.

Department of CSE, GIT ECS302: AI 10

Department of CSE, GIT ECS302: AI 11

Department of CSE, GIT ECS302: AI 12

Department of CSE, GIT ECS302: AI 13

4. Choosing attribute tests:

Department of CSE, GIT ECS302: AI 14

• Information theory measures information content in bits.

Department of CSE, GIT ECS302: AI 15

then the information content I of the actual answer is given by

To check this equation, for the tossing of a fair coin, we get

Department of CSE, GIT ECS302: AI 16

Department of CSE, GIT ECS302: AI 17

bits of information to answer the question.

bits of information to classify the example.

confirming our intuition that Patrons is a better attribute to split on.

Department of CSE, GIT ECS302: AI 19

Department of CSE, GIT ECS302: AI 20

Department of CSE, GIT ECS302: AI 21

The relevant properties of the wumpus world are that

The first step is to identify the set of random variables we need:

• Pij is true if and only if square [i, j] actually contains a pit.

Department of CSE, GIT ECS302: AI 22

Applying the product rule, we have: [ Product rule: P (a Ʌ b) = P (a | b) P (b) ]

For a configuration with n pits, this is just 0.2n X 0.816-n .

Department of CSE, GIT ECS302: AI 23

We'll abbreviate these facts as :

We are interested in answering queries such as : P(P1,3 | known, b).

namely summing over entries from the full joint distribution.

Department of CSE, GIT ECS302: AI 24

Department of CSE, GIT ECS302: AI 25

Department of CSE, GIT ECS302: AI 26

Department of CSE, GIT ECS302: AI 27

You might also like