0% found this document useful (0 votes)

205 views11 pages

Decision Tree Induction in DWDM

The document outlines the concepts and methodologies related to data classification, particularly focusing on decision tree induction. It discusses the significance of classification in various fields, the general approach to building classifiers, and the algorithms used for decision tree induction, including attribute selection measures and tree pruning techniques. Additionally, it addresses scalability issues when dealing with large datasets that do not fit into memory.

Uploaded by

phani kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

205 views11 pages

Decision Tree Induction in DWDM

Uploaded by

phani kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

UNIT –III: Syllabus

Classification: Basic Concepts, General Approach to solving a classification problem, Decision Tree Induction:
Attribute Selection Measures, Tree Pruning, Scalability and Decision Tree Induction, Visual Mining for
Decision Tree Induction.
DATA CLASSIFICATION
Classification is a form of data analysis that extracts models describing important data
classes. Such models, called classifiers, predict categorical (discrete, unordered) class labels.
For example, we can build a classification model to categorize bank loan applications as either
safe or risky. Such analysis can help provide us with a better understanding of the data at large.
Many classification methods have been proposed by researchers in machine learning, pattern
recognition, and statistics.
Why Classification?
A bank loans officer needs analysis of her data to learn which loan applicants are “safe”
and which are “risky” for the bank. A marketing manager at AllElectronics needs data analysis
to help guess whether a customer with a given profile will buy a new computer.
A medical researcher wants to analyze breast cancer data to predict which one of three
specific treatments a patient should receive. In each of these examples, the data analysis task
is classification, where a model or classifier is constructed to predict class (categorical) labels,
such as “safe” or “risky” for the loan application data; “yes” or “no” for the marketing data; or
“treatment A,” “treatment B,” or “treatment C” for the medical data.
Suppose that the marketing manager wants to predict how much a given customer will
spend during a sale at AllElectronics. This data analysis task is an example of numeric
prediction, where the model constructed predicts a continuous-valued function, or ordered
value, as opposed to a class label. This model is a predictor.
Regression analysis is a statistical methodology that is most often used for numeric
prediction; hence the two terms tend to be used synonymously, although other methods for
numeric predictionexist. Classification and numeric prediction are the two major types of
prediction problems.
General Approach for Classification:
Data classification is a two-step process, consisting of alearning step (where a
classification model is constructed) and a classification step (wherethe model is used to predict
class labels for given data).
 In the first step, a classifier is built describing a predetermined set of data classes or
concepts. This is the learning step (or training phase), where a classification algorithm
builds the classifier by analyzing or “learning from” a training set made up of database
tuples and their associated class labels.
 Each tuple/sample is assumed to belong to a predefined class, as determined by the class
label attribute
 In the second step, the model is used for classification. First, the predictive accuracy of the
classifier is estimated. If we were to use the training set to measure the classifier’s accuracy,
this estimate would likely be optimistic, because the classifier tends to overfit the
[Link] rate is the percentage of test set samples that are correctly classified by the
model
Data Warehousing and Data Mining

Fig: Learning Step

Fig: Classification Step

Decision Tree Induction:

Decision tree induction is the learning of decision trees from class-labeled training
tuples. A decision tree is a flowchart-like tree structure, where each internal node (non leaf
node) denotes a test on an attribute, each branch represents an outcome of the test, and each
leaf node (or terminal node) holds a class label. The topmost node in a tree is the root node.
Internal nodes are denoted by rectangles, and leaf nodes are denoted by ovals.
“How are decision trees used for classification?” Given a tuple, X, for which the
associated class label is unknown, the attribute values of the tuple are tested against the decision
tree. A path is traced from the root to a leaf node, which holds the class prediction for that tuple.
Decision trees can easily be converted to classification rules.
“Why are decision tree classifiers so popular?” The construction of decision tree
classifiers does not require any domain knowledge or parameter setting, and therefore is
appropriate for exploratory knowledge discovery. Decision trees can handle multidimensional
data. Their representation of acquired knowledge in tree form is intuitive and generally easy to
assimilate by humans. The learning and classification steps of decision tree induction are
simple and fast.
Decision tree induction algorithms have been used for classification in many
application areas such as medicine, manufacturing and production, financial analysis,
astronomy, and molecular biology. Decision trees are the basis of several commercial rule
induction systems.

Page 2
Data Warehousing and Data Mining
During tree construction, attribute selection measures are used to select the attribute
that best partitions the tuples into distinct classes. When decision trees are built, many of the
branches may reflect noise or outliers in the training data. Tree pruning attempts to identify
and remove such branches, with the goal of improving classification accuracy on unseen data.

 During the late 1970s and early 1980s, J. Ross Quinlan, a researcher in machine learning,
developed a decision tree algorithm known as ID3 (Iterative Dichotomiser).
 This work expanded on earlier work on concept learning systems, described by E. B. Hunt, J.
Marin,and P. T. Stone. Quinlan later presented C4.5 (a successor of ID3), which became a
benchmark to which newer supervised learning algorithms are often compared.
 In 1984,a group of statisticians (L. Breiman, J. Friedman, R. Olshen, and C. Stone) publishedthe
book Classification and Regression Trees (CART), which described the generation ofbinary
decision trees.

Decision Tree Algorithm:

Algorithm: Generate decision tree. Generate a decision tree from the training tuples of data
partition, D.
Input:
 Data partition, D, which is a set of training tuples and their associated class labels;
 attribute list, the set of candidate attributes;
 Attribute selection method, a procedure to determine the splitting criterion that “best”
partitions the data tuples into individual classes. This criterion consists of a splitting
attribute and, possibly, either a split-point or splitting subset.
Output: A decision tree.
Method:
1) create a node N;
2) if tuples in D are all of the same class, C, then
3) return N as a leaf node labeled with the class C;
4) if attribute list is empty then
5) return N as a leaf node labeled with the majority class in D; // majority voting
6) apply Attribute selection method(D, attribute list) to find the “best” splitting
criterion;
7) label node N with splitting criterion;
8) if splitting attribute is discrete-valued and
multiway splits allowed then // not restricted to binary trees
9) attribute list attribute list - splitting attribute; // remove splitting attribute
10) for each outcome j of splitting criterion
// partition the tuples and grow subtrees for each partition
11) let Dj be the set of data tuples in D satisfying outcome j; // a partition
12) if Dj is empty then
13) attach a leaf labeled with the majority class in D to node N;
14) else attach the node returned by Generate decision tree(Dj , attribute list) to node N;
endfor
15) return N;

Page 3
Data Warehousing and Data Mining

Methods for selecting best test conditions

Decision tree induction algorithms must provide a method for expressing an attribute
test condition and its corresponding outcomes for different attribute types.

Binary Attributes: The test condition for a binary attribute generates two potential
outcomes.

Nominal Attributes:These can have many values. These can be represented in two ways.

Ordinal attributes: These can produce binary or multiway splits. The values can be grouped as
long as the grouping does not violate the order property of attribute values.

Page 4
Data Warehousing and Data Mining

Attribute Selection Measures

 An attribute selection measure is a heuristic for selecting the splitting criterion that
“best” separates a given data partition, D, of class-labeled training tuples into individual
classes.
 If we were to split D into smaller partitions according to the outcomes of the splitting
criterion, ideally each partition would be pure (i.e., all the tuples that fall into a given
partition would belong to the same class).
 Conceptually, the “best” splitting criterion is the one that most closely results in such a
scenario. Attribute selection measures are also known as splitting rules because they
determine how the tuples at a given node are to be split.
 The attribute selection measure provides a ranking for each attribute describing the given
training tuples. The attribute having the best score for the measure4 is chosen as the
splitting attribute for the given tuples.
 If the splitting attribute is continuous-valued or if we are restricted to binary trees, then,
respectively, either a split point or a splitting subset must also be determined as part of the
splitting criterion.
 The tree node created for partition D is labeled with the splitting criterion, branches are
grown for each outcome of the criterion, and the tuples are partitioned accordingly.
 There are three popular attribute selection measures—information gain, gain ratio, and
Gini index.

Information Gain
ID3 uses information gain as its attribute selection measure. Let node N represent or
hold the tuples of partition D. The attribute with the highest information gain is chosen as the
splitting attribute for node N. This attribute minimizes the information needed to classify the
tuples in the resulting partitions and reflects the least randomness or “impurity” in these
partitions. Such an approach minimizes the expected number of tests needed to classify a given
tuple and guarantees that a simple (but not necessarily the simplest) tree is found.

The expected information needed to classify a tuple in D is given by

Where piis the nonzero probability that an arbitrary tuple in D belongs to class Ciand is estimated
by |Ci,D|/|D|. A log function to the base 2 is used, because the information is encoded in
[Link](D) is also known as the entropy of D.

Information needed after using A to split D into V partitions.

Information gain is defined as the difference between the original information requirement (i.e.,
based on just the proportion of classes) and the new requirement (i.e., obtained after
partitioning on A). That is,

Page 5
Data Warehousing and Data Mining
The attribute A with the highest information gain, Gain(A), is chosen as the
splittingattribute at nodeN. This is equivalent to saying that we want to partition on the
attributeA that would do the “best classification,” so that the amount of information still
requiredto finish classifying the tuples is minimal.
Gain Ratio
C4.5, a successor of ID3, uses an extension to information gain known as gain ratio,
which attempts to overcome this bias. It applies a kind of normalization to information gain
using a “split information” value defined analogously with Info(D) as

This value represents the potential information generated by splitting the trainingdata set, D, into
v partitions, corresponding to the v outcomes of a test on attribute A. Note that, for each
outcome, it considers the number of tuples having that outcome with respect to the total number
of tuples in D. It differs from information gain, which measures the information with respect
to classification that is acquired based on the same partitioning. The gain ratio is defined as

Gini Index
The Gini index is used in CART. Using the notation previously described, the Gini
indexmeasures the impurity of D, a data partition or set of training tuples, as

Where piis the nonzero probability that an arbitrary tuple in D belongs to class Ciand is
estimated by |Ci,D|/|D| over m classes.
Note: The Gini index considers a binary split for each attribute.
When considering a binary split, we compute a weighted sum of the impurity of
eachresulting partition. For example, if a binary split on A partitions D into D1 and D2, the Gini
index of D given that partitioning is

 For each attribute, each of the possible binary splits is considered. For a discrete-valued
attribute, the subset that gives the minimum Gini index for that attribute is selected as its
splitting subset.
 For continuous-valued attributes, each possible split-point must be considered. The strategy
is similar to that described earlier for information gain, where the midpoint between each
pair of (sorted) adjacent values is taken as a possible split-point.
 The reduction in impurity that would be incurred by a binary split on a discrete- or
continuous-valued attribute A is

Page 6
Data Warehousing and Data Mining

Tree Pruning:
 When a decision tree is built, many of the branches will reflect anomalies in the training
data due to noise or outliers.
 Tree pruning methods address this problem of overfitting the data. Such methods
typically use statistical measures to remove the least-reliable branches.
 Pruned trees tend to be smaller and less complex and, thus, easier to comprehend.
 They are usually faster and better at correctly classifying independent test data (i.e., of
previously unseen tuples) than unpruned trees.
“How does tree pruning work?” There are two common approaches to tree pruning:
prepruning and postpruning.
 In the prepruning approach, a tree is “pruned” by halting its construction early. Upon
halting, the node becomes a leaf. The leaf may hold the most frequent class among the
subset tuples or the probability distribution of those tuples.
 If partitioning the tuples at a node would result in a split that falls below a prespecified
threshold, then further partitioning of the given subset is halted. There are difficulties,
however, in choosing an appropriate threshold.
 In the postpruning, which removes subtrees from a “fully grown” tree. A subtree at a given
node is pruned by removing its branches and replacing it with a leaf. The leaf is labeled
with the most frequent class among the subtree being replaced.

Fig: Unpruned and Pruned Trees

 The cost complexity pruning algorithm used in CART is an example of the postpruning
approach.
 This approach considers the cost complexity of a tree to be a function of the number of
leaves in the tree and the error rate of the tree (where the error rate is the percentage
of tuples misclassified by the tree). It starts from the bottom of the tree.
 For each internal node, N, it computes the cost complexity of the subtree at N, and the
cost complexity of the subtree at N if it were to be pruned (i.e., replaced by a leaf node).
 The two values are compared. If pruning the subtree at node N would result in a smaller
cost complexity, then the subtree is pruned. Otherwise, it is kept.
 A pruning set of class-labeled tuples is used to estimate cost complexity.

Page 7
Data Warehousing and Data Mining
 This set isindependent of the training set used to build the unpruned tree and of any test
set usedfor accuracy estimation.
 The algorithm generates a set of progressively pruned trees. Ingeneral, the smallest
decision tree that minimizes the cost complexity is preferred.
 C4.5 uses a method called pessimistic pruning, which is similar to the cost
complexitymethod in that it also uses error rate estimates to make decisions regarding
subtreepruning.
Scalability of Decision Tree Induction:
“What if D, the disk-resident training set of class-labeled tuples, does not fit in
memory? In other words, how scalable is decision tree induction?” The efficiency of existing
decision tree algorithms, such as ID3, C4.5, and CART, has been well established for relatively
small data sets. Efficiency becomes an issue of concern when these algorithms are applied to
the mining of very large real-world databases. The pioneering decision tree algorithms that we
have discussed so far have the restriction that the training tuples should reside in memory.
In data mining applications, very large training sets of millions of tuples are common.
Most often, the training data will not fit in memory! Therefore, decision tree construction
becomes inefficient due to swapping of the training tuples in and out of main and cache
memories. More scalable approaches, capable of handling training data that are too large to fit
in memory, are required. Earlier strategies to “save space” included discretizing continuous-
valued attributes and sampling data at each node. These techniques, however, still assume that
the training set can fit in memory.

Several scalable decision tree induction methods have been introduced in recent studies.
RainForest, for example, adapts to the amount of main memory available and applies to any
decision tree induction algorithm. The method maintains an AVC-set (where “AVC” stands
for “Attribute-Value, Classlabel”) for each attribute, at each tree node, describing the training
tuples at the node. The AVC-set of an attribute A at node N gives the class label counts for each
value of A for the tuples at N. The set of all AVC-sets at a node N is the AVC-group of N. The
size of an AVC-set for attribute A at node N depends only on the number of distinct values of
A and the number of classes in the set of tuples at N. Typically, this size should fit in memory,
even for real-world data. Rain Forest also has techniques, however, for handling the case where
the AVC-group does not fit in memory. Therefore, the method has high scalability for decision
tree induction in very large data sets.

Fig: AVC Sets for dataset

Page 8
Data Warehousing and Data Mining
Example for Decision Tree construction and Classification Rules:
Construct Decision Tree for following dataset,

age income student credit_rating buys_computer

youth high no fair no
youth high no excellent no
middle_aged high no fair yes
senior medium no fair yes
senior low yes fair yes
senior low yes excellent no
middle_aged low yes excellent yes
youth medium no fair no
youth low yes fair yes
senior medium yes fair yes
youth medium yes excellent yes
middle_aged medium no excellent yes
middle_aged high yes fair yes
senior medium no excellent no
Solution:
Here the target class is buys_computer and values are yes, no. By using ID3
algorithm, we are constructing decision tree.
For ID3 Algorithm we have calculate Information gain attribute selection measure.
P buys_computer (yes) 9
CLASS
N buys_computer (no) 5
TOTAL 14
𝐥
Info(D) = I(9,5) = -𝐥𝐥𝐥 - 𝟓𝐥𝐥𝐥 = 0.940
𝟗 𝟐 𝟏𝟒 𝟏𝟒 𝟐 𝟏𝟒
𝟏𝟒

Age P N TOTAL I(P,N)

youth 2 3 5 I(2,3
middle_aged 4 0 4 I(4,0)
senior 3 2 5 I(3,2)

I(2,3) = - 𝟐 𝐥𝐨𝐠𝟐 𝟐 - 𝟑 𝐥𝐨𝐠𝟐 𝟑 = 0.970

𝟓 𝟓 𝟓 𝟓
I(4,0) = - 𝟒 𝐥𝐨𝐠𝟐 𝟒 - 𝟎 𝐥𝐨𝐠𝟐 𝟎 = 0
𝟒 𝟒 𝟒 𝟒

I(3,2) = - 𝟑 𝐥𝐨𝐠𝟐 𝟑 - 𝟐 𝐥𝐨𝐠𝟐 𝟐

𝟓 𝟓 𝟓 𝟓

Page 9
Data Warehousing and Data Mining

Age P N TOTAL I(P,N)

youth 2 3 5 I(2,3 0.970
middle_aged 4 0 4 I(4,0) 0
senior 3 2 5 I(3,2) 0.970

Info (D) = 𝟓𝐥 𝐥, 𝐥 + 𝐥 𝑰 𝟒, 𝟎 + 𝟓 𝑰 𝟑, 𝟐 = 0.693

Age
𝟏𝟒 𝟏𝟒 𝟏𝟒

Gain(Age) = Info(D) – InfoAge(D)

= 0.940 – 0693 = 0.247
Similarly,
Gain(Income) = 0.029
Gain (Student) = 0.151
Gain (credit_rating) = 0.048

Finally, age has the highest information gain among the attributes, it is selected as the splitting
attribute. Node N is labeled with age, and branches are grown for each of the attribute’s values.
The tuples are then partitioned accordingly, as

Page 10
Data Warehousing and Data Mining

The Tree after splitting branches is

The Tree after Tree Pruning,

Finally, The Classification Rules are,

 IF age=Youth AND Student=Yes THEN buys_computer=Yes
 IF age=Middle_aged THEN buys_computer=Yes
 IF age=Senior AND Credit_rating=Fair THEN buys_computer=Yes

Page 11

Common questions

Information gain is significant in decision tree algorithms like ID3 and C4.5 as it measures the expected reduction in information entropy and thereby evaluates how well a given attribute can differentiate between the classes under consideration. It affects the selection of splitting attributes by prioritizing attributes that yield the highest gain, meaning they most effectively segment the data into pure, homogenous subsets. This leads to a better-organized tree structure and improves classification accuracy by minimizing data impurity at each split .

Scalability challenges with decision tree induction on large datasets arise primarily due to memory constraints, as traditional algorithms require training data to reside entirely in memory. This is unfeasible with very large datasets, leading to inefficiencies due to frequent data swapping between memory and disk. To address these issues, methods like RainForest adapt to available memory by using AVC-sets for managing attribute value and class information compactly, ensuring that even large datasets can be processed without frequent memory swaps. Additionally, these methods provide mechanisms for situations where data still exceeds memory capacity, thereby maintaining scalability in practical applications .

Attribute selection measures in decision tree algorithms serve the purpose of choosing the splitting criterion that best separates class-labelled data partitions into distinct classes. These measures, also known as splitting rules, help identify attributes that, when a split is made based on them, result in the most 'pure' partitions with regard to class labels. By favoring attributes that enhance the purity of class distribution, attribute selection measures directly influence the decision tree's structure by affecting the growth path through the selection of nodes and branches .

Tree pruning helps improve the accuracy of decision tree classifiers by removing branches that reflect noise or outliers in the training data, which results in a tree that is less complex, more comprehensible, and generally performs better on unseen data . The main approaches to pruning are prepruning, which involves stopping the tree construction early, and postpruning, which removes subtrees from a fully grown tree. Prepruning prevents overfitting by halting node splitting below a set threshold, while postpruning, exemplified by the cost complexity pruning in CART, evaluates and potentially removes subtrees based on their impact on error rates .

Decision tree algorithms handle noisy or imprecise data through the use of pruning methods which disentangle the impact of anomalies within the training data. These anomalies often facilitate overfitting, where the tree reflects the noise rather than forming a generalized model. To mitigate this, pruning is employed to remove branches that contribute minimally to predictive accuracy, leading to simpler and more robust tree structures. Consequently, pruned trees not only improve classification accuracy on unseen data but also enhance their interpretability and speed .

Binary and multiway splits in decision tree induction differ based on the number of partitions they create from a node. Binary splits divide a node into exactly two branches, typically used for binary attributes or when a specific criterion simplifies into two outcomes. Multiway splits, on the other hand, create multiple branches emanating from a single node, useful for nominal attributes with multiple categories. The decision to apply binary or multiway splits depends on the nature of the attribute values (binary, nominal, or ordinal) and whether the complexity added by a multiway split can be justified by improved purity or classification performance .

To handle large datasets in decision tree induction when the data cannot fit in memory, scalable approaches such as the RainForest algorithm are utilized. RainForest uses an AVC-set for each attribute at each tree node, describing the training tuples at that node. This method organizes data so that even for large datasets, the memory required is reduced, typically fitting within available memory. Techniques are also in place to manage situations where the AVC-group might not fit in memory, ensuring high scalability for very large datasets .

Decision trees are particularly popular for classification tasks because they require no domain knowledge or parameter settings, making them well-suited for exploratory knowledge discovery. They can handle multidimensional data, and their tree form representation is intuitive and generally easy for humans to interpret. The learning and classification steps are straightforward and fast, and they have been successfully applied in various fields such as medicine and financial analysis. This versatility and ease of use offer significant advantages over more complex methods that might require intensive domain insights .

The Gini index is used in decision tree construction to measure the impurity of a data partition. It is used to identify the best splitting criterion by evaluating the impurity reduction that would result from potential splits on attributes. Specifically, for a given data partition, the Gini index calculates the weighted sum of impurity for resulting partitions after a split that minimizes impurity serves as the best choice for that attribute . For discrete attributes, various binary splits are considered, and for continuous-valued attributes, split points are evaluated in a manner similar to that used with information gain .

The ID3 algorithm decides which attribute to use for a split in a decision tree by calculating and comparing the information gain for each attribute. It primarily utilizes this metric, information gain, to determine which attribute most effectively reduces information entropy. The attribute with the highest information gain is selected to make the split, as it implies it can segment the data into the most informative parts, improving the decision tree's classification capability .

Artificial Neural Networks Syllabus
No ratings yet
Artificial Neural Networks Syllabus
2 pages
Perceptron Trick in Logistic Regression
No ratings yet
Perceptron Trick in Logistic Regression
44 pages
Machine Learning Clustering Techniques
No ratings yet
Machine Learning Clustering Techniques
46 pages
Backpropagation in Multilayer Perceptrons
100% (1)
Backpropagation in Multilayer Perceptrons
11 pages
Understanding Unsupervised Learning Techniques
No ratings yet
Understanding Unsupervised Learning Techniques
4 pages
Classification vs. Clustering Explained
No ratings yet
Classification vs. Clustering Explained
105 pages
K-Means Clustering in Data Mining
No ratings yet
K-Means Clustering in Data Mining
8 pages
Deep Reinforcement Learning Overview
No ratings yet
Deep Reinforcement Learning Overview
75 pages
Artificial Neural Networks Overview
No ratings yet
Artificial Neural Networks Overview
18 pages
DBSCAN Clustering Explained
No ratings yet
DBSCAN Clustering Explained
6 pages
Machine Learning Fundamentals Overview
No ratings yet
Machine Learning Fundamentals Overview
22 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
25 pages
Decision Tree Algorithm with Tuning
No ratings yet
Decision Tree Algorithm with Tuning
5 pages
Unsupervised Learning Overview and Techniques
No ratings yet
Unsupervised Learning Overview and Techniques
18 pages
Instance-Based Learning Overview
No ratings yet
Instance-Based Learning Overview
12 pages
Linear Classification Models Overview
No ratings yet
Linear Classification Models Overview
30 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
29 pages
Machine Learning Lab Viva Questions
100% (1)
Machine Learning Lab Viva Questions
4 pages
Bagging vs Boosting in Ensemble Learning
No ratings yet
Bagging vs Boosting in Ensemble Learning
10 pages
Decision Trees in Machine Learning
No ratings yet
Decision Trees in Machine Learning
28 pages
Understanding DBSCAN Clustering
No ratings yet
Understanding DBSCAN Clustering
18 pages
R2 Model Validation and Cross-Validation
No ratings yet
R2 Model Validation and Cross-Validation
46 pages
Classification Techniques in Data Mining
No ratings yet
Classification Techniques in Data Mining
67 pages
Classification Techniques and Decision Trees
No ratings yet
Classification Techniques and Decision Trees
17 pages
Deep Learning vs. Machine Learning Guide
No ratings yet
Deep Learning vs. Machine Learning Guide
64 pages
Concepts and Techniques: - Chapter 10
No ratings yet
Concepts and Techniques: - Chapter 10
97 pages
Two Feature Mapping Models in ANN
No ratings yet
Two Feature Mapping Models in ANN
25 pages
Non Parametric Methods 8
100% (1)
Non Parametric Methods 8
23 pages
ANN Functional Units for Pattern Recognition
No ratings yet
ANN Functional Units for Pattern Recognition
12 pages
Data Mining Exam Review Guide
100% (1)
Data Mining Exam Review Guide
6 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
36 pages
Constraint-Based Cluster Analysis
No ratings yet
Constraint-Based Cluster Analysis
56 pages
Gini Index vs Gain Ratio in Decision Trees
No ratings yet
Gini Index vs Gain Ratio in Decision Trees
3 pages
McCulloch-Pitts and Perceptron Overview
No ratings yet
McCulloch-Pitts and Perceptron Overview
55 pages
Predictive & Prescriptive Analytics Guide
No ratings yet
Predictive & Prescriptive Analytics Guide
100 pages
Clustering Techniques in Data Mining
No ratings yet
Clustering Techniques in Data Mining
18 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
6 pages
Generative Models in Deep Learning
No ratings yet
Generative Models in Deep Learning
21 pages
Supervised Learning: K-NN & Decision Trees
No ratings yet
Supervised Learning: K-NN & Decision Trees
26 pages
Key Concepts in Machine Learning
No ratings yet
Key Concepts in Machine Learning
4 pages
K-means Clustering with Example
No ratings yet
K-means Clustering with Example
10 pages
Greedy Algorithms: Optimal Solutions Explained
No ratings yet
Greedy Algorithms: Optimal Solutions Explained
20 pages
Decision Trees: Entropy, Gini, & Info Gain
No ratings yet
Decision Trees: Entropy, Gini, & Info Gain
25 pages
Machine Learning Optimization Techniques
No ratings yet
Machine Learning Optimization Techniques
51 pages
Clustering and Learning Techniques Overview
No ratings yet
Clustering and Learning Techniques Overview
26 pages
Understanding Qualitative and Quantitative Data
No ratings yet
Understanding Qualitative and Quantitative Data
89 pages
Exploratory Data Analysis (EDA) Guide
No ratings yet
Exploratory Data Analysis (EDA) Guide
9 pages
Understanding Bayes Classifier Basics
No ratings yet
Understanding Bayes Classifier Basics
23 pages
Enhancing Deep Learning with Bayesian Inference
No ratings yet
Enhancing Deep Learning with Bayesian Inference
28 pages
Data Mining and Warehouse Overview
No ratings yet
Data Mining and Warehouse Overview
26 pages
PyTorch Autoencoder Architecture Guide
No ratings yet
PyTorch Autoencoder Architecture Guide
42 pages
Unsupervised Learning for Children
No ratings yet
Unsupervised Learning for Children
58 pages
When to Use Manhattan Distance in Clustering
No ratings yet
When to Use Manhattan Distance in Clustering
183 pages
Mountain Clustering in Data Analysis
No ratings yet
Mountain Clustering in Data Analysis
21 pages
Time Series Analysis in Data Mining
No ratings yet
Time Series Analysis in Data Mining
17 pages
Perceptron Development History
No ratings yet
Perceptron Development History
34 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
18 pages
LDA and SVM in Machine Learning
No ratings yet
LDA and SVM in Machine Learning
38 pages
Data Classification and Decision Trees
No ratings yet
Data Classification and Decision Trees
12 pages
Classification Methods in DWDM
No ratings yet
Classification Methods in DWDM
14 pages
Impact of Reduced Error Pruning
No ratings yet
Impact of Reduced Error Pruning
5 pages
Types of Loss Functions in ML
No ratings yet
Types of Loss Functions in ML
14 pages
Classification Algorithms in Machine Learning
No ratings yet
Classification Algorithms in Machine Learning
27 pages
Decision Tree Induction Techniques
No ratings yet
Decision Tree Induction Techniques
33 pages
Pruning Neural Networks at Initialization
No ratings yet
Pruning Neural Networks at Initialization
16 pages
Pruning Decision Trees in Python
No ratings yet
Pruning Decision Trees in Python
16 pages
Groundwater Level Prediction for Landslides
No ratings yet
Groundwater Level Prediction for Landslides
8 pages
Decision Tree Clustering Overview
No ratings yet
Decision Tree Clustering Overview
73 pages
200 MCQs on Decision Trees & Random Forests
No ratings yet
200 MCQs on Decision Trees & Random Forests
28 pages
Top Machine Learning Interview Questions
100% (1)
Top Machine Learning Interview Questions
37 pages
Overview of Decision Tree Algorithms
No ratings yet
Overview of Decision Tree Algorithms
15 pages
Understanding Decision Tree Learning
No ratings yet
Understanding Decision Tree Learning
21 pages
Decision Trees and Random Forests Guide
No ratings yet
Decision Trees and Random Forests Guide
31 pages
Real-Time Data Pruning for Smart PMUs
No ratings yet
Real-Time Data Pruning for Smart PMUs
6 pages
Decision Trees in AI and ML
No ratings yet
Decision Trees in AI and ML
22 pages
Classification and Clustering Overview
No ratings yet
Classification and Clustering Overview
38 pages
Understanding Decision Trees Basics
No ratings yet
Understanding Decision Trees Basics
14 pages
Overview of Machine Learning Algorithms
No ratings yet
Overview of Machine Learning Algorithms
61 pages
Learning Decision Trees Explained
No ratings yet
Learning Decision Trees Explained
11 pages
Candidate Splits in Regression Trees
No ratings yet
Candidate Splits in Regression Trees
101 pages
Ensemble Learning: Combining Multiple Learners
No ratings yet
Ensemble Learning: Combining Multiple Learners
65 pages
Understanding Negative Residuals in Regression
No ratings yet
Understanding Negative Residuals in Regression
42 pages
Understanding the AO* Algorithm in AI
No ratings yet
Understanding the AO* Algorithm in AI
17 pages
Learning Decision Trees in ML
No ratings yet
Learning Decision Trees in ML
4 pages
Key ML Concepts for Viva Questions
No ratings yet
Key ML Concepts for Viva Questions
25 pages
Decision Tree Learning Overview
No ratings yet
Decision Tree Learning Overview
6 pages
Implementing ID3, C4.5, and CART Algorithms
No ratings yet
Implementing ID3, C4.5, and CART Algorithms
21 pages
Decision Tree Classification Basics
No ratings yet
Decision Tree Classification Basics
42 pages
Pruning Techniques in CART Decision Trees
No ratings yet
Pruning Techniques in CART Decision Trees
3 pages
Pruning Techniques in Decision Trees
No ratings yet
Pruning Techniques in Decision Trees
17 pages

Decision Tree Induction in DWDM

Uploaded by

Decision Tree Induction in DWDM

Uploaded by

UNIT –III: Syllabus

Fig: Learning Step

Fig: Classification Step

Decision Tree Induction:

Decision Tree Algorithm:

Methods for selecting best test conditions

Attribute Selection Measures

The expected information needed to classify a tuple in D is given by

Information needed after using A to split D into V partitions.

Fig: Unpruned and Pruned Trees

Fig: AVC Sets for dataset

age income student credit_rating buys_computer

Age P N TOTAL I(P,N)

I(2,3) = - 𝟐 𝐥𝐨𝐠𝟐 𝟐 - 𝟑 𝐥𝐨𝐠𝟐 𝟑 = 0.970

I(3,2) = - 𝟑 𝐥𝐨𝐠𝟐 𝟑 - 𝟐 𝐥𝐨𝐠𝟐 𝟐

Age P N TOTAL I(P,N)

Info (D) = 𝟓𝐥 𝐥, 𝐥 + 𝐥 𝑰 𝟒, 𝟎 + 𝟓 𝑰 𝟑, 𝟐 = 0.693

Gain(Age) = Info(D) – InfoAge(D)

The Tree after splitting branches is

The Tree after Tree Pruning,

Finally, The Classification Rules are,

Common questions

In the context of decision tree algorithms like ID3 and C4.5, what is the significance of information gain, and how does it affect the selection of splitting attributes?

Discuss the challenges of decision tree scalability with large datasets and the methods proposed to address these issues.

What is the purpose of using attribute selection measures in decision tree algorithms, and how do they influence the decision tree structure?

How does tree pruning help improve the accuracy of decision tree classifiers, and what are the main approaches to pruning?

Explain how decision tree algorithms handle noisy or imprecise data and what the implications are for the resulting tree structure.

How do binary and multiway splits differ in decision tree induction, and what determines whether an attribute undergoes a binary or multiway split?

What strategies are used to handle large datasets in decision tree induction when the data cannot fit in memory?

Why are decision trees particularly popular for classification tasks, and what are some notable advantages they offer over other methods?

What role does the Gini index play in decision tree construction, and how is it used to select splitting criteria?

How does the ID3 algorithm decide which attribute to use for a split in a decision tree, and what metric does it primarily utilize for this selection?

You might also like