AdaBoost Experiment in ML1 Lab

The document outlines an experiment for evaluating the AdaBoost algorithm in a Machine Learning course, focusing on its performance with various base learners and hyperparameter tuning. It explains the boosting process, including the creation of decision stumps, calculation of sample weights, and iterative model training to minimize errors. Additionally, it provides lab assignments involving the implementation of classifiers on synthetic and credit card fraud datasets, emphasizing the importance of handling class imbalance and hyperparameter tuning.

Uploaded by

yugsavlabooks

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views5 pages

AdaBoost Experiment in ML1 Lab

Uploaded by

yugsavlabooks

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Department of Computer Science and Engineering (Data Science)

Subject: Machine Learning – I (DJS23DSL402)

AY: 2024-25

Experiment 6

(AdaBoost)

Aim: Evaluate the performance of boosting algorithm (AdaBoost) with different base learners and
hyperparameter tuning.

Theory:

Boosting algorithms improve the prediction power by converting a number of weak learners to strong
learners. The principle behind boosting algorithms is first built a model on the training dataset, then a
second model is built to rectify the errors present in the first model. This procedure is continued until
and unless the errors are minimized, and the dataset is predicted correctly. Let’s take an example to
understand this, suppose you built a decision tree algorithm on the Titanic dataset and from there you
get an accuracy of 80%. After this, you apply a different algorithm and check the accuracy and it comes
out to be 75% for KNN and 70% for Linear Regression. The accuracy differs when we built a different
model on the same dataset. But what if we use combinations of all these algorithms for making the final
prediction? We’ll get more accurate results by taking the average of results from these models. We can
increase the prediction power in this way.

AdaBoost also called Adaptive Boosting is a technique in Machine Learning used as an Ensemble
Method. The most common algorithm used with AdaBoost is decision trees with one level that means
with Decision trees with only 1 split. These trees are also called Decision Stumps.

Algorithm builds a model and gives equal weights to all the data points. It then assigns higher weights to
points that are wrongly classified. Now all the points which have higher weights are given more
importance in the next model. It will keep training models until and unless a lower error is received.

1
Department of Computer Science and Engineering (Data Science)

Step 1 – The Image is shown below is the actual representation of our dataset. Since the target column
is binary it is a classification problem. First of all these data points will be assigned some weights.
Initially, all the weights will be equal.

The formula to calculate the sample weights is:

Where N is the total number of datapoints. since we have 5 data points so the sample weights assigned
will be 1/5.
Step 2 – We start by seeing how well “Gender” classifies the samples and will see how the variables (Age,
Income) classifies the samples.
We’ll create a decision stump for each of the features and then calculate the Gini Index of each tree. The
tree with the lowest Gini Index will be our first stump.
Here in our dataset let’s say Gender has the lowest gini index so it will be our first stump.
Step 3 – Calculate the “Amount of Say” or “Importance” or “Influence” for this classifier in classifying the
datapoints using this formula:

The total error is nothing, but the summation of all the sample weights of misclassified data points.
Here in our dataset let’s assume there is 1 wrong output, so our total error will be 1/5, and
alpha(performance of the stump) will be:

2
Department of Computer Science and Engineering (Data Science)

Note: Total error will always be between 0 and 1.

0 Indicates perfect stump and 1 indicates horrible stump.

From the graph above we can see that when there is no misclassification then we have no error (Total
Error = 0), so the “amount of say (alpha)” will be a large number.
When the classifier predicts half right and half wrong then the Total Error = 0.5 and the importance
(amount of say) of the classifier will be 0. If all the samples have been incorrectly classified then the
error will be very high (approx. to 1) and hence our alpha value will be a negative integer.
Step 4 –We need to update the weights because if the same weights are applied to the next model, then
the output received will be the same as what was received in the first model. The wrong predictions will
be given more weight whereas the correct predictions weights will be decreased. Now when we build our
next model after updating the weights, more preference will be given to the points with higher weights.
After finding the importance of the classifier and total error we need to finally update the weights and for
this, we use the following formula:

The amount of say (alpha) will be negative when the sample is correctly classified.
The amount of say (alpha) will be positive when the sample is miss-classified.
There are four correctly classified samples and 1 wrong, here the sample weight of that datapoint
is 1/5 and the amount of say/performance of the stump of Gender is 0.69.

For wrongly classified samples the updated weights will be:

Note: See the sign of alpha when I am putting the values, the alpha is negative when the data point is
correctly classified, and this decreases the sample weight from 0.2 to 0.1004. It is positive when there
is misclassification, and this will increase the sample weight from 0.2 to 0.3988

3
Department of Computer Science and Engineering (Data Science)

We know that the total sum of the sample weights must be equal to 1 but here if we sum up all the new
sample weights, we will get 0.8004. To bring this sum equal to 1 we will normalize these weights by
dividing all the weights by the total sum of updated weights that is 0.8004. So, after normalizing the
sample weights we get this dataset and now the sum is equal to 1.

Step 5 – Now we need to make a new dataset to see if the errors decreased or not. For this we will
remove the “sample weights” and “new sample weights” column and then based on the “new sample
weights” we will divide our data points into buckets.

Step 6 – We are almost done, now what the algorithm does is selects random numbers from 0-1. Since
incorrectly classified records have higher sample weights, the probability to select those records is very
high. Suppose the 5 random numbers our algorithm take is 0.38,0.26,0.98,0.40,0.55.
Now we will see where these random numbers fall in the bucket and according to it, we’ll make our new
dataset shown below.

This comes out to be our new dataset and we see the datapoint which was wrongly classified has been
selected 3 times because it has a higher weight.
Step 9 – Now this act as our new dataset and we need to repeat all the above steps i.e.
1. Assign equal weights to all the datapoints
2. Find the stump that does the best job classifying the new collection of samples by finding their
Gini Index and selecting the one with the lowest Gini index
3. Calculate the “Amount of Say” and “Total error” to update the previous sample weights.
4. Normalize the new sample weights.

4
Department of Computer Science and Engineering (Data Science)

Iterate through these steps until and unless a low training error is achieved.
Suppose with respect to our dataset we have constructed 3 decision trees (DT1, DT2, DT3) in a
sequential manner. If we send our test data now it will pass through all the decision trees and finally,
we will see which class has the majority, and based on that we will do
predictions for our test dataset.

Lab Assignments to complete in this session:

Use the given dataset and perform the following tasks:

Dataset 1: Synthetic dataset
Dataset 2: [Link]: The dataset contains transactions made by credit cards in September
2013 by European cardholders. This dataset presents transactions that occurred in two days, where we
have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds)
account for 0.172% of all transactions. It contains only numerical input variables which are the result of
a PCA transformation. Unfortunately, due to confidentiality issues, the original features and more
background information about the data are not provided. Features V1, V2, … V28 are the principal
components obtained with PCA, the only features which have not been transformed with PCA are 'Time'
and 'Amount'. Feature 'Time' contains the seconds elapsed between each transaction and the first
transaction in the dataset. The feature 'Amount' is the transaction Amount, this feature can be used for
example-dependant cost-sensitive learning. Feature 'Class' is the response variable and it takes value 1
in case of fraud and 0 otherwise.

1. Implement Decision Tree classifier and Logistic Regression on Dataset 1 using K fold cross validation
and compare the results with AdaBoost classifier with base learner as Decision tree and Logistic
Regression.
2. Check if there is class imbalance problem in Dataset 2. Compare the results of decision tree
classifier and AdaBoost classifier on Dataset 2 and write your analysis.
3. Implement AdaBoost with base learner as decision tree on dataset 2 using K fold cross validation.
Perform Hyperparameter tuning using (a) different depth, (b) different learning rate and (c) grid search
CV. Show your results using Boxplot.

Write Ups:
1. Write the algorithm of AdaBoost.

Implementing AdaBoost in Python
No ratings yet
Implementing AdaBoost in Python
6 pages
Understanding the AdaBoost Algorithm
No ratings yet
Understanding the AdaBoost Algorithm
7 pages
Understanding Adaboost Algorithm
No ratings yet
Understanding Adaboost Algorithm
22 pages
AdaBoost Algorithm Explained in ML
No ratings yet
AdaBoost Algorithm Explained in ML
6 pages
Understanding the AdaBoost Algorithm
No ratings yet
Understanding the AdaBoost Algorithm
7 pages
Understanding the AdaBoost Algorithm
No ratings yet
Understanding the AdaBoost Algorithm
9 pages
Understanding Boosting in Machine Learning
No ratings yet
Understanding Boosting in Machine Learning
4 pages
M 8
No ratings yet
M 8
17 pages
Understanding AdaBoost Algorithm in ML
No ratings yet
Understanding AdaBoost Algorithm in ML
5 pages
14 Ensemble 2 Boosting
No ratings yet
14 Ensemble 2 Boosting
17 pages
Introduction To Machine Learning - Boosting
No ratings yet
Introduction To Machine Learning - Boosting
6 pages
Ensemble Methods in Machine Learning
No ratings yet
Ensemble Methods in Machine Learning
32 pages
Understanding the AdaBoost Algorithm
No ratings yet
Understanding the AdaBoost Algorithm
5 pages
Understanding Ensemble Methods in ML
No ratings yet
Understanding Ensemble Methods in ML
30 pages
Understanding Boosting Algorithms in ML
No ratings yet
Understanding Boosting Algorithms in ML
33 pages
Understanding Ensemble Learning Methods
No ratings yet
Understanding Ensemble Learning Methods
16 pages
AdaBoost and Gradient Boosting Tasks
No ratings yet
AdaBoost and Gradient Boosting Tasks
3 pages
Understanding Ensemble Classifiers Techniques
No ratings yet
Understanding Ensemble Classifiers Techniques
50 pages
AI Classifiers and Face Detection Techniques
No ratings yet
AI Classifiers and Face Detection Techniques
24 pages
Boosting in Machine Learning Explained
No ratings yet
Boosting in Machine Learning Explained
45 pages
Understanding Boosting and Adaboost Algorithm
No ratings yet
Understanding Boosting and Adaboost Algorithm
20 pages
Addressing Imbalanced Data in ML
No ratings yet
Addressing Imbalanced Data in ML
40 pages
Improving ML Accuracy on Imbalanced Data
No ratings yet
Improving ML Accuracy on Imbalanced Data
6 pages
Boosting Techniques: AdaBoost vs XGBoost
No ratings yet
Boosting Techniques: AdaBoost vs XGBoost
6 pages
Boosting
No ratings yet
Boosting
13 pages
Ensemble Classifiers Overview
100% (1)
Ensemble Classifiers Overview
37 pages
Ensemble Methods in Machine Learning
No ratings yet
Ensemble Methods in Machine Learning
45 pages
Adaboost Lab Assignment Guide
No ratings yet
Adaboost Lab Assignment Guide
6 pages
Ensemble Learning Techniques Explained
No ratings yet
Ensemble Learning Techniques Explained
41 pages
Ensemble Methods in Machine Learning
No ratings yet
Ensemble Methods in Machine Learning
54 pages
Ensemble Methods in Machine Learning
No ratings yet
Ensemble Methods in Machine Learning
36 pages
Ensemble Learning Techniques Explained
No ratings yet
Ensemble Learning Techniques Explained
15 pages
Understanding Ensemble Classifiers
No ratings yet
Understanding Ensemble Classifiers
43 pages
Ensemble Classifiers: Boosting & Bagging
No ratings yet
Ensemble Classifiers: Boosting & Bagging
37 pages
Ensemble Learning Techniques Explained
No ratings yet
Ensemble Learning Techniques Explained
13 pages
CatBoost for Imbalanced Datasets
No ratings yet
CatBoost for Imbalanced Datasets
20 pages
Understanding Ensemble Techniques in ML
No ratings yet
Understanding Ensemble Techniques in ML
22 pages
Introduction To Ensemble Learning
No ratings yet
Introduction To Ensemble Learning
10 pages
Machine Learning Lecture Notes
No ratings yet
Machine Learning Lecture Notes
20 pages
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
36 pages
Ensemble Learning Techniques Explained
No ratings yet
Ensemble Learning Techniques Explained
38 pages
Data Science Midterm Exam Solutions
No ratings yet
Data Science Midterm Exam Solutions
8 pages
Ensemble Learning: Bagging & Boosting Techniques
No ratings yet
Ensemble Learning: Bagging & Boosting Techniques
57 pages
AdaBoost Algorithm Flowchart
No ratings yet
AdaBoost Algorithm Flowchart
49 pages
Chap4 Ensemble 28
No ratings yet
Chap4 Ensemble 28
27 pages
Classification Performance Metrics Explained
No ratings yet
Classification Performance Metrics Explained
6 pages
Machine Learning for Hotel Profit Prediction
No ratings yet
Machine Learning for Hotel Profit Prediction
7 pages
Ensemble Learning Techniques Explained
No ratings yet
Ensemble Learning Techniques Explained
31 pages
Ensemble Methods for Classification Accuracy
No ratings yet
Ensemble Methods for Classification Accuracy
15 pages
Data Mining: Classification & Prediction Techniques
No ratings yet
Data Mining: Classification & Prediction Techniques
21 pages
Ensemble Methods for Enhanced Classification
No ratings yet
Ensemble Methods for Enhanced Classification
12 pages
Machine Learning AL3451 Answer Key
No ratings yet
Machine Learning AL3451 Answer Key
12 pages
XGBoost Regression Model Training Guide
No ratings yet
XGBoost Regression Model Training Guide
4 pages
Data Mining Classification Overview
No ratings yet
Data Mining Classification Overview
37 pages
Ensembling Methods in Machine Learning
No ratings yet
Ensembling Methods in Machine Learning
5 pages
ES335 Machine Learning Course Notes
No ratings yet
ES335 Machine Learning Course Notes
22 pages
Evaluating Machine Learning Models
No ratings yet
Evaluating Machine Learning Models
13 pages
Machine Learning Ensembles Explained
No ratings yet
Machine Learning Ensembles Explained
31 pages
Excel Mastery: Beginner to Intermediate Course
No ratings yet
Excel Mastery: Beginner to Intermediate Course
6 pages
Android Activity Lifecycle Overview
No ratings yet
Android Activity Lifecycle Overview
47 pages
PC Software and Hardware Discussions
No ratings yet
PC Software and Hardware Discussions
144 pages
Sodium Analyzer for Continuous Monitoring
No ratings yet
Sodium Analyzer for Continuous Monitoring
2 pages
DP23-AOU Acquisition Manual
No ratings yet
DP23-AOU Acquisition Manual
2 pages
Information Modeling and Relational Databases Second Edition Terry Halpin Online Version
No ratings yet
Information Modeling and Relational Databases Second Edition Terry Halpin Online Version
82 pages
Stipendium Hungaricum 2026/2027 Guide
No ratings yet
Stipendium Hungaricum 2026/2027 Guide
32 pages
Notebook Alienware Area 51m R2 User Manual Eng
No ratings yet
Notebook Alienware Area 51m R2 User Manual Eng
94 pages
Relational Data Mining Overview
No ratings yet
Relational Data Mining Overview
57 pages
Mad Technical Book 3161612
No ratings yet
Mad Technical Book 3161612
72 pages
Java Multithreading Concepts Explained
No ratings yet
Java Multithreading Concepts Explained
18 pages
Fenix System: Advanced Pump Monitoring
No ratings yet
Fenix System: Advanced Pump Monitoring
4 pages
Unit 5 Structure Union and File Handling
No ratings yet
Unit 5 Structure Union and File Handling
43 pages
Upload A DocDXGument - Scribd
No ratings yet
Upload A DocDXGument - Scribd
2 pages
Derivatives in ML and AI Applications
No ratings yet
Derivatives in ML and AI Applications
2 pages
SEO Strategies for Content Optimization
No ratings yet
SEO Strategies for Content Optimization
1 page
Java Programming Concepts Explained
100% (1)
Java Programming Concepts Explained
2 pages
Bank Management System Project Overview
No ratings yet
Bank Management System Project Overview
18 pages
Canada Visa Guide for International Invitees
No ratings yet
Canada Visa Guide for International Invitees
5 pages
Introduction to Java Server Pages (JSP)
No ratings yet
Introduction to Java Server Pages (JSP)
23 pages
Update MVX Control System Instructions
No ratings yet
Update MVX Control System Instructions
3 pages
PDBMS Question Bank for Database Design
No ratings yet
PDBMS Question Bank for Database Design
8 pages
Online Student Registration Guide
No ratings yet
Online Student Registration Guide
9 pages
Understanding Functions and Their Types
No ratings yet
Understanding Functions and Their Types
45 pages
H12-891 V1.0 Exam Questions Overview
No ratings yet
H12-891 V1.0 Exam Questions Overview
18 pages
State Patrol Ticket Processing Overview
86% (7)
State Patrol Ticket Processing Overview
4 pages
Understanding Data Types and Analysis
No ratings yet
Understanding Data Types and Analysis
93 pages
Unit 1
No ratings yet
Unit 1
20 pages
DBB Contracting LLC: Pile Shoe Design Report
No ratings yet
DBB Contracting LLC: Pile Shoe Design Report
26 pages
SIWES Report: Laptop & Phone Repair
No ratings yet
SIWES Report: Laptop & Phone Repair
54 pages