0% found this document useful (0 votes)

6 views10 pages

ml-1

The document discusses various machine learning techniques, focusing on supervised learning, overfitting, underfitting, and the importance of labeled data. It covers concepts such as entropy, information gain, ensemble learning, and specific algorithms like Gradient Boosting and K-Nearest Neighbors. Additionally, it highlights the significance of model evaluation and the bias-variance tradeoff in improving predictive performance.

Uploaded by

bantudeepika07

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views10 pages

ml-1

Uploaded by

bantudeepika07

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

INSTITUTE OF AERONAUTICAL ENGINEERING

AN ASSIGNMENT REPORT OF

Machine Learning Techniques and Practices

COURSE CODE- ACAD20

STUDENT NAME:[Link]

ROLL NUMBER: 23951A0444

ELECTRONICS AND COMMUNICATION

ENGINEERING

INSTITUTE OF AERONAUTICAL ENGINEERING DUNDIGAL,

HYDERABAD-500 043, TELANGANA, INDIA.
1. Overfitting and Underfitting in Supervised Learning

Overfitting and underfitting are two fundamental problems in supervised learning that
directly impact the predictive performance of machine learning models.

Overfitting occurs when a model learns not only the underlying patterns in the training data
but also the noise and random fluctuations. As a result, the model performs extremely well
on training data but poorly on unseen test data. This typically happens when the model is
too complex relative to the amount of available data. For example, a high-degree polynomial
regression model may perfectly fit training points but fail to generalize. Overfitting is
characterized by low bias and high variance. Common causes include small datasets,
excessive features, very complex models, and insufficient regularization.

Solutions to overfitting include:

1. Cross-validation to evaluate model generalization.
2. Regularization techniques such as L1 and L2.
3. Pruning in decision trees.
4. Early stopping in neural networks.
5. Increasing training data.
6. Dropout in deep learning.
7. Feature selection and dimensionality reduction.

Underfitting occurs when a model is too simple to capture the underlying structure of the
data. It results in poor performance on both training and test data. This is characterized by
high bias and low variance. For example, fitting a linear model to nonlinear data causes
underfitting.

Solutions to underfitting include:

1. Increasing model complexity.
2. Adding more relevant features.
3. Reducing regularization.
4. Using nonlinear models.
5. Training longer in iterative algorithms.

Balancing bias and variance is key. Techniques such as bias-variance tradeoff analysis,
cross-validation, and model comparison help in selecting appropriate models.
2. Supervised Learning and Importance of Labeled Data

(a) The dataset described is used for supervised learning because it contains input features
along with corresponding output labels. In supervised learning, the model learns a mapping
function from inputs to outputs based on labeled examples.

(b) Classification problems require labeled data because the goal is to assign inputs to
predefined categories. Without labels, the algorithm cannot learn the relationship between
input features and target classes. Labels serve as ground truth. During training, the model
compares predicted labels with actual labels using a loss function. The difference (error)
guides weight updates.

For example, in spam detection, emails are labeled as 'spam' or 'not spam.' The model learns
patterns such as certain keywords or sender characteristics. Without labels, it would be
impossible to measure error or improve predictions.

Labeled data ensures:

1. Supervised training with clear objectives.
2. Quantifiable performance evaluation.
3. Model validation and tuning.
4. Ability to compute metrics such as accuracy, precision, recall, and F1-score.

Thus, labeled data is essential for classification because it provides the learning signal.
3. Model Performs Well on Training but Poor on Test Data

(a) The problem is overfitting.

(b) The model performs well on training data but poorly on test data because it has
memorized the training examples instead of learning general patterns. This may be due to
excessive model complexity, insufficient training samples, or noisy data.

Reasons include:
1. High variance model.
2. Too many parameters.
3. Small dataset.
4. Noise in training data.

Solutions:
1. Apply regularization (L1/L2).
2. Use cross-validation.
3. Simplify the model.
4. Collect more data.
5. Use dropout (for neural networks).
6. Prune decision trees.

The goal is to improve generalization performance.

4. Concept and Types of Machine Learning

Machine Learning is a branch of artificial intelligence that enables systems to learn patterns
from data and improve performance without explicit programming. It relies on algorithms
that learn from experience.

Types of Machine Learning:

1. Supervised Learning – Uses labeled data (classification, regression).
2. Unsupervised Learning – No labels (clustering, dimensionality reduction).
3. Semi-supervised Learning – Combination of labeled and unlabeled data.
4. Reinforcement Learning – Learning through rewards and penalties.

Supervised Learning in Detail:

Supervised learning uses labeled datasets. It consists of:
- Classification (categorical output)
- Regression (continuous output)

Example 1: House price prediction (Regression)

Example 2: Email spam detection (Classification)

Key components:
- Input features
- Target variable
- Loss function
- Optimization algorithm

Algorithms:
- Linear Regression
- Logistic Regression
- Decision Trees
- KNN
- SVM
- Neural Networks

Advantages:
- Clear objective
- High accuracy with sufficient data

Disadvantages:
- Requires labeled data
- Risk of overfitting

Supervised learning is widely used in finance, healthcare, marketing, and image recognition.
5. Entropy and Information Gain Calculation

Entropy measures impurity in a dataset. It is calculated as:

Entropy(S) = -Σ p_i log2(p_i)

Where p_i is the probability of class i.

Information Gain (IG) measures reduction in entropy after splitting:

IG = Entropy(before split) - Entropy(after split)

Given:
Entropy before split = 0.94
Entropy after split = 0.55

IG = 0.94 - 0.55 = 0.39

Thus, the information gain is 0.39.

Higher information gain indicates better attribute for splitting in decision trees.
6. Ensemble Learning

Ensemble learning combines multiple models to improve predictive performance.

(a) Ensemble methods are used to:

- Improve accuracy
- Reduce variance
- Reduce bias
- Increase robustness

(b) Types of ensemble techniques:

1. Bagging (Bootstrap Aggregating)
2. Boosting
3. Stacking
4. Voting
5. Random Forest

Ensembles outperform individual weak learners by aggregating predictions.

7. Gradient Boosting Algorithm

Gradient Boosting is an ensemble method that builds models sequentially.

(a) Loss Function:

Common loss functions include:
- Mean Squared Error (regression)
- Log Loss (classification)

(b) Sequential Learning Process:

1. Initialize model with constant prediction.
2. Compute residual errors.
3. Train weak learner on residuals.
4. Update model predictions.
5. Repeat iteratively.
Each new model corrects previous errors. Learning rate controls contribution.

8. Bagging and Unstable Classifiers

Bagging improves unstable classifiers by reducing variance.

Steps:
1. Create bootstrap samples.
2. Train base learners independently.
3. Aggregate predictions (average or majority vote).

Bagging stabilizes models like decision trees and reduces overfitting. Random Forest is a
popular example.

9. Kernel Functions in SVM

SVM performance depends on kernel choice.

(a) Linear Kernel:

Suitable for linearly separable data. Fast and efficient.

(b) Polynomial Kernel:

Captures polynomial relationships. Degree controls complexity.

(c) RBF Kernel:

Handles nonlinear data effectively. Uses gamma parameter.
Kernel selection impacts bias-variance tradeoff and computational complexity.

10. K-Nearest Neighbors Algorithm

KNN is a supervised learning algorithm that classifies data based on nearest neighbors.

Steps:
1. Choose K.
2. Calculate distance.
3. Select K nearest neighbors.
4. Assign majority class.

Effect of K:
Small K:
- Low bias
- High variance
- Sensitive to noise

Large K:
- High bias
- Low variance
- More stable but may underfit

Choosing optimal K requires cross-validation.

KNN is simple but computationally expensive for large datasets.

Supervised vs. Unsupervised Learning
No ratings yet
Supervised vs. Unsupervised Learning
7 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
37 pages
Topic 2 - Overview of Machine Learning
No ratings yet
Topic 2 - Overview of Machine Learning
34 pages
Machine Learning Overview and Types
No ratings yet
Machine Learning Overview and Types
19 pages
Machine Learning Overview and Techniques
No ratings yet
Machine Learning Overview and Techniques
16 pages
Machine Learning Goals and Types Explained
No ratings yet
Machine Learning Goals and Types Explained
12 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
25 pages
Supervised vs Unsupervised Learning Guide
No ratings yet
Supervised vs Unsupervised Learning Guide
18 pages
Understanding Machine Learning Types
No ratings yet
Understanding Machine Learning Types
17 pages
Machine Learning Types and Techniques
No ratings yet
Machine Learning Types and Techniques
19 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
5 pages
Supervised Learning: Concepts & Algorithms
No ratings yet
Supervised Learning: Concepts & Algorithms
88 pages
Machine Learning Process and Concepts
No ratings yet
Machine Learning Process and Concepts
25 pages
Machine Learning Concepts Overview
No ratings yet
Machine Learning Concepts Overview
284 pages
Supervised Learning Overview and Techniques
No ratings yet
Supervised Learning Overview and Techniques
13 pages
ML UNIT 1 Final - 260122 - 091452
No ratings yet
ML UNIT 1 Final - 260122 - 091452
27 pages
Machine Learning Workflow Overview
No ratings yet
Machine Learning Workflow Overview
32 pages
6.machine Learning-1
No ratings yet
6.machine Learning-1
6 pages
ML Unit-1
No ratings yet
ML Unit-1
14 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
31 pages
Introduction to Machine Learning Basics
No ratings yet
Introduction to Machine Learning Basics
14 pages
Machine Learning
No ratings yet
Machine Learning
10 pages
Classification vs Regression in ML
No ratings yet
Classification vs Regression in ML
15 pages
Supervised Learning Overview and Workflow
No ratings yet
Supervised Learning Overview and Workflow
16 pages
Machine Learning 1
No ratings yet
Machine Learning 1
31 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
28 pages
INT522 Unit-5
No ratings yet
INT522 Unit-5
59 pages
Machine Learning Concepts Explained
No ratings yet
Machine Learning Concepts Explained
16 pages
Unit-1 Machine Learning Techniques
No ratings yet
Unit-1 Machine Learning Techniques
10 pages
Supervised Learning in Machine Learning
No ratings yet
Supervised Learning in Machine Learning
6 pages
Machine Learning Concepts Explained
No ratings yet
Machine Learning Concepts Explained
34 pages
DL Unit 1
No ratings yet
DL Unit 1
54 pages
Types and Techniques in Machine Learning
No ratings yet
Types and Techniques in Machine Learning
17 pages
Supervised Learning: Key Concepts and Metrics
No ratings yet
Supervised Learning: Key Concepts and Metrics
23 pages
Machine Learning
No ratings yet
Machine Learning
91 pages
Machine Learning: Models & Evaluation Insights
No ratings yet
Machine Learning: Models & Evaluation Insights
124 pages
Overview of Machine Learning Models
No ratings yet
Overview of Machine Learning Models
11 pages
Machine Learning Challenges and Solutions
No ratings yet
Machine Learning Challenges and Solutions
48 pages
Ai Notes-unit II
No ratings yet
Ai Notes-unit II
18 pages
Types of Learning in AI and ML
No ratings yet
Types of Learning in AI and ML
9 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
49 pages
Evaluating Machine Learning Models
100% (2)
Evaluating Machine Learning Models
10 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
32 pages
Supervised vs Unsupervised Learning Explained
No ratings yet
Supervised vs Unsupervised Learning Explained
16 pages
Machine Learning Applications and Concepts
No ratings yet
Machine Learning Applications and Concepts
11 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
46 pages
Machine Learning Learning Types & Challenges
No ratings yet
Machine Learning Learning Types & Challenges
38 pages
Machine Learning Model Selection Guide
No ratings yet
Machine Learning Model Selection Guide
2 pages
Supervised ML Model Implementation Guide
No ratings yet
Supervised ML Model Implementation Guide
5 pages
Machine Learning Paradigms Overview
No ratings yet
Machine Learning Paradigms Overview
56 pages
Machine Learning Overview and Applications
No ratings yet
Machine Learning Overview and Applications
46 pages
Key Concepts in Machine Learning
No ratings yet
Key Concepts in Machine Learning
3 pages
Supervised Learning: Key Concepts Explained
No ratings yet
Supervised Learning: Key Concepts Explained
77 pages
ML 1-5
No ratings yet
ML 1-5
176 pages
Machine Learning Fundamentals Overview
No ratings yet
Machine Learning Fundamentals Overview
5 pages
Machine Learning Concepts Overview
No ratings yet
Machine Learning Concepts Overview
64 pages
Overview of Machine Learning Models
No ratings yet
Overview of Machine Learning Models
7 pages
MATH 277 Assignment 2 Overview
No ratings yet
MATH 277 Assignment 2 Overview
14 pages
Understanding Spatial Autocorrelation
No ratings yet
Understanding Spatial Autocorrelation
39 pages
Hypothesis Testing in Engineering Statistics
100% (1)
Hypothesis Testing in Engineering Statistics
22 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
3 pages
Network Meta-Analysis for Decision-Making
No ratings yet
Network Meta-Analysis for Decision-Making
5 pages
Measures of Central Tendency & Variability
No ratings yet
Measures of Central Tendency & Variability
43 pages
ivreg29: Advanced IV Regression in Stata
No ratings yet
ivreg29: Advanced IV Regression in Stata
26 pages
DSC4821 Stochastic Modelling Assessment 2025
No ratings yet
DSC4821 Stochastic Modelling Assessment 2025
5 pages
Comparison of Methods: Passing and Bablok Regression - Biochemia Medica
No ratings yet
Comparison of Methods: Passing and Bablok Regression - Biochemia Medica
5 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
3 pages
Types of Averages Explained
No ratings yet
Types of Averages Explained
1 page
Inflation's Impact on Malaysian Consumption
No ratings yet
Inflation's Impact on Malaysian Consumption
30 pages
Business Forecasting Overview
No ratings yet
Business Forecasting Overview
106 pages
Understanding Heteroscedasticity in Regression
No ratings yet
Understanding Heteroscedasticity in Regression
23 pages
Attribute Measurement System Analysis
No ratings yet
Attribute Measurement System Analysis
14 pages
Sample Size Estimation in Surveys
No ratings yet
Sample Size Estimation in Surveys
7 pages
Understanding MM Algorithms in Statistics
No ratings yet
Understanding MM Algorithms in Statistics
8 pages
Probability Assessment Questions
No ratings yet
Probability Assessment Questions
2 pages
Pearson R-Chi Square-ANOVA
No ratings yet
Pearson R-Chi Square-ANOVA
92 pages
Marketing Analytics Overview and Benefits
No ratings yet
Marketing Analytics Overview and Benefits
14 pages
CH 10
100% (1)
CH 10
56 pages
Understanding Forecasting in Operations
No ratings yet
Understanding Forecasting in Operations
114 pages
Performing a Two-Sample T-Test in Excel
No ratings yet
Performing a Two-Sample T-Test in Excel
2 pages
S&P 500 Financial Econometrics Analysis
No ratings yet
S&P 500 Financial Econometrics Analysis
6 pages
Student Demographics and Performance Analysis
No ratings yet
Student Demographics and Performance Analysis
16 pages
STAT 200 Final Exam Overview
0% (2)
STAT 200 Final Exam Overview
3 pages
T-Test Analysis and Hypothesis Testing
No ratings yet
T-Test Analysis and Hypothesis Testing
4 pages
Variance and Standard Deviation Formulas
No ratings yet
Variance and Standard Deviation Formulas
12 pages
Measures of Central Tendency Explained
0% (1)
Measures of Central Tendency Explained
26 pages
Labour Supply Analysis for Women
No ratings yet
Labour Supply Analysis for Women
5 pages

ml-1

Uploaded by

ml-1

Uploaded by

INSTITUTE OF AERONAUTICAL ENGINEERING

Machine Learning Techniques and Practices

COURSE CODE- ACAD20

ROLL NUMBER: 23951A0444

ELECTRONICS AND COMMUNICATION

INSTITUTE OF AERONAUTICAL ENGINEERING DUNDIGAL,

Solutions to overfitting include:

Solutions to underfitting include:

Labeled data ensures:

(a) The problem is overfitting.

The goal is to improve generalization performance.

Types of Machine Learning:

Supervised Learning in Detail:

Example 1: House price prediction (Regression)

Entropy measures impurity in a dataset. It is calculated as:

Entropy(S) = -Σ p_i log2(p_i)

Where p_i is the probability of class i.

Information Gain (IG) measures reduction in entropy after splitting:

IG = Entropy(before split) - Entropy(after split)

IG = 0.94 - 0.55 = 0.39

Thus, the information gain is 0.39.

Ensemble learning combines multiple models to improve predictive performance.

(a) Ensemble methods are used to:

(b) Types of ensemble techniques:

Ensembles outperform individual weak learners by aggregating predictions.

7. Gradient Boosting Algorithm

Gradient Boosting is an ensemble method that builds models sequentially.

(a) Loss Function:

(b) Sequential Learning Process:

8. Bagging and Unstable Classifiers

Bagging improves unstable classifiers by reducing variance.

9. Kernel Functions in SVM

SVM performance depends on kernel choice.

(a) Linear Kernel:

(b) Polynomial Kernel:

(c) RBF Kernel:

10. K-Nearest Neighbors Algorithm

Choosing optimal K requires cross-validation.

KNN is simple but computationally expensive for large datasets.

You might also like