0% found this document useful (0 votes)

66 views9 pages

Supervised vs Unsupervised Learning

Uploaded by

laharicgopal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views9 pages

Supervised vs Unsupervised Learning

Uploaded by

laharicgopal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

1. Supervised and Unsupervised Learning with suitable block diagram (Pg no.

187)

Supervised learning

Definition:

• Training data includes desired solutions called labels.

• The algorithm learns to predict outputs based on labeled examples.

Typical Tasks:

• Classification: Assigns a category or class to input data.

Example: A spam filter trained with emails labeled as spam or ham.

• Regression: Predicts a numeric target value.

Example: Estimating car prices based on features like mileage, age, brand, etc.

Shared Algorithms:

• Some algorithms can handle both classification and regression tasks.

Example: Logistic Regression is used for classification and predicts probabilities (e.g., 20%
spam).

Important Supervised Learning Algorithms:

• k-Nearest Neighbors (k-NN)

• Linear Regression

• Logistic Regression

• Support Vector Machines (SVMs)

• Decision Trees and Random Forests

• Neural Networks

Training Process:

• Requires many examples with both predictors (features) and labels (desired outputs).

• Example: Cars dataset with features like mileage, age, and prices (labels).
Unsupervised learning

Definition:

• Training data is unlabeled; the system learns without supervision.

Common Tasks:

• Clustering: Groups similar data points.

Example: Grouping blog visitors by preferences and behavior.

• Dimensionality Reduction: Simplifies data while retaining important information.

Example: Merging correlated features like mileage and age into one (feature extraction).

• Anomaly Detection: Identifies unusual instances.

Example: Detecting credit card fraud or manufacturing defects.

• Novelty Detection: Identifies new patterns in data.

• Association Rule Learning: Finds relationships between attributes.

Example: Discovering that people who buy barbecue sauce also buy steak.

Important Algorithms:

• Clustering:

o K-Means

o DBSCAN

o Hierarchical Cluster Analysis (HCA)

• Anomaly & Novelty Detection:

o One-Class SVM

o Isolation Forest

• Visualization & Dimensionality Reduction:

o Principal Component Analysis (PCA)

o Kernel PCA

o Locally-Linear Embedding (LLE)

o t-distributed Stochastic Neighbor Embedding (t-SNE)

• Association Rule Learning:

o Apriori

o Eclat

Use Cases:

• Clustering: Understand data organization (e.g., blog visitor behavior).

• Visualization: Represent complex data in 2D or 3D for pattern recognition.

• Dimensionality Reduction: Combine correlated features for simpler analysis.

• Anomaly Detection: Identify outliers or irregularities in datasets.

• Association Rules: Discover useful correlations (e.g., supermarket sales patterns)

2. with suitable block diagram Batch and Online Learning (Pg no. 194)
3. Main challenges of Machine learning (Pg no. 203)

Batch Learning:

• The model is trained using the entire dataset at once.

• Once trained, it cannot be updated without retraining from scratch.

• Suitable for static datasets.

• Process:

1. Collect entire dataset.

2. Train the model offline.

3. Deploy the trained model.

Online Learning:

• The model is updated incrementally with new data as it arrives.

• Suitable for dynamic or large datasets.

• Process:

1. Receive a data instance.

2. Update the model in real-time.

Block Diagram:

• Batch Learning: Full Dataset → Model Training → Model Deployment.

• Online Learning: Data Stream → Incremental Updates → Continuous Model.

4. FIND-S: finding a maximally specific hypothesis with suitable diagram (Module 4 PPT no. 78, Text
book 3 Pg no. 38)
5. Candidate Elimination Algorithm (Module 4 PPT Pg no. 82, Text book 3, Pg no. 44)

6. Explain data pipeline, with an example of "Machine learning pipeline for real estate investments"
(Page no. 218)

Data Pipeline with Example of "Machine Learning Pipeline for Real Estate Investments" A data
pipeline refers to a sequence of data processing steps designed to automate data handling,
transformation, and model processing. Each stage in the pipeline processes data and passes it to the
next. Components are often modular and independent.

For example, in a machine learning pipeline for real estate investments:

• Input: Raw data about districts, including population, median income, and other features.

• Processing: Transformations like filling missing values, scaling features, and encoding
categorical variables.

• Model Training: Train a regression model using the prepared data.

• Output: Predicted housing prices, used in investment decisions

7. How to Prepare the Data for Machine Learning Algorithms (Page no. 244)

Preparing data involves the following steps:

• Data Cleaning: Handle missing data by dropping rows/columns or imputing values (e.g.,
using median values with SimpleImputer).

• Feature Engineering: Create new features like ratios or combinations that may better
represent the data's relationships.

• Scaling and Encoding: Use tools like StandardScaler for numerical attributes and
OneHotEncoder for categorical attributes.

• Pipelines: Automate these transformations using libraries like ColumnTransformer, enabling

consistency across datasets.

8. How to train a Linear Regression model (Page no.256)

Data Preparation: Use a preprocessing pipeline to clean and transform data.

Model Training

from sklearn.linear_model import LinearRegression

lin_reg = LinearRegression()

lin_reg.fit(housing_prepared, housing_labels)

Evaluation: Test the model's predictions using metrics like Root Mean Squared Error (RMSE) to
understand performance.
9. Grid Search to fine tune your model (Page no.257)

Grid Search is used to find the best hyperparameters by systematically testing combinations:

• Define a grid of hyperparameters to explore.

• Use GridSearchCV to automate the search with cross-validation

from sklearn.model_selection import GridSearchCV

param_grid = [

{'n_estimators': [3, 10, 30], 'max_features': [2, 4, 6, 8]},

{'bootstrap': [False], 'n_estimators': [3, 10], 'max_features': [2, 3, 4]},

grid_search = GridSearchCV(RandomForestRegressor(), param_grid, cv=5,

scoring='neg_mean_squared_error')

grid_search.fit(housing_prepared, housing_labels)

Results include the best parameters and model performance.

Common questions

Batch learning involves training the model using the entire dataset at once, which is suitable for static datasets. Once trained, the model is not updated unless it is retrained from scratch, making it appropriate for situations where data does not change frequently . Online learning, on the other hand, incrementally updates the model with new data as it arrives, making it suitable for dynamic or large datasets. It is ideal in scenarios where data continuously flows, such as stock price prediction or real-time recommendation systems .

Grid search enhances model performance by systematically searching through a specified grid of hyperparameters to find the best model configuration. The key components include defining a parameter grid, such as various 'n_estimators' and 'max_features' for a RandomForestRegressor, and using GridSearchCV to automate this exploration with cross-validation . By evaluating combinations using a scoring function, like 'neg_mean_squared_error', grid search identifies hyperparameter sets that minimize error, optimizing model performance on the given data .

The FIND-S algorithm finds a maximally specific hypothesis that fits the given positive training examples, starting with the most specific hypothesis possible and generalizing it only when necessary to cover more positive examples . However, it only considers positive examples and does not handle negative examples, which limits its ability to explore the entire hypotheses space effectively. In contrast, the Candidate Elimination algorithm considers both positive and negative examples, maintaining a general and a specific boundary for hypothesis space exploration, thus providing a more comprehensive hypothesis search and potential better generalization .

Data preparation involves several crucial steps, including data cleaning, feature engineering, scaling, and encoding, each essential for improving model performance and accuracy. Data cleaning addresses missing values using techniques like dropping or imputing them with median values via SimpleImputer . Feature engineering creates new features that better capture relationships within the data. Scaling and encoding ensure numerical and categorical consistency using tools like StandardScaler and OneHotEncoder . Automation of these transformations through libraries like ColumnTransformer ensures consistency across datasets, inevitably impacting the model's ability to learn meaningful patterns .

Different learning algorithms are used for classification and regression to address the distinct nature of the tasks—classification requires assigning input data to discrete categories, while regression predicts continuous variables. Algorithms are designed with these task-specific requirements in mind to optimize performance . Logistic regression exemplifies a shared algorithm that primarily performs classification by predicting probabilities rather than exact class labels, often expressed in percentage terms (e.g., 20% spam). Despite its classification focus, it operates similarly to regression in lining features to a linear decision boundary, demonstrating its dual-task capability .

In a machine learning pipeline for real estate investments, the processing stage includes several critical transformations to prepare raw data for model training. These transformations involve filling missing values to ensure dataset completeness, scaling features for consistency in units, and encoding categorical variables to numerical format for algorithm compatibility . Each of these transformations enables the subsequent model to effectively process input data, leading to more accurate predictions of housing prices, which are foundational for sound investment decisions .

The choice between k-Nearest Neighbors (k-NN) and Support Vector Machines (SVMs) involves several trade-offs. k-NN is a straightforward algorithm that classifies data points based on the majority class of their neighbors, making it simple to implement and requiring no explicit training phase. However, it can become computationally expensive on large datasets and is sensitive to feature scaling . SVMs, on the other hand, are powerful for clear margin separation in high-dimensional spaces and generally perform well when the number of features exceeds the number of samples. They are less prone to overfitting due to their ability to maximize the margin. Yet, SVMs can be complex and demanding in terms of computation time and resources, particularly with non-linear kernels on large datasets . The choice depends largely on the specific dataset characteristics and problem requirements.

Integrating pipelines with machine learning algorithms streamlines and automates the entire data processing and model deployment workflow, significantly enhancing efficiency in dynamic environments. Pipelines ensure consistent data transformations, such as cleaning, scaling, and encoding, across varying datasets, which is critical when data characteristics are constantly changing . They also facilitate easy adaptation and updating of models with new data streams, as seen in online learning contexts, thereby maintaining high model performance over time. By automating these processes, pipelines reduce manual intervention, allow for rapid iteration, and ensure robustness in model deployment, essential for responsive and resource-efficient solutions in dynamic real-world applications .

Supervised learning requires labeled data, meaning that the training data includes the desired output or solution (labels), allowing the algorithm to learn and predict based on examples. Typical tasks include classification and regression, like a spam filter trained with labeled emails for classification and car price prediction for regression . Unsupervised learning, in contrast, works with unlabeled data, where the system learns without explicit supervision. It includes tasks like clustering, dimensionality reduction, anomaly detection, novelty detection, and association rule learning, such as grouping blog visitors by preferences or discovering purchasing patterns .

Clustering and association rule learning extract patterns and insights from unstructured, unlabeled data, offering significant real-world value. Clustering groups similar data points, enhancing understanding of data organization; for instance, segmenting blog visitors by behavior aids targeted marketing strategies . Association rule learning uncovers relationships between attributes, such as correlating purchases in retail, leading to effective cross-selling strategies, like promoting items frequently bought together . These tasks empower businesses to make data-driven decisions, optimize operations, and personalize customer interactions without requiring labeled datasets.

Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
21 pages
Unit-1 Machine Learning Techniques
No ratings yet
Unit-1 Machine Learning Techniques
10 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
24 pages
Data Sources for Machine Learning Models
No ratings yet
Data Sources for Machine Learning Models
36 pages
Mun-Csc 205 Machine Learning Materials
No ratings yet
Mun-Csc 205 Machine Learning Materials
18 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
51 pages
Beginner's Guide to Machine Learning
No ratings yet
Beginner's Guide to Machine Learning
14 pages
AI ML Overview
No ratings yet
AI ML Overview
37 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
9 pages
Machine Learning Basics and Steps
No ratings yet
Machine Learning Basics and Steps
13 pages
Supervised Learning Overview and Applications
No ratings yet
Supervised Learning Overview and Applications
8 pages
ML UNIT 1 Final - 260122 - 091452
No ratings yet
ML UNIT 1 Final - 260122 - 091452
27 pages
Machine Learning Model Workflow Guide
No ratings yet
Machine Learning Model Workflow Guide
14 pages
Machine Learning Types and Architecture Guide
No ratings yet
Machine Learning Types and Architecture Guide
242 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
9 pages
Machine Learning: A Comprehensive Guide
No ratings yet
Machine Learning: A Comprehensive Guide
7 pages
Machine Learning Workshop Overview
No ratings yet
Machine Learning Workshop Overview
78 pages
Types of Machine Learning Explained
No ratings yet
Types of Machine Learning Explained
20 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
4 pages
Machine Learning Overview and Techniques
No ratings yet
Machine Learning Overview and Techniques
7 pages
Machine Learning Fundamentals and Applications
No ratings yet
Machine Learning Fundamentals and Applications
7 pages
Understanding Machine Learning Concepts
No ratings yet
Understanding Machine Learning Concepts
29 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
46 pages
Social Media Analytics Techniques
No ratings yet
Social Media Analytics Techniques
77 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
31 pages
Understanding Machine Learning Types
No ratings yet
Understanding Machine Learning Types
38 pages
Machine Learning
No ratings yet
Machine Learning
46 pages
6.machine Learning-1
No ratings yet
6.machine Learning-1
6 pages
Machine Learning Types and Processes
No ratings yet
Machine Learning Types and Processes
23 pages
Comprehensive Machine Learning Guide
No ratings yet
Comprehensive Machine Learning Guide
19 pages
Machine Learning Fundamentals Explained
No ratings yet
Machine Learning Fundamentals Explained
9 pages
Machine Learning Concepts Overview
No ratings yet
Machine Learning Concepts Overview
64 pages
Understanding 5.1k Means in ML
No ratings yet
Understanding 5.1k Means in ML
10 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
60 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
26 pages
Overview of Machine Learning Concepts
No ratings yet
Overview of Machine Learning Concepts
8 pages
Types of Machine Learning Explained
No ratings yet
Types of Machine Learning Explained
30 pages
Machine Learning Basics and Applications
No ratings yet
Machine Learning Basics and Applications
140 pages
Classification vs Regression in ML
No ratings yet
Classification vs Regression in ML
15 pages
Data Science Algorithms Overview
No ratings yet
Data Science Algorithms Overview
6 pages
Steps for Machine Learning Projects
No ratings yet
Steps for Machine Learning Projects
9 pages
Ds Sharing 2024
No ratings yet
Ds Sharing 2024
63 pages
Machine Learning Concepts Overview
No ratings yet
Machine Learning Concepts Overview
3 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
10 pages
Machine Learning Fundamentals Overview
No ratings yet
Machine Learning Fundamentals Overview
22 pages
QueExplain Machine Learning
No ratings yet
QueExplain Machine Learning
3 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
89 pages
Machine Learning Key Tasks Explained
No ratings yet
Machine Learning Key Tasks Explained
16 pages
AI and ML: Concepts and Applications
No ratings yet
AI and ML: Concepts and Applications
27 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
138 pages
Python Unit 4 Part-2
No ratings yet
Python Unit 4 Part-2
14 pages
ML Notes Interview
No ratings yet
ML Notes Interview
19 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
12 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
54 pages
Machine Learning Fundamentals Guide
No ratings yet
Machine Learning Fundamentals Guide
8 pages
Arrays and Pointers in C++ Programming
No ratings yet
Arrays and Pointers in C++ Programming
25 pages
SQL and Delphi Exam Instructions
No ratings yet
SQL and Delphi Exam Instructions
5 pages
Nonlinear Karmic System Simulation
No ratings yet
Nonlinear Karmic System Simulation
3 pages
GDI Tutorial: Colors and Fonts in C++
No ratings yet
GDI Tutorial: Colors and Fonts in C++
33 pages
Infosys PFS Project Strategy Insights
No ratings yet
Infosys PFS Project Strategy Insights
5 pages
Statistics For Business and Economics
No ratings yet
Statistics For Business and Economics
31 pages
OSI vs TCP/IP and Network Topologies Explained
No ratings yet
OSI vs TCP/IP and Network Topologies Explained
11 pages
Debugging Errors in MBWhatsApp App
No ratings yet
Debugging Errors in MBWhatsApp App
15 pages
Predictive Analytics Seminar Report
No ratings yet
Predictive Analytics Seminar Report
40 pages
Java Development with IntelliJ IDEA Guide
No ratings yet
Java Development with IntelliJ IDEA Guide
11 pages
WMA11/01 IAL Pure Math P1 Oct 2023
No ratings yet
WMA11/01 IAL Pure Math P1 Oct 2023
1 page
209-914-0002-EN8 - D-BOX G3 Motion System Guide
No ratings yet
209-914-0002-EN8 - D-BOX G3 Motion System Guide
53 pages
IoT Energy Optimization in Die Casting
No ratings yet
IoT Energy Optimization in Die Casting
4 pages
Commercial Programming Project Management
100% (1)
Commercial Programming Project Management
68 pages
Grade 6 ICT Exam Instructions
No ratings yet
Grade 6 ICT Exam Instructions
14 pages
ProStart 2-Button Remote Starter Guide
No ratings yet
ProStart 2-Button Remote Starter Guide
4 pages
CNN Techniques for Image Colorization
No ratings yet
CNN Techniques for Image Colorization
9 pages
Student 360° Portal User Guide
No ratings yet
Student 360° Portal User Guide
17 pages
Assembly Language Arithmetic Operations
No ratings yet
Assembly Language Arithmetic Operations
5 pages
Generalist Anomaly Detection with InCTRL
No ratings yet
Generalist Anomaly Detection with InCTRL
11 pages
Agam Hotspot Login Credentials List
No ratings yet
Agam Hotspot Login Credentials List
13 pages
Beginner's Guide to Machine Learning
No ratings yet
Beginner's Guide to Machine Learning
6 pages
Thyroid Disease Classification via ML
No ratings yet
Thyroid Disease Classification via ML
13 pages
Deploying and Securing Mule API
No ratings yet
Deploying and Securing Mule API
3 pages
Manus Technologies and Open-Source Alternatives
No ratings yet
Manus Technologies and Open-Source Alternatives
12 pages
Cyber Terrorism Under IT Act 2000
No ratings yet
Cyber Terrorism Under IT Act 2000
15 pages
IoT Network Models and Standards Overview
100% (1)
IoT Network Models and Standards Overview
27 pages
Logo Sekolah Methodist Kuantan - Google Search
No ratings yet
Logo Sekolah Methodist Kuantan - Google Search
1 page
Senior PEGA Developer Interview Prep Guide
No ratings yet
Senior PEGA Developer Interview Prep Guide
5 pages
TCIL-IT Computer Training Syllabus
No ratings yet
TCIL-IT Computer Training Syllabus
13 pages

Supervised vs Unsupervised Learning

Uploaded by

Supervised vs Unsupervised Learning

Uploaded by

1. Supervised and Unsupervised Learning with suitable block diagram (Pg no.

• Training data includes desired solutions called labels.

• The algorithm learns to predict outputs based on labeled examples.

• Classification: Assigns a category or class to input data.

• Regression: Predicts a numeric target value.

• Some algorithms can handle both classification and regression tasks.

Important Supervised Learning Algorithms:

• k-Nearest Neighbors (k-NN)

• Support Vector Machines (SVMs)

• Decision Trees and Random Forests

• Training data is unlabeled; the system learns without supervision.

• Clustering: Groups similar data points.

• Dimensionality Reduction: Simplifies data while retaining important information.

• Anomaly Detection: Identifies unusual instances.

• Novelty Detection: Identifies new patterns in data.

• Association Rule Learning: Finds relationships between attributes.

o Hierarchical Cluster Analysis (HCA)

• Anomaly & Novelty Detection:

• Visualization & Dimensionality Reduction:

o Principal Component Analysis (PCA)

o Locally-Linear Embedding (LLE)

o t-distributed Stochastic Neighbor Embedding (t-SNE)

• Association Rule Learning:

• Clustering: Understand data organization (e.g., blog visitor behavior).

• Visualization: Represent complex data in 2D or 3D for pattern recognition.

• Anomaly Detection: Identify outliers or irregularities in datasets.

• Association Rules: Discover useful correlations (e.g., supermarket sales patterns)

• The model is trained using the entire dataset at once.

• Once trained, it cannot be updated without retraining from scratch.

• Suitable for static datasets.

1. Collect entire dataset.

2. Train the model offline.

3. Deploy the trained model.

• The model is updated incrementally with new data as it arrives.

• Suitable for dynamic or large datasets.

1. Receive a data instance.

2. Update the model in real-time.

• Batch Learning: Full Dataset → Model Training → Model Deployment.

• Online Learning: Data Stream → Incremental Updates → Continuous Model.

For example, in a machine learning pipeline for real estate investments:

• Model Training: Train a regression model using the prepared data.

• Output: Predicted housing prices, used in investment decisions

Preparing data involves the following steps:

• Pipelines: Automate these transformations using libraries like ColumnTransformer, enabling

8. How to train a Linear Regression model (Page no.256)

Data Preparation: Use a preprocessing pipeline to clean and transform data.

from sklearn.linear_model import LinearRegression

• Define a grid of hyperparameters to explore.

• Use GridSearchCV to automate the search with cross-validation

from sklearn.model_selection import GridSearchCV

{'n_estimators': [3, 10, 30], 'max_features': [2, 4, 6, 8]},

{'bootstrap': [False], 'n_estimators': [3, 10], 'max_features': [2, 3, 4]},

grid_search = GridSearchCV(RandomForestRegressor(), param_grid, cv=5,

Results include the best parameters and model performance.

Common questions

What are the main differences between batch and online learning, and in which scenarios would each be appropriate?

What are the main differences between batch and online learning, and in which scenarios would each be appropriate?

How does grid search improve the performance of a machine learning model, and what are the key components involved in implementing it?

How does grid search improve the performance of a machine learning model, and what are the key components involved in implementing it?

In what ways does the FIND-S algorithm contribute to hypothesis learning, and what limitations does it present compared to other algorithms like the Candidate Elimination algorithm?

In what ways does the FIND-S algorithm contribute to hypothesis learning, and what limitations does it present compared to other algorithms like the Candidate Elimination algorithm?

What process and tools are involved in preparing data for machine learning algorithms, and why are these steps important?

What process and tools are involved in preparing data for machine learning algorithms, and why are these steps important?

Why is it important to use different learning algorithms for classification and regression tasks, and how does logistic regression function as a shared algorithm for both?

Why is it important to use different learning algorithms for classification and regression tasks, and how does logistic regression function as a shared algorithm for both?

What transformations are typically involved in the processing stage of a machine learning pipeline, particularly in the context of real estate investments?

What transformations are typically involved in the processing stage of a machine learning pipeline, particularly in the context of real estate investments?

Discuss the trade-offs between using k-Nearest Neighbors (k-NN) and Support Vector Machines (SVMs) for a classification problem.

Discuss the trade-offs between using k-Nearest Neighbors (k-NN) and Support Vector Machines (SVMs) for a classification problem.

How does the integration of pipelines and machine learning algorithms enhance model deployment in dynamic environments?

How does the integration of pipelines and machine learning algorithms enhance model deployment in dynamic environments?

How do supervised and unsupervised learning differ in terms of data labeling and typical tasks?

How do supervised and unsupervised learning differ in terms of data labeling and typical tasks?

How do unsupervised learning tasks like clustering and association rule learning provide value in real-world applications?

How do unsupervised learning tasks like clustering and association rule learning provide value in real-world applications?

You might also like