0% found this document useful (0 votes)
47 views49 pages

Introduction to Machine Learning Concepts

The document provides an overview of machine learning (ML), defining it as a subset of artificial intelligence that enables computers to learn from data and improve performance without explicit programming. It discusses various types of ML, including supervised, unsupervised, and reinforcement learning, along with their advantages, disadvantages, and applications. Additionally, it covers key concepts such as training error, generalization error, overfitting, underfitting, and the bias-variance trade-off.

Uploaded by

Kunal Patil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views49 pages

Introduction to Machine Learning Concepts

The document provides an overview of machine learning (ML), defining it as a subset of artificial intelligence that enables computers to learn from data and improve performance without explicit programming. It discusses various types of ML, including supervised, unsupervised, and reinforcement learning, along with their advantages, disadvantages, and applications. Additionally, it covers key concepts such as training error, generalization error, overfitting, underfitting, and the bias-variance trade-off.

Uploaded by

Kunal Patil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

MACHINE

LEARNING-C
SC701
MODULE 1- INTRODUCTION TO ML
Machine learning is a growing technology which
enables computers
to learn automatically from past data.
Machine learning uses various
algorithms for building mathematical
models and making predictions using historical
data or information. Currently, it is being used for
various tasks such as image recognition, speech
recognition, email filtering, Facebook
auto-tagging, recommender system, and many
more.
What is Machine Learning

In the real world, we are surrounded by humans


who can learn everything from their experiences
with their learning capability, and we have
computers or machines which work on our
instructions. But can a machine also learn from
experiences or past data like a human does? So
here comes the role of Machine Learning.
Machine Learning is said as a subset of
artificial intelligence
that is mainly concerned
with the development of

algorithms which allow


a computer to learn from
the data and past experiences
on their own. The term machine learning was
first introduced by Arthur Samuel in 1959.
WE CAN DEFINE IT IN A SUMMARIZED WAY AS:

“MACHINE LEARNING ENABLES A MACHINE TO


AUTOMATICALLY LEARN FROM DATA, IMPROVE
PERFORMANCE FROM EXPERIENCES, AND PREDICT
THINGS WITHOUT BEING EXPLICITLY
PROGRAMMED”.
Bloom’s Taxonomy
HOW DOES MACHINE LEARNING WORK

A Machine Learning system learns from


historical data, builds the prediction
models, and whenever it receives new
data, predicts the output for it. The
accuracy of predicted output depends upon
the amount of data, as the huge amount of
data helps to build a better model which
predicts the output more accurately.
MACHINE LEARNING MODEL
Training Data

Train Machine
Learning algorithm

Trained Model

Test the model with


new input

Is model
N Y Machine Learning
performing
Model Ready
correctly?
DATA FORMATS
Structured Data is stored in predefined format and is
highly specific.

Unstructured Data is a collection of many varied data


types which are stored in their native formats.

Semi structured Data that does not follow the tabular


data structure models associated with relational
databases or other data table.
DIKW PYRAMID

Understanding

Meaning

content

Events, Records and


Transaction
CATEGORIES OF DATA ANALYTICS
TYPES OF MACHINE LEARNING

Based on the methods and way of learning,


machine learning is divided into various types

Supervised Machine Learning

Unsupervised Machine Learning

Reinforcement Learning
Supervised Machine Learning

Supervised machine learning is based on


supervision. It means in the supervised learning
technique, we train the machines using the
"labeled" dataset, and based on the training, the
machine predicts the output. Here, the labeled data
specifies that some of the inputs are already
mapped to the output.
Categories of Supervised Machine Learning

Supervised machine learning can be classified


into two types of problems, which are given
below:

Classification

Regression
ADVANTAGES

Since supervised learning work with the labeled


dataset so we can have an exact idea about the
data

These algorithms are helpful in predicting the


output on the basis of prior experience.
DISADVANTAGES
These algorithms are not able to solve complex
tasks.

It may predict the wrong output if the test data is


different from the training data.

It requires lots of computational time to train the


algorithm.
APPLICATIONS

Image Segmentation

Medical Diagnosis

Fraud Detection

Spam detection

Speech Recognition
UNSUPERVISED MACHINE LEARNING
The main aim of the unsupervised learning
algorithm is to group or categories the
unsorted/unlabelled dataset according to the
similarities, patterns, and differences.

Machines are instructed to find the hidden


patterns from the input dataset.
CATEGORIES OF UNSUPERVISED MACHINE
LEARNING

Unsupervised Learning can be further classified


into two types, which are given below:

Clustering

Association
ADVANTAGES:

These algorithms can be used for complicated


tasks compared to the supervised ones because
these algorithms work on the unlabeled dataset.

Unsupervised algorithms are preferable for


various tasks as getting the unlabeled dataset is
easier as compared to the labeled dataset.
DISADVANTAGES:

The output of an unsupervised algorithm can be


less accurate as the dataset is not labeled, and
algorithms are not trained with the exact output
in prior.

Working with Unsupervised learning is more


difficult as it works with the unlabeled dataset
that does not map with the output.
APPLICATIONS

Network Analysis

Recommendation Systems

Anomaly Detection

Singular Value Decomposition


REINFORCEMENT LEARNING

Reinforcement learning works on a


feedback-based process, in which an AI agent (A
software component) automatically explore its
surrounding by hitting & trail, taking action,
learning from experiences, and improving its
performance.

Agent gets rewarded for each good action and get


punished for each bad action; hence the goal of
reinforcement learning agent is to maximize the
rewards.
ADVANTAGES

It helps in solving complex real-world problems


which are difficult to be solved by general
techniques.

The learning model of RL is similar to the


learning of human beings; hence most accurate
results can be found.

Helps in achieving long term results.


DISADVANTAGE

RL algorithms are not preferred for simple


problems.

RL algorithms require huge data and


computations.

Too much reinforcement learning can lead to an


overload of states which can weaken the results.
APPLICATION

Video Games

Resource Management

Robotics

Text Mining
ISSUES IN MACHINE LEARNING
Inadequate Training Data

Poor quality of data

Massive training data

Over fitting and Under fitting

Monitoring and maintenance

Getting bad recommendations


Lack of skilled resources

Limited possibilities to reuse a model

Data Bias
APPLICATION OF MACHINE LEARNING
STEPS IN DEVELOPING A MACHINE LEARNING
APPLICATION.
Collect Data

Prepare the input data

Analyze the input data

Train the algorithm

Test the algorithm

Use the algorithm

Periodic Revisit
TRAINING ERROR

Training error is simply an error that occurs


during model training, i.e. dataset
inappropriately handle during preprocessing or
in feature selection.
GENERALIZATION ERROR

In supervised learning applications in machine


learning and statistical learning theory,
generalization error (also known as the
out-of-sample error) is a measure of how
accurately an algorithm is able to predict
outcome values for previously unseen data.
Notice that the gap between predictions and
observed data is induced by model
inaccuracy, sampling error, and noise. Some
of the errors are reducible but some are not.
Choosing the right algorithm and tuning
parameters could improve model accuracy, but we
will never be able to make our predictions 100%
accurate.
TRAINING ERROR AND GENERALIZATION
ERROR
OVERFITTING

A statistical model is said to be over fitted when


the model does not make accurate predictions on
testing data. When a model gets trained with so
much data, it starts learning from the noise and
inaccurate data entries in our data set. And when
testing with test data results in High variance.
Then the model does not categorize the data
correctly, because of too many details and noise.
REASONS FOR OVERFITTING:

High variance and low bias.

The model is too complex.

The size of the training data.


Bias- differences between actual or expected
values and the predicted values are known as
error or bias error or error due to bias
Low Bias: Low bias value means fewer
assumptions are taken to build the target
function. In this case, the model will closely
match the training dataset.
High Bias: High bias value means more
assumptions are taken to build the target
function. In this case, the model will not match
the training dataset closely.
Variance is the measure of spread in data from
its mean position. In machine learning variance
is the amount by which the performance of a
predictive model changes when it is trained on
different subsets of the training data.

Low variance: Low variance means that the


model is less sensitive to changes in the training
data and can produce consistent estimates of the
target function with different subsets of data
from the same distribution.
High variance: High variance means that the
model is very sensitive to changes in the training
data and can result in significant changes in the
estimate of the target function when trained on
different subsets of data from the same
distribution.
EXAMPLE-
Actual- 9.2
Predicted-8.9,12.2,7.2,7.8

Bias= |actual- predicted|

Low bias= 0.3


High bias=2.0

Variance= variety in predicted output

Low variance= 7.2,7.8


High variance= 12.2,8.9
TECHNIQUES TO REDUCE OVERFITTING

Decrease training data.

Reduce model complexity.


UNDERFITTING

Machine learning algorithm is said to have under


fitting when it cannot capture the underlying
trend of the data, i.e., it only performs well on
training data but performs poorly on testing
data.
REASONS FOR UNDERFITTING

High bias and low variance.

The size of the training dataset used is not


enough.

The model is too simple.

Training data is not cleaned and also contains


noise in it.
TECHNIQUES TO REDUCE UNDERFITTING

Increase model complexity

Increase the number of features, performing feature


engineering

Remove noise from the data.


BIAS-VARIANCE TRADE-OFF.
The bias is known as the difference between the
prediction of the values by the Machine
Learning model and the correct value. Being high
in biasing gives a large error in training as well
as testing data.

The variability of model prediction for a given


data point which tells us the spread of our data is
called the variance of the model. The model with
high variance has a very complex fit to the
training data and thus is not able to fit
accurately on the data which it hasn’t seen
before. As a result, such models perform very well
on training data but have high error rates on test
data
If the algorithm is too simple (hypothesis with
linear equation) then it may be on high bias and
low variance condition and thus is error-prone. If
algorithms fit too complex (hypothesis with high
degree equation) then it may be on high variance
and low bias. In the latter condition, the new
entries will not perform well. Well, there is
something between both of these conditions,
known as a Trade-off or Bias Variance Trade-off.
BIAS-VARIANCE TRADE OFF

You might also like