0% found this document useful (0 votes)
32 views38 pages

Machine Learning Performance Metrics Guide

The document discusses various techniques for evaluating machine learning model performance, including cross-validation, precision, recall, and ROC curves. It covers k-fold cross-validation and bootstrapping for evaluating classifier performance on limited datasets. Underfitting and overfitting are explained as well as the importance of training, validation, and test sets for accurate assessment of model generalization.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views38 pages

Machine Learning Performance Metrics Guide

The document discusses various techniques for evaluating machine learning model performance, including cross-validation, precision, recall, and ROC curves. It covers k-fold cross-validation and bootstrapping for evaluating classifier performance on limited datasets. Underfitting and overfitting are explained as well as the importance of training, validation, and test sets for accurate assessment of model generalization.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Performance Evaluation

Module-II
EC 19-203-0811
Introduction to Machine Learning

Course Outcomes
1. To understand various machine learning techniques
2. To acquire knowledge about classification techniques.
3. To understand dimensionality reduction techniques and decision trees.
4. To understand unsupervised machine learning techniques.

7-Performance Evaluation Metrics 1:35 PM 2


Syllabus
Module I
Introduction: Machine Learning, Applications, Supervised Learning -Classification, Regression,
Unsupervised Learning, Reinforcement Learning, Supervised Learning: Learning a Class from Examples,
Vapnik - Chervonenkis (VC) Dimension, Probably Approximately Correct (PAC) Learning, Noise, Learning
Multiple Classes, Regression, Model Selection and Generalization, Dimensions of a Supervised Machine
Learning Algorithm

Module II
Multilayer Perceptrons: Introduction, The Perceptron, Training a Perceptron, Learning Boolean
Functions, Multilayer Perceptrons, Backpropagation Algorithm, Training Procedures. Classification- Cross
validation and re-sampling methods- Kfold cross validation, Boot strapping, Measuring classifier
performance- Precision, recall, ROC curves. Bayes Theorem, Bayesian classifier, Maximum Likelihood
estimation, Density Functions.

7-Performance Evaluation Metrics 1:35 PM 3


Syllabus

Module II
Multilayer Perceptrons: Introduction, The Perceptron, Training a Perceptron, Learning Boolean
Functions, Multilayer Perceptrons, Backpropagation Algorithm, Training Procedures. Classification- Cross
validation and re-sampling methods- Kfold cross validation, Boot strapping, Measuring classifier
performance- Precision, recall, ROC curves. Bayes Theorem, Bayesian classifier, Maximum Likelihood
estimation, Density Functions.

7-Performance Evaluation Metrics 1:35 PM 4


Syllabus
Module III
Dimensionality Reduction: Introduction, Subset Selection, Principal Components
Analysis, Factor Analysis, Multidimensional Scaling, Linear Discriminant Analysis,
Isomap, Locally Linear Embedding, Decision Trees: Introduction, Univariate Trees,
Pruning, Rule Extraction from Trees, Learning Rules from Data, Multivariate Trees,
Introduction to Linear Discrimination, Generalizing the Linear Model.

Module IV
Clustering: Introduction, Mixture Densities, k-Means Clustering, Expectation-
Maximization Algorithm, Mixtures of Latent Variable Models, Supervised Learning after
Clustering, Hierarchical Clustering, Choosing the Number of Clusters.

7-Performance Evaluation Metrics 1:35 PM 5


References

1. Stephen Marsland, “MACHINE LEARNING An Algorithmic Perspective”, 2nd


Edition, CRC Press, 2015. [Ch-2]

2. Christopher M. Bishop, “Pattern Recognition and Machine Learning”,


Springer,2006.

3. Ethem Alpaydin, “Introduction to Machine Learning”, Second Edition, 2010


[Ch-11]

4. Images from different websites and NPTEL course slides

7-Performance Evaluation Metrics 1:35 PM 6


Contents

• Cross validation and re-sampling methods


• K-fold cross validation
• Boot strapping
• Measuring classifier performance
Precision
Recall
ROC curves

7-Performance Evaluation Metrics 1:35 PM 7


Contents

• Cross validation and re-sampling methods


• K-fold cross validation
• Boot strapping
• Measuring classifier performance
Precision
Recall
ROC curves

7-Performance Evaluation Metrics 1:35 PM 8


Purpose of Learning
• get better at predicting the outputs, be it class labels or
continuous regression values.
• to compare the predictions with known target labels
• error that the algorithm makes on the training set.

• However, the algorithms must be generalised to examples that were not


seen in the training set
need some different data, a test set.
• But test set does not modify the weights or other parameters for them
we use them to decide how well the algorithm has learnt.
• The only problem with this is that it reduces the amount of data that
we have available for training

7-Performance Evaluation Metrics 1:35 PM 9


Overfitting
• to make sure that enough training is done so that the algorithm generalises
well.
• But there is at least as much danger in over-training as there is in under-
training.
• The number of degrees of variability in most machine learning algorithms is
huge — for a neural network there are lots of weights, and each of them can
vary.
• This is undoubtedly more variation than there is in the function
• If we train for too long, then we will overfit the data, which means that we
have learnt about the noise and inaccuracies in the data as well as the
actual function. Therefore, the model that we learn will be much too
complicated, and won’t be able to generalise.

7-Performance Evaluation Metrics 1:35 PM 10


Overfitting

Finding the generating function Overfitting: NN matches the input perfectly,


including the noise in them reduces the
chance of generalisation
7-Performance Evaluation Metrics 1:35 PM 11
Underfitting

• Machine Learning Algorithm is said to have underfitting when it


cannot capture the underlying trend of the data.

• i.e., it only performs well on training data but performs poorly on


testing data

7-Performance Evaluation Metrics 1:35 PM 12


Underfitting

• model is too simple for the data.

• Eg: data is quadratic and model


is linear

[Link]
ea8964d9c45c#:~:text=Underfitting%20means%20that%20your%20model,val%2Ftest%20error%20is%20large.

7-Performance Evaluation Metrics 1:35 PM 13


Underfitting

[Link]

7-Performance Evaluation Metrics 1:35 PM 14


Variance

• The difference between the error rate of training data and testing
data is called variance.

• If the difference is high then it’s called high variance and when the
difference of errors is low then it’s called low variance.

• Usually, make a low variance for a generalized model.

[Link]

7-Performance Evaluation Metrics 1:35 PM 15


Training Set

• The sample of data used to fit the model.


• The actual dataset that we use to train the model (weights and
biases in the case of a Neural Network).
• The model sees and learns from this data.

7-Performance Evaluation Metrics 1:35 PM 16


Validation Set
• is a set of data, separate from the training set, that is used to validate our
model performance during training.

• This validation process gives information that helps us tune the model’s
hyperparameters and configurations accordingly.

• It is like a critic telling us whether the training is moving in the right


direction or not.

• The model is trained on the training set, and, simultaneously, the model
evaluation is performed on the validation set after every epoch.

7-Performance Evaluation Metrics 1:35 PM 17


Test Set

• The test set is a separate set of data used to test the


model after completing the training.

• It provides an unbiased final model performance metric in terms of


accuracy, precision, etc.

• OR simply, it answers the question of "How well does the model


perform?"

7-Performance Evaluation Metrics 1:35 PM 18


Visualisation of the Splits

[Link]

7-Performance Evaluation Metrics 1:35 PM 19


How to split data?

7-Performance Evaluation Metrics 1:35 PM 20


Resampling Techniques

• Machine Learning models often fails to generalize well on data it has not
been trained on.

• Sometimes, it fails miserably, sometimes it gives somewhat better than


miserable performance.

• To be sure that the model can perform well on unseen data, we use a re-
sampling technique, called Cross-Validation

7-Performance Evaluation Metrics 1:35 PM 21


Cross Validation
• is a technique used to evaluate the performance of a model on unseen
data.

• It involves dividing the available data into multiple folds or subsets,


using one of these folds as a validation set, and training the model on
the remaining folds.

• This process is repeated multiple times, each time using a different fold
as the validation set.

• Finally, the results from each validation step are averaged to produce a
more robust estimate of the model’s performance

7-Performance Evaluation Metrics 1:35 PM 22


k- fold Cross Validation

[Link]

7-Performance Evaluation Metrics 1:35 PM 23


k- fold Cross Validation

• In K-Fold CV, we have a parameter ‘k’.

• This parameter decides how many folds the dataset is divided.

• Every fold gets chance to appears in the training set (k-1) times,
which in turn ensures that every observation in the dataset
appears in the dataset, thus enabling the model to learn the
underlying data distribution better.

• The value of ‘k’ used is generally between 5 or 10.

[Link]

7-Performance Evaluation Metrics 1:35 PM 24


Examples of k- fold CV

[Link]

7-Performance Evaluation Metrics 1:35 PM 25


Leave-One-Out Cross-Validation

Extreme Case:
the algorithm
is validated on just one
piece of data, training on all
of the rest.

[Link]

7-Performance Evaluation Metrics 1:35 PM 26


Bootstrapping

• calculated on multiple bags of random samples with replacement

• it can be used to infer population results of machine learning


models trained on random samples with replacement.

[Link]
algorithms/bootstrapping/#:~:text=Particularly%20useful%20for%20a
ssessing%20the,replacement%20during%20the%20sampling%20pro
cess.

7-Performance Evaluation Metrics 1:35 PM 27


Confusion matrix

[Link]

7-Performance Evaluation Metrics 1:35 PM 28


Accuracy
The higher the accuracy, the better the model.

Accuracy = TP+TN / TP+FP+FN+TN

100+150/100+20+30+150 = 0.83

This means that the machine learning


algorithm is 83% accurate in its predictions.

[Link]

7-Performance Evaluation Metrics 1:35 PM 29


Misclassification Rate
• Also referred to as the error rate

• the misclassification rate defines how


often the model makes incorrect
predictions.

Error rate = FP+FN/TP+FP+FN+TN


20+30/100+20+30+150 = 0.17

• Hence, the machine learning algorithm


is 17% inaccurate in its predictions.
[Link]

7-Performance Evaluation Metrics 1:35 PM 30


Precision

precision equals 100/100+20 = 0.83


This means that out of all the positive
predictions, 83% were true.
[Link]

7-Performance Evaluation Metrics 1:35 PM 31


Recall

Recall = TP/TP+FN = 100/100+30 = 0.76


This means that out of all the actual positive cases, only
[Link] 76% were predicted correctly.
7-Performance Evaluation Metrics 1:35 PM 32
F1-score

[Link]

7-Performance Evaluation Metrics 11:29 AM 33


Receiver Operating Characteristics

[Link]

7-Performance Evaluation Metrics 1:35 PM 34


Evaluating Regression Models

[Link]
everyone/lecture/akqf1/evaluating-machine-learning-models

7-Performance Evaluation Metrics 1:35 PM 35


Evaluating Regression Models

[Link]
everyone/lecture/akqf1/evaluating-machine-learning-models

7-Performance Evaluation Metrics 1:35 PM 36


Conclusion

• Performance Evaluation Metrics


• Training –Validation –Test Split
• Overfitting-Underfitting
• Confusion Matrix
• Receiver Operating Characteristics

7-Performance Evaluation Metrics 1:35 PM 37


Thank You

You might also like