0% found this document useful (0 votes)

15 views15 pages

Cross Validation Techniques Explained

The document explains the concepts of underfitting and overfitting in machine learning, highlighting how overfitting leads to poor performance on new data while underfitting fails to capture data trends. It introduces cross-validation as a technique to mitigate overfitting by partitioning data into subsets for training and testing models, with methods such as K-fold, Stratified K-fold, and Leave One Out Cross Validation (LOOCV). The document emphasizes the importance of selecting an appropriate value for k in K-fold cross-validation to ensure representative training and testing groups.

Uploaded by

Nishant

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views15 pages

Cross Validation Techniques Explained

Uploaded by

Nishant

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Cross Validation

Contents
Understanding Underfitting and Overfitting:
Overfit Model
Overfitting occurs when a statistical model or
machine learning algorithm captures the
noise of the data. Intuitively, overfitting
occurs when the model or the algorithm fits
the data too well.

Overfitting a model result in good accuracy

for training data set but poor results on new
data sets. Such a model is not of any use in
the real world as it is not able to predict
outcomes for new cases.
Underfit Model
Underfitting occurs when a statistical model or machine
learning algorithm cannot capture the underlying trend
of the data.

 Intuitively, underfitting occurs when the model or the

algorithm does not fit the data well enough.

Underfitting is often a result of an excessively simple

model. By simple we mean that
 The missing data is not handled properly.
 No outlier treatment.
 Removing of irrelevant features or features which do
not contribute much to the predictor variable.
How to tackle Problem of Overfitting:
The answer is Cross Validation

A key challenge with overfitting, and with machine

learning in general, is that we can’t know how well our
model will perform on new data until we actually test it.

There are different types of Cross Validation

Techniques but the overall concept remains the same,
• To partition the data into a number of subsets
• Hold out a set at a time and train the model on
remaining set
• Test model on hold out set
How K-fold works
Divide your training data into K equal-sized
“folds.”
Algorithm iterates through each fold, treating
that fold as holdout data, training a model on
all the other K-1 folds, and evaluating the
model’s performance on the one holdout fold.
This results in having K different models,
each with an out of sample model accuracy
score on a different holdout set.
The average of these K models’ out-of-sample
scores is the model’s cross-validation score.
What is Cross Validation?
 Cross-validation is a technique for evaluating ML models by
training several ML models on subsets of the available input
data and evaluating them on the complementary subset of the
data. ... In k-fold cross-validation, you split the input data
into k subsets of data (also known as folds).

 Here are the steps involved in cross validation:

1. You reserve a sample data set
2. Train the model using the remaining part of the dataset
3. Use the reserve sample of the test (validation) set. This will
help you in gauging the effectiveness of your model’s
performance. If your model delivers a positive result on
validation data, go ahead with the current model. It rocks!
Why to use Cross Validation?
Cross Validation is a very useful technique for
assessing the effectiveness of your model,
particularly in cases where you need to
mitigate over-fitting.
The process of cross validation in general
Types of Cross Validation
K-Fold Cross Validation

Stratified K-fold Cross Validation

Leave One Out Cross Validation

k-Fold Cross Validation:
The procedure has a single parameter called k that refers to the number of groups that a given data sample is to be split into. As such, the procedure is often called k-fold cross-validation. When a specific value for k is chosen, it may be used in place of k in the reference to the model, such as k=10 becoming 10-fold cross-validation.

If k=5 the dataset will be divided into 5 equal parts and the below process will run 5 times, each time with a different holdout set.

1. Take the group as a holdout or test data set

2. Take the remaining groups as a training data set
3. Fit a model on the training set and evaluate it on the test set
4. Retain the evaluation score and discard the model
At the end of the above process Summarize the skill of the model using the sample of model evaluation scores.
How to decide the value of k?
The value for k is chosen such that each train/test group
of data samples is large enough to be statistically
representative of the broader dataset.

A value of k=10 is very common in the field of applied

machine learning, and is recommend if you are struggling
to choose a value for your dataset.

If a value for k is chosen that does not evenly split the
data sample, then one group will contain a remainder of
the examples. It is preferable to split the data sample into
k groups with the same number of samples, such that the
sample of model skill scores are all equivalent.
Stratified k-Fold Cross Validation:
Same as K-Fold Cross Validation, just a slight difference
The splitting of data into folds may be governed by criteria such as ensuring that each fold has the same proportion of observations with a given categorical value, such as the class outcome value. This is called stratified cross-validation.
In below image, the stratified k-fold validation is set on basis of Gender whether M or F
Leave One Out Cross Validation (LOOCV):
This approach leaves 1 data point out of training data, i.e. if there are n data points in the original sample then, n-1 samples are used to train the model and p points are used as the validation set. This is repeated for all combinations in
which the original sample can be separated this way, and then the error is averaged for all trials, to give overall effectiveness.

The number of possible combinations is equal to the number of data points in the original sample or n.

Dav M2
No ratings yet
Dav M2
55 pages
Cross Validation Techniques Explained
No ratings yet
Cross Validation Techniques Explained
27 pages
Cross-Validation for ML Model Building
No ratings yet
Cross-Validation for ML Model Building
30 pages
Types of Cross Validation Explained
No ratings yet
Types of Cross Validation Explained
21 pages
Understanding Overfitting and Cross-Validation
No ratings yet
Understanding Overfitting and Cross-Validation
11 pages
Cross-Validation Techniques in ML
No ratings yet
Cross-Validation Techniques in ML
4 pages
Understanding Cross-Validation Techniques
No ratings yet
Understanding Cross-Validation Techniques
5 pages
Machine Learning Cross-Validation Guide
No ratings yet
Machine Learning Cross-Validation Guide
25 pages
Understanding K-Fold Cross Validation
No ratings yet
Understanding K-Fold Cross Validation
105 pages
Cross-Validation for Model Selection
No ratings yet
Cross-Validation for Model Selection
45 pages
Cross Validation Methods in Machine Learning
No ratings yet
Cross Validation Methods in Machine Learning
11 pages
Cross Validation Techniques Explained
No ratings yet
Cross Validation Techniques Explained
11 pages
Data Preparation and Cross-Validation Guide
No ratings yet
Data Preparation and Cross-Validation Guide
20 pages
Evaluating Regression Model Quality
No ratings yet
Evaluating Regression Model Quality
28 pages
ML Unit 5
No ratings yet
ML Unit 5
9 pages
Understanding k-Fold Cross-Validation
No ratings yet
Understanding k-Fold Cross-Validation
6 pages
Understanding Bias, Variance, and Cross-Validation
No ratings yet
Understanding Bias, Variance, and Cross-Validation
23 pages
Understanding Cross Validation Methods
No ratings yet
Understanding Cross Validation Methods
11 pages
Understanding Cross Validation Methods
No ratings yet
Understanding Cross Validation Methods
11 pages
Cross Validation and Underfitting Insights
No ratings yet
Cross Validation and Underfitting Insights
6 pages
Cross-Validation in Machine Learning
No ratings yet
Cross-Validation in Machine Learning
4 pages
Cross-Validation Techniques Explained
No ratings yet
Cross-Validation Techniques Explained
4 pages
K-Fold Cross Validation in Ridge Regression
No ratings yet
K-Fold Cross Validation in Ridge Regression
37 pages
Cross Validation Techniques in ML
No ratings yet
Cross Validation Techniques in ML
27 pages
K-Fold and Cross-Validation Techniques
No ratings yet
K-Fold and Cross-Validation Techniques
10 pages
K-Fold Cross Validation Explained
No ratings yet
K-Fold Cross Validation Explained
2 pages
Advantages of K-Fold Cross Validation
No ratings yet
Advantages of K-Fold Cross Validation
21 pages
Cross-Validation Techniques in ML
No ratings yet
Cross-Validation Techniques in ML
26 pages
Random Forest and Cross-Validation Techniques
No ratings yet
Random Forest and Cross-Validation Techniques
39 pages
Understanding k-Fold Cross-Validation
No ratings yet
Understanding k-Fold Cross-Validation
69 pages
K-Nearest Neighbor: Impact of k=1
No ratings yet
K-Nearest Neighbor: Impact of k=1
58 pages
Machine Learning Data Splits Explained
No ratings yet
Machine Learning Data Splits Explained
30 pages
Cross-Validation in Ensemble Learning
No ratings yet
Cross-Validation in Ensemble Learning
107 pages
Model Selection Techniques in ML
No ratings yet
Model Selection Techniques in ML
58 pages
Cross Validation Techniques Explained
No ratings yet
Cross Validation Techniques Explained
21 pages
ML Lecture 2 Models in ML
No ratings yet
ML Lecture 2 Models in ML
10 pages
Cross-Validation Techniques in ML
No ratings yet
Cross-Validation Techniques in ML
16 pages
Cross-Validation Techniques Explained
No ratings yet
Cross-Validation Techniques Explained
3 pages
K-Fold Cross Validation Explained
No ratings yet
K-Fold Cross Validation Explained
21 pages
Understanding Cross-Validation in ML
No ratings yet
Understanding Cross-Validation in ML
7 pages
Stratified K-Fold Cross-Validation Explained
100% (1)
Stratified K-Fold Cross-Validation Explained
5 pages
Cross-Validation Techniques in ML
No ratings yet
Cross-Validation Techniques in ML
4 pages
Cross Validation for Model Selection
No ratings yet
Cross Validation for Model Selection
33 pages
Cross-Validation Methods in ML
No ratings yet
Cross-Validation Methods in ML
16 pages
Cross-Validation Techniques in ML
No ratings yet
Cross-Validation Techniques in ML
8 pages
Bias-Variance Tradeoff in Machine Learning
No ratings yet
Bias-Variance Tradeoff in Machine Learning
7 pages
Cross-Validation Techniques Overview
100% (1)
Cross-Validation Techniques Overview
10 pages
Overfitting and Model Evaluation Techniques
No ratings yet
Overfitting and Model Evaluation Techniques
20 pages
K-Fold Cross Validation Explained
No ratings yet
K-Fold Cross Validation Explained
11 pages
Bias-Variance Tradeoff in Machine Learning
No ratings yet
Bias-Variance Tradeoff in Machine Learning
40 pages
Bootstrapping & Cross-Validation Methods
No ratings yet
Bootstrapping & Cross-Validation Methods
44 pages
Cross-Validation Techniques Explained
No ratings yet
Cross-Validation Techniques Explained
11 pages
K-Fold Cross Validation Explained
No ratings yet
K-Fold Cross Validation Explained
2 pages
Essential Guide to Model Validation Techniques
No ratings yet
Essential Guide to Model Validation Techniques
22 pages
Cross-Validation Techniques in ML
No ratings yet
Cross-Validation Techniques in ML
20 pages
Understanding Generalization Errors in ML
No ratings yet
Understanding Generalization Errors in ML
9 pages
Forestry Volume Table Creation Guide
No ratings yet
Forestry Volume Table Creation Guide
3 pages
Melting Efficiency in WAAM Process
No ratings yet
Melting Efficiency in WAAM Process
12 pages
Uber Data Analysis Project Report
100% (4)
Uber Data Analysis Project Report
37 pages
AI-Based Stock Price Prediction Techniques
No ratings yet
AI-Based Stock Price Prediction Techniques
8 pages
AI-Powered HVAC Health Monitoring
No ratings yet
AI-Powered HVAC Health Monitoring
6 pages
Spatio-Temporal Kriging with gstat
No ratings yet
Spatio-Temporal Kriging with gstat
16 pages
N-Gram Models for Sparse Data Analysis
No ratings yet
N-Gram Models for Sparse Data Analysis
38 pages
Product Quality Prediction in Pulsed Laser Cutting
No ratings yet
Product Quality Prediction in Pulsed Laser Cutting
18 pages
Machine Learning for Loan Default Prediction
No ratings yet
Machine Learning for Loan Default Prediction
5 pages
Big Data's Impact on Organizational Science
No ratings yet
Big Data's Impact on Organizational Science
23 pages
Image Enhancement for CNN Performance
No ratings yet
Image Enhancement for CNN Performance
40 pages
Software Development Effort Estimation Analysis
No ratings yet
Software Development Effort Estimation Analysis
4 pages
Transfer Learning for Thermal Image Classification
No ratings yet
Transfer Learning for Thermal Image Classification
16 pages
HemoQR: AI for Anemia Detection
No ratings yet
HemoQR: AI for Anemia Detection
18 pages
Model Selection and Optimization Guide
No ratings yet
Model Selection and Optimization Guide
2 pages
Predicting Strengths of Recycled Concrete
No ratings yet
Predicting Strengths of Recycled Concrete
21 pages
Online Payment Fraud Detection Report
No ratings yet
Online Payment Fraud Detection Report
38 pages
DWDM Lab Manual: Experiments Overview
No ratings yet
DWDM Lab Manual: Experiments Overview
44 pages
Feature Selection for SVM Classification
No ratings yet
Feature Selection for SVM Classification
6 pages
Machine Learning for Air Quality Prediction
No ratings yet
Machine Learning for Air Quality Prediction
15 pages
Demolition Waste Prediction Model Using RF
No ratings yet
Demolition Waste Prediction Model Using RF
15 pages
Cloud Computing Q&A Guide
No ratings yet
Cloud Computing Q&A Guide
46 pages
Enhanced Learning for Tennis Match Prediction
No ratings yet
Enhanced Learning for Tennis Match Prediction
24 pages
Machine Learning Interview Cheatsheet
No ratings yet
Machine Learning Interview Cheatsheet
14 pages
Azure ML Model Deployment and Evaluation
No ratings yet
Azure ML Model Deployment and Evaluation
137 pages
ECON 556X Problem Set 2 Instructions
No ratings yet
ECON 556X Problem Set 2 Instructions
5 pages
K-Fold Cross-Validation Explained
No ratings yet
K-Fold Cross-Validation Explained
24 pages
AI-Powered Resume Analyzer Project
No ratings yet
AI-Powered Resume Analyzer Project
15 pages
Model Selection in Machine Learning
No ratings yet
Model Selection in Machine Learning
24 pages
Data Mining Lab Manual for BCA-IT
No ratings yet
Data Mining Lab Manual for BCA-IT
60 pages

Cross Validation Techniques Explained

Uploaded by

Cross Validation Techniques Explained

Uploaded by

Cross Validation

Overfitting a model result in good accuracy

 Intuitively, underfitting occurs when the model or the

Underfitting is often a result of an excessively simple

A key challenge with overfitting, and with machine

There are different types of Cross Validation

 Here are the steps involved in cross validation:

Stratified K-fold Cross Validation

Leave One Out Cross Validation

1. Take the group as a holdout or test data set

A value of k=10 is very common in the field of applied

You might also like