0% found this document useful (0 votes)
15 views17 pages

Understanding Classification Metrics

The document outlines the process of classification in machine learning, detailing data preprocessing, model training, and evaluation using metrics such as confusion matrix, accuracy, precision, recall, and F1-score. It emphasizes the importance of understanding true positives, true negatives, false positives, and false negatives in evaluating model performance, particularly in balanced versus imbalanced datasets. Additionally, it includes a case study on heart disease prediction, highlighting the necessary steps for data preparation and model evaluation.

Uploaded by

iqranawaz9353
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views17 pages

Understanding Classification Metrics

The document outlines the process of classification in machine learning, detailing data preprocessing, model training, and evaluation using metrics such as confusion matrix, accuracy, precision, recall, and F1-score. It emphasizes the importance of understanding true positives, true negatives, false positives, and false negatives in evaluating model performance, particularly in balanced versus imbalanced datasets. Additionally, it includes a case study on heart disease prediction, highlighting the necessary steps for data preparation and model evaluation.

Uploaded by

iqranawaz9353
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Classification Metrics

Lecture # 6
Addressing a Classification
▪ Data preprocessing
Task
-Data cleaning
-Handling categorical data
-Data scaling
▪ Visualize data
-Observe general trend
-Spot outliers
▪ Separate features (X) and target (Y) columns in the dataset
▪ Split data into train and test data
-Now we have four sets: X_train, Y_train, X_test, X_test
▪ Train the classification model on train data (X_train and Y_train)
▪ Use the model to make predictions (Y_pred) for the test data (X_test only)
▪ Evaluate the model by comparing the predicted results (Y_predict) with the actual/expected value
(Y_test)
Classification Evaluation - The Confusion
▪ Confusion Matrix is a performance measurement for machine learning classification.
Matrix
▪ It summarizes the performance of a classification algorithm.
-The number of correct and incorrect predictions are summarized with count values and broken
down
by each class. This is the key to the confusion matrix.

▪ The confusion matrix shows the ways in which your classification model is confused when it
makes predictions.

▪ It gives insight not only into the errors being made by the classifier but more importantly the
types of errors that are being made.

▪ A binary classifier can have four classification results: true positives (TP), true negatives (TN),
false positives (FP), and false negatives (FN).
-The first two are correct classifications.
The Confusion
Matrix
True Positive: TP is the number of true positives.
▪ You predicted positive and it is true.
▪ You predicted that a man has cancer and he actually has it.

True Negative: TN is the number of true negatives.


▪ You predicted negative and it is true.
▪ You predicted that a man does not have cancer and he actually does not have
it.

False Positive: FP is the number of false positives.


▪ Also called type I error.
▪ You predicted positive and it is false.
▪ You predicted that a man has cancer but he actually does not have it.

False Negative: FN is the number of false negatives.


▪ Also called type II error.
▪ You predicted negative and it is false.
▪ You predicted that a man does not have cancer but he has it.
The Confusion
Matrix
Constructing a Confusion Matrix
▪ Make a prediction for each row in your test/validation dataset.
▪ From the expected outcomes and predictions count find the number of correct and incorrect
predictions for each class, organized by the class that was predicted.
▪ These numbers are then organized into a table, or a matrix as follows:
-Each row of the matrix corresponds to an actual class.
-Each column of the matrix corresponds to a predicted class.
▪ The counts of correct and incorrect classification are then filled into the table.

Predicted Predicted
Positive Negative
Actual True False Type II error
Positive positives negatives
Actua False True The aim of classification is to
l positive negative reduce
Negative s s type I and II errors!

Type I accurate
error results
The Confusion
Matrix
Example
A classification algorithm is used to identify an email as Spam. Expected Predicted
Consider the following predictions from the classification
Non-Spam Spam
algorithm.
Spam Spam
Non-Spam Non-Spam
The resulting confusion matrix will be: Spam Spam
Predicted Predicted
Spam Non-Spam
Positive Negative
Non-Spam Non-Spam
Actua TP FN
l 3 1 Non-Spam Non-Spam
Positiv Spam Spam
e Non-Spam Spam
Actual FP TN
Non-Spam Non-Spam
Negative 2 4
Evaluation Metrics Based on Confusion
Matrix
Balanced and Imbalanced Dataset

▪ A balanced dataset is a dataset where each output class is represented by the same or almost same
number of input samples, otherwise its is an unbalanced or biased dataset.

▪ For example: For 2 classes, Class1 and Class 2,


-50% samples belong to Class1 and 50% sample belong to Class2 – balanced.
-60% samples belong to Class1 and 40% sample belong to Class2 – balanced.
-80% samples belong to Class1 and 20% sample belong to Class2 – unbalanced.

▪ Feeding imbalanced data to your classifier can make it biased in favor of the majority class, simply
because it did not have enough data to learn about the minority.
Accurac
y▪ Classification accuracy is the ratio of correct predictions to total predictions
made. 𝑇𝑃+𝑇𝑁
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠 = 𝑇𝑃+𝐹𝑃+𝐹𝑁+𝑇𝑁
▪ 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
▪ The appeal of accuracy is that it has an intuitive and plain English explanation.
▪ Works well for balanced classes

▪ In the presence of imbalanced classes, accuracy suffers from a paradox where a model is highly
accurate but lacks predictive power.
▪ For example, to predict the presence of a very rare cancer that occurs in 0.1% of the population. After
training the model, the accuracy is at 95%. However, 99.9% of people do not have the cancer: if simply a
model is created that “predicted” that nobody had that form of cancer, the naive model would be 4.9%
more accurate, but clearly is not able to predict anything.
Evaluation Metrics Based on Confusion
Matrix
Precision
▪ Also called positive prediction value.
PP PN

▪ Precision is the ratio of correct positive predictions to all the AP TP FN


positives.
AN FP TN
-Out of the total predicted positives, how many are correct.
-When something is predicted positive, how likely it is to be right.
▪ Precision measures exactness or quality.
▪ 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠
=
𝑇𝑃
𝑇𝑜𝑡𝑎𝑙 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 𝑇𝑃+𝐹𝑃

▪ To increase precision, try to minimize type I error


(FP).
Evaluation Metrics Based on Confusion
Matrix
Recall Score
▪ Also called sensitivity or true positive rate (TPR).
PP PN

AP TP FN
▪ Recall score is the ratio of correct positive predictions to all the actual
positives. AN FP TN
-Out of the total actual positives, how many are correct.
-How many correct predictions were made and how many were missed.
▪ Recall measures completeness.
▪ 𝑅𝑒𝑐𝑎𝑙𝑙 𝑆𝑐𝑜𝑟𝑒 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠
=
𝑇𝑃
𝑇𝑜𝑡𝑎𝑙 𝑎𝑐𝑡𝑢𝑎𝑙 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 𝑇𝑃+𝐹𝑁

▪ To increase precision, try to minimize type II error


(FN).
Evaluation Metrics Based on Confusion
Matrix
Precision or Recall Score?
▪ It depends on the use case.
PP PN

AP TP FP
- See which type is more harmful or creating a greater impact: FP or
FN AN FN TN

▪ In a spam filter where positive means that an email is spam, FP will be more harmful, as the user
might miss an important email if it is wrongly classified as spam.
▪ Reduce FP, use precision as evaluation metric.

▪ In a cancer detector where positive means that the person has cancer, FN will be more harmful, as a
person having cancer might go untreated because it was not detected.
▪ Reduce FN, use recall as evaluation metric.
Evaluation Metrics Based on Confusion
Matrix
F1-scor
e▪ Used when both FP and FN are almost equally important.
▪ Also, precision and recall are less intuitive.
▪ F1 represents a balance between the recall and precision, where the relative contributions of both are
equal.
▪ The F1-score is 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑅𝑒𝑐𝑎𝑙𝑙mean (a kind of average used for ratios).
the harmonic
𝑠𝑐𝑜𝑟𝑒 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙
▪ 𝐹1 =2×
Evaluation Metrics Based on Confusion
Matrix
Revisiting example on slide 7

3+4
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = = 7 ≡ 70%
10 Predicted Predicted
10
3 3 Positive Negative
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = = ≡ Actual TP FN
60% 3 + 2
5 Positive 3 1
3 3
𝑅𝑒𝑐𝑎𝑙𝑙 𝑆𝑐𝑜𝑟𝑒 = Actua FP TN
3+1 = ≡ l 2 4
475% Negative
0.6 × 0.75 0.45
𝐹1𝑠𝑐𝑜𝑟𝑒 = 2 =2× ≡ 66%
0.6 + 0.75 1.35
The Confusion
▪ This matrix can easily be applied to multiclass classifier, by adding more rows and columns
Matrix
to the confusion matrix.
Example 2
Consider a classification problem, trained on
the following 9 classes:
Plane, Car, Bird, Cat, Deer, Dog, Frog,
Horse, Ship and Truck.
▪ True positives for class
Car?
- The diagonal value.
▪ False positives for class
Car?
-The column values.
▪ False negatives for class
Car?
-The row values.
▪ True negatives for class
Car?
-All the remaining values.
▪ Size of validation set?
-TP + FP + FN + TN
Case Study - The Heart Disease
▪ Checkout these two datasets
Dataset
▪ Dataset 1: [Link]

▪ Dataset 2: [Link]
Case Study - The Heart Disease
Dataset
Dataset 2
▪ Dataset size: 303
▪ Classification: binary (1 class label with 0 and 1 values)
▪ Features: 13 features
1. age: Age in years
2. sex: Gender (1 = male; 0 = female)
3. cp: Chest pain type (Value 1: typical angina, Value 2: atypical angina, Value 3: non-anginal pain, Value 4: asymptomatic)
4. trestbps: Resting blood pressure (in mm Hg on admission to the hospital)
5. chol: Serum cholestoral in mg/dl
6. fbs: Fasting blood sugar > 120 mg/dl (1 = true; 0 = false)
7. restecg: Resting electrocardiographic results
8. thalach: Maximum heart rate achieved
9. exang: Exercise induced angina (1 = yes; 0 = no)
10. oldpeak: ST depression induced by exercise relative to rest
11. slope: The slope of the peak exercise ST segment
12. ca: Number of major vessels (0-3) colored by flourosopy
13. thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
14. target: 1 or 0 (num) (the predicted attribute), diagnosis of heart disease (angiographic disease status),
Value 0: < 50% diameter narrowing Value 1: > 50% diameter narrowing
Case Study - The Heart Disease
▪ Data preprocessing
Dataset
-Data cleaning
-Handling categorical data
-Data scaling
▪ Visualize data
-Observe general trend
-Spot outliers
▪ Separate features (X) and target (Y) columns in the dataset
▪ Split data into train and test data
-Now we have four sets: X_train, Y_train, X_test, X_test
▪ Train the classification model on train data (X_train and Y_train)
▪ Use the model to make predictions (Y_pred) for the test data (X_test only)
▪ Evaluate the model by comparing the predicted results (Y_predict) with the actual/expected value
(Y_test)

You might also like