C1W2 Ungraded Lab: misclassification calculation

zbynekb · December 21, 2022, 6:50pm

In the C1W2_Ungraded_Lab_Birds_Cats_Dogs.ipynb, in the cell right below the Confusion Matrix, I wouldn’t say the misclassifications were computed the right way.

Let’s take the misclassification rate of Birds as an example. In my view, it is a situation when we submit Birds to a model (one by one) and count wrong predictions (i.e., when real Birds are predicted as dogs or cats). The fraction of the number of wrong predictions (numerator) and the total number of Birds submitted for the model classification (denominator) should be the misclassification rate of Birds, IMHO.

In Python: ((y_true == 0) & ( (y_pred_imbalanced == 2) | (y_pred_imbalanced == 1) )).sum() / (y_true == 0).sum()

The formula in the notebook considers all Birds predictions and counts how many times the ground truth was cat or dog rather than a bird.

In Python: ((y_pred_imbalanced == 0) & ((y_true == 2) | (y_true == 1))).sum() / (y_pred_imbalanced == 0).sum()

Discussion: In my view, misclassification rate is False Negative Rate.
The formula in the notebook calculates False Discovery Rate.
Regarding the terminology, see Wikipedia.

Your view, guys?

balaji.ambresh · December 22, 2022, 7:08am

Thanks for pointing this out. I’ve asked the staff to fix it.

Please use this snippet:

misclassified_birds = (imbalanced_cm[0, 1] + imbalanced_cm[0, 2])/np.sum(imbalanced_cm, axis=1)[0]
misclassified_cats = (imbalanced_cm[1, 0] + imbalanced_cm[1, 2])/np.sum(imbalanced_cm, axis=1)[1]
misclassified_dogs = (imbalanced_cm[2, 0] + imbalanced_cm[2, 1])/np.sum(imbalanced_cm, axis=1)[2]

print(f"Proportion of misclassified birds: {misclassified_birds*100:.2f}%")
print(f"Proportion of misclassified cats: {misclassified_cats*100:.2f}%")
print(f"Proportion of misclassified dogs: {misclassified_dogs*100:.2f}%")

a-zarta · January 18, 2023, 10:02pm

Notebook has been updated, thanks to @zbynekb for flagging and @balaji.ambresh for coming up with the solution

dvvilkins · August 7, 2024, 9:35pm

balaji.ambresh:

misclassified_birds = (imbalanced_cm[0, 1] + imbalanced_cm[0, 2])/np.sum(imbalanced_cm, axis=1)[0]
misclassified_cats = (imbalanced_cm[1, 0] + imbalanced_cm[1, 2])/np.sum(imbalanced_cm, axis=1)[1]
misclassified_dogs = (imbalanced_cm[2, 0] + imbalanced_cm[2, 1])/np.sum(imbalanced_cm, axis=1)[2]

print(f"Proportion of misclassified birds: {misclassified_birds*100:.2f}%")
print(f"Proportion of misclassified cats: {misclassified_cats*100:.2f}%")
print(f"Proportion of misclassified dogs: {misclassified_dogs*100:.2f}%")

And doesn’t code for the confusion matrix also have to be changed? The confusion matrix image seems incorrect in light of the new snippet above.

It would seem to me that the arguments in the code:

imbalanced_cm = confusion_matrix(y_true, y_pred_imbalanced)

should be reversed in order to align:

imbalanced_cm = confusion_matrix(y_pred_imbalanced, y_true)

balaji.ambresh · August 8, 2024, 8:41am

The code provided in the notebook is correct according to the latest scikit-learn api version 1.5.1 (see confusion_matrix):

Here’s an example:

import numpy as np
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

class_labels = ['birds', 'cats', 'dogs']
y_true = ["dogs", "dogs", "dogs", "dogs", "dogs",
          "cats", "cats", "cats", "cats", 
          "birds", "birds", "birds", "birds"]
y_pred = ["dogs", "dogs", "cats", "birds", "dogs",
         "cats", "dogs", "dogs", "dogs",
         "birds", "birds", "birds", "dogs"]

imbalanced_cm = confusion_matrix(y_true, y_pred, labels=class_labels)
cmd = ConfusionMatrixDisplay(imbalanced_cm, display_labels=class_labels)
cmd.plot()

# from notebook
misclassified_birds = (imbalanced_cm[0, 1] + imbalanced_cm[0, 2])/np.sum(imbalanced_cm, axis=1)[0]
misclassified_cats = (imbalanced_cm[1, 0] + imbalanced_cm[1, 2])/np.sum(imbalanced_cm, axis=1)[1]
misclassified_dogs = (imbalanced_cm[2, 0] + imbalanced_cm[2, 1])/np.sum(imbalanced_cm, axis=1)[2]

print(f"Proportion of misclassified birds: {misclassified_birds*100:.2f}%")
print(f"Proportion of misclassified cats: {misclassified_cats*100:.2f}%")
print(f"Proportion of misclassified dogs: {misclassified_dogs*100:.2f}%")

Proportion of misclassified birds: 25.00%
Proportion of misclassified cats: 75.00%
Proportion of misclassified dogs: 40.00%

# I prefer this
total_instances = imbalanced_cm.sum(axis=1)
correct_classifications = np.diag(imbalanced_cm)
misclassifications = total_instances - correct_classifications
print(misclassifications * 100 / total_instances)

[25. 75. 40.]

Am I missing something?

dvvilkins · August 8, 2024, 4:08pm

This example was very helpful. The issue is on my end. I misinterpreted the phrase “Proportion of misclassified birds” etc. to mean the proportion of animals misclassified as birds rather than the proportion of birds that have been misclassified as other animals. As a result, I was reading off the confusion matrix columns rather than the rows.

Topic		Replies	Views
Week 2: Ungraded Lab - imbalanced confusion matrix proportions problem Machine Learning in Production	3	567	May 21, 2021
Course 3, Week 1, "When To Change...": confusing formula Structuring Machine Learning Projects	1	505	July 31, 2023
Highly Confused After Plenty of Work - Classification Error1 Advanced Learning Algorithms week-3	4	30	February 26, 2025
Week 2 Model Misclassifies Cats Convolutional Neural Networks in TensorFlow week-2	1	501	September 27, 2022
Week 2 Exercise 8-Incorrectly Classified? Neural Networks and Deep Learning	1	583	June 25, 2021

C1W2 Ungraded Lab: misclassification calculation

Related topics