C2_M1_Lab_1_tuning_and_metrics: Request for the consistent "Accuracy = Recall" behavior clarification

Hello,

I noticed that in ALL the metric measurements throughout the entire lab, the Recall metric is always(!) equal to the Accuracy one. I believe, it’s kind of expected for perfectly balanced (all classes have the same number of samples) datasets in a multi-class classification with macro-averaged a Recall metric (am I right? Please, confirm). But this behavior (Accuracy = Recall) persists even for the unbalanced dataset (please, look at the “(Optional) Further Exploration: Batch Size Optimization on an Imbalanced Dataset” segment of the lab), which I suppose is also possible, perhaps by chance, but it seems suspicious to me. Could you, please, explain these cases and tell me what I am missing in understanding these metrics and their (strange to me) behavior in the lab?

Thank you!

@DAResaid

can you share the screenshot where you are noticing this behaviour

Remember as Recall by definition is the ability to detect the true representative of the class/feature (true positive) chances of recall and accuracy being equal is from the understanding of model to accurately detect these true positive cases(class/features), hence probably they match in their value.

Being said that, there is no standard rule always the accuracy should be equal to Recall always as

accuracy measures the overall correctness of the model across all classes (true positives + true negatives) / total predictions. It is a good general measure, but it can be misleading in cases of class imbalance, where one class is much more common than the other.

where as Recall (also known as sensitivity) measures the model’s ability to find all relevant cases. It focuses only on the positive class: (true positives) / (true positives + false negatives)

Regards
DP

Thank you for the clarification!

I have attached the entire Jupyter notebook of the lab with ALL the outputs I have got doing it for your reference, as well as the screenshots which reflects the behavior I described in my initial message.

Picture 1 C2_M1_Lab_1_tuning_and_metrics.jpg shows output of the second cell of the “Exercise: Implementing Metrics in PyTorch“ segment.

Picture 2 C2_M1_Lab_1_tuning_and_metrics.jpg shows outputs of the second cell of the “How other metrics change with learning rate“ segment.

Picture 3 C2_M1_Lab_1_tuning_and_metrics.jpg shows outputs of the first cell of the “(Optional) Further Exploration: Batch Size Optimization on an Imbalanced Dataset“ segment.

Thank you!

C2_M1_Lab_1_tuning_and_metrics.ipynb (175.1 KB)

1 Like

Thank you for the images.

In a binary classification problem (two classes), if your dataset has an exactly equal number of samples in each class, it’s possible for recall and accuracy to be the same

As I already told you Accuracy is the overall proportion of correct predictions. ((TP+TN)/Total) where as Recall (also known as sensitivity or True Positive Rate) is the proportion of actual positive instances that were correctly identified (TP/(TP+FN)).

So if your model’s ability to classify the positive class is the same as its ability to classify the negative class (that is sensitivity equals specificity of (TP/P=TN/N)), and the number of positive samples (P) equals the number of negative samples (N), then accuracy and recall will be mathematically identical.

To find the exact reason, usual steps to check is:
Check class distribution: Verify if your dataset is perfectly balanced.

Although the values of accuracy and recall match most of the time, you also notice there is variation in some training cycle in decimal places.

Specifying averaging method is another useful method if you are suspicious of your model behaviour. In case one is using a library like scikit-learn, ensure you are using the correct averaging method (macro, weighted, or micro) for your specific needs. The default is often micro which leads to this result in multi-class scenarios.

And lastly introduce imbalance: one could intentionally create an imbalanced dataset (for example 80% one class, 20% another) and retrain the model. The metrics should diverge if the equality was due to balanced dataset.

Hope this clears your doubt.

Regards
DP

Thank you, but since the lab (and its datasets) was created (selected) by your team what is the reason of the described behavior in this particular case?

I am asking because as you can see on the provided pictures neither your “Although the values of accuracy and recall match most of the time, you also notice there is variation in some training cycle in decimal places.“ nor “The metrics should diverge if the equality was due to balanced dataset.“ work for this lab.

Thank you!

@DAResaid

I am not staff but a volunteer mentor for the course, so honestly I don’t know the reason behind selection of dataset.

But (guessing) as noticed the idea of this exercise is to learn and understand more about learning rate variability to the metrics, than looking deeper into why accuracy and recall have same values because the exercise is on a simple cnn architecture model where the distribution might not be as complex as one sees in complex model architecture.

Got it! Thank you for the explanation and help!

Is it possible to forward my request to somebody, who actually built the course (especially the lab/exercise part of it)?

1 Like

@Mubsi

Learner is looking for staff inputs behind the model selection and overlooking the matching values of accuracy and recall. So kindly look into this once you are back from holiday.

Regards
DP

1 Like

@DAResaid
Sure. I just informed the learning technologist of this course but his response might be delayed as you know dlai staff is on holiday til January 2nd for Christmas and New year.

Thank you.

Regards
DP

1 Like

Yep, I know about holidays.:wink: So no rush.

And thank you again for your help!

1 Like

This could be the reason :backhand_index_pointing_down:t2: just to let you know why that behaviour or pattern happens and is completely normal

Anyways wait for @Mubsi response.

Hi all,

Thanks for catching this! The code implementation for the accuracy metric was contradicting the definitions in the text, which is why Accuracy and Recall were outputting identical values.

I have updated the notebook to resolve this:

  • The Technical Fix: The accuracy_metric was previously initialized with average="macro" (which calculates Balanced Accuracy). I have changed this to average="micro" to correctly calculate Standard Accuracy. The other metrics (Precision, Recall, F1) remain as macro to properly handle class imbalance.
  • Markdown: I have also tweaked the markdown to clarify that Standard Accuracy is a micro-averaged metric, while we use macro-averaging for the others to prevent the majority class from dominating the score.

I have updated the notebook with these changes. While the results are not drastically different, the behavior is now technically sound. You will now see the expected divergence between Accuracy and Recall, particularly in the imbalanced dataset section.

Best,
Mubsi

4 Likes