Estimating variance

Hello guys, I am now finishing week 3, precisly in the skewed datasets part. My question is as follows:

  • Is precision/recall (confusion matrix) is a good way to estimate overfitting?

While precision, recall, and confusion matrices can provide clues to overfitting when comparing training and test performance, they are not the primary tools for estimating overfitting. Precision and recall, as well as metrics derived from the confusion matrix, are typically used to evaluate the performance of a classifier, particularly in the context of unbalanced or skewed datasets. Overfitting occurs when a model performs well on training data but poorly on unseen test data. The key to identifying overfitting is to compare the model’s performance on the training set to a validation or test set. If your model has much higher precision, recall, or accuracy on the training data than on the test data, it’s a strong indicator of overfitting. To better evaluate overfitting, you may want to use learning curves or cross-validation for a more direct assessment.

3 Likes

Thanks a lot, this was really helpful and insightful.

1 Like