While reading about evaluation metrics for imbalanced classification problem, I came to know that the metrics can be broadly classified into 3 categories - threshold metrics, ranking metrics, probability metrics. The “threshold metrics” category consists of accuracy, error, precision, recall, sensitivity, etc. One of the documents mentioned that “threshold metrics assume full knowledge of the conditions under which the classifier will be deployed. In particular, they assume that the class imbalance present in the training set is the one that will be encountered throughout the operating life of the classifier. which is not often the case so they can mislead you”.
I don’t understand how metrics such as recall can mislead? Whatever be the ratio of positive to negative category examples, these metrics would nonetheless give a measure of how accurately the positive and negative categories were predicted.
I presume you are referring to this post. If I understand the post correctly, I think it has to do with the difference between the distribution of the training set and that of the real data during application, and that the threshold measures are assumed to only be calculated on the training set.
Thanks @reinoudbosch . I believe this is exactly what the author meant in that post. This perfectly makes sense. Thanks!