Steps after finding the F1 score is bad for skewed data

T​hanks Andrew for talking about using precision, recall and F1 score to evaluate the model performance for skewed data. But what is the next step? Suppose we find out that accuracy for the minor label was very bad. From the training process itself, it actually optimizes the loss which to my understanding is more likely the average accuracy. So can we customize the loss function to make the training process optimize the customized metrics, say F1 score? I know adding more weights on the minor label is one way to do so but the performance is not always good based on my personal experience.

Please jump in and share your thoughts.

1 Like

Hi @jack_01234 ,
I hope I understood your question. You can for sure modify the loss function or do oversampling on the minor class. But as you say, this probably negatively affects the other classes. A more elegant approach, which should give better results, is to do data augmentation on the minor class. Or even better, get more real examples of the minor class if this is possible!

More info about data augmentation comes in one of the videos afterwards:

Did that help?

But can’t we embed somehow these precision / recall (f-score) into our cost function in order to encourage a model to maximize these indicators? Or should we simply look at these indicators manually and decide whether the model is good enough for our purpose?