How to improve F1 score in an imbalanced dataset?


I wanted to ask some suggestions about a binary classification task that I am facing in my work. The biggest problem comes because I have an imbalanced dataset. More specifically, the dataset has 9 features, and a total of 13000 data points. The imbalance ratio between the classes is a little worse than 1:10. The issue that I am having is the low values of F1 score.

So far, I have tried the following:

  1. Test different types of models (e.g. logistic, ridge, SVM, XGBoost, random forest etc)
  2. I used the random undersampling/oversampling algorithms contained in the imbalanced-learn library.
  3. Augment the minority class only in the training set with some more data points from another dataset, and check the performance in the test set (which contains points only from my original dataset); i.e. I changed the training distribution slightly
  4. I tried changing the coefficients of the class, i.e. put higher weights to the minority class points, so that the model would “understand” that those points are important.
  5. I tried to use outlier detection techniques (isolation forest, one-class SVM, local outlier factor etc).

The best performance that I have managed to get is around 80% accuracy and 20% F1 score using regularized SVM. I am happy with the accuracy, however the F1 score is still really low. Can someone suggest more things which I can try?

Thanks in advance! I am very grateful for the help :slight_smile:
Best wishes,

1 Like