Thanks for your thoughts and valuable input!
Did you think about quantifying the uncertainty of your model and exploiting this information?
Active learning can help to quantify:
- which label is expected to provide a valuable benefit and also
- when a sufficient amount of data has been used to train your model.
Active Learning used in the context when measuring labels is expensive either cost or time-wise. Here is a simplified visualization on what it does at a regression task: using model uncertainty to decide which label to measure next:
Specifically for your problem, I believe the approach could be suitable to measure labels especially at the important areas close to the decision boundary to get a good model also with limited (but the right) data:
Source
If you are interested in sources and applications, free to check out:
- https://proceedings.neurips.cc/paper/2018/file/b197ffdef2ddc3308584dce7afa3661b-Paper.pdf
- GitHub - nihalsid/ViewAL: [CVPR'20] Implementation for the paper "ViewAL: Active Learning with Viewpoint Entropy for Semantic Segmentation"
- [1911.11789] ViewAL: Active Learning with Viewpoint Entropy for Semantic Segmentation
Best regards
Christian