what to do if it is not possible to collect additional data and you are overfitting dev set?
If we overfit dev set, it is normally suggested to get a bigger dev set.
But, in some cases no more data is available: Whatever you have used is all you have. At least for a longer period of time, no additional data would be available, for instance in some research projects.
what to do in such cases?
would you consider moving part of train set into dev set to make it bigger?
would you try to ignore/forget all your tuning decisions and re-start from scratch?
any other ideas/ suggestions appreciated.
do you mean: overfitting is visible on validation set (high loss) but you got a small loss on training set?
(But you still got your final test set untouched in reserve?)
Something like in this plot here?
see also this thread
Have you sufficiently tried e.g the following measures?
- to reduce the feature space to get a more suitable ratio of data in relation to your feature space (e.g. PCA or PLS or feature selection?)
- tackled model complexity with regularization / dropout approaches?
Have you considered cross-validation?
Would be great if you also could outline the application, the model and some background information.