Overfitting dev set and no additional data available

Cahit_Bagdelen · December 27, 2022, 9:35pm

Dear all,
what to do if it is not possible to collect additional data and you are overfitting dev set?
If we overfit dev set, it is normally suggested to get a bigger dev set.
But, in some cases no more data is available: Whatever you have used is all you have. At least for a longer period of time, no additional data would be available, for instance in some research projects.
what to do in such cases?
would you consider moving part of train set into dev set to make it bigger?
would you try to ignore/forget all your tuning decisions and re-start from scratch?
any other ideas/ suggestions appreciated.
thanks,
Cahit

Christian_Simonis · December 27, 2022, 9:50pm

Hi there,

do you mean: overfitting is visible on validation set (high loss) but you got a small loss on training set?
(But you still got your final test set untouched in reserve?)

Something like in this plot here?

see also this thread

Have you sufficiently tried e.g the following measures?

to reduce the feature space to get a more suitable ratio of data in relation to your feature space (e.g. PCA or PLS or feature selection?)
tackled model complexity with regularization / dropout approaches?

Have you considered cross-validation?

Would be great if you also could outline the application, the model and some background information.

Best regards
Christian

Topic		Replies	Views
Overfitting to the dev set? Structuring Machine Learning Projects	2	354	November 27, 2023
Overfit/overtune the dev set? Structuring Machine Learning Projects	2	619	May 23, 2022
Quiz-Practical aspects of Deep Learning Improving Deep Neural Networks: Hyperparameter tun	2	591	August 25, 2022
Training set error? Structuring Machine Learning Projects	1	649	October 22, 2022
DLS C3 W1 Bird Recognition in Peacetopia Q11 Structuring Machine Learning Projects	2	955	January 12, 2022

Overfitting dev set and no additional data available

Related topics