Size and distribution of TRAIN/DEV/TEST set

How reasonable it is to do hyperparamenters optimization without the full TRAIN set? Something like, we use 10% of the real TRAIN set to optimize hyperparameters ( like network topology and others) and only after we get the top solutions we train with the full TRAIN set… Is this ok? What would be the consenquences?

Hi there, welcome to the course!

I can offer an opinion on this situation, but hopefully others in the community will also chime in and add their perspective.

From what I’ve seen in my own experimentation your problem set will probably dictate the size of your training set. By using only a small subset of the training data to tune hyperparameters you risk running into the situation where eyou have too little data resulting in a sub-optimal model and/or there is variations in the data your model might not see because certain data points were not included in your subset.

Hopefully someone else in the community can offer you a more technical explanation on why this is a good or not good idea!