Once we have trained different models with train/dev sets, and selected the best one:
- should we measure its performance with test set having it trained with train/dev sets merged?
- or just evaluate the model trained with train set but instead of dev use test set?
If you have resources, I recommend the 1st approach. If not, go with the 2nd approach.
I’m not sure I understand your first approach. We never “merge” the train and dev sets, right? The point is that we train on the training set and then use the dev set to evaluate the performance. If we have (e.g.) overfitting or underfitting problems, then we modify our choices of hyperparameters and try again: train on the training set and evaluate on the dev set.
Once we have done as well as we can on the iterative tuning of hyperparameters as described above, then we check the model accuracy on the test set, because that is a fair way to evaluate the model: it will show how the model performs on data that it has not been previously trained on (has never “seen” before).
Hey @Pablo_Ferrer_Gonzale,
Welcome to the community. Just to add to @paulinpaloalto Sir’s explanation, the approach discussed by Paul Sir is indeed the best possible approach that you can adopt for any application.
However, in some extreme cases, you can even train your model on the train set and just hyper-tune it using the dev set, and consider the dev set performance as the performance of the model. Off course, in this scenario you won’t be able to find the real-world performance of your model, but as Ethan Hunt said “desperate times call for desperate measures”, this is the only way to go.
One of the plausible scenarios for adopting this approach is Anomaly Detection, in which, you won’t have enough anomalous examples to form a separate test set, so you have to work your way out using just train and dev sets. This will be discussed by Prof Andrew in Course 3 of the Machine Learning Specialization. I hope this helps.
Regards,
Elemento