Hi, I am slightly confused by their differences in definition and use. My understanding is that
dev set is a set of data used to tune hyperparameters, compare the difference in algorithms/models we trained (to see which performs better)? If we are comparing 2 algorithms does that mean we would split the train set further into 2 parts to train 2 algorithms or should we train both on the same training data. Prof Ng mentioned that dev sets are used to hold out cross validation. What does that mean?
How is the dev and the test set different? Is the test set like the final set we run on only a finalized model to evaluate its performance whereby we no longer tune the hyperparameters or change the algorithm?
You are right about the use of a dev set for hyperparameter tuning and model comparison.
The train set doesn’t need to be splitted, you can use the same train set to train different models with differente hyperparameters and evaluate them with the same dev set, through an iterative process that lets you tune your model until you are satisfied.
Cross validation (Cross-validation (statistics) - Wikipedia) is a technique that allows you to use your train + dev sets segmented in different ways to validate your model multiple times, thus having a more robust estimate of its performance.
For example, in k-fold cross-validation you segment the data in k subsets, each of one is used one time as dev set while the rest is used for training, hence you end up with k different performance estimates for the same model, which you can average and compute variability.
The test set is different in the sense, that it should be kept separate from all the training / tuning process, until you decide which one is your final model and, only then, you evaluate it with the test set. The reason for that is to have a completely independent evaluation of the model with data that has never seen before.
Hi, kampamocha
I’m still confused about dev/test set why we don’t just use test set like dev set, and then tuning hyperparameter on test set to choose the best model.
We want the performance on the test set to be an estimation of the model’s future performance with actual data that the model hasn’t seen in any way, for better generalization.
When we tweak our model based on the test set, we are giving away hints to the model (data leakage) on how the test data is, so in this way, the performance on the test is not really an estimation of performance in unseen data.
Hope that helps, you will learn more about train/dev/test sets in course 3.