In the train/dev/test video at 9:10, Mr Andrew says
Make sure the dev/test set comes from the same distribution.
I understand that if the algorithm performs well on the dev set during the training phase. Then, After we select our best model, the algorithms will perform well on the test set as well if the distribution of dev and test same it the same.
But, I wonder why he doesn’t mention anything about train and dev set distribution. I guess if train and dev set distribution is not the same/similar, we will always have a false notion that our model is well trained even if it performs well on that particular dev set. It might not generalize well. Even if it performs well on that particular dev set. It is possible that it doesn’t generalize on the test set despite having the same distribution of test and dev set? or It starts performing badly on real data.
Like Is it possible that the train and dev set have somewhat different distributions ‘say both normal but with different mean or same mean but different variances’? Do we consider this kinda data for training?
Note: Usually, I carve a dev set out of training data only. I understand it will have a similar distribution in that case, but I don’t know if it is always the practice that incorporates research as well.
How do you decide and these train/dev/test in practice? It would be great if you throw some light here or refer me to papers or blogs for its strategy.
I have studied course 3. I did not get an answer there as well.
Sorry, but I think Prof Ng does cover all this in DLS C3. It’s been a while since I went through the lectures there, but in my notes there are several lectures in Week 1 that are relevant, but the real material with the most detail on that point is in Week 2. E.g. start with this lecture and watch the next 2 after that as well.
The general guideline is to split the whole pile of data once, at maybe at 70/15/15 percent.
You train on the training set.
You adjust the model parameters (for example, the regularization) using information from the model cost on validation set. Each time you adjust a parameter, you train again using the training set.
You use the test set only once, at the end, to verify your model gives good-enough results on new data (data that wasn’t use in training or validation).
it means we do improving and tuning on validation set > test again on training set; ensure get same almost same results with validation > final test with test data?