Training-Dev Sets

CourseraFan · May 31, 2022, 4:48am

Hey guys, So while watching the video in C3 - W2. I see that Prof. Ng says that you shuffle training set and carve out a piece for training-dev set. It also says that NN will train on training set but not in training-dev set. So what does training-dev set do? Does it do the same as dev set?

paulinpaloalto · May 31, 2022, 4:56am

He explains all that in that same lecture where he introduces the concept. If you missed the full explanation, you should watch the lecture again. The point is that is used as a stand-in for the “dev set” in the case that the dev and test sets are from a different distribution than the training set. Note that he makes a big point of the fact that the dev and test sets need to be from the same distribution, but that the training set does not. In the case that the training set and the dev/test sets are from different distributions, you train on the training set and then use the training-dev set to assess how well the training worked and whether you have overfitting or underfitting and all that hyperparameter tuning. Once you’ve done that, then you compare the performance on the dev set. Seriously, he explained it all in the lectures. With my “mini-explanation” above in mind, please watch the relevant lectures again and I bet it will make sense this time.

CourseraFan · May 31, 2022, 8:28am

Oh yes Thanks! Sorry, but i asked this question when i was watching the lecture but haven’t finished it yet. After finished watching the lecture, i started to understand and with your answer, it does make sense. Thanks for your time. Sorry…

sukhdeepjohar · August 17, 2024, 12:25pm

So, let’s say I have these results from my training -
HLP - ‘0 percent’
training set error - 1 percent
training-dev error - 1.5%
dev set error - 10%

With this, I can say that my bias ad variance is both low.
And I can also say that there is likely an issue with the dev set being from a different distribution.
I verify that the dev set reflects the user scenario very accurately.

Now, in this case how do I improve my training so my algorithm performs well on the real world data ?

paulinpaloalto · August 21, 2024, 4:07am

It’s been several years since I took this course, but Prof Ng has 3 lectures at the beginning of Week 2 of DLS C3 that specifically discuss this specific scenario: the training dataset is from a different statistical distribution than the dev/test sets. There are a total of 38 minutes of material in those three lectures. Have you watched all of them?

I don’t have time to watch all of them now, but I skimmed the interactive written transcript of the third of the lectures and I think he does give some suggestions of how to address that kind of problem, although he does say that’s not a simple “cut and dried” approach. It requires thought, analysis of your datasets and experimentation.

sukhdeepjohar · August 21, 2024, 6:40am

Yes, I did check those and various conversations in this portal. Mostly, it talks of ways to identify data mismatch and if identified, how to address it.
My question is if we are not able to address, is there still something that can be done based on the different techniques learnt, to make sure that we can train the algo to work well with the data we have in the dev set ?

This is my understanding on the steps we would go about if we see high error on dev set:
if there is a big difference in the Training error and dev error, there are 2 possibilities

high variance
dev data is from a different distribution.

With training-dev set, say, we are able to identify that variance is not a problem, data mismatch is.

Now, we should see if the dev data reflects real world data.

If not, then try to get better dev data that reflects real world.
If it does, then what ?
It means the training data probably doesn’t capture the real world data well. Options:
get more training data (real or synthesized) that covers real world data. .
But if you don’t have enough real world data to bolster the training data, what can you do ?
Can you still use the methods to reduce variance, even though we don’t see high variance on the training-dev data ?

Topic		Replies	Views
DLS C3W2 3rd Lecture notes Structuring Machine Learning Projects coursera-platform	5	668	January 9, 2022
Course 3 Week 2 Train-Dev set question Structuring Machine Learning Projects coursera-platform	1	535	July 9, 2022
Data distribution for training-dev set Structuring Machine Learning Projects coursera-platform	2	557	December 29, 2022
Course 2 Week 1 Basic Recipe for ML Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	520	February 3, 2022
Week1 quizz: very confused about train/dev/test set and when to add new data to which set Structuring Machine Learning Projects week-module-1 , coursera-platform	2	412	February 1, 2024

Training-Dev Sets

Related topics