Hey guys, So while watching the video in C3 - W2. I see that Prof. Ng says that you shuffle training set and carve out a piece for training-dev set. It also says that NN will train on training set but not in training-dev set. So what does training-dev set do? Does it do the same as dev set?
He explains all that in that same lecture where he introduces the concept. If you missed the full explanation, you should watch the lecture again. The point is that is used as a stand-in for the “dev set” in the case that the dev and test sets are from a different distribution than the training set. Note that he makes a big point of the fact that the dev and test sets need to be from the same distribution, but that the training set does not. In the case that the training set and the dev/test sets are from different distributions, you train on the training set and then use the training-dev set to assess how well the training worked and whether you have overfitting or underfitting and all that hyperparameter tuning. Once you’ve done that, then you compare the performance on the dev set. Seriously, he explained it all in the lectures. With my “mini-explanation” above in mind, please watch the relevant lectures again and I bet it will make sense this time.
Oh yes Thanks! Sorry, but i asked this question when i was watching the lecture but haven’t finished it yet. After finished watching the lecture, i started to understand and with your answer, it does make sense. Thanks for your time. Sorry…
So, let’s say I have these results from my training -
HLP - ‘0 percent’
training set error - 1 percent
training-dev error - 1.5%
dev set error - 10%
With this, I can say that my bias ad variance is both low.
And I can also say that there is likely an issue with the dev set being from a different distribution.
I verify that the dev set reflects the user scenario very accurately.
Now, in this case how do I improve my training so my algorithm performs well on the real world data ?
It’s been several years since I took this course, but Prof Ng has 3 lectures at the beginning of Week 2 of DLS C3 that specifically discuss this specific scenario: the training dataset is from a different statistical distribution than the dev/test sets. There are a total of 38 minutes of material in those three lectures. Have you watched all of them?
I don’t have time to watch all of them now, but I skimmed the interactive written transcript of the third of the lectures and I think he does give some suggestions of how to address that kind of problem, although he does say that’s not a simple “cut and dried” approach. It requires thought, analysis of your datasets and experimentation.
Yes, I did check those and various conversations in this portal. Mostly, it talks of ways to identify data mismatch and if identified, how to address it.
My question is if we are not able to address, is there still something that can be done based on the different techniques learnt, to make sure that we can train the algo to work well with the data we have in the dev set ?
This is my understanding on the steps we would go about if we see high error on dev set:
if there is a big difference in the Training error and dev error, there are 2 possibilities
- high variance
- dev data is from a different distribution.
With training-dev set, say, we are able to identify that variance is not a problem, data mismatch is.
Now, we should see if the dev data reflects real world data.
- If not, then try to get better dev data that reflects real world.
- If it does, then what ?
It means the training data probably doesn’t capture the real world data well. Options: - get more training data (real or synthesized) that covers real world data. .
- But if you don’t have enough real world data to bolster the training data, what can you do ?
Can you still use the methods to reduce variance, even though we don’t see high variance on the training-dev data ?