Overfit/overtune the dev set?

Shangran_Li · May 19, 2022, 2:42am

Hi All,

I have a question about overfitting/overtuning.
In this slide, Prof Ng said if an algorithm did well on the training set, but not on the dev set, then we can try bigger training set because probably it’s overfitting on the training set. → I can understand this intuitively, because the algorithm’s parameters are learned from the training set. And it might be easier to overfit a small training set.

He also said if an algorithm did well on the dev set, but not on the test set, then we can try a bigger dev set because probably the algorithm is overtuned toward the dev set. → This is the part I don’t understand:

Since the algorithm is simply testing on the dev set instead of learning any parameters from the dev set, how can it be “overtuned” toward the dev set?

Also, if the new and bigger dev data set we use have the same distribution with the old and smaller dev data set, I suppose the problem of doing well on dev set but doing poorly on the test won’t be solved (because of the same distribution) .

Can anybody help with clarify these? Many thanks!

SomeshChatterjee · May 19, 2022, 4:43am

Hi Shangran,

Good question, you are right, the model never ‘trains’ on the dev set so it may seem that way. But what happens is in the training process we continuously change different hyperparameters and observe the performance on the train and dev set. After testing with multiple hyper-parameters we may finally find a set of hyper-parameters which give decent results on both the dev and train set.

However, since we have tested different combination of hyper-parameters against the dev set, it’s possible that we have just found a set of hyperparamters which when used with the training data works well on the dev set. Since we have tested so many times against the dev set we are biased towards selecting the hyper-parameters that work best on dev set and thus model indirectly gets influenced by the dev set.

That’s why we break the data into train-dev-test partitions and not just train-test partition so that after we have done all our hyper-parameter tuning we still have some data against which we can revalidate if model truly can perform well on unseen data.

Also, the bigger devset ‘may’ (or may not as you point out) solve the problem because it’s clear in this case that the test set has a different distribution than the dev set so there exists data with different distributions which isn’t captured by the dev set. So by expanding on the dev there’s a chance that we’ll capture the missing distribution and our model will learn the missing patterns.

Hope this helps.

Shangran_Li · May 23, 2022, 1:07am

Thanks @SomeshChatterjee It’s very helpful!

Topic		Replies	Views
Overfitting to the dev set? Structuring Machine Learning Projects coursera-platform	2	354	November 27, 2023
When to say we are overfitting the dev set? Structuring Machine Learning Projects coursera-platform	6	656	October 12, 2022
What is the use of the test set if the dev and test set come from the same source? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	6	663	September 12, 2023
Confusion about Training Set vs. Dev Set Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	816	December 19, 2021
Question about Course 2 Week 1 Quiz Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	672	January 17, 2023

Overfit/overtune the dev set?

Related topics