Train, Dev, and Test sets

I couldn’t get why should I split the data into three parts, Train, Dev, and Test sets? Arn’t Train and Test sets enough?
Of course, sometimes in the courses it has been mentioned that train and test sets are enough, but they keep mentioning that dev set as it is very important.
What is exactly the purpose of Dev set can’t be done by test set?

1 Like

The Training set is used for training.
The Dev set (often called “validation set”) is used for adjusting the model.
The Test set is a final check of your completed model.

If you don’t have these three tasks using separate sets of data, you will have overfitting and not make good predictions on new data.

1 Like

Thanks TMosh
I believe that we can use test set to test over- and under-fitting. Taking more samples from train set for dev set will weaken the train set. Test set is enough I believe. Besides, all most all projects I know di not use dev set, they use all test set.

1 Like

You need all three. If you use the test set to as the dev set, then you have no data left to check the system performance.

So you will not know if your model is good at making predictions.

All three sets are needed.

The reason most projects you know of don’t have a dev set is that they’re …

  1. examples,
  2. or demonstrations,
  3. or contests,

… none of which care about how good the model is at making future predictions (or for practical use).

I think you misunderstood something. You do not need three sets to make sure that the model performance is acceptable. Training and testing sets or training and development (evaluation) sets, name them as you like. They are only two sets. You can apply any cross-validstion technique such as hold-out or k-fold on the training and dev sets to make sure that there are no bias or variance problems.

But those other techniques are just ways to create more subsets of the data. Tom is just saying what Prof Ng has said at a number of points in the courses. Prof Ng discussed this in some detail in Week 1 of DLS Course 2 in the section on tuning hyperparameters. Then he discusses it in an even deeper way here in DLS Course 3 where he discusses techniques for dealing with the case that the training set is from a different statistical distribution than the dev and test sets. In fact in that section, he ends up using 4 subsets of the data: training set, training-dev set, dev set and test set.

In the train-dev-test case, we use the training and the dev sets to make any decision about the model, leaving the test set out for the final fair judgement of the final model. It is not fair to use either the training and the dev sets because your final model is built on them and so it favours them.

In the train-test case, we use both of them to make decisions until the final model looks great on both sets.

The thing that ONLY the 3-set case can do is to give a performance value based on a dataset that is never used to make any decision about the final model.

Let’s make an analogy, a resturant staff (some cooks + waiters/waitresses) has made all the decisions that finally make the food they think is the best, why would the manager still want to invite outside people to food tasting?