Usually, in non neural network ML techniques, I used to split my data to train vs test.
train set used to train the model while test set used to evaluate how good is this model.
Now in the first video, Andrew present another type of data - dev, which I couldn’t understand exactly what it is.
can someone please describe in plain english what is the roll of each data set train dev and tets.
adding an example would be great as well
thanks!
Hi Noam, Great question! here is a good wiki article.
A distinction between the training set is that it’s used to learn parameters such as the weights of the model, while the validation or development set can be used to adjust hyper parameters like the number of hidden layers and the size of them
Well, this is quite confusing me.
Are you saying that the hyperparameters should be examined after the model has been trained on the dataset? If it’s true it doesn’t make sense.
Obviously the hyperparameters affect the cost function, hence affect the error of the total training set, so it doesn’t make sense to me that the hyperparameters examined after the training set.
As @gautamaltman said, the parameters are what the model learns during training (e.g. weights), while the hyperparameters are those you set up before the training (e.g. number of neurons).
The process can be done in this way:
Set up specific hyperparameters and train with the training set.
Evaluate the trained model with the dev set. If the performance is not good enough, you can tweak your hyperparameters and train again (step 1), until you are satisfied with the performance on the dev set.
With your final model, then you evaluate the test set to have an estimation of the performance with unseen data.
Here you have some previous discussions about this topic that may help you: