Hi Noam, Great question! here is a good wiki article.
A distinction between the training set is that it’s used to learn parameters such as the weights of the model, while the validation or development set can be used to adjust hyper parameters like the number of hidden layers and the size of them
Well, this is quite confusing me.
Are you saying that the hyperparameters should be examined after the model has been trained on the dataset? If it’s true it doesn’t make sense.
Obviously the hyperparameters affect the cost function, hence affect the error of the total training set, so it doesn’t make sense to me that the hyperparameters examined after the training set.
As @gautamaltman said, the parameters are what the model learns during training (e.g. weights), while the hyperparameters are those you set up before the training (e.g. number of neurons).
The process can be done in this way:
Set up specific hyperparameters and train with the training set.
Evaluate the trained model with the dev set. If the performance is not good enough, you can tweak your hyperparameters and train again (step 1), until you are satisfied with the performance on the dev set.
With your final model, then you evaluate the test set to have an estimation of the performance with unseen data.
Here you have some previous discussions about this topic that may help you: