This is of course going to sound crazy, but why do a training set? If the purpose of the training set is to decide what model I want to use, and I already know I want to use a deep neural network, what purpose does the training set actually serve? I mean, it’s not like we use the model from the training set to initialize the parameters in the dev and test sets. (That actually sound like a good idea, though. )
So if we are essentially starting from scratch on the dev set and test set, why not just start there? I am sure I am missing something, so can someone please enlighten me?
Le’ts recap what is training set, validation set (dev set), and test set.
We eventually need to evaluate the network. In this case, we need to use a brand-new data, which means, data is not fed into the network before. If we reuse the data, it means that we give answers to the network. That’s not a fair evaluation. So, at first, we need to separate the test set.
Then, the most important step is how we can finalize the weights for the network, which is really tough work if we consider the number of trainable parameters. It sometimes reaches to “millions”.
As “training” is important, we want to use data as much as possible. That’s training data set.
This training set is used multiple times.
Training time is not trivial. How we can understand/monitor what’s going during the training ? It may be overfitting, or exploding, not converging,… Lot’s of conditions that we need to check to set the hyper parameters like “learning-rate”, “epoch number”, “batch size”, etc. To check those, which data we can use ? Again, we can not use any of test data. So, we need to allocate some data from the training set for that purpose. That’s validation set (dev set). In this sense, validation set is not used for training weights, but is used separately to see what’s going during training.
That’s the reason why we need the training, validation and test sets separately.
Hope this helps.