On which set should Dropout be implemented ? Training set , Dev set or Test set ? And Why on that particular set ?
Welcome to the community. In the video titled “Dropout Regularization”, Prof Andrew gave a couple of indirect references that Dropout is implemented on the training set. For instance,
- “And so for each training example, you would train it using one of these neural based networks.”
- “So, what you do is you use the d vector, and you’ll notice that for different training examples, you zero out different hidden units.”
- “And in fact, if you make multiple passes through the same training set, then on different pauses through the training set, you should randomly zero out different hidden units.”
However, he doesn’t state it explicitly, perhaps this is what led you to ask this query. Similarly, he also makes a reference to dropout and test set,
- “So what we’re going to do is not to use dropout at test time in particular which is …”
Now, let me try to give you 2 cents of my opinion as well,
- We implement dropout on the training set when the model is training, and we don’t use dropout on the cross-validation and test sets, when the model is performing inferencing.
- The reason to this lies in the purpose behind using dropout. As clearly stated in the video, we use dropout to decrease the extent of over-fitting.
- Now, over-fitting happens when the model is training, not when the model is inferencing. Here, note that we get to know that the model is over-fitting after performing inferencing, but since the model is trained on the training set, and not on the dev/test sets, hence, the model overfits the training set, and not the dev/test sets.
- And hence, it makes sense to use dropout on the training set, since that is the set on which over-fitting is taking place, and not on the dev and/or test sets.
- Additionally, when we are making predictions on any set, be it test set or be it dev set, we want the predictions to be stable, i.e., we want the model to predict the same output given the same input.
- However, during training, we are trying to reduce the inter-dependence of neurons on each other, and hence, another reason for using dropout on the training set and not on the dev/test sets.
I hope this helps.
Elemento did a great job of covering how this applies specifically to Dropout, but you can also make a more general statement than that: regularization by definition is only applied during training. That applies to all forms for regularization: L2, Lasso, Dropout and others. The point is that regularization modifies the training to result in a different model, which we hope, of course, is a more accurate model with less overfitting. Then you apply the resulting trained model to any other datasets you have: cross validation, test or “real” input data to make predictions.
I totally missed out the generic viewpoint from my answer, and focused on dropout only. Thanks a lot for completing my answer @paulinpaloalto Sir