Hey, I have a small clarification here. In Dropout, Professor Ng told us that we shouldn’t make dropouts in test set. My understanding is that, All our predictions at the end depends on our W and B parameters that we learn. So, If we optimise that in training time using dropout, It doesn’t make sense to do it in testing since the parameters have already been tuned to it. Is my understanding right? Kindly correct me if I am wrong.
All forms of regularization (L2, dropout, L1 …) are only applied during training. Never when actually making predictions with the resulting trained model. The point is that regularization modifies the results of the training: you get a different model with regularization than without. But that is the model at the end of training. When you apply the model to the cross development set, the test set or real world inputs, you never include the regularization. Its results are already “built in” to the weights of the model, which I think is what you are saying in your last sentence. At first I thought you were disagreeing with what Prof Ng said, but in the end it sounds like you are agreeing with him.
Note that you don’t even include the dropout logic when you are making predictions on the training data. The point is you are not changing (training) the model when you are simply making predictions. When you are finished with training, you need to calculate the prediction accuracy on the training data, so that you can evaluate the performance of the model and detect things like bias and overfitting. When you do prediction with the model, it is always done without regularlzation. It is only when you are doing gradient descent during training that you include the dropout.