Cost function is not well defined using dropout

anon76241992 · November 16, 2022, 9:56am

HI
In the video “Understanding dropout” prof. Andrew said that the cost function of NN will not be well defined which is quite understandable but after that he said the learning curve (plot of training loss vs Val loss) can not be used. Why?? And after that, he said that he often “turns off dropout or sets keep_prob=1” but when?? training phase or inference phase??

when I use TensorFlow and Keras to do some training I know that the dropout layer will behave differently if we set arg training=False or True but I still don’t get the phrase “turn off dropout or set keep_prob=1”. When to do that and if I have a dropout layer in my NN I can not plot learning curve??

Doesn’t the training loss in the learning curve compute using NN with dropout?

paulinpaloalto · November 16, 2022, 5:16pm

All forms of regularization (L1, L2, dropout …) are only applied during training. So anytime you are in “inference” (prediction) mode, you disable regularization, which includes setting keep_prob = 1 if you are using dropout. I think Prof Ng’s point in the section you are referring to is that when you compute training accuracy or even the cost for the purposes of plotting the learning curve, you treat it as if you are in prediction mode and set keep_prob = 1 to get a well defined cost function.

anon76241992 · November 17, 2022, 10:04am

thanks for replying sir. then what about the part the prof. Andrew said learning curve can not be used? I confused that if we use drop out layer can we still investigate training process with the learning curve?

paulinpaloalto · November 17, 2022, 3:32pm

Prof Ng is recommending against doing that. But you could do it if you used the technique I alluded to above: every 100th iteration of training, you could rerun the forward propagation with keep_prob = 1 and then use those results to compute a consistently defined cost and training and validation accuracy values. In other words, you would be using the full trained model (not the randomly subsetted one) there every time, so the results will be meaningful. It would be a little more code, but would allow you to track the convergence in a mathematically correct way. But it’s an interesting question whether the mathematical point that Prof Ng is making here would really have that much effect in a “real world” sense. All this behavior is statistical anyway, so the fact that dropout is perturbing the cost function could be considered as just adding some more statistical noise to the learning curve data. Of course how much noise will depend on your keep_prob value. You could try some experiments comparing the “pure” method I suggested above with just taking the “incorrect” perturbed cost every 100 iterations and see if the two methods actually give you results that are that much different. But I’ve never tried that comparison and Prof Ng is the expert here: he probably wouldn’t have brought this up and spent the time on it if it didn’t really matter.

Topic		Replies	Views
Plotting Cost Function J with Dropout Improving Deep Neural Networks: Hyperparameter tun	3	590	January 4, 2022
Doubt related to Inverted Dropout Technique Improving Deep Neural Networks: Hyperparameter tun	2	820	February 16, 2023
Inverted dropout Intuition? Improving Deep Neural Networks: Hyperparameter tun	3	671	May 24, 2022
Monotonic decrease of cost function plot confusion Improving Deep Neural Networks: Hyperparameter tun week-2	4	145	May 12, 2024
Difference between cost function of L2 and dropout regulariztion - Week1 Improving Deep Neural Networks: Hyperparameter tun	3	541	December 19, 2022

Cost function is not well defined using dropout

Related topics