Backpropagation when using dropout and Regularization

Rob_Chavez · February 10, 2022, 2:16am

@neurogeek/all —

Curious how one implements backpropagation when using dropout and regularization. Do you just hold static the “W parameter” updates for nodes that were dropped? Or am I thinking about this incorrectly? Welcome any links I may have missed when googling this

Rob_Chavez · February 10, 2022, 2:27am

just started going through the regularization notebook and see it’s mentioned there… never mind!

neurogeek · February 10, 2022, 2:44pm

Hey @Rob_Chavez,

Glad you found it in the notebook. I just wanted to point out the Dropout paper http://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf which has a specific section on the effects of backprop when using dropout.

Cheers and happy learning!

paulinpaloalto · February 10, 2022, 5:24pm

It sounds like you’ve already found the answer, but maybe the one other thing worth saying is to “generalize” the point to all forms of regularization: how regularization affects back prop in general is determined by how the regularization affects forward propagation. Forward prop is a composition of functions and you then use the Chain Rule to take the derivatives of all those layers of functions in order to do back prop, right? So what the functions are determines their derivatives, which determines what effect they have during back prop. In the case of dropout, you’re literally zeroing some of the neurons on a per sample basis, so the derivative is also zero for those particular elements and it is also affected by the “reverse scaling” by \frac {1}{keepProb}. In the case of L2 regularization, the mechanism is completely different: you just get a number of new terms in the summation of the cost function and the derivatives of those terms are included in the gradients as well at back prop time.

Rob_Chavez · February 11, 2022, 2:25am

Thank you for the reply and the paper…and with Hinton as an author; sweet

Rob_Chavez · February 11, 2022, 2:36am

Thank you, @paulinpaloalto, that was a great explanation. Was able to calculate the derivates for backprop for the first course and would like to give it a go again for this one

Topic		Replies	Views
Regularization Intuition In Programming Assignment Improving Deep Neural Networks: Hyperparameter tun	2	519	July 13, 2021
C2W1: Programming Assignment on Regularization Improving Deep Neural Networks: Hyperparameter tun	3	371	October 2, 2023
Back Prop question Advanced Learning Algorithms week-2	54	165	May 27, 2025
Week 1 assignment 1 backprop; calculating dx Sequence Models	4	561	December 10, 2023
Week 1: Back propagation with L2 regularization Improving Deep Neural Networks: Hyperparameter tun	1	889	June 24, 2021

Backpropagation when using dropout and Regularization

Related topics