Drop out gradient (scaled again)

donnie1123 · August 22, 2021, 5:53am

In Course 2 Week 1 Assignment 2. I understand dW should multiply by random zero matrix again. But whether we need to scale it but 1 / keep_prob? Since dW3 = 1./m * np.dot(dZ3, A2.T) where A2 is already scaled, Why we need to scale again. Thanks

nramon · August 23, 2021, 6:35pm

Hi, @donnie1123.

That expression corresponds to the derivative of the dropout function. Here’s the derivation (the scaling factor is applied to mask).

Good luck with the assignment

Topic		Replies	Views
Backpropagation with Dropout Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	784	April 19, 2023
Gradients with dropout Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	803	July 28, 2023
C2W1: Programming Assignment on Regularization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	394	October 2, 2023
Course 2 -- Week 1 -- Dropout Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	761	June 28, 2021
Doubt about the implementation of inverted dropout Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	849	November 19, 2024

Drop out gradient (scaled again)

Related topics