Drop out gradient (scaled again)

In Course 2 Week 1 Assignment 2. I understand dW should multiply by random zero matrix again. But whether we need to scale it but 1 / keep_prob? Since dW3 = 1./m * np.dot(dZ3, A2.T) where A2 is already scaled, Why we need to scale again. Thanks

Hi, @donnie1123.

That expression corresponds to the derivative of the dropout function. Here’s the derivation (the scaling factor is applied to mask).

Good luck with the assignment :slight_smile: