Drop out gradient (scaled again)

In Course 2 Week 1 Assignment 2. I understand dW should multiply by random zero matrix again. But whether we need to scale it but 1 / keep_prob? Since dW3 = 1./m * np.dot(dZ3, A2.T) where A2 is already scaled, Why we need to scale again. Thanks

That expression corresponds to the derivative of the dropout function. Here’s the derivation (the scaling factor is applied to mask).

