Hi everyone, I understand the idea behind dropouts of forward prop where we have to mask A1 with D1 and scaling it by dividing A1 by keep_prob.
A1 = A1*D1
A1 = A1/keep_prob
However, since we have already applied the dropouts during the forward prop, the cost function for the first iteration would have taken that into account. Then wouldn’t dA1 already be based on the scaled and masked A1? Why do we still need to multiply dA1 by D1 and then dividing it by keep_prob?
Am I missing something here?
The gradients are just the derivatives of the forward propagation steps. So if the forward function has a factor of 1/keep_prob, then the derivative will also, right?
Also remember that dA1 is one output of the back prop calculation at layer 2, so it does not automatically have any entries zeroed by the mask. That happens as we do the back prop calculation for layer 1. dA1 is just an intermediate value that we need to calculate the gradients that we actually apply, which of course are dW1 and db1 for layer 1.
Thank you for your prompt reply! Perhaps I am still missing something here.
For example, assuming a column vector with each value calculated based on the equation, y = x^2 + 2x.
Let’s say it has 5 rows, with x= 1, 2, 3, 4, 5, v=[[3, 12, 15, 24, 35]]. And after multiplying by a D1 layer and dividing by keep_prob(0.8), v(shut)= [[3.75, 0, 18.75, 0, 43.75]].
Thereafter, when we use the values of v(shut) to calculate the derivative values, aren’t they already masked and scaled accordingly?
The point is that the derivative is the derivative of a function, not a substitution of a particular set of values. The “values” are being back propagated from the “downstream” layers. Do you know what the Chain Rule is and how it works? Note that Prof Ng does not really cover the derivation of back prop because these courses are designed not to require a knowledge of calculus. Here’s a thread that covers the basic derivation of back prop for a feed forward net without dropout. Maybe based on that, it will make more sense if you contemplate how to incorporate dropout into that picture.