Hello, please explain to me the following code in backward_propagation_with_regularization function: dZ = np.multiply(dA, np.int64(A > 0))
Why are we multiplying dA
on np.int64(A > 0)
?
Everything in this exercise looks different than the fully general L layer code we built in C1 Week 4, right? They’ve just hard-coded everything to a specific 3 layer network to keep the code simple. No layers of functions like linear_activation_backward calling relu_backward and linear_backward.
The general formula being implemented there is:
dZ^{[l]} = dA^{[l]} * g^{[l]'}(Z^{[l]})
But in the specific case that the activation function is ReLU. Think about it for a sec and the light should go on!
You could legitimately observe that it would be more literally correct if they had written np.int64(Z > 0), but if A = ReLU(Z), then A > 0 iff Z > 0, right? I were writing the code, I would have written it as np.float64(A > 0), but that’s just me. I prefer not to assume python’s type coercion is going to do exactly what I expect.
Thank you for your explanation! The derivative of ReLU is 1 or 0 and np.int
or np.float
from bool values just converts them to 1 or 0