[C2W2 - Gradient Checking lab] ReLU backprop formula

In the backward_propagation_n function of the lab, the gradient of the ReLU function, say for Z2, is given as np.int64(A2 > 0).

However, should this be rather np.int64(Z2 > 0) instead, since the gradient is taken with respect to Z2 and not to A2? I think for this case, we can use both, since Z2 > 0 is synonymous with A2 > 0 given the ReLU function, but I’m just wondering if I’m misunderstanding something.

Thank you in advance!

Your analysis is correct, either will work because of the underlying behavior of ReLU. They’ve already referenced A2 and A1 earlier in the code, so you could argue it is ever ever so slightly more efficient to use variables that will already be cached in memory. But even that is not really true, since the other values were loaded from the cache variable a couple of lines earlier. Oh, well. Using the Z values would arguably have been more mathematically correct.

Also note that you’ve got to be very careful there. If you use >= instead of > there, it only works with Z2.

1 Like