In the “Summary of gradient descent” image, dZ1 is computed using g1’(Z1). However, when I use Z1 in the exercise, I do not pass the test. If I instead use A1 (as indicated in the tips section of the exercise), I pass the test. Why do we use A1 to compute dZ1 in the exercise when it appears from the image we need to use Z1?
It turns out that the way they constructed the test cases here is not really correct. It turns out that we have this mathematical relationship:
g(z) = tanh(z)
g'(z) = (1 - tanh^2(z))
So in our particular case here, we should have:
A1 = tanh(Z1)
g'(Z1) = (1 - A1^2) = (1 - tanh^2(Z1))
But when they constructed the test case, there is no relationship between the Z1 and A1 values they happen to give you: both are just random numbers. So you have to write the code using (1 - A1^2) in order to get the correct answer. It’s simpler to write the code that way and more efficient as well, so we just need to go with that for now. I filed a bug about the test case a while back, but changing it turns out to be not so simple and it’s not clear whether they will agree to go through with that.
Also note that the instructions explicitly tell you to use the (1 - A1^2) method. In fact they actually wrote out the code for you.
Thank you for the explanation. It seems I forgot that the definition of g1’(Z1) is 1 - A1^2, and that is why we use A1 in the code. Thank you!