The instructions for calculating dZ[1] look like this according to the script:
We use the tanh function here, whose derivative is 1-tanh(x)^2. The formula clearly states that the input for the derivative of tanh is Z[1]. However, the solution only works if I use A1 as input (as the task also suggests). Can someone please explain to me why we use A1 here and not cache[“Z1”]?
*sorry, if the question was already answered somewhere else.
It doesn’t mean that the input is Z^{[1]}. It means its a derivative with respect to Z^{[1]} and that is 1 - tanh(Z^{[1]})^2 and tanh(Z^{[1]}) is equal to A^{[1]}.
This is a bug in the grader test case. They just generated all the A and Z values as random values and then calculated the output assuming you used (1 - A1^2) for the derivative of tanh
. Your version using (1 - tanh(Z1)^2) is mathematically correct, but it doesn’t pass the test case because with their values:
A1 \neq tanh(Z1)
I have filed a bug about this, but fixing it will take some work and they haven’t gotten around to it yet.
So you just need to follow their instructions and use A1 in this case. It’s more efficient anyway: tanh
is pretty expensive to compute, so it saves work to use the precomputed value from forward propagation.
1 Like
Thank you both for the explanation. Now I understand what is meant.
1 Like