Hi,
I keep getting this error despite dW1 matching exactly with the expected dW1 result
To be more precise, to compute dW1 I implemented this expression:
dZ1 = np.dot(W2.T, dZ2) *(1 - np.tanh(cache['Z1'])**2)
dW1 = 1/m * np.dot(dZ1, X.T)
The result for dZ1 in my implementation is:
dW1 = [[ 0.00301023 -0.00747267]
[ 0.00257968 -0.00641288]
[-0.00156892 0.003893 ]
[-0.00652037 0.01618243]]
Thus matching the expected output for dW1
Expected output
dW1 = [[ 0.00301023 -0.00747267]
[ 0.00257968 -0.00641288]
[-0.00156892 0.003893 ]
[-0.00652037 0.01618243]]
This is the complete traceback
AssertionError Traceback (most recent call last)
<ipython-input-48-a06d396e2b09> in <module>
7 print ("db2 = "+ str(grads["db2"]))
8
----> 9 backward_propagation_test(backward_propagation)
~/work/release/W3A1/public_tests.py in backward_propagation_test(target)
177 assert output["db2"].shape == expected_output["db2"].shape, f"Wrong shape for db2."
178
--> 179 assert np.allclose(output["dW1"], expected_output["dW1"]), "Wrong values for dW1"
180 assert np.allclose(output["db1"], expected_output["db1"]), "Wrong values for db1"
181 assert np.allclose(output["dW2"], expected_output["dW2"]), "Wrong values for dW2"
AssertionError: Wrong values for dW1
Am I missing something?
Well, it’s possible that the test case you fail is somehow different than the one that you seem to pass. Note that you are working too hard in computing the derivative of tanh. You already have A1 available in the cache, right? It’s possible that your solution has different rounding behavior or maybe the cache input in the other test case doesn’t have a good value for Z1?
Yes, my last theory turns out to be right: you can check by printing A1 and np.tanh(Z1) and you’ll see they are the same for the test case you can see in the notebook, but not for the test case in the file public_tests.py. Here are the first few lines of the test routine:
def backward_propagation_test(target):
np.random.seed(1)
X = np.random.randn(3, 7)
Y = (np.random.randn(1, 7) > 0)
parameters = {'W1': np.random.randn(9, 3),
'W2': np.random.randn(1, 9),
'b1': np.array([[ 0.], [ 0.], [ 0.], [ 0.], [ 0.], [ 0.], [ 0.], [ 0.], [ 0.]]),
'b2': np.array([[ 0.]])}
cache = {'A1': np.random.randn(9, 7),
'A2': np.random.randn(1, 7),
'Z1': np.random.randn(9, 7),
'Z2': np.random.randn(1, 7),}
So you can see that there is no mathematical relationship between Z1 and A1 in the cache for that test case. You could argue this is bogus, but why not just use the A1 value? It’s easier (and more efficient at runtime) to write the code that way in any case.
Thanks a lot! Quick question though, I don’t really get how can I compute dZ1 using A1. Assuming I follow this formula
$\frac{\partial \mathcal{J} }{ \partial z_{1}^{(i)} } = W_2^T \frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)} } * ( 1 - a^{[1] (i) 2}) $
Note: latex eq is not displaying in code embed
Well, this is a true statement, right?
A1 = np.tanh(Z1)
That’s what it means that tanh is the layer 1 activation function. You just got done writing that code in the previous function. So just plug in A1 in place of np.tanh(Z1)
in your existing code.
Just use $ bracketing in normal markdown text for LaTeX and it works fine.
Oh ok. Just saw also the tip from the exercise further explaining it. Thanks again for the availability
Oh, right, they literally wrote the code out for you in the instructions.
Hi I have this kind of error, but the output and the expected as similar as i used
dZ1 =np.dot(W2.T,dZ2)*(1-np.power(np.tanh(cache[“Z1”]),2)) for which shoud come to the same thing if i am right ? i dont understand if you have any idea
Hi, Hichem.
Your code is correct from a mathematical point of view, but I explained earlier on this thread why it doesn’t work with the test cases they have here. Their test cases are not correctly constructed because the Z1 and A1 values are not related. The relationship should be:
A1 = np.tanh(Z1)
But you can see in this post that it’s not. So you just have to write the g'(Z1) expression as (1 - A1^2) in order to pass the tests there. You have A1 in the cache, so it’s actually more efficient to write it that way in any case. The code will run faster because computing tanh
is non-trivial (you need to evaluate e^x). They also gave you a very explicit hint to do it that way in the instructions for this section.
Hi,
Thanks for you re help, i guess so better to use A1