W3_A1_Ex-6_backward_propagation [AssertionError: Wrong values for dW1]

Federico_Starace · November 20, 2021, 6:19pm

Hi,
I keep getting this error despite dW1 matching exactly with the expected dW1 result
To be more precise, to compute dW1 I implemented this expression:

 dZ1 = np.dot(W2.T, dZ2) *(1 - np.tanh(cache['Z1'])**2)
 dW1 = 1/m * np.dot(dZ1, X.T)

The result for dZ1 in my implementation is:

dW1 = [[ 0.00301023 -0.00747267]
 [ 0.00257968 -0.00641288]
 [-0.00156892  0.003893  ]
 [-0.00652037  0.01618243]]

Thus matching the expected output for dW1
Expected output

dW1 = [[ 0.00301023 -0.00747267]
 [ 0.00257968 -0.00641288]
 [-0.00156892  0.003893  ]
 [-0.00652037  0.01618243]]

This is the complete traceback

AssertionError                            Traceback (most recent call last)
<ipython-input-48-a06d396e2b09> in <module>
      7 print ("db2 = "+ str(grads["db2"]))
      8 
----> 9 backward_propagation_test(backward_propagation)

~/work/release/W3A1/public_tests.py in backward_propagation_test(target)
    177     assert output["db2"].shape == expected_output["db2"].shape, f"Wrong shape for db2."
    178 
--> 179     assert np.allclose(output["dW1"], expected_output["dW1"]), "Wrong values for dW1"
    180     assert np.allclose(output["db1"], expected_output["db1"]), "Wrong values for db1"
    181     assert np.allclose(output["dW2"], expected_output["dW2"]), "Wrong values for dW2"

AssertionError: Wrong values for dW1

Am I missing something?

paulinpaloalto · November 20, 2021, 7:25pm

Well, it’s possible that the test case you fail is somehow different than the one that you seem to pass. Note that you are working too hard in computing the derivative of tanh. You already have A1 available in the cache, right? It’s possible that your solution has different rounding behavior or maybe the cache input in the other test case doesn’t have a good value for Z1?

paulinpaloalto · November 20, 2021, 7:40pm

Yes, my last theory turns out to be right: you can check by printing A1 and np.tanh(Z1) and you’ll see they are the same for the test case you can see in the notebook, but not for the test case in the file public_tests.py. Here are the first few lines of the test routine:

def backward_propagation_test(target):
    np.random.seed(1)
    X = np.random.randn(3, 7)
    Y = (np.random.randn(1, 7) > 0)
    parameters = {'W1': np.random.randn(9, 3),
         'W2': np.random.randn(1, 9),
         'b1': np.array([[ 0.], [ 0.], [ 0.], [ 0.], [ 0.], [ 0.], [ 0.], [ 0.], [ 0.]]),
         'b2': np.array([[ 0.]])}

    cache = {'A1': np.random.randn(9, 7),
         'A2': np.random.randn(1, 7),
         'Z1': np.random.randn(9, 7),
         'Z2': np.random.randn(1, 7),}

So you can see that there is no mathematical relationship between Z1 and A1 in the cache for that test case. You could argue this is bogus, but why not just use the A1 value? It’s easier (and more efficient at runtime) to write the code that way in any case.

Federico_Starace · November 20, 2021, 7:46pm

Thanks a lot! Quick question though, I don’t really get how can I compute dZ1 using A1. Assuming I follow this formula

$\frac{\partial \mathcal{J} }{ \partial z_{1}^{(i)} } =  W_2^T \frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)} } * ( 1 - a^{[1] (i) 2}) $

Note: latex eq is not displaying in code embed

paulinpaloalto · November 20, 2021, 7:48pm

Well, this is a true statement, right?

A1 = np.tanh(Z1)

That’s what it means that tanh is the layer 1 activation function. You just got done writing that code in the previous function. So just plug in A1 in place of np.tanh(Z1) in your existing code.

paulinpaloalto · November 20, 2021, 7:49pm

Just use $ bracketing in normal markdown text for LaTeX and it works fine.

Federico_Starace · November 20, 2021, 7:54pm

Oh ok. Just saw also the tip from the exercise further explaining it. Thanks again for the availability

paulinpaloalto · November 20, 2021, 8:45pm

Oh, right, they literally wrote the code out for you in the instructions.

Hichem_Djadoudi · May 30, 2023, 8:31am

Hi I have this kind of error, but the output and the expected as similar as i used

dZ1 =np.dot(W2.T,dZ2)*(1-np.power(np.tanh(cache[“Z1”]),2)) for which shoud come to the same thing if i am right ? i dont understand if you have any idea

paulinpaloalto · May 30, 2023, 3:35pm

Hi, Hichem.

Your code is correct from a mathematical point of view, but I explained earlier on this thread why it doesn’t work with the test cases they have here. Their test cases are not correctly constructed because the Z1 and A1 values are not related. The relationship should be:

A1 = np.tanh(Z1)

But you can see in this post that it’s not. So you just have to write the g'(Z1) expression as (1 - A1^2) in order to pass the tests there. You have A1 in the cache, so it’s actually more efficient to write it that way in any case. The code will run faster because computing tanh is non-trivial (you need to evaluate e^x). They also gave you a very explicit hint to do it that way in the instructions for this section.

Hichem_Djadoudi · May 31, 2023, 7:06am

Hi,

Thanks for you re help, i guess so better to use A1

Topic		Replies	Views
W3_A1_Ex-6_AssertionError_Getting wrong values Neural Networks and Deep Learning coursera-platform	10	697	June 22, 2023
W3 \| Planar Data Classification \| Backprop wrong dW1 Neural Networks and Deep Learning coursera-platform	1	502	September 27, 2022
Week 3 Programming Assignment Exercise 6 Neural Networks and Deep Learning coursera-platform	5	601	June 25, 2023
Course 1 Week 3 BackPropogation Error: Wrong value of dw1 Neural Networks and Deep Learning coursera-platform	7	651	September 6, 2023
Week 3 Programming - Exercise 6 Back Prop Neural Networks and Deep Learning coursera-platform	3	546	May 17, 2022

W3_A1_Ex-6_backward_propagation [AssertionError: Wrong values for dW1]

Related topics