I have successfully completed all the tests so far (until exercise 6), however I’m stuck for sometime now on the backward propagation function.
It seems I’m calculating all the derivates correctly in the 2nd layers (db2, dW2) but my value for the dW1 is not equal to the solution, so I must be doing something wrong.
Having the 2nd layer values correct I guess the error is in the calculation of dZ1. I’m calculating it following the given equation ok:
I have tried both the np.multiply function and the * operator to perform the product between 𝑔[1]′(𝑍[1]) and the dot product of w[2]T dZ[2].
The given formula should be straight forward but somehow is not working for me, any ideas of what it might be?
Welcome, @Joao_Neto. First, thank you for an exemplary post! You framed the problem well and explained what you have tried to fix it. Well done. There is only one thing missing though. It’s a great help to insert a snapshot of the “traceback” i.e. the detailed error log that is directed to your screen. Only the traceback, not you code. As you are probably well aware, the latter is not allowed, as a policy matter. It would be great if you could append that to your message. Thanks! @kenb
Right, exactly as you reported. I think that I am correct to assume that all preceding functions have passed their tests, in which case the best that I have to offer is that you check and recheck your expressions for the gradient. Also be sure that you have not accidently changed other parts of the function. If you no longer have the original copy available as a reference, you can download as fresh one using this FAQ.
Note that * and np.multiply are both the same operation, which is “elementwise” multiply. The expression for dW1 involves two different operations: It is a dot product between W^{[2]T} and dZ^{[2]} followed by an elementwise multiply between the first result and g^{[1]'}(Z^{[1]}).
It might help to take a look at this thread to understand more about the notational conventions that Prof Ng uses for dot product and elementwise multiply.
It might also help to actually see the value that you get for dW1 with your code. Maybe there would be a clue there. One common error on this particular line of code is to use dW^{[2]} instead of W^{[2]} in the dot product.