For the second programing assignment of week 4 about the implementation of 2-Layer NN model as well as L-layer Model, I am getting the 2-Layer not working at all. I am getting nan as a result for the cost across all the iterations. I don’t know what I did wrong. I computed the cost by invoking compute_cost() with the two parameters A2 and Y.
I did the same (with a slight change in the parameters) when implementing the L-layer model and got the model working just fine . The second model well implemented, but not the first one.
For the inputs for Second call linear forward first argument is same as that for first call.
The first argument should be A1 as second layer input is from first layer.
Also as this is related to DeepLearning Specialization I am moving it to that chat if you continue to have problems you can still ask there.
Hey guys, that was a great point about the input for the second layer. I changed it from X to A1, but the problem remains. The algorithm keeps giving nan for the cost.
The way to get NaN for the cost is if any of your A2 values are “saturated” so that they round to exactly 0 or exactly 1. Of course mathematically the output of sigmoid is never exactly 0 or 1, but we are dealing with the pathetic limitations of floating point arithmetic here. Of course this shouldn’t happen with our test data, so most likely it means something is wrong with how you are handling the learning rate. Note that the subroutines from the Step by Step exercise can be assumed to be correct here, meaning that the mistake is in how you are calling them, not in the functions themselves.
Oh, wait, the bug is obvious: where is the call to sigmoid in your logic? So your A2 is the raw output of linear_forward, so they could even be negative, which causes the NaNs. You should not be calling linear_forward at that level, right? That’s the mistake. It should be linear_activation_forward which includes the activation.
Note that you made the same mistake on backward propagation as well. At least you’re consistent …
Thanks Paul for pointing this out. However, for the programing exercise we don’t have the control on the learning rate (it has been set by the assignment makers). We’re basically not going to change the learning rate, we simply called the implemented functions with the correct parameters.
Thanks Paul, your second comment is very clear about the issue, and it helped solve the NaN value for the cost :). However, I’m getting some of the parameters wrong. Could you please take a glance at this? This is what I have
You have the activation functions backward on the backward propagation steps. Remember you are going backward by definition, so the first step is the output layer which has “sigmoid” as the activation, right?
Could you please elaborate a bit further? I read your comment and it seems to be consistent with my code. But still getting the wrong result. I unsuccessfully spent time trying to spot the problem.
Ok, you have fixed that bug. Are you sure you executed the actual function cell by doing “Shift-Enter” before you ran the test again? Just typing new code and then calling the function again does nothing: it just runs the old code again.
Yes you are completely right, and the clarification is really helpful. I thought the code above could solve the issue, but still getting it wrong. I’m confused