NAN as results for the cost computations

Bertrand_T_Tameza · December 27, 2021, 12:12pm

Hi,

For the second programing assignment of week 4 about the implementation of 2-Layer NN model as well as L-layer Model, I am getting the 2-Layer not working at all. I am getting nan as a result for the cost across all the iterations. I don’t know what I did wrong. I computed the cost by invoking compute_cost() with the two parameters A2 and Y.

I did the same (with a slight change in the parameters) when implementing the L-layer model and got the model working just fine . The second model well implemented, but not the first one.

Any tips that may help please?

THANKS

Hirwa · December 27, 2021, 3:50pm

Hello @Bertrand_T_Tameza can you please share your code so I can help you

Bertrand_T_Tameza · December 27, 2021, 4:46pm

Sure, this is my code

{moderator edit - solution code removed}

ai_curious · December 27, 2021, 5:03pm

Are you sure about the inputs to the second call to linear_forward?

Compare, for example, to the inputs and outputs of the several lines implementing the back prop.

Hirwa · December 27, 2021, 5:21pm

For the inputs for Second call linear forward first argument is same as that for first call.
The first argument should be A1 as second layer input is from first layer.

Also as this is related to DeepLearning Specialization I am moving it to that chat if you continue to have problems you can still ask there.

Thanks

Bertrand_T_Tameza · December 27, 2021, 5:31pm

Hey guys, that was a great point about the input for the second layer. I changed it from X to A1, but the problem remains. The algorithm keeps giving nan for the cost.

Hirwa · December 27, 2021, 5:32pm

Can you send your notebook in private message

Bertrand_T_Tameza · December 27, 2021, 5:34pm

yep, your email please

paulinpaloalto · December 27, 2021, 5:35pm

The way to get NaN for the cost is if any of your A2 values are “saturated” so that they round to exactly 0 or exactly 1. Of course mathematically the output of sigmoid is never exactly 0 or 1, but we are dealing with the pathetic limitations of floating point arithmetic here. Of course this shouldn’t happen with our test data, so most likely it means something is wrong with how you are handling the learning rate. Note that the subroutines from the Step by Step exercise can be assumed to be correct here, meaning that the mistake is in how you are calling them, not in the functions themselves.

paulinpaloalto · December 27, 2021, 5:37pm

Oh, wait, the bug is obvious: where is the call to sigmoid in your logic? So your A2 is the raw output of linear_forward, so they could even be negative, which causes the NaNs. You should not be calling linear_forward at that level, right? That’s the mistake. It should be linear_activation_forward which includes the activation.

Note that you made the same mistake on backward propagation as well. At least you’re consistent …

paulinpaloalto · December 27, 2021, 5:40pm

Discourse supports Direct Messages. Just click the user’s icon and then click “Message”.

But you don’t really need that in this case. I’ve already pointed out the bug.

Bertrand_T_Tameza · December 27, 2021, 5:40pm

Thanks Paul for pointing this out. However, for the programing exercise we don’t have the control on the learning rate (it has been set by the assignment makers). We’re basically not going to change the learning rate, we simply called the implemented functions with the correct parameters.

paulinpaloalto · December 27, 2021, 5:40pm

Please read my other later reply. It wasn’t the learning rate. You are calling the wrong functions.

Bertrand_T_Tameza · December 27, 2021, 6:08pm

Thanks Paul, your second comment is very clear about the issue, and it helped solve the NaN value for the cost :). However, I’m getting some of the parameters wrong. Could you please take a glance at this? This is what I have

{moderator edit - solution code removed}

paulinpaloalto · December 27, 2021, 6:10pm

You have the activation functions backward on the backward propagation steps. Remember you are going backward by definition, so the first step is the output layer which has “sigmoid” as the activation, right?

Bertrand_T_Tameza · December 27, 2021, 6:48pm

Could you please elaborate a bit further? I read your comment and it seems to be consistent with my code. But still getting the wrong result. I unsuccessfully spent time trying to spot the problem.

Bertrand_T_Tameza · December 27, 2021, 6:52pm

paulinpaloalto · December 27, 2021, 6:54pm

Ok, you have fixed that bug. Are you sure you executed the actual function cell by doing “Shift-Enter” before you ran the test again? Just typing new code and then calling the function again does nothing: it just runs the old code again.

Bertrand_T_Tameza · December 27, 2021, 7:02pm

Yes you are completely right, and the clarification is really helpful. I thought the code above could solve the issue, but still getting it wrong. I’m confused

Bertrand_T_Tameza · December 27, 2021, 7:03pm

Yes I executed the function before I ran the test

Topic		Replies	Views
Week 4 Assignment 3 Exercise 1 Neural Networks and Deep Learning week-4	8	291	January 26, 2024
Week 4 Exercise 1 - two_layer_model - wrong output Neural Networks and Deep Learning	3	549	May 19, 2022
Nan in week 3 assignment Neural Networks and Deep Learning	10	682	May 2, 2021
W4_A2 part 2 L_layer_model cost Neural Networks and Deep Learning	4	412	September 5, 2023
Course 1 Week 2 A2 np dot leads to nan Neural Networks and Deep Learning	3	502	July 14, 2023

NAN as results for the cost computations

Related topics