W2_A2_Ex6 optimizing log(0) error

Paul_Durcek · January 1, 2024, 10:54pm

I’m getting a NaN error on iteration 42 of the 2nd optimization loop in exercise 6. I’m trying to be careful to not reveal my answer publicly per forum guidelines.

Debugging through it, it turns out that the A ends up being 1.00000000+e00 which causes cost function np.log(1-A) bit to call np.log(0) which is undefined. Is there a zero safe way to execute the np.dot(1-Y, np.log(1-A).T) bit with zero safety?

This is the A value where things break:

A:[[1.43030959e-29 6.31160658e-42 1.00000000e+00]]

paulinpaloalto · January 1, 2024, 11:38pm

Yes, you get NaN from the cost function if any of your \hat{y} values “saturate” and round to exactly 1. Of course if we were doing pure math here, the output of sigmoid could never be one, but it can happen in floating point.

But the bigger point here is that this should not be happening. There must be something else wrong with your code. Are you sure you pass all the tests for the various functions? Please show the output of the model test and then what you get when you run the real training which is I assume where you’re seeing that NaN value.

Update: Oh, sorry, exercise 6 is optimize, not model. Then the story is more straightforward: this most likely means your “update parameters” logic is incorrect. Things to check are that you subtracted rather than adding and that you are using the actual learning rate that was passed in, not hard-coding it to the default or the like.

paulinpaloalto · January 1, 2024, 11:41pm

But it is an interesting point that in a more general case, it might be necessary to defend yourself against this “saturation” problem. In 64 bit floats, it only takes z > 36 to saturate sigmoid. In 32 floats it’s even less. Here’s a thread which discusses a general way to do that. Eventually we will convert to using packages like TensorFlow and they have sophisticated implementations for defending against this.

Paul_Durcek · January 2, 2024, 6:37pm

Thanks for your response. Here are the outputs from the propage and optimize functions. All tests pass until the optimize. Potentially issue with dw|db calculation causing w and b to be too large?

Propagate method output:

Optimize method outputs (standard and with additional details):

paulinpaloalto · January 2, 2024, 7:26pm

Yes, your values are clearly wrong in the optimize case. Did you compare your code to the math formulas for the “update parameters” step? I’ll bet you used + instead of - there. We subtract because the gradient points in the direction of most rapid increase of the function (the cost in our case). What we want is to decrease the cost, so we go in the opposite direction.

Paul_Durcek · January 2, 2024, 8:24pm

Thank you very much; I got it working. I had - in the w and b adjustment methods, but I was calling sigmoid on the dw|db values in the adjustment function instead of alpha hence ignoring the learning rate. This case can be closed.

Topic		Replies	Views
Week 2 Exercise 5 - propogate function log returns NAN and dimension mismatch Neural Networks and Deep Learning week-module-2 , coursera-platform	2	279	February 4, 2024
Course 1 Week 2 A2 np dot leads to nan Neural Networks and Deep Learning coursera-platform	3	502	July 14, 2023
Problem in Exercise 5 Week2 Neural Networks and Deep Learning coursera-platform	4	516	October 31, 2022
NAN as results for the cost computations Neural Networks and Deep Learning coursera-platform	27	608	December 27, 2021
Week 4 Assignment 3 Exercise 1 Neural Networks and Deep Learning week-module-4 , coursera-platform	8	295	January 26, 2024

W2_A2_Ex6 optimizing log(0) error

Related topics