W2_A2_Ex6 optimizing log(0) error

I’m getting a NaN error on iteration 42 of the 2nd optimization loop in exercise 6. I’m trying to be careful to not reveal my answer publicly per forum guidelines.

Debugging through it, it turns out that the A ends up being 1.00000000+e00 which causes cost function np.log(1-A) bit to call np.log(0) which is undefined. Is there a zero safe way to execute the np.dot(1-Y, np.log(1-A).T) bit with zero safety?

This is the A value where things break:

A:[[1.43030959e-29 6.31160658e-42 1.00000000e+00]]

Yes, you get NaN from the cost function if any of your \hat{y} values “saturate” and round to exactly 1. Of course if we were doing pure math here, the output of sigmoid could never be one, but it can happen in floating point.

But the bigger point here is that this should not be happening. There must be something else wrong with your code. Are you sure you pass all the tests for the various functions? Please show the output of the model test and then what you get when you run the real training which is I assume where you’re seeing that NaN value.

Update: Oh, sorry, exercise 6 is optimize, not model. Then the story is more straightforward: this most likely means your “update parameters” logic is incorrect. Things to check are that you subtracted rather than adding and that you are using the actual learning rate that was passed in, not hard-coding it to the default or the like.

But it is an interesting point that in a more general case, it might be necessary to defend yourself against this “saturation” problem. In 64 bit floats, it only takes z > 36 to saturate sigmoid. In 32 floats it’s even less. Here’s a thread which discusses a general way to do that. Eventually we will convert to using packages like TensorFlow and they have sophisticated implementations for defending against this.

Thanks for your response. Here are the outputs from the propage and optimize functions. All tests pass until the optimize. Potentially issue with dw|db calculation causing w and b to be too large?

Propagate method output:

Optimize method outputs (standard and with additional details):

Yes, your values are clearly wrong in the optimize case. Did you compare your code to the math formulas for the “update parameters” step? I’ll bet you used + instead of - there. We subtract because the gradient points in the direction of most rapid increase of the function (the cost in our case). What we want is to decrease the cost, so we go in the opposite direction.

Thank you very much; I got it working. I had - in the w and b adjustment methods, but I was calling sigmoid on the dw|db values in the adjustment function instead of alpha hence ignoring the learning rate. This case can be closed.