I’m getting a NaN error on iteration 42 of the 2nd optimization loop in exercise 6. I’m trying to be careful to not reveal my answer publicly per forum guidelines.

Debugging through it, it turns out that the `A`

ends up being `1.00000000+e00`

which causes cost function `np.log(1-A)`

bit to call `np.log(0)`

which is undefined. Is there a zero safe way to execute the `np.dot(1-Y, np.log(1-A).T)`

bit with zero safety?

This is the A value where things break:

```
A:[[1.43030959e-29 6.31160658e-42 1.00000000e+00]]
```

Yes, you get NaN from the cost function if any of your \hat{y} values “saturate” and round to exactly 1. Of course if we were doing pure math here, the output of sigmoid could never be one, but it can happen in floating point.

But the bigger point here is that this should not be happening. There must be something else wrong with your code. Are you sure you pass all the tests for the various functions? Please show the output of the `model`

test and then what you get when you run the real training which is I assume where you’re seeing that NaN value.

**Update:** Oh, sorry, exercise 6 is `optimize`

, not `model`

. Then the story is more straightforward: this most likely means your “update parameters” logic is incorrect. Things to check are that you subtracted rather than adding and that you are using the actual learning rate that was passed in, not hard-coding it to the default or the like.

But it is an interesting point that in a more general case, it might be necessary to defend yourself against this “saturation” problem. In 64 bit floats, it only takes z > 36 to saturate sigmoid. In 32 floats it’s even less. Here’s a thread which discusses a general way to do that. Eventually we will convert to using packages like TensorFlow and they have sophisticated implementations for defending against this.

Thanks for your response. Here are the outputs from the `propage`

and `optimize`

functions. All tests pass until the `optimize`

. Potentially issue with `dw`

|`db`

calculation causing `w`

and `b`

to be too large?

Propagate method output:

Optimize method outputs (standard and with additional details):

Yes, your values are clearly wrong in the `optimize`

case. Did you compare your code to the math formulas for the “update parameters” step? I’ll bet you used + instead of - there. We subtract because the gradient points in the direction of most rapid *increase* of the function (the cost in our case). What we want is to *decrease* the cost, so we go in the opposite direction.

Thank you very much; I got it working. I had - in the `w`

and `b`

adjustment methods, but I was calling `sigmoid`

on the `dw|db`

values in the adjustment function instead of alpha hence ignoring the learning rate. This case can be closed.