I assume this did not happen when you solved the exercises in this assignment. Note that (as typical here) they set the random seeds, so that the results are actually reproducible.

But since you are solving a different problem of your own, note that NaN can occur because of getting `log(0)`

when you compute the cost in the case that sigmoid “saturates”: mathematically the output of sigmoid can never be exactly 0 or 1, but that’s if you’re doing “pure math” and using \mathbb{R}. But we don’t have that luxury: we have to do everything in the finite representation of either 32 or 64 bit floating point. In float64, I think sigmoid saturates for z > 35 or so.

There are several things to say here:

For starters, just getting NaN for the cost doesn’t really do any harm. The actual J value is not really used for anything. All we really care about are the gradients and those are fine. Well, they will be exactly 0 for the samples with z > 35, but that’s only a problem if *all* your z values are > 35.

You can actually implement your loss logic to catch the saturation cases. Here’s a thread which discusses that.