Course 1 Week 2 A2 np dot leads to nan

Gradient Descent does not actually depend on the J values themselves, just the derivatives of J w.r.t. the various parameters. So Gradient Descent still works, even if your cost function is throwing NaNs, because your sigmoid values have “saturated”. But you can also add logic to detect and avoid the saturation cases. Here’s a thread which discusses this in more detail.