In Exercise 5, we are asked to compute cost using np.dot. The most obvious choice (to me) is

*{Moderator Edit: Solution Code Removed}*

However, this has the potential of giving ‘nan’ or ‘-inf’ values when A has entries that are 0 or 1. How do we avoid this issue?

Check the formula again:

J = -\frac{1}{m}\sum_{i=1}^{m}(y^{(i)}\log(a^{(i)})+(1-y^{(i)})\log(1-a^{(i)}))

You also have to use the sum function.

PS: Posting your code is not allowed. I am deleting it after this reply. Next time, only share your full error.

Sorry for posting the code.

I was able to pass all the tests. The issue I described appears only if I play around with different learning rates. For some learning rates, the cost function gives ‘nan’ values. I suspect it is because with these learning rates, the weights/biases are taking values that are too extreme (in either direction) thus making the output of sigmoid function 0 or 1.

I am wondering if there is a way to implement the cost function so that even in the case above, the cost function does not take ‘nan’ values. instead it takes very large or very small values, giving the gradient descent a chance to tame it in some more iterations.

Thank you.

Gradient Descent does not actually depend on the J values themselves, just the derivatives of J w.r.t. the various parameters. So Gradient Descent still works, even if your cost function is throwing NaNs, because your sigmoid values have “saturated”. But you can also add logic to detect and avoid the saturation cases. Here’s a thread which discusses this in more detail.