Cost function problem

This is an interesting question! Of course the \hat{y} values are the output of sigmoid, so they can never be exactly 0 or 1 mathematically. But here we are dealing with the finite limitations of floating point representations, not the abstract beauty of \mathbb{R}, so it can actually happen that the values “saturate” and end up rounding to be exactly 0 or 1.

There are several ways to handle that:

You can test your \hat{y} values for exact equality to 0 or 1 and then slightly perturb the values before the cost computation:

A[A == 0.] = 1e-10
A[A == 1.] = 1. - 1e-10

You can also use isinf() and isnan() to replace any saturated values that happen after the fact, although that’s a bit more code since you need to catch the bad values while the cost is still the “loss” in vector form.

loss[np.isnan(loss) | np.isneginf(loss)] = 42.

You could replace the non-numeric values with 0., but the point is those cases represent a big error that should be punished pretty severely by the loss function. Of course the actual J value doesn’t really affect the gradients in any case: the derivatives are calculated separately.

You can look up the documentation for numpy isnan() and isneginf(). There are two cases to worry about:

If the \hat{y} is 1 and the y is 0, then you get 1 * -\infty for the (1 - Y) term, which is -\infty. But if \hat{y} is 1 and the y is 1, then you get 0 * -\infty for the (1 - Y) term and that is NaN (not a number). Of course you have the same cases in the opposite order for the Y term, when you hit \hat{y} = 0.

2 Likes