Logistic loss function - divide by zero encountered in log

Logistic Loss function formula

As the formula of logistic loss function is as shown above, when our prediction, f_wb is extremely close to 0 or 1, it inevitably results in np.log(0) and returns error code:

<ipython-input-128-06cade4fd647>:6: RuntimeWarning: divide by zero encountered in log
err = -y[i]*np.log(f_wb) - (1-y[i])*np.log(1-f_wb)
<ipython-input-128-06cade4fd647>:6: RuntimeWarning: invalid value encountered in multiply
err = -y[i]*np.log(f_wb) - (1-y[i])*np.log(1-f_wb)

What should be done when we come across this dilemma?

It is an interesting question! Of course in pure math terms, the output of sigmoid can never exactly equal 0 or 1. But we are dealing with the pathetic limitations of finite floating point representations here, not the abstract beauty of \mathbb{R}, so this can actually happen. Here’s a thread from DLS which discusses this point.

I’m not familiar with the assignments in MLS, so it’s also possible that I’m missing something here and you should not be hitting this case. But the above link applies in the general case …

Thanks for your prompt reply and it helps a lot!

Well, I didn’t come to this problem when I’m doing assignments in MLS, but when I was messing around with different data by using the methods taught in the course. So your general case explanation is indeed just what I needed. Thank you : )

The issue of y_pred being zero can arise when computing the natural logarithm log(y_pred). This is because the logarithm is undefined for values less than or equal to zero. To address this issue, a small positive value, called the “epsilon”, is added to y_pred to ensure that the logarithm is well-defined. The value of epsilon is usually a small constant such as 1e-8 or 1e-16.

Here is an updated implementation that adds epsilon to y_pred:
import numpy as np

def cross_entropy(y_true, y_pred, epsilon=1e-8):
    n = y_true.shape[0]
    y_pred = np.clip(y_pred, epsilon, 1 - epsilon)
    loss = -np.sum(y_true * np.log(y_pred) + (1-y_true) * np.log(1-y_pred)) / n
    return loss

where np.clip(y_pred, epsilon, 1 - epsilon) limits the values of y_pred between epsilon and 1 - epsilon to avoid the logarithm of zero.