Hi @VladimirFokow, I think you do make a point here about the np.inf
problem, and I think you might also be interested in how popular ML packages take care of that while keeping the loss formula the way it is most well-known today which is -y\log(f)-(1-y)\log(1-f).
In tensorflow, if you trace through the source code from this beginning, you will end up seeing these lines:
output = tf.clip_by_value(output, epsilon_, 1. - epsilon_)
return -tf.reduce_sum(target * tf.math.log(output), axis)
The first line will clip the values of f and 1-f such that they will both always remain between epsilon_
and 1-epsilon_
, and epsilon_
is a very small number 1e-07
, so it avoids the np.inf
problem.
sklearn’s implementation has this line as well:
y_pred = np.clip(y_pred, eps, 1 - eps)
and their eps = 1e-15
.
So I believe it will be good if we stick with the formula the way it is, as it is indeed how popular implementation uses it, and just be consistent with the way we learn it.
I did the assignment as well and I didn’t encounter the np.inf
problem using the formula so I think it had been designed to avoid the problem that you described.