So this is from the course (not deeplearning.ai)
This what I understood from it
In negative log-likelihood, we take log of the joint probabilities and the log probs function increased as probability increases, but usually joint probabilities are in fractions [0, 1]. So to convert everything in postive we mulitply by -1 and then the max log-value will now become minimum positive log-value. That is why we use np.argmin
. If \times -1 is not done, then we would have used np.argmax
I won’t say whether your version is correct or not, but I am happy to share my version with you.
We want to maximize the likelihood because it is an indication that the model fits the data well, and we are happy to maximize the log-likelihood because that will also maximize the likelihood. We want to minimize the negative log likelihood because that will maximize the log likelihood and as the screenshot said, it is the convention.
Therefore, if I formulate it as, and my model is able to be, mimizing the log-likelihood, it sticks with the convention and my model will fit the data well.
1 Like
Could you explain this using conventions np.argmax
and np.argmin
if possible?
Also in logistic regression loss function since we use log, and the values fall under range (0, 1) and the log of this range is always -ve, so that is why we multiply with -1 to get the absolute value, since -ve sign is guaranteed in the log. We could also abs()
but using -ve
is more intuitive to convey that it is negative log-likelyhood
Is that correct understanding?
Yes, in this case we are guaranteed that the log values are negative, so just multiplying by -1 is the better solution. Using the absolute value function is to be avoided if possible, because it’s not differentiable at 0. Of course we manage to get by with ReLU also being non-differentiable at 0 with no apparent issues, but differentiable is better than non-differentiable if you have the choice.
1 Like