Have I understood negative log-likelihood correctly?

tbhaxor · May 15, 2023, 7:39pm

So this is from the course (not deeplearning.ai)

This what I understood from it

In negative log-likelihood, we take log of the joint probabilities and the log probs function increased as probability increases, but usually joint probabilities are in fractions [0, 1]. So to convert everything in postive we mulitply by -1 and then the max log-value will now become minimum positive log-value. That is why we use np.argmin. If \times -1 is not done, then we would have used np.argmax

rmwkwok · May 16, 2023, 10:19pm

I won’t say whether your version is correct or not, but I am happy to share my version with you.

We want to maximize the likelihood because it is an indication that the model fits the data well, and we are happy to maximize the log-likelihood because that will also maximize the likelihood. We want to minimize the negative log likelihood because that will maximize the log likelihood and as the screenshot said, it is the convention.

Therefore, if I formulate it as, and my model is able to be, mimizing the log-likelihood, it sticks with the convention and my model will fit the data well.

tbhaxor · June 2, 2023, 4:47am

Could you explain this using conventions np.argmax and np.argmin if possible?

tbhaxor · June 2, 2023, 4:49am

Also in logistic regression loss function since we use log, and the values fall under range (0, 1) and the log of this range is always -ve, so that is why we multiply with -1 to get the absolute value, since -ve sign is guaranteed in the log. We could also abs() but using -ve is more intuitive to convey that it is negative log-likelyhood

Is that correct understanding?

paulinpaloalto · June 18, 2023, 6:12pm

Yes, in this case we are guaranteed that the log values are negative, so just multiplying by -1 is the better solution. Using the absolute value function is to be avoided if possible, because it’s not differentiable at 0. Of course we manage to get by with ReLU also being non-differentiable at 0 with no apparent issues, but differentiable is better than non-differentiable if you have the choice.

Topic		Replies	Views
What is the function used in optimising log-loss? Calculus for Machine Learning and Data Science week-1	2	381	August 22, 2023
Understanding negative loss function Neural Networks and Deep Learning	4	584	June 30, 2023
Why in the formula we are multiplying by - sign? Supervised ML: Regression and Classification week-3	4	492	January 11, 2023
Logistic Regression Cost Function Neural Networks and Deep Learning	1	717	May 12, 2021
Difference between maximum and negative log likelihood loss functions? AI Discussions	0	52	February 19, 2023

Have I understood negative log-likelihood correctly?

Related topics