The Cost function sign (- or +)

Hi

I have a question about the cost function sign.

Video: Explanation of Logistic Regression Cost Function (Optional) explained what the sign of cost function must be positive:

but in the notes and programming assignment, the cost function is negative

and also here

and the test doesn’t pass with a positive cost function, which one is correct?

Hi @Ali_Ghadimi. There appears to be a dropped minus sign on that first slise that you show. On the slide, Prof Ng is showing that cross-entropy loss can be derived from the maximum likelihood principal: given the data (assumed to be drawn from the “correct” distribution) what parameters are most likely to explain/predict the data. In other words, which parameters maximize the (log) likelihood function?

The underlying distribution in the (log) likelihood function (at top) is the Bernoulli (Binomial) distribution–the basic distribution for a weighted coin toss. In the AI disciplines, typically the problem is couched in terms of a loss function. What are the parameters most likely to minimize the loss associated with deviations from the actual data. Hence, the objective function is multiplied by -1 to turn it into a minimization problem.

Postscript: To my mind the script-L function \mathcal{L} is suggestive of “loss” and so should include the minus sign. Prof Ng goes another way, but drops the minus sign in the last line (assuming that this snapshot was not taken a second before the minus sign appears) as if he too fell to the ambiguity. You confusion is quite understandable. Paraphrasing LaPlace (I think), “half the battle of mathematics is the invention of a good notation.” :nerd_face:

1 Like

Right! The underlying point here is that the loss involves the logarithm of numbers between 0 and 1. Those logarithms are negative, so we multiply by -1 to get a positive cost value.

Prof Ng, explains here why we need to drop the minus sign, but in the lecture notes, the minus sign is there… and also in the code…

Yes, the slides are a bit confusing. But you just have to keep in mind what I said in my previous reply: the logarithms are all negative and we need the cost to be positive. So it’s just a question of where you put the minus sign: on the individual terms, inside the parens inside the sum, outside the summation (factored out) or incorporated into the definition of L(\hat{y}^{(i)}, y^{(i)}).