Logistic Regression Cost Function

Dear Mentor,

I would to like to make sure understanding is correct especially below values in for example correct or wrong

  1. For y=1 we want log y^ to be small and for example -logy^ = -0.000001

  2. For y=0, we want log1-y^ to be large and for example -log(1-Y^) = -100000000;


1 Like

Note that all the logarithms here are of numbers between 0 and 1 (the output of sigmoid), so they are negative. Here is a graph of log(z) for 0 < z < 1:

Of course we know that log(1) = 0 because e^0 = 1.

So in the case of y = 1, what Prof Ng means by making log(\hat{y}) as large as possible is to make it as far to the right on that graph as possible, meaning a negative number that is as close to 0 as possible. Which will make the loss value -log(\hat{y}) as close to 0 from the positive side as possible.

Then for the y = 0 case, the loss is -log(1 - \hat{y}). So to make that as small as possible, you want \hat{y} to approach 0, which makes -log(1 - \hat{y}) approach 0.