I am having a hard time understanding the logic in this.

If we want to minimise the loss function, why do we seek a large log(yhat)?

The way I understand it is that the best way to minimise it if y=1 we want yhat close to 1 right, which gives log(1)=0. Similar for when y=0.

I however dont understand the line of “want large(yhat)”.

Thank You

Yes, the way he explains this can be a bit confusing if you’re not paying really close attention. You have to track whether he’s talking about the raw logarithm values (which are negative) or the actual loss values (-1 times the log value). In the case in point, he literally means that log(\hat{y}) is larger, meaning closer to zero from the negative side. “Larger” meaning further to the right on the number line. That will make the actual loss value (-log(\hat{y})) smaller (closer to zero from the positive side). Because we are talking about a sample with a label of 1, that also means you want \hat{y} to be as large as possible, meaning as close to 1 as possible. Of course sigmoid values are never exactly equal to either 0 or 1.

1 Like

Hey @Fedros_Fieros,

Welcome to the community. Just to add to what Paul Sir has said, `yhat`

lies between 0 and 1. Now, when `y = 1`

, we want `yhat`

to be as close to 1 as possible, i.e., `yhat`

should be as large as possible or in other words `log(yhat)`

should be as large as possible. It’s simple with an example:

y = 1

yhat = 0 → 0.5 → 1

log(yhat) = -\infty → -1 → 0

As you can see, as `yhat`

approaches 1, `log(yhat)`

becomes larger and larger. I hope this helps.

Regards,

Elemento

Now I understand, Thank you

It does help, thank you Elemento