I am having a hard time understanding the logic in this.
If we want to minimise the loss function, why do we seek a large log(yhat)?
The way I understand it is that the best way to minimise it if y=1 we want yhat close to 1 right, which gives log(1)=0. Similar for when y=0.
I however dont understand the line of “want large(yhat)”.
Thank You
Yes, the way he explains this can be a bit confusing if you’re not paying really close attention. You have to track whether he’s talking about the raw logarithm values (which are negative) or the actual loss values (-1 times the log value). In the case in point, he literally means that log(\hat{y}) is larger, meaning closer to zero from the negative side. “Larger” meaning further to the right on the number line. That will make the actual loss value (-log(\hat{y})) smaller (closer to zero from the positive side). Because we are talking about a sample with a label of 1, that also means you want \hat{y} to be as large as possible, meaning as close to 1 as possible. Of course sigmoid values are never exactly equal to either 0 or 1.
1 Like
Hey @Fedros_Fieros,
Welcome to the community. Just to add to what Paul Sir has said, yhat
lies between 0 and 1. Now, when y = 1
, we want yhat
to be as close to 1 as possible, i.e., yhat
should be as large as possible or in other words log(yhat)
should be as large as possible. It’s simple with an example:
y = 1
yhat = 0 → 0.5 → 1
log(yhat) = -\infty → -1 → 0
As you can see, as yhat
approaches 1, log(yhat)
becomes larger and larger. I hope this helps.
Regards,
Elemento
Now I understand, Thank you
It does help, thank you Elemento