Clarification W2 Cost Function

MoHassan · May 8, 2021, 5:42am

Please, I need clarification for this part of the slide

Thanks in advance.

albertovilla · May 8, 2021, 8:39am

In order to calculate the total cost you are summing up two types of terms:

If y = 1, the term (1-y) part becomes 0, therefore the only important term is -log(y_hat)
If y = 0, then the only term we care about is -log(1-y_hat)

So you are summing either -log(y_hat) or -log(1-y_hat) depending on whether you have case 1 or 2. Then what’s important is to remember the properties of the logarithm function.

log(1) = 0, and logarithm of a value lower than 1 are increasing negative numbers, for example log(0.9) = -0.04, log(0.5) = -0.30
When we only have the -log(y_hat) term we want log(y_hat) to be large, so we also want y_hat to be large, because the closer to 1 that value is the closer to 0 it will be -log(y_hat).

Note that log(y_hat) large means that we will get a small negative number, in the example above -0.04 is larger than -0.3.

When we only have the term -log(1-y_hat) again we want log(1-y_hat) to be large in the same sense i.e. a small negative number, which implies we want the term 1-y_hat as high as possible and in consequence we want y_hat small, the lower y_hat the closer that 1-y_hat is closer to 1 and then log(1-y_hat) is closer to 0

I don’t know if it clearer now. Unfortunately discourse doesn’t support LaTex so the mathematical part doesn’t look as good as it should.

MoHassan · May 16, 2021, 5:15am

I understand the second case now, but concerning the first case, how to we get the output close to 1?

albertovilla · May 16, 2021, 3:38pm

Hi @MoHassan, I’m not sure if I understand what you mean with the output close to 1, do you refer to y_hat. We are trying to predict that y_hat so it matches the ground truth, that’s y so we are generating values between 0 and 1.

When y is 0 we want y_hat as close to 0 as possible and when y is 1 we want y_hat as close as possible to 1.

MoHassan · May 17, 2021, 6:29am

Yeah, but in both cases I see that we are outputting y_hat close to 0, so the prediction can’t be 1 in the first case.

albertovilla · May 17, 2021, 6:55am

In the first case, we want y_hat close to 1, so -log(y_hat) is close to 0.

In the second case, we want y_hat close to 0, so (1 - y_hat) is as close to 1 as possible and then -log(1-y_hat) is close to 0.

MoHassan · May 19, 2021, 4:46am

We want just y_hat to be 0 or 1, or the final output to be 0 or 1.
I understand that we want the final output.

albertovilla · May 19, 2021, 11:15am

Leaving aside the cost function the main concept is that ideally we want y_hat to be equal to the real y. If that was true for all y_i then we would manage to have a total cost of 0.

When you apply that concept to the formula of the cost you will see that the closest y_hat is to y the lower cost you will have for that particular sample. If you manage to have that for all samples you are minimizing the overall cost.

MoHassan · May 21, 2021, 7:23am

Okay got it,
Thanks so much.

Topic		Replies	Views
Logistic regression cost function - log y_hat large? Neural Networks and Deep Learning coursera-platform	5	530	February 18, 2023
Logistic Regression Cost Function Neural Networks and Deep Learning coursera-platform	1	719	May 12, 2021
Logistic regression loss function Neural Networks and Deep Learning week-2 , coursera-platform	2	18	November 30, 2024
Week 2 video 3 cost function Neural Networks and Deep Learning coursera-platform	7	483	August 17, 2023
Loss Function for logistic regression confusion Neural Networks and Deep Learning week-2 , coursera-platform	2	320	February 26, 2024

Clarification W2 Cost Function

Related topics