Week 2 video 3 cost function

i200660_Mirza_Ubaidu · August 17, 2023, 3:24pm

Correct me if i am wrong. In the video, it is explained that the loss function should be a minimum. Now, it is also said that when y = 1, loss function = - log y_hat. To minimize this function, we want the value of y_hat to be largest (closest to 1) because the negative sign will make it smallest as possible. But my question here is that the negative sign simply represents the direction of difference between the actual and predicted. So if we have y_hat close to 1, then our loss function will be approximately -1. But this is bad no? because we want our loss to be 0 not -1. Because in this case 0 < -1

saifkhanengr · August 17, 2023, 3:46pm

Let say Y_{hat} is 0.99 and Y is 1. So, what is the value of log(0.99)? It’s a negative value, so, the cost value will be -(negative value) = positive value.

I don’t think this statement is true. Negative sign do not represent the direction.

paulinpaloalto · August 17, 2023, 4:58pm

Right! The point is that our \hat{y} values are between 0 and 1. Take a look at the graph of the natural log function and you’ll see that it is negative for the domain (0, 1). The range of the function on that domain is (-\infty, 0). So we need to multiply by -1 to get a positive value for the cross entropy loss.

Here’s a nice explanation from Raymond of cross entropy loss.

Roee_Ben_Shoshan · August 17, 2023, 5:47pm

It probably comes from learning physics.
In physics this statement is actually true, the sign of a number such as force, speed, acceleration etc, represents the direction it pulls/push/accelerate.

paulinpaloalto · August 17, 2023, 6:43pm

But the equivalent of “force” in this instance is the derivative of the cost, right? So in that case, the direction is expressed by the sign. But just the cost is always positive.

i200660_Mirza_Ubaidu · August 17, 2023, 6:48pm

lets try to break this down. The loss function is computed on a single data point with y_hat i.e the predicted outcome and y i.e the actual outcome. Suppose the model fits perfectly on a data point, i.e if y = 1(as we know y can only be 1 or 0), and y_hat is also approximately 1, then the loss function: - (ylog(y_hat) + (1-y)log(1-y_hat)) also known as the binary cross entropy function becomes 0. The same case is also true if both y and y_hat are 0.

So, the value of 0, is the best ideal value that we want for the loss function correct? We would want to tune our parameters in such a way that they yield 0 loss or close to 0 loss.

paulinpaloalto · August 17, 2023, 6:54pm

Yes, that sounds correct.

i200660_Mirza_Ubaidu · August 17, 2023, 7:13pm

Ok i understand it now. I plotted the graph of log and then it became apparent to me. For a moment i forgot that the value of y_hat cannot exceed 1. And when it reaches 1(in the case that y is also 1), the loss function becomes 0. Thanks community!

Topic		Replies	Views
The Cost function sign (- or +) Neural Networks and Deep Learning coursera-platform	5	559	February 5, 2022
Logistic regression loss function Neural Networks and Deep Learning week-2 , coursera-platform	2	18	November 30, 2024
Why in the formula we are multiplying by - sign? Supervised ML: Regression and Classification week-3	4	495	January 11, 2023
Logistic regression cost function - log y_hat large? Neural Networks and Deep Learning coursera-platform	5	530	February 18, 2023
Logistic Regression Cost Function Neural Networks and Deep Learning coursera-platform	1	719	May 12, 2021

Week 2 video 3 cost function

Related topics