Loss not fully using target?

kirsten_greed · December 13, 2022, 11:24pm

I am puzzling over this section of course 3

The red arrow indicates that loss involves the whole target value
The purple arrow indicates that loss only involves a the target as a toggle switch.

paulinpaloalto · December 14, 2022, 12:21am

But remember that the point is that every y^{(i)} value must be either 1 or 0 by definition, right? They are the “labels” for the data samples and every sample is either a “yes” or a “no”. So the loss is defined for every sample by selecting the relevant one of those two formulas. The point is that your goal for the f_{w,b}(x^{(i)}) value is different depending on whether the y^{(i)} value is 1 or 0.

Or maybe am I just missing your point.

kirsten_greed · December 14, 2022, 7:03am

Thank you
I understand the target will be either a 1 or a 0
Are you saying that the prediction will also be either a 1 or a 0?
So the loss will vary between 1, -1 , 0 ?

rmwkwok · December 14, 2022, 7:11am

The prediction is f_{w,b}(x^{(i)}) and it can take any value between 0 and 1. It is between 0 and 1 because we use the Sigmoid function.

paulinpaloalto · December 14, 2022, 11:25am

Right! Which means the loss will be between 0 and +\infty.

Here’s a thread from DLS which discusses the cross entropy loss in more detail and shows the graph of the log function between 0 and 1.

kirsten_greed · December 15, 2022, 12:33am

Thank you.
What scenario is the loss = + \infty ?

paulinpaloalto · December 15, 2022, 12:57am

If the y label is 1 and the \hat{y} prediction value is exactly 0. Or when the y label is 0 and the \hat{y} prediction value is exactly 1. Because as you can see from the two formulas, in either of those cases you get -log(0) as the loss and log(0) = -\infty. Of course notice that from a mathematical standpoint, the output of sigmoid is never exactly 0 or 1, it only approaches them asymptotically. So you could say that the loss is never really infinite in mathematical terms, but we are doing everything in floating point here so the values can actually “saturate” and round to 0 or 1. If that actually happens, you end up with NaN (Not a Number) as the cost value because of the rules for propagating infinite values in IEEE 754 floating point.

Sorry, I may be using different notation than they use in MLS. In DLS, the output of a model for a particular sample is called \hat{y}. So to put both the MLS and DLS notation together we have:

\hat{y}^{(i)} = f_{w,b}(x^{(i)})

paulinpaloalto · December 15, 2022, 4:00pm

The more mathematically correct way to say this is:

0 < loss < \infty

so the loss is never infinite in mathematical terms, but it can be arbitrarily large. You can always make the prediction worse (farther from the correct answer), although that is never the goal of course. Note that the loss can never be exactly zero either, but that is fine. The way we interpret the predictions is by comparing them to 0.5. If \hat{y} is > 0.5, then we consider that a “yes” answer and “no” otherwise. So the model can achieve 100% accuracy w.r.t. the labels on the samples without the loss actually being 0.

The point of the loss function is that it is the basis for “back propagation” which allows the model to be trained to produce correct answers. The derivatives of the loss tell the algorithm which direction to move the parameter values in order to improve the results and that’s where the learning happens.

kirsten_greed · December 15, 2022, 7:43pm

Thank you.
What are MLS and DLS?
I notice on my calculator Log(0) gives a domain error. I guess that is another way of saying NaN ?

paulinpaloalto · December 15, 2022, 10:26pm

MLS is the Machine Learning Specialization, which it looks like you are taking. DLS is the Deep Learning Specialization, which is perhaps the next set of courses to take after MLS. MLS gives you a survey of lots of different types of ML algorithms. DLS focusses specifically on Deep Neural Networks and really goes into a lot of detail on how they work, what kind of problems they can solve and how to build such solutions.

My guess is what your calculator means by “domain error” is that they consider the “domain” of the function log to be only positive numbers (strictly greater than zero). So 0 is not in the domain of that function as they define it. “domain” is the term mathematicians use for the set of all possible inputs to a given function. The “range” of the function is that set of all possible output values from the function.

Topic		Replies	Views
Loss Function of Week 3 Neural networks topic Neural Networks and Deep Learning coursera-platform	4	647	February 12, 2024
Week 2 video 3 cost function Neural Networks and Deep Learning coursera-platform	7	483	August 17, 2023
Loss function for logistic regression Neural Networks and Deep Learning coursera-platform	2	605	December 28, 2021
Week 2 / video 3 Neural Networks and Deep Learning coursera-platform	4	536	June 18, 2022
Cost function for logisitic regression : Has Andrew made a mistake here? Supervised ML: Regression and Classification week-3	8	81	February 12, 2025

Loss not fully using target?

Related topics