Hello, My question is: what is this Y indicating here—the ground truth or the prediction by the model?

Dear Noor Jamali,

For a given input, we always calculate the loss/error between the prediction (of the model) and the ground truth.

In week 3, we can see that the labels are already defined as red=0 and blue=1 against the given data.

Here, we just want the classifier to define regions as either red or blue.

Okay, got it Mam, Thanks

Sincerely

Noor Jamali

@Rashmi In this lecture training-a-softmax-classifier, can someone explain the intuition behind the loss function?

I’ve seen loss function written as L = -y*log(a) + (1-y)*log(1-a).

Thanks

The formula you show is the cross entropy loss function for a binary (yes/no) classification, in which you would have (of course) only two terms. The first term is the loss in the y = 1 case and the second is the loss for the y = 0 case.

When you generalize to a multiclass problem with more than 2 possible output classes, we switch from sigmoid for the output activation to softmax, which gives us a probability distribution across the various output classes for any given sample. So the loss function is just the generalization of the loss function for the binary case: it is still the same cross entropy calculation, but only one term will be selected by the y label value for each sample. It’s exactly the same formula as the one for the binary case if you think about it in that way.

I forget how much detail Prof Ng goes into in the lectures about explaining that. You should scan through those lectures again. Or there’s a nice lecture on YouTube by Prof Geoff Hinton covering softmax and the cross entropy loss function.

Or if your question is more basic about why the logarithm is used there, it comes from “maximum likelihood estimate” in statistics. Here’s a thread from mentor Raymond that gives a nice intuitive explanation with examples. And here’s a thread that shows the graph of log between 0 and 1.