Doubt regarding the notation in Softmax Lab


The above Image is from Softmax Lab from Multiclass classification. In the indicator function or second equation in the image, what does the “n” represent. I am guessing it is value inbetween {1, N}?

Also, in third paragraph, the statement says that “only the line that corresponds to the target contributes to the loss, other lines are zero”, What does this mean, is this in regard to output probabilities where they are either 1 or 0?

image
The above equation which is part of last equation in the image above is it same as common cross entropy representation shown below
image

1 Like

Hello @tinted

Yes. You can also see it this way:

image

Look at the indicator function again:

image

For sample i, it has only one value for its label y^{(i)}, so even though the inner summation goes over from j=1 to j=N, only one of the j's will pass the indicator function, and that is when j = y^{(i)}. Such use of the indicator function is consistent with the definition of the cross-entropy loss

image

See, even though there are ten logs on the R.H.S., only one will be actually used.

What do you think? What is the definition for P(x) and Q(x)? How do you use them in the context here? You may read this section of the article up to the Remarks.

Cheers,
Raymond

I am thinking if we consider P(x) as label “y” matrix or theoretically it can be that indicator function as well and then substituting Softmax function directly to cost function (probably because of that “from_logits = True” parameter ) then it should be similar with cross entropy.

I have two more doubts,

  1. how are you typing those notations
    image
  2. I have seen some small negligable value being added to probability distributions so that calculating log(0) wont be a problem, does this small value has anything to do with round off errors for which we use “from_logits=True” or is it entirely just because we are storing and using float values.
    image

Alright. You described some steps but I am sure you can verify it yourself by actually writing down the steps on a piece of paper. Refer to that link if needed since it’s already a good example of how to argue and do the maths to convert the entropy function to the cross entropy for the logistic regression. I will just leave that to you :wink:

$y^{(i)}$

See this for more examples

We know 0 \times \log(0) = 0, however, a computer will complain or return the Not-A-Number token. To avoid that, we add a reasonably small number (e.g. log(0. + 1e-10)) to lower bound it. Btw, why don’t you just try np.log(0.) and 0.*np.log(0.)?

Cheers,
Raymond

1 Like