The above Image is from Softmax Lab from Multiclass classification. In the indicator function or second equation in the image, what does the “n” represent. I am guessing it is value inbetween {1, N}?

Also, in third paragraph, the statement says that “only the line that corresponds to the target contributes to the loss, other lines are zero”, What does this mean, is this in regard to output probabilities where they are either 1 or 0?

The above equation which is part of last equation in the image above is it same as common cross entropy representation shown below

For sample i, it has only one value for its label y^{(i)}, so even though the inner summation goes over from j=1 to j=N, only one of the j's will pass the indicator function, and that is when j = y^{(i)}. Such use of the indicator function is consistent with the definition of the cross-entropy loss

See, even though there are ten logs on the R.H.S., only one will be actually used.

I am thinking if we consider P(x) as label “y” matrix or theoretically it can be that indicator function as well and then substituting Softmax function directly to cost function (probably because of that “from_logits = True” parameter ) then it should be similar with cross entropy.

I have two more doubts,

how are you typing those notations

I have seen some small negligable value being added to probability distributions so that calculating log(0) wont be a problem, does this small value has anything to do with round off errors for which we use “from_logits=True” or is it entirely just because we are storing and using float values.

Alright. You described some steps but I am sure you can verify it yourself by actually writing down the steps on a piece of paper. Refer to that link if needed since it’s already a good example of how to argue and do the maths to convert the entropy function to the cross entropy for the logistic regression. I will just leave that to you

We know 0 \times \log(0) = 0, however, a computer will complain or return the Not-A-Number token. To avoid that, we add a reasonably small number (e.g. log(0. + 1e-10)) to lower bound it. Btw, why don’t you just try np.log(0.) and 0.*np.log(0.)?