I am having issue to understand the cost function of softmax, I don’t have that much math knowledge, so here is my understanding of the softmax cost function:
This the specific loss function for each class.
The n summation is to check all the classes? so see which one is the true class?
the m summation is to adds up all the possibility to 1?
Then divide by m to get the average cost?
Then what does this indicator function part does in the middle of the formula?
This → in the denominator makes, for each sample, the predictions of all classes to sum up to one. This is the normalization factor.
This → , as you know, is the indicator function that does the picking for . With this in mind, if you further read more on indicator function, I believe you will finally find out what it does.
This → gives you the picked loss value for example i according to .
With a picked loss value for sample i, just sums over all loss values.