I am having issue to understand the cost function of softmax, I don’t have that much math knowledge, so here is my understanding of the softmax cost function:
This the specific loss function for each class.

The n summation is to check all the classes? so see which one is the true class?

the m summation is to adds up all the possibility to 1?

Then divide by m to get the average cost?

Then what does this indicator function part does in the middle of the formula?

This → in the denominator makes, for each sample, the predictions of all classes to sum up to one. This is the normalization factor.

This → , as you know, is the indicator function that does the picking for . With this in mind, if you further read more on indicator function, I believe you will finally find out what it does.

This → gives you the picked loss value for example i according to .

With a picked loss value for sample i, just sums over all loss values.