How softmax function associates output index with categories

Hi everybody,

I would like to understand how the softmax activaction function associates its output values to the different possible categories.

In the lectures and labs we tackled the handwritten digits problem where we tried to figure out what number does the image show. When we had only two digits (0 and 1) we used a threshold to distinguish between these two possible numbers using a sigmoid function. Now that we have 10 numbers, how does the algorithm recognize that the index of the values it returns are the possibility of it to be one of these 10 possible numbers?

For example, supposing that the digit is a “5”. How does the computation return the possibility of this number to be five in the 5th index of the output? This same question might be applied to the “clustering” lab exercise of this week as well, where we have 4 possible target outputs.

Maybe that’s quite a philosophical question and the order of things are sorted from the beginning, but I’m curious about it.

Thank you!

It is controlled by the “labels” you give to your data. All the input samples have a “label” which identifies which “class” the sample is an instance of. If you have C output classes, then those are the integer index values 0 to C - 1. Of course in the handwritten digit case, it’s pretty clear how you would want to assign those labels :grinning:, but you could be crazy and label “5” as index 1 and “6” as index 3. It’s up to you, but the point is you have to remember how you did the label assignments.

softmax outputs a set of probabilities for each sample, which is a vector with C elements indexed by the label values.