Hi everybody,
I would like to understand how the softmax activaction function associates its output values to the different possible categories.
In the lectures and labs we tackled the handwritten digits problem where we tried to figure out what number does the image show. When we had only two digits (0 and 1) we used a threshold to distinguish between these two possible numbers using a sigmoid function. Now that we have 10 numbers, how does the algorithm recognize that the index of the values it returns are the possibility of it to be one of these 10 possible numbers?
For example, supposing that the digit is a “5”. How does the computation return the possibility of this number to be five in the 5th index of the output? This same question might be applied to the “clustering” lab exercise of this week as well, where we have 4 possible target outputs.
Maybe that’s quite a philosophical question and the order of things are sorted from the beginning, but I’m curious about it.
Thank you!
Matías