As I understood from ANG’s talk in Multiclass Classification section, there is a difference between Softmax Multiclass Classification and Classification with Multiple Outputs.
However, I feel that I do not understand why Softmax probability output layer cannot be interpreted as multiple outputs?
For example, you input an image with cat and dog. In this case the output Softmax vector will have have high probabilities for both Cat and Dog categories, right? If this is the case, then we can interpret those two outputs as two multiple outputs.
The softmax will ultimately choose one output not 2 present in the image, while Classification with Multiple Outputs will give all the classes present or not in the image!
As far as I understood from the optional labs SoftMax output a probability vector normalized in such a way that the sum its components is equal to 1. At this point it is up to you to decide what classes are present in the input x. Right?
Yes, that’s right, the sum is always one. A sum of probabilities will be 1, and because of this, softmax will suppress the other classes in favor of one class. The softmax is designed for inputs that are mutually exclusive and not present all in one image; otherwise, there will be a “conflict” at the output stage, and some of them need to be suppressed. And yes, the output is a vector of probabilities, and then the arg max over them is chosen.
Got it. Thank you for explaining this clearly.
I have played with the optional lab and see now how Softmax function exaggerates the most likely output and suppresses all the other outputs.