In the rock-paper-scissors example, I 'm just curious about why the output is always like [1,0,0], [0,1,0]… Shouldn’t the output from the softmax layer be the list of probabilities of the three different layers? I just can’t believe the model can have 100% confidence in the prediction of one class.
Yes. But after that, the 0’s and 1’s are the result of making predictions (selecting the highest output as the "True’ case, and all the others as False).
Is this operation done by the softmax layer itself automatically?
It depends on the exact model and parameters…