I have one question about the the output layer of a neuron network. It is said that the number of neurons in the last layer should match the number of classes you are classifying for, i.e if you have 10 classes, the output layer should have 10 neurons.
In fact, I tried different number the it. If the number of neurons for the output layer is smaller than the number of classes, I would get error message but if the number of neurons for the output layer is greater than the number of classes, I would not get any errors and it worked.
Can anyone explain for me?
The output of a multiclass network is input to the softmax activation function and then the “cross entropy” loss function is applied to compute the cost. If you have a dataset with 10 classes and you use a softmax output layer with, say, 13 classes, it doesn’t really do that much harm at least in terms of the prediction accuracy of your model. You’ll have 3 labels that never occur: there are literally no samples that have those values as labels. That means if the network predicts one of those values for a particular sample, the cost function will punish that heavily, because it’s obviously a wrong answer. So assuming that you’ve made good choices for all your other hyperparameters, the trained model you get should never predict those three “extra” classes.
So it should do no harm to the accuracy of your model, but it also does you no good and just wastes memory space and compute cycles. Your training will run slower and it has no other benefit, so it is recommended that you define your output layer correctly.