The output of a multiclass network is input to the softmax activation function and then the “cross entropy” loss function is applied to compute the cost. If you have a dataset with 10 classes and you use a softmax output layer with, say, 13 classes, it doesn’t really do that much harm at least in terms of the prediction accuracy of your model. You’ll have 3 labels that never occur: there are literally no samples that have those values as labels. That means if the network predicts one of those values for a particular sample, the cost function will punish that heavily, because it’s obviously a wrong answer. So assuming that you’ve made good choices for all your other hyperparameters, the trained model you get should never predict those three “extra” classes.
So it should do no harm to the accuracy of your model, but it also does you no good and just wastes memory space and compute cycles. Your training will run slower and it has no other benefit, so it is recommended that you define your output layer correctly.