Firstly, it should be len(np.unique(training_labels)) == np.max(training_labels) + 1 and this has been fixed in my previous reply. Sorry about the typo.
As you noticed, labels start with 0. Since the maximum label is 24, there are 25 categories.
Number of units in the output dense layer should correspond to the number of categories. You don’t need to one-hot encode the labels since the sparse version of the loss takes care integer represented true labels.