The lab says that we use sparse categorical cross entropy which briefly is just cross entropy made for integer classes. Now, since calculating the final prediction is non-trivial (computing an argmax across channels), how is Keras calculating the loss without it knowing exactly how the prediction is even coming? Since the ground truth y dim is (None, 96, 128, 1) and our model outputs a y_hat dim of (None, 96, 128, 23) and nowhere are we explicitly mentioning how predictions are made from these 23 dimensions, how is Keras calculating the predictions and in turn the loss? There seems to be a lot happening under the hood which is smart but I don’t quite understand it.
Thanks a lot for your help