Trouble understanding the UNet loss Keras


The lab says that we use sparse categorical cross entropy which briefly is just cross entropy made for integer classes. Now, since calculating the final prediction is non-trivial (computing an argmax across channels), how is Keras calculating the loss without it knowing exactly how the prediction is even coming? Since the ground truth y dim is (None, 96, 128, 1) and our model outputs a y_hat dim of (None, 96, 128, 23) and nowhere are we explicitly mentioning how predictions are made from these 23 dimensions, how is Keras calculating the predictions and in turn the loss? There seems to be a lot happening under the hood which is smart but I don’t quite understand it.

Thanks a lot for your help

Hi devashishd,

I am not sure if I understand your question correctly, but let me try to answer what I think I understand.

The loss is only calculated during training, it is not updated after prediction. So during training, the value of y is known, loss is calculated, and parameters are adjusted. During prediction, the parameters are frozen so that y_hat can be calculated and argmax can be used to predict.

As I indicated, I am not sure if this answers the question you asked. Please let me know if and what I understood wrongly.