Loss in semantic segmentation - C4 - Week 3

I still do not understand the loss function used for the semantic segmentation using U-Net.

Sparse categorical cross entropy receives a volume which is (None, Width, height, num_classes). It does so because this loss function makes use of tf.argmax() to turn it into a one-layer labeled image. Is that correct?

In the example, it is used only accuracy metric.

But what if I want to check IoU, Dice, and so on?
I tried, but it complains that the output of the model has a different shape.

Any clues?

Yes, the output of the model gives a softmax value at each pixel of the image. That’s the key point about image segmentation: the classification output is per pixel. So you use categorical cross entropy as always with softmax output, but the difference is that it is per pixel, not a single output for the entire image.