Hi All,
This is my understanding: The last layer of the U-net has the same (height x width) dimension and the number of channels equals to the number of classes we would like label in the picture, in this case we have 23 classes. Hence the output the U-net is (96, 128, 23), and value in each channel represents probability that a pixel belongs to a certain class .
But shouldn’t Y_train (mask) also have the same dimension? In our assignment, the y_train (or mask)'s dimension is only (96, 128, 1) and the third dimension shows a final label that a pixel belongs to, not probability. Then how does the model calculate loss function?