Then why do we specify the number of channels as 3? Also, what exactly are we trying to do with the masked images in the preprocess_path function(i.e. what how does tf.math.reduce_max(mask, axis=-1, keepdims=True) help/how will getting the maximum along the last axis help us?)
We’re effectively getting rid of the alpha channel by decoding the png to just three channels since we don’t need transparency information for what we’re doing.
While U-Net is fully convolutional and can use any sized image, we’d definitely want to consider performance, the assignment uses 96x128 for that particular reason from my understanding.
I will get back to you on why we’re doing the max in a bit.
Now for the max used in the mask we just need one channel rather than all three channels. So what we’re doing here is taking the highest value in any of the rgb channels for any particular pixel (this is why we’re doing the max along the last axis).
Do I understand correctly, that for the mask images, we select the highest value regardless of the color R, G, or B. Why do we need to reduce the dimension from 3 channels to just 1? Does this create a risk of overlap? e.g. two classes having the same highest “R” value but different G and B values?
I had the same question. Unless I am missing something, there appears to be an implicit assumption about the RGB color values of the mask. For example a Purple mask and an Orange mask might both have the same red value, and if that was the max value for each of those colors then I think that it would produce the same class.