Problems training a different dataset. Week 3 Image Segmentation with U-Net Assignment

Hello!
Is there any problem if I try to train the U-Net Image Segmentation model introduced in Weeks 3 assignment on a different dataset?

The dataset I’m trying to train my model with is “cityscapes_data”. This dataset has got 3475 images (RGB images and Mask images). There are 30 different classes defined. The major difference between this dataset and the one introduced in the assignment (CARLA dataset) is that the Mask images do not have 4 number of channels, they just have 3 (RGB).

The only part of the code I have changed is the unet_model function (now the number of classes defined is n_classes = 30 instead of n_classes = 23), but I guess I should change lots of parts of the code, such as the preprocessing image part.

Does someone know what things I should change in order to be able to run the same U-Net funtion defined in our notebook? Moreover, is there any way I could transform the cityscapes_data Mask images to the same format (4 number of channels) as the CARLA Mask images dataset?
I would appreciate any help.
Thanks a lot!
Sara.

I took a look at the CARLA mask files and it looks to me like they are just PNG images with three channels and the label value is the first channel and the other two are zero. If you take a look at the logic in the U-net notebook, it just does reduce_max across the channels dimension, so you end up with 480 x 640 x 1 and then it downsizes everything to 96 x 128 x 1. Because there are 22 distinct labels all the actual pixel values for the masks are between 0 and 22. Are your masks formatted similarly but with 30 possible labels? If so, then it sounds changing n_classes should be enough to get things to work.

Hello.
Let me show you an example of the dataset I am trying to use. It has got 3475 images (RGB images with their respectives Mask images). Here is an example:
2976
2976

On the one hand, CARLA RGB files are tensors of shape=(480, 640, 3) and dtype=uint8. My dataset images have the same characteristics. However, CARLA mask files are black instead of coloured. I guess that is why once you decode those images, the resulting tensor has got shape=(480, 640, 4) and dtype=uint8, as you can see in the next image.

I don’t really know how this images are treated and what is their encoding. Is there any way I can tranform my images to get the same characteristics as de images in CARLA dataset?

Moreover, just changing the parameter n_channels to 30, I receive the next error:

Thanks for your help and interest.
Sara.

I think you are somehow misinterpreting the CARLA mask files. Why did you specify channels = 4 on the “image decode” there? The logic they gave us in the notebook uses three channels. Here is some logic that I created based on the code in the notebook to probe what is in the mask files:

for image, mask in dataset.take(1):
    print(image)
    print(mask)
    img = tf.io.read_file(image)
    print(f"type(img) {type(img)}")
    img = tf.image.decode_png(img, channels=3)
    print(f"img.shape {img.shape}")
    img = tf.image.convert_image_dtype(img, tf.float32)

    mask = tf.io.read_file(mask)
    print(f"type(mask) {type(mask)}")
    mask = tf.image.decode_png(mask, channels=3)
    print(f"mask.shape {mask.shape}")
    print(f"{mask[200,200:204,:]}")
    mask = tf.math.reduce_max(mask, axis=-1, keepdims=True)
    print(f"mask.shape {mask.shape}")
    print(f"{mask[200,200:204,:]}")
    maskmin = tf.math.reduce_min(mask, axis=None, keepdims=False)
    maskmax = tf.math.reduce_max(mask, axis=None, keepdims=False)
    print(f"maskmin {maskmin} maskmax {maskmax}")

Give that a try and see what you get. My guess is that the “image decode” synthesized the 255 value in the last channel. Or you could try saying channels = 5 and see if you get two added dimensions with value 255.

I haven’t looked at your second set of evidence yet. I think the first step is to get the data interpreted correctly. It’s all hopeless until we get that straightened out. :scream_cat:

No, sorry, I’m wrong about that: there do seem to be 4 channels in both the image and mask files in the CARLA dataset, but the last channel seems to always have the value 255 for both the images and the masks. So I don’t know what that represents, but it is the case that the logic they give us in the notebook seems to ignore the 4th channel of both the images and the masks. I suggest you do the same, until we figure out what the meaning of that 4th channel is supposed to be.

Ok, it looks like this is a “PNG thing”: in PNG format, they support several different pixel formats, but one of the common ones is RGBA with 8 bits for each channel. The last channel is “Alpha” which gives the “Transparency” value. In this case they appear not to actually take advantage of that and just leave the alpha values at 255.

So I believe the correct thing to do is just to ignore the Alpha value, which (as I observed in my earlier reply) is what the code in the notebook does.

Just for reference, I found that information on the Wikipedia page about PNG format.

I added some more logic to my code shown above to check that all the alpha values really are 255 at least in the one image. It also counts the number of pixels with label == 13 in the mask and it’s about 1/3 of the total pixels FWIW:

for image, mask in dataset.take(1):
    print(image)
    print(mask)
    img = tf.io.read_file(image)
    print(f"type(img) {type(img)}")
    img = tf.image.decode_png(img, channels=4)
    print(f"img.shape {img.shape}")
    print(f"{img[200,200:204,:]}")
    print(f"alpha !255 = {tf.math.reduce_sum(tf.cast(img[:,:,3] != 255, tf.float32))}")
    img = tf.image.convert_image_dtype(img, tf.float32)

    mask = tf.io.read_file(mask)
    print(f"type(mask) {type(mask)}")
    mask = tf.image.decode_png(mask, channels=4)
    print(f"mask.shape {mask.shape}")
    print(f"{mask[200,200:204,:]}")
    print(f"alpha !255 = {tf.math.reduce_sum(tf.cast(mask[:,:,3] != 255, tf.float32))}")
    # Remove alpha channel
    mask = mask[:,:,0:3]
    mask = tf.math.reduce_max(mask, axis=-1, keepdims=True)
    print(f"mask.shape {mask.shape}")
    print(f"{mask[200,200:204,:]}")
    print(f"mask == 13 = {tf.math.reduce_sum(tf.cast(mask[:,:,0] == 13, tf.float32))}")
    maskmin = tf.math.reduce_min(mask, axis=None, keepdims=False)
    maskmax = tf.math.reduce_max(mask, axis=None, keepdims=False)
    print(f"maskmin {maskmin} maskmax {maskmax}")