DLS Course 4 - Week 3: U-Net Image Segmentation Assignment

Turo · November 14, 2022, 4:23pm

Hi All!

I’m referring to subsection 2.2 of the Image Segmentation assignment, where in the definition of the process_path function there’s a line whose purpose I don’t understand:

mask = tf.math.reduce_max(mask, axis=-1, keepdims=True)

What exactly are we trying to achieve here? I am trying to have this U-Net implementation working with a different dataset (consisting in png images with 3 channels instead of 4 like in the assignment), but something goes wrong here.

Any help would be appreciated!

Cheers!

Turo

paulinpaloalto · November 14, 2022, 5:14pm

Look at the shape of mask before and after that statement. It turns out that the inputs here are PNG files and they have 4 channels: RGBA. But for the mask values, only one of the channels has a non-zero value. So that reduce_max just gives you one output channel with the actual mask value.

If your data is in some other format like JPEG your mileage may vary, although even if your masks are prepared in advance with only one channel to hold the mask value, that logic will do no harm. Other than wasting some computation and creating some potential confusion.

Turo · November 14, 2022, 5:51pm

Oh, I see, thank you! So in case we’re working with a dataset where the masks are defined over 3 RGB channels, would it make sense to set the previous line as:

msk = tf.image.decode_png(msk, channels = 1)

in order to have one grey-scale channel for the masks? Or do I have to adapt the architecture to the masks I’m using?

paulinpaloalto · November 14, 2022, 6:03pm

Yes, it looks like that would work, but you might want to try it and make sure it does the same thing as the reduce_max logic that was shown. Just to make sure their definition of greyscale doesn’t involve any other transformations. You want the mask values to be the “labels” for the types of objects in the image, right? So they should be predefined integer index values.

You can look through how the masks are handled in the rest of the code. It appears that they are handled as one channel images everywhere. Of course the point is you want to train your model to produce those masks as output.

I should say that I have no expertise or experience here beyond what they show us in this notebook. Note that training such a network is going to be computationally expensive.

Turo · November 14, 2022, 6:25pm

Doing so seems to do the trick, but the performance is rather poor. I’ll have a look into what’s the problem! Again, thank you!

Topic		Replies	Views
Week 3, Programming Assignment 2, section 2.2 Convolutional Neural Networks	2	351	September 26, 2023
Course 4 Week3: U Net Assignment Doubt: Details about Preprocessing Convolutional Neural Networks	5	610	June 27, 2023
(C4W3) Decoding_png does not work on masks Convolutional Neural Networks	16	414	August 11, 2023
Week 3 Image_segmentation_Unet Preprocess may have a bug Convolutional Neural Networks	2	571	August 1, 2021
Week 3, programming assignment 2: how does this code snippet work? Convolutional Neural Networks	2	485	September 5, 2022

DLS Course 4 - Week 3: U-Net Image Segmentation Assignment

Related topics