Greetings!
What is the idea behind the highlighted line of code below?
I see that mask image also has 3 channels (RGB), but why do we use the max value of all the 3 original channels?
Thank you.
Greetings!
What is the idea behind the highlighted line of code below?
I see that mask image also has 3 channels (RGB), but why do we use the max value of all the 3 original channels?
Thank you.
This line of code takes the maximum mask value among all 3 RGB channels, to obtain a mask (greyscale) with 1 channel.
I see.
But will this method mail for the following two pixels of the image mask?
[255, 0, 0] and [255, 1, 2] # [R,G,B]
As I understood, these two pixels originally corresponding to the different classes will be recognized as a single class. Right?
I don’t remember the assignment now, but at that code line, you will have a mask result of 255 for both!
I did.
However I still haven’t got the idea…
img = tf.io.read_file(image_path) # Reads the image file as bytes
img = tf.image.decode_png(img, channels=3) # Decodes PNG to a tensor with 3 color channels (RGB)
img = tf.image.convert_image_dtype(img, tf.float32) # Converts to float32 and scales values to [0, 1]
–
Keep Learning AI with DeepLearning.AI - Girijesh
I recommend you try some experiments to see what that line of code does.
Hello, @AKazak,
Here is how I looked into that:
I added the code cell above right after the notebook’s 4th code cell which is sampling the unmasked and masked version of an image in the dataset.
First, if you check, you will find that each mask has 4 channels and if you list all unique values for each channel, you will see that the 2nd to the 4th channels are not informative because they have the same values across all pixels.
Then you can look at the part of the code you shared in this thread, after your own investigation, you would find that the 4th channel is simply ignored, leaving only the first 3 for our consideration on what the reduce_max was doing.
If, for all masks, the 2nd and the 3rd channel were always zeros, then taking the reduce_max is no different from ignoring them, and this makes sense because it only leaves the first channel which represents the class labels.
However, are the 2nd and the 3rd channels always zeros? That will be for you to confirm, because I am only demonstrating how I would look into that, so it is just a starting
![]()
Keep learning, and cheers,
Raymond
Thank you for pushing this forward!
This is exactly what I notice. So we have out 4 channels only 1 channel containing the valuable information. If this is the case, then what is the point of running reduce_max?
Taking just a 0th channel seems to be more logical in this case.
If it is always that the 0th channel contains the labels, then we can, as you said, just index it. These are just alternatives for achieving the same goal ;), although I prefer indexing it more, too.
Cheers,
Raymond
I think that , in general case, among all the alternatives one should give a preference to the most simple and straightforward one.
Yes, I agree with you!