(C4W3) Decoding_png does not work on masks

I am testing masks the preprocessing stages provided in the 2nd assignment of C4 Wk3 on my own PING masks, namely

def process_path(image_path, mask_path):
img = tf.io.read_file(image_path)
img = tf.image.decode_png(img, channels=3)
img = tf.image.convert_image_dtype(img, tf.float32)

mask = tf.io.read_file(mask_path)
mask = tf.image.decode_png(mask, channels=0)
mask = tf.math.reduce_max(mask, axis=-1, keepdims=True)
return img, mask

When applying this code to my own ping mask images (grayscale, 7 classes, pixel values 0 to 6), the converted images are 0 at all pixels. This happens regardless of the “channels” parameter (set to 0 , 1 or 3), whether or not I use the reduce_max function, with or without keepdims = True.

The data checks earlier in the assignment, namely

img = imageio.imread(image_list[0])
mask = imageio.imread(mask_list[0])

show good “img” and corresponding “mask”. In other words, imread reads the mask files well, tf.io.decode does not. I tried a workaround using imread, but I could not get it to work for the dataset.
Please help, and thanks beforehand.

The indentation in the code you posted is not correct.

Have you tried printing some sample values from your decoded mask files both before and after the reduce_max calls?

Notice that in the notebook code, the input files both have 4 channels (RGBA) of which the A channel is always 255 and needs to be excluded. If your mask files are encoded as grayscale, meaning that they should have only 1 channel, then using either channels = 0 or 1 in decode_png call should have worked. So now your job is to figure out why it didn’t. My suggestion is to examine the actual values as the next step. Are you sure your mask files are encoded the way you think?

If you have a pair of files that you are willing to share here (one image and one mask) maybe we can run some experiments and learn something. If you prefer not to share those in a public way, you could DM them to me as attachments.

Thanks TMosh, but the change in indentation is an error of transcription to this page. The code is read and processed correctly. The result is all zeros

Thanks Paul. The reduce_max call has no effect on the values that emerge from “decode_png”, which are still 0.
If I use reduce_max, and examine the output values with the logic

for image, mask in image_ds.take(1):
#sample_image, sample_mask = image, mask
print(mask.shape)
(1024, 1024)

mask

here is what I get

<tf.Tensor: shape=(1024, 1024), dtype=uint8, numpy=
array([[0, 0, 0, …, 0, 0, 0],
[0, 0, 0, …, 0, 0, 0],
[0, 0, 0, …, 0, 0, 0],
…,
[0, 0, 0, …, 0, 0, 0],
[0, 0, 0, …, 0, 0, 0],
[0, 0, 0, …, 0, 0, 0]], dtype=uint8)>

If I do not apply reduce_max, the shape becomes
(1024,1024,1)
and the values are

mask

<tf.Tensor: shape=(1024, 1024, 1), dtype=uint8, numpy=
array([[[0],
[0],
[0],
…,
[0],
[0],
[0]],

   [[0],
    [0],
    [0],
    ...,
    [0],
    [0],
    [0]],

   [[0],
    [0],
    [0],
    ...,
    [0],
    [0],
    [0]],

   ...,

   [[0],
    [0],
    [0],
    ...,
    [0],
    [0],
    [0]],

   [[0],
    [0],
    [0],
    ...,
    [0],
    [0],
    [0]],

   [[0],
    [0],
    [0],
    ...,
    [0],
    [0],
    [0]]], dtype=uint8)>

(Again, the indentation error occurs upon transcription to this page.)

Here are image and mask. Both are .png in my storage.
The mask image looks black. It could be because the maximum pixel value is 6 (7 classes). Or, perhaps, the Discourse decoder is the same “tf.io.decode_png”, and does the same “zeroing” operation on the mask file,
Also, Discourse converts the image file, 0012.png, to jpeg.


It is recommended that you use the little “</>” tool to format any code that you “copy/paste” into a reply, so that the formatting doesn’t get screwed up by assuming it is just plain text.

If you are on a Windows system, you can right click the mask image and go to its property page. In the “Details” Tab, you can find its bit depth to be 16. Therefore, to read it properly by Tensorflow, you need to set the correct dtype:

image

Cheers,
Raymond

1 Like

Ok, I first took a look at your files using just straight matplotlib, with no TF funny business. Here’s my code:

# Experiment with Eduardo's image and mask
img = imageio.imread("./Eduardo/0012.png")
mask = imageio.imread("./Eduardo/0012Labeling.png")
print(f"type(img) {type(img)}")
print(f"img.shape {img.shape}")
print(f"type(mask) {type(mask)}")
print(f"mask.shape {mask.shape}")
fig, arr = plt.subplots(1, 2, figsize=(14, 10))
arr[0].imshow(img)
arr[0].set_title('Image')
arr[1].imshow(mask)
arr[1].set_title('Segmentation')

And here’s what I see when I run that:

type(img) <class 'imageio.core.util.Array'>
img.shape (1024, 1024)
type(mask) <class 'imageio.core.util.Array'>
mask.shape (1024, 1024)

So you can see that your files actually only have 2 dimensions. That’s probably what the problem is. They render correctly for me with the code shown above.

Here’s the code to fix that:

img = np.expand_dims(img, 2)
mask = np.expand_dims(mask, 2)
print(f"type(img) {type(img)}")
print(f"img.shape {img.shape}")
print(f"type(mask) {type(mask)}")
print(f"mask.shape {mask.shape}")

Running that gives this:

type(img) <class 'imageio.core.util.Array'>
img.shape (1024, 1024, 1)
type(mask) <class 'imageio.core.util.Array'>
mask.shape (1024, 1024, 1)

Now the next step is to try the TF method with and without the expand_dims and see if that explains the funky behavior.

One other funky thing I just noticed:

The dtype on your image files is uint8, but on the mask is uint16. Not sure if that will cause problems, but something to check.

print(f"type(img) {type(img)}")
print(f"img.shape {img.shape}")
print(f"img.dtype {img.dtype}")
print(f"type(mask) {type(mask)}")
print(f"mask.shape {mask.shape}")
print(f"mask.dtype {mask.dtype}")

That gives this:

type(img) <class 'imageio.core.util.Array'>
img.shape (1024, 1024, 1)
img.dtype uint8
type(mask) <class 'imageio.core.util.Array'>
mask.shape (1024, 1024, 1)
mask.dtype uint16

Actually maybe all my activity above about compensating for the 2D shape does not seem to be an issue with the TF version of things. I think it’s only the uint16 datatype on the mask and Raymond’s recipe fixes it. If I try this, both images render correctly for me:

img = tf.io.read_file("./Eduardo/0012.png")
img = tf.image.decode_png(img, channels=0)
img = tf.image.convert_image_dtype(img, tf.float32)

mask = tf.io.read_file("./Eduardo/0012Labeling.png")
mask = tf.image.decode_png(mask, channels=0, dtype=tf.dtypes.uint16)
    
print(f"type(img) {type(img)}")
print(f"img.shape {img.shape}")
print(f"img.dtype {img.dtype}")
print(f"type(mask) {type(mask)}")
print(f"mask.shape {mask.shape}")
print(f"mask.dtype {mask.dtype}")

fig, arr = plt.subplots(1, 2, figsize=(14, 10))
arr[0].imshow(img.numpy())
arr[0].set_title('Image')
arr[1].imshow(mask.numpy())
arr[1].set_title('Segmentation')

I won’t show the images, but here’s the text output:

type(img) <class 'tensorflow.python.framework.ops.EagerTensor'>
img.shape (1024, 1024, 1)
img.dtype <dtype: 'float32'>
type(mask) <class 'tensorflow.python.framework.ops.EagerTensor'>
mask.shape (1024, 1024, 1)
mask.dtype <dtype: 'uint16'>
1 Like

Raymond, I guess you solved the problem! But let me check if it works in my system…

Indeed, I changed the mask rank, from 2 to 3 and back, and it didn’t make a difference. I will check whether setting the correct dtype works in my code, and let you know. Many thanks to both mentors.

Setting the right dtype as prescribed by Raymond and Paul solved the problem. Many thanks to all. Next I will have somewhat more substantial questions on Semantic Segmentation.

I know one shouldn’t add comments to a “Solved” topic, but this might be of help. The mask images that caused this trouble were generated by Labelkit, a plugin of ImageJ, and output as PING files by ImageJ. The 16 bit depth was generated, without warning, by Labelkit, even though the images to segment were at 8 bit.
Why would the labeling utility generate masks at greater depth than the images, given that the labels are small integers, is beyond my comprehension.

Well, just because your pixels only have 256 possible values, that doesn’t mean that the number of types of objects represented in the picture necessarily has to be < 256. Just because they are greyscale images, that doesn’t limit the variety of what the picture can represent, right? Ansel Adams took pictures of quite a few different things in black and white. :laughing:

The point is that the tool has to support the arbitrary “general case”. But they could have built the tool to recognize that if the number of labels you actually use is representable in 8 bits, then they could have used uint8 as the output type in the generated file.

1 Like