Week 3, Programming Assignment 2, section 2.2

Dear community,

I am currently completing the Programming assignment of Week 3. In section 2.2 I see a the following piece of code:

def process_path(image_path, mask_path):
img = tf.io.read_file(image_path)
img = tf.image.decode_png(img, channels=3)
img = tf.image.convert_image_dtype(img, tf.float32)

mask = tf.io.read_file(mask_path)
mask = tf.image.decode_png(mask, channels=3)
**mask = tf.math.reduce_max(mask, axis=-1, keepdims=True)**
return img, mask

def preprocess(image, mask):
input_image = tf.image.resize(image, (96, 128), method=‘nearest’)
input_mask = tf.image.resize(mask, (96, 128), method=‘nearest’)

return input_image, input_mask

image_ds = dataset.map(process_path)
processed_image_ds = image_ds.map(preprocess)

I understand everything about this code, the only thing I don’t get is why are we applying the reduce_max function on the last axis to the mask… (line in bold \ within the **). Am I missing something?

It’s sufficient to represent classes as the 0th channel of the mask to show up properly on the screen when using plt.imshow.

Since there are 23 classes in total for the assignment, the 1st channel can take on values in range [0, 22]. When you reduce_max along the last axis, the image mask goes from shape (480, 640, 3) (which can be used for display purpose) to shape (480, 640) which can be be used as the class labels when using sparse categorical crossentropy as the loss function in training the unet model.

Got it, thank you!