In the programming assignment notebook “Image_segmentation_Unet_v2”, in methods
def process_path(image_path, mask_path):
img = tf.io.read_file(image_path)
img = tf.image.decode_png(img, channels=3)
img = tf.image.convert_image_dtype(img, tf.float32)
mask = tf.io.read_file(mask_path)
mask = tf.image.decode_png(mask, channels=3)
mask = tf.math.reduce_max(mask, axis=-1, keepdims=True)
return img, mask
def preprocess(image, mask):
input_image = tf.image.resize(image, (96, 128), method='nearest')
input_mask = tf.image.resize(mask, (96, 128), method='nearest')
input_image = input_image / 255.
return input_image, input_mask
I think this line may be a bug:
input_image = input_image / 255.
If you read doc for tf.image.convert_image_dtype(…), this method if used with dtype=tf.float32, will rescale values to [0, 1]. So it may not be needed again in “def preprocess”. I verify by reading back the dataset processed_image_ds and confirmed np.max(img) ~= 1/255. instead of expected value of 1.0. Unless somehow, UNet is done with 1/255^2 as normalization, i think this may be unintended.
In any case, i think the model may still train and perform well with this sort of rescaling. The only problem is if someone want to visualize directly the image coming out of dataset, and plt.imshow(…) will not render anything (total black?).