Image_segmentation_Unet_v2 mask dimension 0 meaning

In section 2 Load and Split data, mask is read from mask_list.
it has a shape of (480, 640, 4).

What exactly do these dimensions mean. are the first two presumably the image dimensions, but what does the third dimension mean.
Specifically, further down, when we display the original image alongside the segmentation in line 9, the last dimension is set to 0 arr[1].imshow(mask[:, :, 0])

what does 0 mean? is this the class for the pixel or probability of object presence? what are the other 3 dimensions?

N = 2
img = imageio.imread(image_list[N])
mask = imageio.imread(mask_list[N])
#mask = np.array([max(mask[i, j]) for i in range(mask.shape[0]) for j in range(mask.shape[1])]).reshape(img.shape[0], img.shape[1]) fig, arr = plt.subplots(1, 2, figsize=(14, 10))
arr[0].imshow(img)
arr[0].set_title(‘Image’)
arr[1].imshow(mask[:, :, 0])
arr[1].set_title(‘Segmentation’)

The mask images here are PNG files and the PNG format supports several different representations. Here they are using the 4-channel version where the channels are R, G, B and A (alpha). Alpha is to support the concept of “alpha blending”. But here the masks are the “labels” for the data, right? For every pixel in the input image (480 x 640) we need the “semantic label” for that pixel which tells what it is: pedestrian, car, tree, drivable road surface, sidewalk, stop sign, yadda, yadda. So there is really only one of the indices of the third dimension that contain the actual label. If I have an array with 4 elements, the indexing works the same way it always does, right? myArray[0] is the first element, myArray[3] is the last element and so forth. So they are using only the first element along the third dimension.

Given that, it’s unclear why they would waste the space of using the 4 channel representation, but whatever …

1 Like

Not clear why the assignment authors took this approach, but thank you for taking the time to look into my question.