In section 2 Load and Split data, mask is read from mask_list.
it has a shape of (480, 640, 4).
What exactly do these dimensions mean. are the first two presumably the image dimensions, but what does the third dimension mean.
Specifically, further down, when we display the original image alongside the segmentation in line 9, the last dimension is set to 0 arr[1].imshow(mask[:, :, 0])
what does 0 mean? is this the class for the pixel or probability of object presence? what are the other 3 dimensions?
N = 2
img = imageio.imread(image_list[N])
mask = imageio.imread(mask_list[N]) #mask = np.array([max(mask[i, j]) for i in range(mask.shape[0]) for j in range(mask.shape[1])]).reshape(img.shape[0], img.shape[1])fig, arr = plt.subplots(1, 2, figsize=(14, 10))
arr[0].imshow(img)
arr[0].set_title(āImageā)
arr[1].imshow(mask[:, :, 0])
arr[1].set_title(āSegmentationā)
The mask images here are PNG files and the PNG format supports several different representations. Here they are using the 4-channel version where the channels are R, G, B and A (alpha). Alpha is to support the concept of āalpha blendingā. But here the masks are the ālabelsā for the data, right? For every pixel in the input image (480 x 640) we need the āsemantic labelā for that pixel which tells what it is: pedestrian, car, tree, drivable road surface, sidewalk, stop sign, yadda, yadda. So there is really only one of the indices of the third dimension that contain the actual label. If I have an array with 4 elements, the indexing works the same way it always does, right? myArray[0] is the first element, myArray[3] is the last element and so forth. So they are using only the first element along the third dimension.
Given that, itās unclear why they would waste the space of using the 4 channel representation, but whatever ā¦