In the programming assignment for the U-net part, we were already provided annotated data for training. But, if I wanted to train my data using U-net architecture, I need to annotate my image data set. But I don’t know how to proceed. One possible way is to use any photoshop software, but in a dataset consisting of 10,000 images, it would be an extremely painful process. So please can anybody help me on how to label data for semantic segmentation?
One doubt about data annotation related to semantic segmentation. For each training example in the data set, the true prediction will be the image which is simply a mask of that corresponding rgb image corresponding to each object or class right?
But when i tried to inspect the dataset provided by the course instructors, i see the images with complete black background and not objects. So is this a mask that contains the class lables with 1,2,3, … As pixel intensities?
I think you just aren’t rendering the labeled images correctly. They show you how to do that in the notebook. The labeled images (masks) are PNG files that have 4 channels, but the labels are on channel 0.
The dataset used in this exercise they tell us is the “CARLA self driving dataset”. Google will find you the CARLA website, although I spent some time poking around there and did not find any information about the dataset itself.
Have a more careful look at the files. Both the image files and the mask files are PNG files with 4 channels. In PNG format, you can either have 3 channel (RGB) images or 4 channel (RGBA) images where the A is “alpha” which is used by some more sophisticated graphics techniques. Our images here all have A = 0. In the case of the mask files, the object label for the pixel is the value on channel 0 (R) and the other channels are all 0. You can see how they handle that in the cell that shows sample image and mask files early in the notebook. Add some logic there to print the shapes and print out some sample channel values to see what is going on.
Yeah, I have played the masks, and finally found what I am looking for the true prediction. But couldn’t understand why 4 channelled images (.png) are used for masks. since a greyscale image (single channel) can be stored instead. Is it a matter of choice or any specific criteria for choosing .png over other formats like .jpg?