C4W3: Data Annotation for Semantic segementation

Hello,

In the programming assignment for the U-net part, we were already provided annotated data for training. But, if I wanted to train my data using U-net architecture, I need to annotate my image data set. But I don’t know how to proceed. One possible way is to use any photoshop software, but in a dataset consisting of 10,000 images, it would be an extremely painful process. So please can anybody help me on how to label data for semantic segmentation?

Thank you

Regards
Shravista Kashyapo

Creating a new labeled data set is very expensive and time-consuming. It is often the most challenging part of a machine learning solution.

One common old-school method is to use crowd-sourced volunteers (or grad students) to manually record the labels.

For some tasks there are automation tools available: Here is one example:

The cost of labeling a new data set is why there are many commonly-used standard data sets.

Ok, thanks for your help.

One doubt about data annotation related to semantic segmentation. For each training example in the data set, the true prediction will be the image which is simply a mask of that corresponding rgb image corresponding to each object or class right?

But when i tried to inspect the dataset provided by the course instructors, i see the images with complete black background and not objects. So is this a mask that contains the class lables with 1,2,3, … As pixel intensities?

I think you just aren’t rendering the labeled images correctly. They show you how to do that in the notebook. The labeled images (masks) are PNG files that have 4 channels, but the labels are on channel 0.

The dataset used in this exercise they tell us is the “CARLA self driving dataset”. Google will find you the CARLA website, although I spent some time poking around there and did not find any information about the dataset itself.

Ok, I think I overlooked the data. So, this means, the labelled data is simply “n” channel array, where each channel represents an object class right?

Have a more careful look at the files. Both the image files and the mask files are PNG files with 4 channels. In PNG format, you can either have 3 channel (RGB) images or 4 channel (RGBA) images where the A is “alpha” which is used by some more sophisticated graphics techniques. Our images here all have A = 0. In the case of the mask files, the object label for the pixel is the value on channel 0 (R) and the other channels are all 0. You can see how they handle that in the cell that shows sample image and mask files early in the notebook. Add some logic there to print the shapes and print out some sample channel values to see what is going on.

Yeah, I have played the masks, and finally found what I am looking for the true prediction. But couldn’t understand why 4 channelled images (.png) are used for masks. since a greyscale image (single channel) can be stored instead. Is it a matter of choice or any specific criteria for choosing .png over other formats like .jpg?

It’s just a personal choice. Some people like PNG better than JPG for some reason.

1 Like

Thank you very much, it completes my queries.