It’s an interesting and important question. I don’t really know the answer here, but there have been some discussions of this point in the past. At a high level, it’s pretty clear that creating data with the detailed form of labeling required for semantic segmentation in which literally every single pixel in the image needs to label could be pretty labor intensive and it’s not immediately clear how you could automate that. There’s a fundamental chicken or egg problem: you need the data to train a model to do the labelling. So how do you get the data to train the first such model. Of course this technology has been around for a while, so maybe we get lucky and earlier pioneers have solved this problem for us.
When the question first came up, I looked at the CARLA website and there is a lot of information there, including links to the datasets used here, but in my exploration I never found any explanation on the CARLA website of how they created the dataset.
Here’s a thread with links to several other threads that have some information relevant to your question. One of the threads links to a company that apparently is in the data labelling business, but I did not do any further investigation on that.
In another context, someone told me about the software package Label Studio, which is open source and (I think) is intended to help streamline the labeling process. I have not personally tried to use it, so I have no information about how easy or difficult it is to use.
Sorry, that’s all I have. Maybe we get lucky and others who have more knowledge in this area will see this thread and chime in.