Hi there,
I was wondering how were the true labels on the training set generated for bounding boxes and semantic generation? At one point I even thought that Andrew Ng was saying that we could do bounding boxes without the true labels for bounding boxes coordinates, but I think I was wrong.
Are they generated through data synthesis? Or with other software designed specifically to contour objects/box objects with human help?
Thank you for your time.
Unsupervised learning for image detection/localization is one of key research area for long years. A classic way is to do clustering, but, one of modern approach is to use GAN (Generative Adversarial Network) which has a generator in it. But, I believe it is still a research area, and not a level to be usable for autonomous driving.
Typically, human intervention is required. So, there are many projects to try to minimize human efforts for annotations from usability view points. (Of course, there are some businesses to annotate data for you.
)
Microsoft VoTT (Visual Object Tagging Tool) is one of major tools, and supports Pascal VOC/TFRecords for Tensorflow, and CNTK for Microsoft Cognitive Toolkit. Current version does not support YOLO format. There are some conversion tools, of course.
LabelImg is another good tool to create a bounding box visually, and support YOLO format in addition to Pascal VOC. (I’m using this, since it is handy for YOLO.)