Is a specific anchor box related to a specific class?

Doron_Modan · January 17, 2023, 1:05pm

Suppose we have 3 classes: pedestrian, car and motorcycle - does it make sense to pick 3 corresponding anchor boxes, or not necessarily? For example, one narrow, tall anchor box to ‘catch’ the pedestrian, a square to ‘catch’ the motorcycle, and a rectangular to ‘catch’ the car?
Or rather, we choose a bigger number of anchor boxes?
Is there at all any relation between the number of classes and number of anchor boxes?
Also, must anchor boxes be made of straight lines? Or could thay be a circle shape for example?

reinoudbosch · January 17, 2023, 1:51pm

Hi Doron_Modan,

You will want to find the shape of the bounding box that best detects the classes you want to detect. Have a look at this blogpost. This includes setting the shape of the bounding box e.g. to circle, which is then called a bounding circle. See, e.g., this article about medical object detection.

ai_curious · January 17, 2023, 2:46pm

I disagree. In my experience you want to find the shapes of bounding boxes that best represent the shapes of the objects in your training data, not the classes. Anchor shape is primarily about good localization, not classification. If you have a lot of nearby cars and a lot of far away cars in your data, you likely need at least 2 anchor box shapes to localize them…not one for all cars. Further, if the objects in your training data don’t represent the objects you want to predict, you have another, different, problem.

The YOLO designers used K-means to derive good anchor box shapes. There is a thread about it in this forum here: Deriving YOLO anchor boxes

reinoudbosch · January 17, 2023, 2:49pm

Indeed, I should have written “bounding boxes”. This is exemplified in the first link in my previous post.

ai_curious · January 17, 2023, 2:54pm

The number of anchor boxes in a YOLO architecture directly impacts the shape of the network output and the amount of computation. Unlikely one could afford the memory or compute time to support a network with enough anchor boxes to assign one per class for ImageNet, for example, which contains 1,000 classes.
I found in my own analyses that there were constantly diminishing returns in accuracy from increasing the number of anchor boxes, and the cost/benefit tradeoff of accuracy vs compute was around 8 anchor boxes. That number will vary depending on the data set. If you have few classes but different proximity you probably want more anchor boxes than classes. If you have lots of classes, you will have significantly fewer anchor boxes, like even 2 orders of magnitude fewer. Hope this helps.

Ps: note that it’s not just aspect ratio, ‘taller than wide’ that matters for anchor box shape, but the actual size in pixels. Anchor boxes that aren’t close in shape to training objects causes problems in training.

Also note that at least in my quick first read, the article linked above talks about the size and shape of anchor boxes, not a ‘face’ anchor box.

Topic		Replies	Views
Number of anchor boxes Convolutional Neural Networks coursera-platform	5	681	October 19, 2024
Yolo algorithm - anchor box vs classes Convolutional Neural Networks coursera-platform	1	599	December 21, 2021
Why using anchors boxes? Convolutional Neural Networks coursera-platform	2	510	August 12, 2023
Week 3 - Car Detection Anchor Boxes Convolutional Neural Networks coursera-platform	14	946	July 11, 2023
Yolo Anchor Boxes Convolutional Neural Networks coursera-platform	13	1205	October 30, 2023

Is a specific anchor box related to a specific class?

Related topics