Yolo Anchor Boxes

ai_curious · June 5, 2022, 11:13am

Here’s how I think of it. Suppose we want to do classification when there is one object in an image. You can run a CNN forward prop and easily generate a prediction, right? Cat. But how do you deal with images containing two objects when the network only produces a single output? The initial approach was to divide the input image and run the same classification network on all parts. If there is one object in each subdivided part, then we’re good. Except now you are doing lots more computation. And some of those regions may still have multiple objects. YOLO was a reaction to this challenge. How to deal with multiple objects, possibly near each other, but still run near real time. By introducing grid cells (number of grid cells is S in the YOLO papers) and anchor boxes, B, a YOLO CNN can output S*S*B classification predictions from a single forward pass. You kind of get the best of all worlds; high enough accuracy even on multi-object images at a very high frame rate. When it was introduced circa 2016, YOLO was competitive in accuracy with state of the art region-based approaches but was substantially faster, which is why it is still studied 6 years later. Hope this helps.

Topic		Replies	Views
Detecting Multiple Objects using YOLO - Grid Cells plus Anchor Boxes Convolutional Neural Networks coursera-platform	6	1571	March 16, 2024
Number of anchor boxes Convolutional Neural Networks coursera-platform	5	681	October 19, 2024
Week 3 - Car Detection Anchor Boxes Convolutional Neural Networks coursera-platform	14	946	July 11, 2023
Encoding the anchor boxes Convolutional Neural Networks coursera-platform	8	842	August 8, 2022
What are anchor boxes doing? week 3, assignment 1 Convolutional Neural Networks coursera-platform	5	740	September 27, 2021

Yolo Anchor Boxes

Related topics