Grids in YOLO Algorithm

Aditya_Ranganath · January 14, 2024, 9:06pm

Why use grids at all in the YOLO Algorithm? Why can’t we just treat the input image itself as 1 big cell and have (bx, by, bh, bw) for multiple cars/objects based on that one cell. It would presumably be the same as having multiple cars/objects in one of say 19x19 grid cells and we can handle it the same way when the input image has multiple objects.

gent.spah · January 15, 2024, 6:40am

I think this is because the computation on a large image is much more expensive thats why you separate it in regions that you can easily manage!

ai_curious · January 15, 2024, 10:59am

Hmmm. But YOLO doesn’t actually divide the input image into smaller sub-images…that’s sliding windows, right? YOLO inputs the entire image into the network at once.

I might think of it as YOLO doesn’t divide the input into multiple small regions, but it does divide output into multiple small regions in order to support a fixed maximum number of object predictions per image (SSB for v2)

gent.spah · January 15, 2024, 2:34pm

Hi @ai_curious longtime hope you are doing well! @Aditya_Ranganath please listen to ai_curious he is much more knowledgeable than me at this subject!

Aditya_Ranganath · January 15, 2024, 4:20pm

Thanks @ai_curious and @gent.spah for your replies.

We split the images into grids so we can have our output (y) be more structured with some fixed number of predictions. Is that right?

But then, if we treat the entire image as 1 cell and say we have 3 total possible classes - why can’t we just have, a fixed number of predictions (say 30 → 10 predictions per class). This would also have a fixed maximum number of object predictions per image.

Any clarification would be highly appreciated.

Thanks!

ai_curious · January 15, 2024, 5:14pm

My intuition is the single (or even merely fewer, say 3 or 5) cell divisions would adversely impact localization training and prediction accuracy. But I haven’t thought this all the way through and at the end of the day it’s an empirical decision. Do some experiments, see what works best in training and in the operational environment taking into account costs of computation, memory, throughput, etc. These are all just engineering tradeoffs one can choose among, right? Go where your data takes you.

Aditya_Ranganath · January 15, 2024, 5:27pm

I see. If I understood correctly you’re saying neither design is right / wrong - it’s just a matter of what works better in practice. So there could be a hypothetical scenario where using just 1 large cell might make sense. Is this a reasonable assumption?

Thanks @ai_curious.

Topic		Replies	Views
How does a cell detect a bounding box bigger than itself, YOLO? Convolutional Neural Networks coursera-platform	6	832	July 10, 2021
Questions about YOLO Convolutional Neural Networks coursera-platform	13	2466	January 23, 2025
Output grid cells for YOLO, Sliding Window Convolutional Neural Networks coursera-platform	3	542	February 19, 2022
Queries regarding YOLO and Sliding window Convolutional Neural Networks week-module-3 , coursera-platform	10	94	February 28, 2025
Course4 Week3: Understanding YOLO Algorithm Convolutional Neural Networks coursera-platform	5	818	March 18, 2025

Grids in YOLO Algorithm

Related topics