Object detection using yolo

Ayush_Sarangi · March 12, 2023, 12:27pm

while doing object detection using yolo algorithm why do we get more than one bounding box for each object before using non-max suppression ?

ai_curious · March 12, 2023, 5:46pm

YOLO v2, which is the basis for the autonomous vehicle programming exercise, makes S*S*B predictions each forward pass, where S*S is the grid cell count and B is the number of anchor boxes. Each of those S*S*B locations, the original YOLO paper refers to them as detectors, makes its own prediction about whether or not an object is present or not. If an object is near a grid cell boundary, or significantly overlaps two grid cells, it is entirely possible that two neighboring detectors will each think the object center is in their location. Non-max suppression can then remove one of the duplicates by assuming that if two predicted bounding boxes mostly overlap (highly similar location and shape) then they must be the same object.

Notice that if the localization prediction was always 100% accurate this would never happen, because each object is only in one actual location in the image (at least for non-quantum solution space for these discussions!). This situation arises when there is a lack of precision in the localization output.

Lakshmi_Narayana · March 13, 2023, 2:18am

My understanding:
An object can cover multiple grid cells and each grid cell covered can claim that it has the object by giving a bounding box as output, resulting in multiple boxes. Out of which, we select the one that claims with most confidence.

Please suggest any corrections.

ai_curious · March 13, 2023, 11:18am

After further consideration, I believe it can happen also within one grid cell also because of anchor boxes. Especially if the detected object is not close to matching one anchor box in size, I think two anchor boxes from the same grid cell can end up with bounding box predictions. Only one of which should survive Non-max suppression.

Lakshmi_Narayana · March 13, 2023, 3:27pm

So the overall idea is that the multiple bounding boxes are a result of multiple claims by either cells or anchor boxes, that they have the object. Is that right?

ai_curious · March 13, 2023, 4:06pm

my emphasis added

Confident that’s what you were thinking, but to be precise.

Each vector of predictions [p_c, b_x, b_y, b_w, b_h, c_1,…, c_n] sits at a specific location in the network output [S_x, S_y, B_i, …]. Duplicates are most likely coming from neighboring grid cells any anchor box, though if the anchor boxes and training are good likely same anchor box in each grid cell. Or, same grid cell, different anchor boxes. I think this could happen especially when two anchor boxes are close in shape. Hopefully if the anchor boxes are quite distinct in shape the training would reduce the occurrence of ‘false positives’ which is really what these multiple predictions on same object are.

Sounds to me like you have this

Lakshmi_Narayana · March 13, 2023, 4:21pm

Yesss…

I don’t think they can be called ‘false positives’ because they really do have the object, but they just don’t contribute to the labels that interest us.

Almost…
Now I’m trying to understand how the object spanning multiple grid cells is detected when checking on a single grid cell.

ai_curious · March 13, 2023, 5:00pm

I say that because we’re considering the case where there is only one object, but two grid cells ‘claim’ it to use your word. The object center is only in one of those locations, so any others are mistakes.

The key to understanding how a bounding box prediction can be larger than one grid cell is this diagram from the paper…

Bounding Box width and height b are multiples of anchor box width and height. Here p is used for anchor box because the paper refers to them as priors. The anchor box shape is multiplied by e^{t}, where t is the direct output of the network. e^t can be any positive number. If t \gt 0 then e^t > 1. and b will be larger than p, and even larger than grid cell size.

Here’s another recent thread that should resonate…

Topic		Replies	Views
YOLO - How come algortihm predicts mutiple bounding box without knowing cordinates of it? Convolutional Neural Networks	2	633	December 2, 2021
Detecting Multiple Objects using YOLO - Grid Cells plus Anchor Boxes Convolutional Neural Networks	6	1565	March 16, 2024
Course 4 Week 3 YOLO algorithm Convolutional Neural Networks	4	530	July 11, 2023
How does YOLO know if 3 cells make 1 object? Convolutional Neural Networks	3	611	August 14, 2023
Course4 Week3: Understanding YOLO Algorithm Convolutional Neural Networks	5	816	March 18, 2025

Object detection using yolo

Related topics