Output layer for detecting same object (a bounding box) multiple times in an image

Samanyu_Kansara · March 1, 2023, 12:26pm

Hi,

I’m unable to find information on how to do get multiple bounding box output for an image looking for an object which is present in the image at least once.

I’ve seen some existing projects being able to do that like craft text detector which draws bounding boxes around all detected text.

Any help is greatly appreciated!

ai_curious · March 1, 2023, 4:26pm

Joseph Redmon at the podium

YOLO can output multiple bounding boxes per forward pass per image. However, it is designed to generate a single bounding box per object. Generally multiple bounding boxes for the same object is undesirable. Perhaps you can elaborate on the functional/business requirement to help us make better recommendations.

Samanyu_Kansara · March 2, 2023, 10:15am

Thanks for the response.

I checked out YOLO but it is not what I need. I want to train the network to detect all occurrences of one specific object in the provided image. What I want to accomplish is extract regions of interest in a scanned specially marked document.

Detecting all the faces in an image is a similar use case, which also most mobile phones are able to do in real time.

ai_curious · March 2, 2023, 11:05am

Common terminology is that objects are unique instances, and thus occur only once within an image. If there are two faces in an image those are two objects, although one class. Unless your objective is to go further and recognize that the two faces are actually the same person. In that case I think you’re looking at at least a two-phase process- one to detect (localize plus classify) then a second one to interpret (facial comparison, read characters from a license plate etc)

Samanyu_Kansara · March 2, 2023, 11:33am

I think I did not word it right. I want to detect all objects belonging to the same class. I understand that all the objects are unique, but I want them all detected.

ai_curious · March 2, 2023, 2:14pm

YOLO can detect (localize plus classify) multiple objects of same or different classes in an image. You can always filter out objects with uninteresting class(es) afterwards. If you watched the video, Mr Redmon also talks briefly about other approaches that were state of the art circa 2015 when YOLO was invented (eg Deformable Parts, Regional CNN, Fast RCNN) and how they compare in speed and accuracy. It also shows at least one image with several airplanes, each with its own bounding box and class label, which seems like what you are interested in doing.

To the best of my knowledge that was the first public presentation of YOLO, so it’s an historic event and worth the 13 minutes to watch even if you end up using something else in your project.

Random YOLO screen capture found on the web…

It wasn’t shown in this example (above) but a YOLO-based system could be used in near real time to detect the license plates, then read the license plates, determine whether the license plate was carried by a car or a truck in order to charge the vehicle owner an appropriate toll/tariff.

ai_curious · March 4, 2023, 4:09pm

Something similar is described here:

Topic		Replies	Views
Detecting Multiple Objects using YOLO - Grid Cells plus Anchor Boxes Convolutional Neural Networks	6	1566	March 16, 2024
Object detection using yolo Convolutional Neural Networks	7	615	March 13, 2023
Yolo Anchor Boxes Convolutional Neural Networks	13	1197	October 30, 2023
Course4 Week3: Understanding YOLO Algorithm Convolutional Neural Networks	5	816	March 18, 2025
YOLO - How does Bounding box get identified when Object spawns multiple sliding windows(Grids) Convolutional Neural Networks	2	731	November 25, 2021

Output layer for detecting same object (a bounding box) multiple times in an image

Related topics