Yolo prediction

Mohammad_Hamza · September 11, 2022, 6:57pm

When I make a prediction on an image and run it many times, instead of getting one bounding box for each object the boxes are overlapped. How will I be able to get a single box for each object even if I run it twice or thrice?

paulinpaloalto · September 11, 2022, 9:00pm

This was discussed in the lectures and in the assignment under the heading of “non-max suppression”. That was section 2.4 and exercise 3 of the assignment.

Mohammad_Hamza · September 12, 2022, 11:53am

The section covers how to remove the bounding box. I did all those steps and just made the prediction. When I ran the prediction part once, the image gave only one single and correct bounding box for each object. When I rerun this line. The previous boxes don’t eliminate and the new ones overlap the previous boxes. How to eliminate and avoid it so that each time a prediction is run on an image, it gives just one bounding box.

paulinpaloalto · September 12, 2022, 3:23pm

I don’t know the answer, but if I correctly understand what you are saying you get different results the first time you predict versus the second and later time you predict. Well, just as a scientist, one would observe that if you try what you think is the same thing and get different results, it must not really have been the same thing, right? Well, I suppose you could theorize that the whole thing is not deterministic in the first place. But the Occam’s Razor version would be that there must be some aspect of this that is “stateful” or perhaps you are not actually executing the same thing. We can’t see what you are doing, so you are the one in the best position to investigate further. Try to construct a more “pure” experiment. E.g. try “Kernel → Restart and Clear Output” and then do “Cell → Run All”. When you start from a clean state, does that change the behavior, e.g. make it more reproducible?

Or maybe we get lucky and someone who knows more than I do about YOLO and TF in general will be able to suggest a better theory …

ai_curious · September 13, 2022, 1:47am

If only there was a way to take a list of bounding boxes and confidences scores and suppress the likely duplicate boxes, say all the ones with the non-maximum confidence scores. That would be pretty useful.

@paulinpaloalto had it right, as usual. It is difficult to be precise when describing what “YOLO” does since there are multiple versions and multiple code implementations of each version that have appeared over the years and the OP doesn’t elaborate on which was used. That said, it is a common fundamental idea across all versions that each grid cell + anchor box pair acts as an independent detector and makes its own prediction about object presence, location, shape, and class. Therefore it is not at all unexpected when there are multiple predictions of the same object produced by different nearby detectors from each forward propagation. And given that each detector has been trained with their own learned parameters, it is also not unexpected that they produce slightly different predicted values. In order to prune the list of predictions to only the best, a post-neural-net processing step must occur. Non-maximum suppression is one such step. Here is the official TensorFlow version of it:
https://www.tensorflow.org/api_docs/python/tf/image/non_max_suppression

You pass in lists of box coordinates and confidence scores, along with IOU and confidence score thresholds and a maximum count, and get back a list. The list represents boxes that have a confidence score higher than the threshold. Additionally, if there is a set of boxes that are determined to be effectively co-located based on an IOU exceeding the provided IOU threshold, then only the member with the highest confidence score is retained.

The result of the pruning is a subset of the original list where each box represents the highest confidence prediction for that location. If you really want only one prediction per image, run NMS with a max_output_size = 1

<rant>It is important to recognize that YOLO is not merely the CNN. It is critical to pick a useful number of anchor boxes that have shapes derived from the operational data. It is critical to perform post-processing. It is critical to understand which version of YOLO is being used, and how it was trained. You shouldn’t just download some code and/or prebuilt models from github and expect good results on all inputs. </rant>

Topic		Replies	Views
Object detection using yolo Convolutional Neural Networks coursera-platform	7	615	March 13, 2023
W3: Question with non max suppression and classes Convolutional Neural Networks coursera-platform	7	1169	October 10, 2023
Non-max supression Clarification Convolutional Neural Networks coursera-platform	2	534	October 10, 2021
Question about Week3 Yolo Programming assignment Convolutional Neural Networks week-module-3 , coursera-platform	3	265	April 19, 2024
YOLO non-max supression Convolutional Neural Networks coursera-platform	13	1257	May 6, 2023

Yolo prediction

Related topics