YOLOv1 Research Paper

ashish_learns · July 3, 2021, 7:47pm

I tried reading the YOLOv1 paper and am unable to understand this :

It weights localization error
equally with classification error which may not be ideal.
Also, in every image many grid cells do not contain any
object. This pushes the “confidence” scores of those cells
towards zero, often overpowering the gradient from cells
that do contain objects. This can lead to model instability,
causing training to diverge early on

Above is from page number 3 of the paper under the Training section. Link to the paper :

reinoudbosch · July 7, 2021, 8:21pm

Hi ashish_learns,

Regarding the first part of your question you can have a look here. The second part of your question is addressed here.

ashish_learns · July 8, 2021, 3:09pm

Also, does YOLO give multiple bounding boxes during training only? Is it the case that during prediction/testing, YOLO outputs only one bounding box?

reinoudbosch · July 10, 2021, 1:01am

Hi ashish_learns,

The paper states "On PASCALVOC thenetwork predicts 98 bounding boxes per image and class probabilities for each box. "

ai_curious · July 10, 2021, 4:04am

The output of a CNN forward propagation is the same regardless of whether training or what you called prediction/testing. What differs is whether there is backprop and iterative modification (learning) of parameters. But forward prop produces the same dimension output in all cases. In YOLO that dimension, and thus the number of bounding box predictions, is driven by the SxSxB shape of the output layer. You train it to make that many predictions at a time, and when you run it operationally, that’s what it does. This is generally true of all machine learning: you train how you fight, and fight how you train.

Sliding windows, other region based approaches, and YOLO were all invented to deal with the challenge of detecting multiple objects per scene. YOLO did it close to as well and much much faster, which is why you are studying it 5 years on.

ashish_learns · July 10, 2021, 6:14am

So after these 98 bounding boxes are predicted, and suppose that the image has only 5 objects(and thus we should get only 5 bounding boxes) then this algorithm, for each grid cell, selects the bounding box having the highest probability among all the bounding boxes for that particular grid cell?

reinoudbosch · July 10, 2021, 3:50pm

Yes, that is demonstrated in figure 2 in the paper.

ashish_learns · July 10, 2021, 7:19pm

That also means that Intersection of Union is not used during the time of prediction as there is no ground truth available?

reinoudbosch · July 10, 2021, 7:41pm

No, because intersection over union is used in non-max suppression in order to find the highest probability among the bounding boxes. This is explained in the assignment. On this, the yolo paper states “some large objects or objects near the border of multiple cells can be well localized by multiple cells. Non-maximal suppression can be used to fix these multiple detections.” (p. 4).

ai_curious · July 10, 2021, 11:01pm

Remember IOU, or Jaccard Index, is a general purpose mechanism for comparing two regions. It is used differently in different parts of the overall YOLO solution. Initially it is used to compare anchor boxes with ground truth bounding boxes during training data set up. Later it is used to compare two outputs of forward propagation (neither of which is a ground truth bounding box) during the NMS phase of operational execution.

Topic		Replies	Views
Object detection using yolo Convolutional Neural Networks coursera-platform	7	615	March 13, 2023
Course4 Week3: Understanding YOLO Algorithm Convolutional Neural Networks coursera-platform	5	816	March 18, 2025
YOLO - How come algortihm predicts mutiple bounding box without knowing cordinates of it? Convolutional Neural Networks coursera-platform	2	633	December 2, 2021
How does a cell detect a bounding box bigger than itself, YOLO? Convolutional Neural Networks coursera-platform	6	827	July 10, 2021
YOLO Algorithm and grid cells Convolutional Neural Networks week-3 , coursera-platform	11	90	March 19, 2025

YOLOv1 Research Paper

Related topics