Queries regarding YOLO and Sliding window

To give a little more detail on question 2) in addition to Kader’s excellent response, note that the training of YOLO to recognize objects is not as focussed on the grid cells as you might expect. The grid cells are primarily used as a convenient way to organize the presentation of the results. There is no requirement that an object be contained completely within a grid cell, but the object is assigned to the grid cell that contains the centroid of the object. That also makes the NMS post processing more efficient, since it’s unlikely that two objects presented in the output are really the same object if their centroids are in different grid cells.

YOLO is by far the most sophisticated algorithm we have seen so far in DLS. There are a number of threads on the forum that explore various aspects of YOLO in quite a bit more detail than is covered in the lectures. For example, here’s one that talks about how grid cells and anchor boxes are used in YOLO. And here’s one that talks about the Non-Max Suppression that I referred to earlier.