It’s not in any code we write or would have seen, right? The output of the model includes both the bounding box and the object type for every object detected by the algorithm. The model is trained to recognize objects and that includes computing the bounding boxes for us. If you have the coordinates of the bounding box of an object, then it’s pretty straightforward to compute the centroid of that object. It’s not really the centroid in the physics meaning of that (center of mass), but simply the center of the rectangle, as in the intersection of the two diagonals of the bounding box. Once the algorithm has that value, then it uses that to determine which grid cell will have the data for that object. Note that there is no restriction on objects that they need to be contained within a single grid cell: an object can span multiple cells and the only real purpose of the grid cells is just to “hang” the objects there, so that we can write the loops to process them in a localized way.
One other level of things here is that the nature of the YOLO algorithm and its training is that it can recognize the same object multiple times in slightly different ways. We deal with that ex post facto by the “culling” process called Non-Max Suppression, which is covered both in the lectures and in the assignment. There are a number of articles on the forum about that as well, e.g. this one.