+1 for calling out this really important point that isn’t always mentioned or emphasized. In YOLO v2, the predicted bounding box shape is the anchor box shape times a factor. The derived bounding box shape is used in the cost function, but the factor is what the network is learning to generate. This also reinforces why it is important to have a good set of anchor boxes; the closer the anchor box shapes are to the ground truth bounding box shapes, the faster and better the network learns the weights that produce better factors and lower localization error.
@Zolids there are many threads that go deep on the questions you pose above. Here is one that might be useful:
It contains the mathematical expressions for what @Alireza_Saei wrote and that I quoted above. From those equations you can see how a predicted bounding box shape can exceed the dimension of a grid cell.