Not quite. Grids in YOLO are fixed size, determined before training starts, and their shape is not part of the network output. The predicted bounding box shape, b_w, b_h, can be smaller than, equal to, or larger than the grid cell in which it is located.
@Mithun_Kar
If you read the original papers carefully, or some of the several YOLO threads discoverable through the one linked by @paulinpaloalto above, you’ll see that YOLO doesn’t directly predict any of b_x, b_y, b_w, or b_h. Rather, the direct floating point values it outputs are subjected to further transformation to generate the location and shape coordinates. The inverse transformation must be performed when establishing the training data. Other than that, they are produced exactly the same way any neural network produces any floating point output. By that I mean labels provide ground truth values Y, the network generates predicted outputs \hat{Y}, and the loss function minimizes Y - \hat{Y} during training.
The expressions relating the b_{…} coordinates with the direct network outputs are discussed here:
Hope this helps