How are the width and height in yolo determined?

I don’t quite understand
Let me talk about what I understand.
For example: after CNN. Find out whether each grid detection is the probability of a car, and then take the grid with a higher probability as the center point. Do K-means for this grid? Then determine the width and height according to the clustering results? Then IOU?

I’m talking about another idea of mine

CNN has finally shrunk the image to a very small size, with the car features as the midpoint and the original image as the border when zoomed in?

Is my understanding correct?

What should I have missed?

Hey @Fatcar2002,

Can you please clarify whether you are talking about the original YOLO Implementation as per DLS C4 W3 A1, or are you trying to modify the implementation somehow?