I have never personally tried anything like that, but there are some very detailed threads on the forum about YOLO, both how it works and how to train it. Here’s one to start with.
Here’s one that explains how Anchor Boxes work, which are a key part of the algorithm. Here’s another says more about Anchor Boxes and also describes the training dataset.