To the best of my knowledge there are 6 “versions” of YOLO
The original, circa 2015/2016 is described here: [1506.02640] You Only Look Once: Unified, Real-Time Object Detection
The next two, which are v2 and YOLO 9000 are described in the same paper : [1612.08242] YOLO9000: Better, Faster, Stronger
It was YOLO 9000 that introduced hierarchical (not mutually exclusive) classification and thus switched away from softmax.
The last paper from the original team is known as v3 and is described here: [1804.02767] YOLOv3: An Incremental Improvement
After v3, lead researcher Joseph Redmon stopped working on object detection because of the uses he saw being made of it. Subsequently there have been two additional releases by other people.
V4 : [2004.10934] YOLOv4: Optimal Speed and Accuracy of Object Detection.
V5: https://docs.ultralytics.com/
I’m not aware of a published paper for v5. There are some interweb locations that refer to a ‘new’ or ‘modified’ or ‘improved’ v5 but nothing I have seen claims to be a new “version.”
The lectures and code in this class are based on v2. It’s important to provide that context whenever asking questions about or discussing YOLO, because there are some substantial differences between the versions.
Regarding the question on anchor boxes and classes, in YOLO v2 anchor boxes are indeed class independent. The number to use, and their sizes, is determined by exploratory data analysis on the training set. Picking good anchors influences training effectiveness and efficiency because of how the anchor shapes influence predicted bounding box shapes, which in turn influences loss calculations. Bad anchors lead to bad predictions both during training and at runtime. I wrote up some of my experience generating anchors for a custom dataset here: Deriving YOLO anchor boxes which may help here. Cheers