I’m trying to understand what is the minimum and maximum size of an object which YOLO is capable of detecting? Are anchor boxes somehow related to it?
I have my own data of scanned documents. Sometimes a particular object is basically across the whole page and thus it takes like 90-95% of an image. I’m having big difficulties training the YOLO model to recognize these large objects.
Am I missing any basics here? Are there any relevant topics I should study to understand more deeply what is going on?
Thank you in advance,
Anchor boxes don’t impose a hard limit on object prediction size, but they definitely influence the predictions during training. Training examples that are significantly different than your anchor box shapes will result in large imprecision early in training, possibly leading to instability but certainly increasing training time and cost.
Are you computing your own anchor box shapes to incorporate these large objects? Following the approach described by the YOLO paper authors regarding separation of classification from localization early in the training? Using their ideas about optimization and learning rates ? Etc. I learned a lot from trying to train a YOLO from scratch but never got satisfactory localization results - though my problems were more acute when faced with multiple small objects, not huge ones. Love to hear yours worked better.
This is very interesting, thank you. I’m using the k-means algorithm to generate anchor boxes from training data. What surprised me is, that my hunch about problems with large objects was probably right.
Here is what I did as an experiment:
- I resized every image to be something like 20 % of the original size
- I padded it back to original size with black color
And now of a sudden, YOLO works like charm