Object detection with varying size

In the data-centric approach we need to iteratively clean the data, and constantly monitor concept-drift/data-drift. In the case of detecting objects with a large scale of sizes (from very small to very large, or very close to the camera, or in focus, or very far away), it will be better: a) to work certain scales together (such as objects small and medium)? or b) focus on solving for a scale and add more scales later?