Let’s say I have 10,000 images that I want to use to train as a cat classifier. Out of the 10,000 images, 2,000 are cat images, and the other 8,000 images have a similar background but no cat in the image.
I was wondering if there is any issue with having more ‘negative’ label data than the ‘positive’ label. Are there any guidelines on how the distribution of images should be across the labels for the training set?