C3_W2: Distribution of images across different labels in training set

Let’s say I have 10,000 images that I want to use to train as a cat classifier. Out of the 10,000 images, 2,000 are cat images, and the other 8,000 images have a similar background but no cat in the image.

I was wondering if there is any issue with having more ‘negative’ label data than the ‘positive’ label. Are there any guidelines on how the distribution of images should be across the labels for the training set?

Generally you would like to have equal numbers of ‘true’ and ‘false’ cases.
But some amount of skew is acceptable. 10% of ‘true’ examples seems to be a good working limit.

It depends somewhat on the total number of examples you have for training.