The image cropping code extends images if necessary with all zero (black) pixels. Might the classifier using these cropped and extended images work better if extensions used gray rather than black pixels? Gray ought to be statistically close to average image pixel values.
Hi @jlammens,
Thank you for the question. This is indeed an interesting one.
My gut feeling says it would probably be the same no matter what you use, as the neural network would just learn to disregard it. This said, you have to be careful that you use the same background in the training data as in the test data, because using a new type of background on which it was not trained on might easily confuse it.
I was curious about your question and did a bit of a search through the literature. I couldnt find any direct comparison of what you are asking (even though I assume if gray was better, people would already use it), but I found this one, which might be interesting:
They compared rescaling the image to padding it. Their main conclusion regarding padding is as follows:
“Our study showed that zero-padding had no effect on the classification accuracy but considerably reduced the training time. The reason is that neighboring zero input units (pixels) will not activate their corresponding convolutional unit in the next layer.”
So if what they claim is correct, black padding doesn’t make it worse, but it makes it faster. And speed is in many cases a very important factor. Of course every case is slightly different and in our case we are actually doing both: rescaling and padding, which might complicate the theoretical understanding a bit.
If you are curious and have the capacity to dig deep into this issue, you can try it yourself. You can pad the data with the mean value, retrain the network and observe the results. In light of the above article, perhaps you can also try rescaling in a different way (maybe no upscaling, but just downsizing when necessary). People have also tried fake backgrounds, removing backgrounds, etc. The possibilities are many.
All these small nuances are what makes this field so interesting (and not yet completely discovered and understood) and there are always countless possibilities to improve the algorithms. Some of them may do wonders, others make it worse. It is up to us to try.
Interesting, thanks for your reply! Zero values (black RGB pixels) indeed imply no activation for the corresponding input ‘neurons’. For what it’s worth, this is actually the inverse of what happens in human retinas, where photoreceptors use ‘negative’ or ‘inverted’ coding with activation being inversely proportional to amount of received light…
That’s an interesting fact! Thanks for sharing.