I am training EfficientNetB4 model on chest x-ray images for the purpose of disease detection. The second layer in EfficientNetB4 model is the normalization layer which, by default, uses ImageNet data’s mean and variance for normalization. I am wondering if it is really necessary to normalize images in this case? One reason I can think of is that if the images are sent to the model without normalizing (i.e. the pixel values are in the range 0-255) then the calculations down the line can be really huge. Also, normalization is needed during inference to bring the dataset closer to the training data’s distribution. What other reasons are there? What if I only rescale the images (divide by 255 to bring each image array in the range 0-1) and not normalize?
Normalization helps limit the range of the values of the features.
An 8-bit integer has a range of over two orders of magnitude (0 to 255).
The same data converted to a float has a range of less than one order of magnitude (0 to 1).
This makes a big difference in how well a gradient-based algorithm can converge to a solution.
I used EfficientNet some time ago so I am not 100% sure what’s the default in EfficientNet now, but my memory is that its normalization layer was just dividing the inputs by 255, so, in that case, it is not different from what you are proposing to do in below:
I suggest you to verify it, by loading the model, then take the normalization layer out, and test the layer by passing some numbers into the layer.
Btw, I agree with you on the following
Our model needs to work with data defined in the same way as the data used for training.
If EfficientNet are trained with photos where their pixel values are defined to be [0, 255] (taking into account that it has its own normalization layer that rescales [0-255] into [0-1]), then, at inference time, [0-255] is the required definition for our photos’ range of the pixel values.
I believe you were making my above point by saying “to bring the dataset closer to the training data’s distribution”, but I would not say it is “training data’s distribution” because, to me, “training data’s distribution” is some quality across all the photos, but the range of [0 - 255] is a quality for every single photo. Therefore, as far as the pixel value range is concerned, “training data’s distribution” is referring to some other thing.