Hi everyone,
In week 3 assignment, why we are dividing image array by 127.5 and also why subtracting by one?
def read_image_tfds(image, bbox):
image = tf.cast(image, tf.float32)
shape = tf.shape(image)
factor_x = tf.cast(shape[1], tf.float32)
factor_y = tf.cast(shape[0], tf.float32)
image = tf.image.resize(image, (224, 224,))
image = image/127.5
image -= 1
bbox_list = [bbox[0] / factor_x ,
bbox[1] / factor_y,
bbox[2] / factor_x ,
bbox[3] / factor_y]
return image, bbox_list
In this case the pixel values will be in between [-1, 1]. The main purpose of doing such normalizations is so that the neural network can converge faster because the weights do now oscillate much in magnitude, and computation power used is less compared when you are dealing with bigger numbers.
Thank you for the answers. I have one more question.
Is there any reason of choosing pixel range [-1, 1] over range [0,1] or this choice is arbitrary?
I don’t think there is much difference between the ranges [-1, 1] and [0, 1] in terms of computation and convergence there are the same just the sign changes. I suspect that by using [-1, 1] there is better separation of pixel values and might be that this kind of normalization is used for some other separation downstream, but these are just suspicions, I would have to go through and study that thoroughly to give a more precise answer…
I agree and feel that [0,1] or [-1,1] should not matter much.
One thought here is that with [0,1], the darkest pixels would definitively get scaled to “0”, which could falsely trick the Network into thinking that these pixels are not important. An easy way to look at this would be to look at the activation function g(w*x + b). If x=0, Relu will give you “0” which implies no learning.
Of course, you could have “0” pixels with the [-1,1] normalization but since this range is stretched, the probability of that happening is way lower.
Let me know if this makes sense.
1 Like