In the “show_tensor_images” function in the exercise, the first line of code in that function looks like this:
image_tensor = (image_tensor + 1) / 2
But I don’t understand why we need to add 1 to image_tensor
and then halve the result, why don’t we just use the image_tensor
, can someone help to explain?
Many thanks!
Depends on the normalization of the image_tensor. I am guessing its between [-1, 1] so this moves it in the range [0, 1] as its better for convergence in general.
1 Like
Yes, it’s exactly as Gent says: you have to check how the image data is normalized. Raw RGB images have unsigned 8 bit integer values, so they range from 0 to 255. That gives terrible convergence, so the common practices are either to normalize the images to the range [-1,1] as was done here or to “standardize” them by simply dividing the values by 255. In the latter case you’d get the range [0,1], which also gives good convergence and has the added benefit that the standard image rendering algorithms can handle that representation. Here they used [-1,1] which is why they need to perform that rescaling in order for the image rendering to work properly, Try it without doing that and watch what happens! Just comment out that line and rerun the rendering.
2 Likes
Thanks very much for your answers!