While preparing the data, why do we normalize X_train and X_test with 255?
That is one of the standard ways to normalize image data. The raw pixel values are unsigned 8 bit integers, so they range from 0 to 255. If you use the raw data, the gradients are out of control and it’s very difficult to get convergence. Try it for yourself and watch what happens: you need an incredibly low learning rate in order not to overshoot, but then it takes an outrageous number of iterations to converge if you can converge at all.
Dividing by 255 gives you floating point numbers between 0. and 1. which allows gradient descent to work much better. Because this style of normalization is so common, most image rendering functions will directly handle images normalized this way.
1 Like
I see. Thanks a lot for the explanation.