Why do we need to normalize data in gradient descent algorithm?

In addition to @gent.spah‘s excellent response, you can also check that his statement is true by taking a look at the definition of the Gaussian normal distribution (on which the normalization step relies) with the standard deviation \sigma and the mean \mu:

{isplaystyle {rac {1}{igma {qrt {2i }}}}e^{-{rac {1}{2}}eft({rac {x-u }{igma }}ight)^{2}}}


So the Gaussian probability density function is defined by the mean, not by the median. But in the end this does not matter anyway in a normal distribution since mean and median are identical due to the perfect symmetry. That being said, of course median and mean can be different in a data set where you do not have this perfect symmetry, especially if the data does not follow a symmetric distribution like the Gaussian or student t distribution.

The other part of your question seems to be covered in these threads, too:

Hope that helps!

Best regards

1 Like