Why do we need to normalize data in gradient descent algorithm?

Why do we need to normalize data in gradient descent algorithm ?

Considering the use of mean and standard deviation :
why do we use mean instead of median ?

Thank you in advance

We need to normalize data because it reduces oscillations and compute power (range 0 to 1 is better than huge numbers) overall helping and speeding up convergence to a minima.

The gaussian distribution is defined on mean and mean is different than median, using the mean the normalization is more balanced than median.

3 Likes

Also, normalizing the features allows us to use a larger learning rate without risk of the solution diverging due to excessively large gradients for individual features.

3 Likes

In addition to @gent.spah‘s excellent response, you can also check that his statement is true by taking a look at the definition of the Gaussian normal distribution (on which the normalization step relies) with the standard deviation \sigma and the mean \mu:

{isplaystyle {rac {1}{igma {qrt {2i }}}}e^{-{rac {1}{2}}eft({rac {x-u }{igma }}ight)^{2}}}

source

So the Gaussian probability density function is defined by the mean, not by the median. But in the end this does not matter anyway in a normal distribution since mean and median are identical due to the perfect symmetry. That being said, of course median and mean can be different in a data set where you do not have this perfect symmetry, especially if the data does not follow a symmetric distribution like the Gaussian or student t distribution.

The other part of your question seems to be covered in these threads, too:

Hope that helps!

Best regards
Christian

1 Like