Why do we need to normalize data in gradient descent algorithm?

Romain_Montoriol · May 31, 2023, 12:35pm

Why do we need to normalize data in gradient descent algorithm ?

Considering the use of mean and standard deviation :
why do we use mean instead of median ?

Thank you in advance

gent.spah · May 31, 2023, 1:10pm

We need to normalize data because it reduces oscillations and compute power (range 0 to 1 is better than huge numbers) overall helping and speeding up convergence to a minima.

The gaussian distribution is defined on mean and mean is different than median, using the mean the normalization is more balanced than median.

TMosh · May 31, 2023, 3:20pm

Also, normalizing the features allows us to use a larger learning rate without risk of the solution diverging due to excessively large gradients for individual features.

Christian_Simonis · May 31, 2023, 6:21pm

In addition to @gent.spah‘s excellent response, you can also check that his statement is true by taking a look at the definition of the Gaussian normal distribution (on which the normalization step relies) with the standard deviation \sigma and the mean \mu:

${isplaystyle {rac {1}{igma {qrt {2i }}}}e^{-{rac {1}{2}}eft({rac {x-u }{igma }}ight)^{2}}}$

source

So the Gaussian probability density function is defined by the mean, not by the median. But in the end this does not matter anyway in a normal distribution since mean and median are identical due to the perfect symmetry. That being said, of course median and mean can be different in a data set where you do not have this perfect symmetry, especially if the data does not follow a symmetric distribution like the Gaussian or student t distribution.

The other part of your question seems to be covered in these threads, too:

Hope that helps!

Best regards
Christian

Topic		Replies	Views
Input data normalization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	707	July 12, 2022
Distribution of the data Supervised ML: Regression and Classification week-3	2	622	July 27, 2023
Impact of Feature Scaling on underlying distribution Supervised ML: Regression and Classification week-2	7	253	April 16, 2024
Vagueness & Ambiguity In Z Normalization Supervised ML: Regression and Classification week-2	1	511	January 6, 2023
Mean Normalization VS other forms of Feature Scaling Supervised ML: Regression and Classification week-2	2	530	July 28, 2022

Why do we need to normalize data in gradient descent algorithm?

Related topics