Vagueness & Ambiguity In Z Normalization

Hi all,

In the 2nd Video of the gradient descent in practice the professor says we can use the z normalization on any variable (we had learnt in the university this is the standardization of the Gaussian distribution).

But theory says we must first normalize the numbers to prove they’re N(μ,σ^2) (if we can do such a thing as the distribution is not normal, how to rescale here?) and only afterwards standardize the distribution to Z(0,1) by the use of formula Z=(X-μ)/σ.

Isn’t this a vagueness in the video?

Looking forward for your reply

Thanks in advance!

1 Like

Hi @Menelaos_Gkikas,

you can conduct z-normalization regardless of whether the variable is normally distributed or not.

E.g. image if you have several features w/ a Student t-Distribution which have fatter tails than a normal distribution.

Still after scaling w/ z-normalization you have made them comparable which is beneficial for your algorithm: After all, you want to make sure that your features are in a comparable, reasonable way to have a nice training process and run gradient descent more effectively without biasing the algorithm to high magnitude features, see also: Questions on normalizing really huge data - #2 by Christian_Simonis

It you have really strange distribution shapes, one would have to think about if other scaling approaches might be better (like min/max or so)…

Hope that helps! Please let me know if you have any further questions.

Best regards