When to use which normalization

Vedbhai · October 11, 2024, 5:29pm

My doubt is when I should use Z-score normalisation and when I should use mean normalisation ??

Please elaborate as to why we need both and for all practical purposes which normalisation we should use ?

Regards

TMosh · October 11, 2024, 7:21pm

This is a duplicate of other questions on the same topic.

nadtriana · October 12, 2024, 11:20am

The choice between these methods often comes down to the structure of your data and the specific algorithm you’re using, as you asked before on the same topic. Z-score normalization is useful when you care about the standard deviation and need normalized distributions, while mean normalization is simpler and often effective when you just want to rescale between a fixed range, such as -1 to 1. You may want to

Use z-score normalization when your data follows a Gaussian distribution, or when algorithms (such as k-means clustering or logistic regression) assume the features are normally distributed (e.g., many machine learning algorithms such as SVMs or neural networks).
Use mean normalization when the data distribution is unknown or arbitrary (non-Gaussian distributed data) and you just need to scale the features in a way that balances their impact on model training (e.g., recommender systems). It’s often sufficient for algorithms that don’t make strong assumptions about the data distribution, and you only care about centering the data and making sure it’s in a certain range (like -1 to 1).

Vedbhai · October 16, 2024, 1:03pm

Sir,
Do you mean Logistic regression expects us to give data that follows Gaussian distribution as the input?

Also SVMs ??

I am surprised Dr Andrew never mentioned this in the course.

There was no mention of using z-score specifically for Logistic regression

nadtriana · October 16, 2024, 2:41pm

You are right to be surprised - logistic regression and SVMs do not require the input data to follow a Gaussian (normal) distribution. My earlier explanation may have confused you about when z-score normalization is appropriate. Z-score normalization isn’t required for logistic regression or SVMs, but feature scaling in general is helpful to ensure that the optimization process works efficiently. It’s not about the distribution of the data, it’s about ensuring comparable feature scales to avoid bias in gradient-based optimization.

Dr. Andrew Ng doesn’t specifically mention in the course that z-score normalization is required for logistic regression or SVM, because these algorithms don’t depend on the data being normally distributed. The course emphasizes feature scaling in general because it helps optimization algorithms, such as gradient descent, converge faster. Whether you use mean normalization or z-score normalization depends more on your dataset and the specific problem you’re solving.

Hope this helps!

Topic		Replies	Views
Feature Scaling - What method to choose? Supervised ML: Regression and Classification week-module-2	3	399	August 18, 2023
Question about rescaling Supervised ML: Regression and Classification week-module-2	4	507	July 7, 2022
Mean Normalization VS other forms of Feature Scaling Supervised ML: Regression and Classification week-module-2	2	532	July 28, 2022
C1_W2_Course_Feature_Scaling Supervised ML: Regression and Classification week-module-2	3	518	July 3, 2022
Question about feature scaling Supervised ML: Regression and Classification week-module-2	5	36	August 17, 2024

When to use which normalization

Related topics