Confusing on normalisation

Now, I am kinda confuse on what are the difference between minmax standardization, z score normalization and transformation such as box cox transformation. How do we know to choose which method in normalizing the data to prevent skewness?

Hi, @LimXiuXian96.

Normalization does not alter skewness. If you want to make your data more normal distribution-like, this may be helpful. If you’re worried about your dataset being imbalanced, take a look at this.

You may be interested in the Practical Data Science Specialization :slight_smile:


I forgot to include this link in case you were asking about the different normalization methods and their uses :nerd_face:

1 Like

From the lectures, i know that normalization enable us to make sure that training is much more faster to achieve lower cost. But i am still confusing on how skewness in data might affect us in training out deep learning model. Does it have any effect on training?

If you’re referring to the distribution of the individual features, I’d say it doesn’t. But I’ll see if I can find any references :thinking:

1 Like

Ok noted, thanks a lot for the info provided.

1 Like


All your questions are very important and demonstrate your understanding of how the data affects the model performance.
I’d like to point out that there is a critical difference between standardization and normalization.
Standardization concerns the scale or unit of a feature, where normalization corrects for the distribution of a feature. For example, min-max standardization fits the feature to be within [0, 1] but this does not mean it will be normally distributed. It may still be skewed. On the other hand, a z-score is a normalization which attempts to fit the feature to be normally distributed. As result, most of the data points will fall between 2 \sigma away from the mean, that said it is most likely not between [0, 1]. Here, you will have less skewness as result.

Which method you choose depends on your application and what the model requires. For example, if you want the feature to be non-negative, min-max may be better than z-score. (z-score can take a negative value.)

You are correct that normalizing/standardizing often leads to better performance and efficiency.
For a simple example, think of a linear model. Though it’s not imperative that each feature is normalized, you can easily see that the normalized model will be much simpler to compute and to understand. (The weights will be also on the same scale.) It is the same, if not more, important for larger, more complex models like deep learning models.

Skewness, on the other hand, should not affect performance in terms of training efficacy. But here is what you should remember about the skewness. For skewed data, the train-dev-test split needs to be more carefully done to make sure all the splits represent all the data range across the splits. You don’t want to end up with a situation where the test splits contain no negative samples, for example.

I hope that helps. Let me know if you have any questions.

1 Like

Thanks a lot for the info provided !

So based on my current understanding, to recap:

  1. z-score transformation/min max standardization speed up the training of deep learning model because it actually RESCALE it to a suitable range hence avoiding much higher gradient wrt one feature than the gradient wrt other features and toughen the optimization process. (Take longer time and steps to achieve low costs)

  2. skewness does not affect the deep learning model(neural net due to its universal approximate functionality) but it will affects other machine learning statistical models such as linear regression/logistic regression, right?

  3. z score transform does not alters skewness, ie it does not change the distribution of a feature(only tell about how far from the mean of a feature), right?

Pls correct me if any wrong, thank in advance.

I think you are right on!
Normalization does not correct for the skewness. And you are right on point about the Universal approx. theory.
And correcting skewness WOULD be helpful if you are doing a very simple linear regression since it assumes the features are normally distributed.