Disadvantages of feature scaling

We know that feature scaling effectively improves the speed of the gradient descent algorithm, especially when one feature dominates others. However, after scaling, all features are treated the same in terms of their importance. I just wonder, what are the disadvantages of feature scaling? Does it always lead to better results?

Hi @Trung_Hoang

welcome to the community!

Potential disadvantages are:

  • if you have in some features very strong outliers (and in others you don‘t), this might actually be a disadvantage in the features with outliers because the outlier dominates the scaling (e.g. = max in min/max scaling) leading to unintended results. In such a case you might even lose the sensitivity of that feature with the outlier or in extreme cases where we talk about numerical capabilities of a data type we might even have a loss of information (think of 1e32 in a feature column where the range is rather from 0-10). Often here, having a clear strategy how to deal with outliers before scaling (e.g. winsorizing or clipping if the business problem allows it) is useful.

  • that sometimes (if you take a look at the scaled features standalone) the interpretability lacks a bit since after scaling (e.g. min/max scaling) you can still interpret the relative distribution (e.g. 0.9 is rather high), but you cannot directly interpret the feature with the original unit (e.g. 30 Newton) or so. But in reality this is not a problem because hopefully you have done lots of such interpretation before already in your Data Understanding step and of course you can do this kind of interpretation also afterwards with the data you need.

As you see the good thing is, we can deal with these points well: scaling is really an essential part of machine learning, see also:

Please let me know if this helps.

Best regards
Christian

1 Like

Thank you !
Best regards

1 Like

My pleasure!

Happy learning @Trung_Hoang!

Best regards
Christian

1 Like