Feature Scaling - When to Scale

Hi @mvrbiguv,

This post discussed the relationship between learning rate size, feature scales, and cost contour regularity. You will also notice that the discussion is based on a lecture slide so you might review some lectures again for more explanation.

The key of feature scaling is for all features to span over a similar range, not the best range and not a small range. Usually people applies one of the first three methods in this wikipedia section to all features for the job. Those three methods will all result in a “small range” around zero, but being small is not the key, having a similar range among all scaled features is the key.

If you scale all features to a similar range, you might achieve a more regular cost contour ( as examplified in the lecture slide quoted in the linked post), if you exempt some features from scaling, you take the risk of having a less regular contour.

I recommend you to try to answer your own questions by doing some real experiments on different datasets. That will give you a more concrete idea, and you will be able to practice what you are trading off when not scaling each and every one of the features using the methods I mentioned in that wikipedia section.

Raymond