Hi @mvrbiguv,
This post discussed the relationship between learning rate size, feature scales, and cost contour regularity. You will also notice that the discussion is based on a lecture slide so you might review some lectures again for more explanation.
The key of feature scaling is for all features to span over a similar range, not the best range and not a small range. Usually people applies one of the first three methods in this wikipedia section to all features for the job. Those three methods will all result in a “small range” around zero, but being small is not the key, having a similar range among all scaled features is the key.
If you scale all features to a similar range, you might achieve a more regular cost contour ( as examplified in the lecture slide quoted in the linked post), if you exempt some features from scaling, you take the risk of having a less regular contour.
I recommend you to try to answer your own questions by doing some real experiments on different datasets. That will give you a more concrete idea, and you will be able to practice what you are trading off when not scaling each and every one of the features using the methods I mentioned in that wikipedia section.
Raymond