Hi there,
yes, in my experience it is an important step so that the training works effectively and all features are treated equally within your training process. Usually you want this. E.g. for gradient-based methods this also helps to reach the optimum more quickly. I recently read a nice outline here:
Personally, I made good experience with giving it in advance some thought how the features are distributed in reality in order to chose the right method for normalising the range, e.g.
- z Transformation / standardization for normally distributed features
- min/max scaling for e.g uniform distributed features
Hope that helps!
Best
Christian