Normalisation/feature scaling

should we scale before or after splitting the dataset in train,cv,test?

We should always split first and scale later (applying the scaling methods from training data on the valid and test data). This is to prevent the data leakage. This is true not only for scaling but all of the transformation steps.

The only reason people do the reverse is that it’s easier to do (but not the should-do one).