I’m not sure which scaling method I should use, in which cases is it better to divide by the max, mean normalization, or z-score normalization?
This is totally up to you.
Primarily it depends on how you want to handle outliers in the data set, and its overall statistics.
- using the range (max - min) will put a lot of emphasis on outliers.
- using the standard deviation will minimize the impact of outliers.
Hello victor,
Z-score normalization or standardisation is the process of features scaling so they all have the properties of a Gaussian distribution, i.e mean is 0 and standard deviation 1.
if you distribution is not gassian distribution or the standard deviation is very small, then mean normalization works better. mean normalization basically fixed the range of data between 0 and 1 or -1 and 1
Disadvantages: when we normalise data, it is sensitive to outliers, so if there are outliers in your dataset then mean normalization is not preferred and that is when standardisation or z-score normalization can be used.
Outliers are observation of data that does not fit the rest of the data, it has extreme values away from the usual data presentation in model analysis.
Regards
DP
Note that the outliers may be the most important data in the model, so be careful how they are considered.