How to deal with different units in dataset

In feature scaling I use standardization or normalization to make the features in same range but what if I have at least 2 features one in meter and other is centimeter BUT both in same range let’s say between 1 and 2. the question is should I need to convert to same unit first as m or cm and then scale this features? or just a scale it without any conversion but here it will be a garbage in and garbage out because the input is different scaling that means after scaling it will be same thing. what do you think about that ?

Hello, @khaldon,

Just scale them individually without first converting cm to m or vice versa.

Standardization essentially takes away any unit because we will need to divide the difference from mean (which has the unit of the feature) by the standard deviation (which also has the unit of the feature).

Since the units are taken away, all scaled features become the same in terms of unit,
because they are all unitless.

Speaking of the division, even if we convert m to cm by multiplying it (the divdend) by 100, its standard deviation (the devisor) will also by scaled up by 100. In other words, the conversion factor 100 will get canceled out, so the conversion is not needed.

Cheers,
Raymond

2 Likes

Ignore it. To machine learning, the measurement units don’t matter. They’re just floating point numbers.