-
We see X (and not x) during normalization. Does it mean we normalize all the features of the training dataset?
-
Also, when we have features of mixed data types: numerical, categorical, ordinal, does it still make sense to normalize only the numerical features?
and one-hot-encode categorical/ordinal data? -
How to best deal with time stamp data (not necessarily a complete time-series data, but say fewer timestamps which could be encoded into 10-15 values): is it better to aggregate the data for each of the 10-15 timestamps, especially as non-time series algorithms do not work very well when it encounters a time data type.
-
Verifying the distribution of the features: The ideal training dataset should contain equal distribution of all the possible combinations of the features. For example if x1 and x2 were categorical, then 25% each of (00, 01, 10 and 11). For numerical, individually each of the feature should show a normal distribution and we might also need to check the joint distribution of the multivariate inputs. and what if we find skew in the multivariate distribution? Are there techniques to augment the multi-variate data to normalize such distributions?
Thanks for your help.