Use data engineering to better converge the loss function

the data engineering can shrink the data to the same size for better loss function convergence. I think we can use different learning rate if different feature have different range. I googled it and it shows that Adam optimisaiton can use different learning rate for different features.

I am wondering that use data engineering ot use different learning rate for different feature is better. Thanks

Hi @guangze_xia ,

Both, data engineering* and algorithms with adaptive learning rates (like Adam opt.) can be used together.

*Data preprocessing is laborious so people do not normally do this when starting first tests of the ML/DL model and use algorithms like Adam to account for the different variances of the multiple features.

Hi Nydia,

Thanks you!