Do we still need normalization with "Adam"

Lizhang_Qin · July 16, 2022, 3:15am

With Adam optimizer
– Each feature will get it’s own learning rate.

So we still need to do normalization for input layer?

Thanks,

Lizhang

rmwkwok · July 16, 2022, 5:05am

Hi Lizhang, we do normalization because it helps gradient descent work better. If you don’t normalize, you may have a difficult time determining the best initial learning rate for Adam. You might do some experiements comparing different optimizers and initial learning rates on dataset with features of very different scales, and share the result with us!

Cheers,
Raymond

Lizhang_Qin · July 17, 2022, 3:37pm

Great, thanks. Doing normalization is certainly needed.

Topic		Replies	Views
Regularization and Adam and normalization and mini-batch Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	519	April 23, 2022
Why don't we put a vector of initial learning rate in adam optimization instead of a single one Advanced Learning Algorithms week-module-2	9	303	February 10, 2024
Batch Normalization vs Feature Input Normalization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	649	May 24, 2021
Do we need to use a learning rate scheduler for adaptive optimizers like Adam, AdaGrad? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	607	July 26, 2021
Input data normalization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	707	July 12, 2022

Do we still need normalization with "Adam"

Related topics