Do we still need normalization with "Adam"

With Adam optimizer
– Each feature will get it’s own learning rate.

So we still need to do normalization for input layer?

Thanks,

Lizhang

1 Like

Hi Lizhang, we do normalization because it helps gradient descent work better. If you don’t normalize, you may have a difficult time determining the best initial learning rate for Adam. You might do some experiements comparing different optimizers and initial learning rates on dataset with features of very different scales, and share the result with us!

Cheers,
Raymond

2 Likes

Great, thanks. Doing normalization is certainly needed.