With Adam
optimizer
– Each feature will get it’s own learning rate.
So we still need to do normalization for input layer?
Thanks,
Lizhang
With Adam
optimizer
– Each feature will get it’s own learning rate.
So we still need to do normalization for input layer?
Thanks,
Lizhang
Hi Lizhang, we do normalization because it helps gradient descent work better. If you don’t normalize, you may have a difficult time determining the best initial learning rate for Adam. You might do some experiements comparing different optimizers and initial learning rates on dataset with features of very different scales, and share the result with us!
Cheers,
Raymond
Great, thanks. Doing normalization is certainly needed.