Learning rate decay vs RMSprop

Hi @am003e

Upfront: Adam uses the calculation of an exponentially filtered moving average, combining RMSProp and Momentum. So it’s not like a classic LRD is used, but rather a dynamic (adaptive learning rate) approach
Therefore, this thread cshould be interesting to you:

I find your thoughts interesting… Do you have a certain cost functions in mind where you see this method particularly useful also compared to other gradient-based optimizers.

Please let me know what you think​:slightly_smiling_face:

Best regards
Christian