Hi @am003e
Upfront: Adam uses the calculation of an exponentially filtered moving average, combining RMSProp and Momentum. So it’s not like a classic LRD is used, but rather a dynamic (adaptive learning rate) approach
Therefore, this thread cshould be interesting to you:
I find your thoughts interesting… Do you have a certain cost functions in mind where you see this method particularly useful also compared to other gradient-based optimizers.
Please let me know what you think
Best regards
Christian