In the week 2 lecture notes for “Learning rate decay” it looks to me like we can understand LRD as a “naive” or “training data agnostic” rule for changing alpha based on the number of iterations, as opposed to RMSprop which modifies alpha based on dW and db (separately from modifying dW and db thems…

[image] am003e: I’m trying to tell if Adam (or even just RMSprop) is generally a better choice than LRD My take on this is that unfortunately there is not the only „silver bullet“ to solve all optimization problems. My experience is that it depends on the data and also the problem that you ar…

Learning rate decay vs RMSprop

Course Q&A Deep Learning Specialization Improving Deep Neural Networks: Hyperparameter tun

Christian_Simonis February 3, 2023, 9:50pm 2

Hi @am003e

Upfront: Adam uses the calculation of an exponentially filtered moving average, combining RMSProp and Momentum. So it’s not like a classic LRD is used, but rather a dynamic (adaptive learning rate) approach
Therefore, this thread cshould be interesting to you:

I find your thoughts interesting… Do you have a certain cost functions in mind where you see this method particularly useful also compared to other gradient-based optimizers.

Please let me know what you think

Best regards
Christian

Topic		Replies	Views
Difference between Rmsprop and ADAM Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	1193	April 17, 2023
Adam vs RMSPROP, Momentum Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	569	January 8, 2023
RMS Prop vs GD With Momentum Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	562	May 24, 2021
Adam Optimization Question Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	778	December 28, 2022
Optimization algorithms Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	731	April 8, 2023

Learning rate decay vs RMSprop

Related topics