The point of RMSProp is to slow down parameter updates for derivatives that contribute to oscillation (vertical direction) in order to speed up learning. What if a derivative is larger in the horizontal direction than in the vertical direction? If we slowed down the the updates to the horizontal parameter wouldn’t that slow down learning?
Well then maybe that situation is not a good application of RMSprop. This is yet another example of the “meta” theme of everything in Week 1 of Course 2: it all depends. There is no single magic “silver bullet” answer that works the best in all cases. The point is to be familiar with the suite of tools that you have at your disposal and how to recognize which of them might be applicable in a given problem scenario.
3 Likes