Why only use squared term and not variance in RMS prop

P_R_Siddharthan · March 6, 2022, 3:11pm

In RMS prop, we divide dW by E[dw^2], why not divide it by E[dw^2]- (E[dw])^2 which is the true variance of the variable dW over the last 1/(1-beta) terms?

Is the true variance not as useful as just the E[dW^2]? Why not?

kenb · March 7, 2022, 4:31pm

Hi @P_R_Siddharthan. It looks like you are now in Course 2 but you posted in the Course 1 forum. Please repost there. Thanks!

paulinpaloalto · March 7, 2022, 4:41pm

I just used the little “edit pencil” on the title to move this to DLS Course 2. But unfortunately I don’t know the answer to the actual question at hand here. If I can scare up a few minutes, I’ll go watch that lecture again. But that’s my suggestion for the answer in any case: listen to the lecture again a bit more carefully and I’ll bet Prof Ng addresses this question. I just did a quick scan and towards the end of that lecture, he mentions that RMSprop was first described by Prof Geoff Hinton in a Coursera course. I did a quick google and here are the PDF slides from that course and here’s the YouTube video of the lecture.

P_R_Siddharthan · March 13, 2022, 2:43pm

Unfortunately, Geoff Hinton’s lecture does not address why variance is not used. Just states how it is computed just as in our DL specialization.

paulinpaloalto · March 13, 2022, 4:06pm

Sorry that he doesn’t give any further explanation, but it’s good to hear that the two statements of the method are consistent.

Joel_Wigton · April 13, 2023, 10:42pm

This might explain it:

Topic		Replies	Views
Intuition for RMS Prop Neural Networks and Deep Learning coursera-platform	3	551	February 19, 2023
RMSProp formula clarification Improving Deep Neural Networks: Hyperparameter tun week-module-2 , coursera-platform	2	19	September 26, 2024
RMSprop(C2W2L07) Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	486	March 11, 2022
Question about RMSprop Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	281	December 17, 2023
Why does RMSProp look at a moving average containing past gradients instead of just the current gradient? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	500	July 9, 2023

Why only use squared term and not variance in RMS prop

Related topics