Gradient descent with Momentum - Week 2

Sao-Mai · May 15, 2024, 6:14pm

#Week2
#Grandient descent with Momentum

Hi,
I have a question regarding gradient descent with momentum, please.

In the video, it says that in some research papers, instead of writing the equation on the left, they write the equation on the right: V_dw = B*v_dw + dW. We have this equation because we do *1/(1-B) on the equation on the left and we find the equation on the right.
I do not understand why we don’t have : 1/(1-B)*V_dw = 1/(1-B)*v_dw + dW, please ?

Secondly, our V_dw is being scale by 1/(1-B). So, when we performing gradient decent update, we need to change the alpha by 1/(1-B).
I do not understand because if we use this equation above: V_dw = Bv_dw + dW. and we change alpha by 1/(1-B) - for this last term: dW we already do 1/(1-B) (as in the first image) but do it 1*/(1-B) a second time with the alpha…

Can someone explain me the intuition, please ?
I hope I was clear in my explanation
Thank you,
Kind regards,
Sao Mai

TMosh · May 18, 2024, 4:31pm

Is this thread related to your question?

Sao-Mai · May 20, 2024, 8:28pm

Hi,
Thank you very much for your reply !
I fact, not really.
My question was: The formula we learnt in the videos for the gradient decent with Momentum is: v_dw = Betav_dw + (1-Beta)dW.
However, in some literature we often see: v_dw = Betav_dw + dW. With the (1-Beta) ommited. I wanted to know, please how we end up with this formula : v_dw = Betav_dw + dW ?
Secondly, by using this formula v_dw = Betav_dw + dW, how do we need to update our parameter W, please ? (instead of W = W - av_dw, what should we use ?)

I would like to understand the mechanisms behind, please.
Thank you very much
Sao Mai

TMosh · May 20, 2024, 8:48pm

Sorry, I do not know.

Sao-Mai · May 20, 2024, 8:59pm

No worries! Thank you so much for you reply

saifkhanengr · May 21, 2024, 4:12am

We are not deriving the formula on the right-hand side (v_{dw} = \beta v_{dw} + dW) from the formula written on the left-hand side (v_{dw} = \beta v_{dw} + (1 - \beta) dW). These are two different formulas for momentum to control vertical movement and as Prof. Andrew said, “both of these will work just fine”, though he also mentioned some limitations of the right-hand side formula.

Its the same as for other methods: W = W - \alpha v_{dw} and b = b - \alpha v_{db}

Sao-Mai · May 22, 2024, 7:39pm

I see we are not deriving the formula on the right-hand side from the formula written on the left-hand side.
Thank you so much for your explaination, everything is clear

Topic		Replies	Views
Momentum Formula Improving Deep Neural Networks: Hyperparameter tun week-2	5	210	May 18, 2024
Course 2, Week 2, suggest for Gradient Descent with Momentum Improving Deep Neural Networks: Hyperparameter tun	1	292	November 14, 2023
Momentum Gradient Descent question Improving Deep Neural Networks: Hyperparameter tun	5	617	December 23, 2022
Course 2, week 2, update_parameters_with_momentum() issue Improving Deep Neural Networks: Hyperparameter tun	2	514	November 24, 2021
Question Regarding Scaling of V_(dw) and V_(db) Improving Deep Neural Networks: Hyperparameter tun week-2	1	233	February 5, 2024

Gradient descent with Momentum - Week 2

Related topics