Course 2, Week 2, suggest for Gradient Descent with Momentum

levietminh.ftu2 · November 14, 2023, 3:56pm

I suggest the formula to compute v_dw and v_db should be rewritten to explain how it could become a momentum for gradient descent

The formula in lesson:
v_dW := beta * v_dW + (1-beta) *dW
W: = W - alpha * v_dW

The rewritten formula:
v_dW := dW + beta * (v_dW - dW)
W := W - alpha * v_dW = W - [ alpha *dW + alpha * beta * (v_dW - dW) ]

So we could see the component beta * (v_dW - dW) is the momentum for gradient descent.

If v_dW < dW : it is a negative acceleration that limits the oscillation
If v_dW > dW: it is a positive acceleration to speed up convergence

paulinpaloalto · November 14, 2023, 4:17pm

That’s an interesting point! Of course they wrote it the way they did to emphasize the point that what is being done there is effectively using the Exponential Weighted Average of dW. It would be nice to add your formulation in the lectures or in the explanations in the assignment to give more intuition about why it is useful and how it achieves that effect. Thanks for pointing this out!

Topic		Replies	Views
Gradient descent with Momentum - Week 2 Improving Deep Neural Networks: Hyperparameter tun week-2	6	234	May 22, 2024
Momentum Formula Improving Deep Neural Networks: Hyperparameter tun week-2	5	210	May 18, 2024
Gradient Descent with Momentum (formula) Improving Deep Neural Networks: Hyperparameter tun	2	522	November 13, 2022
HELP - Something not clear with momentum gradient decent Improving Deep Neural Networks: Hyperparameter tun	5	386	August 11, 2023
Moving Average & Momentum Improving Deep Neural Networks: Hyperparameter tun	1	513	February 2, 2023

Course 2, Week 2, suggest for Gradient Descent with Momentum

Related topics