Help! On momentum gradient descent. course2

Ryan4 · June 29, 2022, 2:14pm

Why is the following formula used in the exponentially weighted average.
v_t=β×v_(t−1) +(1−β)×θ(t)
And in the second week of programming assignments momentum gradient descent was used.
𝑣𝑑𝑊[𝑙]=𝛽𝑣𝑑𝑊[𝑙]+(1 -𝛽)𝑑𝑊[𝑙]
Why is the first term 𝛽𝑣𝑑𝑊[𝑙] in the formula for momentum gradient descent not 𝑙-1. Then if initially the 𝑣𝑑𝑊 all initialized to 0, wouldn’t the first term be meaningless.

Ryan4 · June 29, 2022, 4:11pm

Oh, I misunderstood. In momentum gradient descent, [L] stands for the Lth parameter, not the Lth iteration.

paulinpaloalto · June 29, 2022, 10:11pm

Yes, the l (lower case ell) there is the layer number, nothing to do with iterations.

Topic		Replies	Views
Momentum Formula Improving Deep Neural Networks: Hyperparameter tun week-module-2 , coursera-platform	5	239	May 18, 2024
Update_parameters_with_momentum Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	496	June 8, 2023
HELP - Something not clear with momentum gradient decent Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	411	August 11, 2023
Course 2, week 2, update_parameters_with_momentum() issue Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	541	November 24, 2021
Doubt regarding course 2 Week 2 assignment Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	517	November 27, 2022

Help! On momentum gradient descent. course2

Related topics