Update_parameters_with_momentum

Well_Zhang · June 7, 2023, 5:17am

Hi, I’m curious about why the equation of gradient descent with momentum here doesn’t rely on the previous gradient. That is, I remember in the definition of exponential weighted moving average, v_t = \beta * v_{t-1} + (1-\beta)*\theta_t . It’s because we connect v_t with its previous elements so that we can have a smoother diagram.

But in the case of gradient descent with momentum, why don’t we involve v_{dW^{[l-1]}}?

Also, we initialized all v_{dW^{[l]}} with zeros, what’s the point of multiplying \beta and zero here?

Mujassim_Jamal · June 7, 2023, 5:59am

The update rule will be applied for each layer, as we have parameters associated with each layer separately. Therefore, the term v_{t−1} represents the velocities calculated at the previous time step or previous iteration for the layer l, not the velocities from the previous layer l-1.

After a few iterations, multiplying \beta would make sense, since (1 - \beta)dW^{[l]} and (1 - \beta)db^{[l]} are also being added.

rmwkwok · June 7, 2023, 7:29am

Hi @Well_Zhang,

In addition to @Mujassim_Jamal’s explanation especially on the meaning of t and [l],

The previous gradients are there, then you update them with dW^{[l]} or db^{[l]} and they become the “current” gradients.

Cheers,
Raymond

Well_Zhang · June 7, 2023, 4:00pm

Thank you. As we call update_parameters_with_momentum() in each iteration, the old v_{dW^{[l]}} here is actually from the last iteration and we update it in each new iteration.

Mujassim_Jamal · June 8, 2023, 3:28am

Yes, you are right …

Topic		Replies	Views
Moving Average & Momentum Improving Deep Neural Networks: Hyperparameter tun	1	512	February 2, 2023
Doubt regarding course 2 Week 2 assignment Improving Deep Neural Networks: Hyperparameter tun	2	503	November 27, 2022
Confused about the momentum updates! Improving Deep Neural Networks: Hyperparameter tun	3	610	April 25, 2021
Gradient descent with Momentum - Week 2 Improving Deep Neural Networks: Hyperparameter tun week-2	6	233	May 22, 2024
Implementing exponentially weighted averages Improving Deep Neural Networks: Hyperparameter tun	3	520	April 5, 2023

Update_parameters_with_momentum

Related topics