Doubt regarding course 2 Week 2 assignment

in the video for explaining about momentum the equation was given as
Screenshot 2022-11-27 at 12-21-17 C2_W2.pdf

But in the programming exercise the exponential momentum for vt is given as

can you explain about this

Hi @M_A_Naidu

clearly what this image say is the Clarification about Screenshot 2022-11-27 at 12-21-17 C2_W2.pdf

what Exponentially weighted averages it is mean so it can be used in many application like Time series and Gradient descent with momentum so how to use concept of Exponentially weighted averages in Gradient descent with momentum is shown in this photo


as it mean how you can update weight (W,B) using the advantages of concept of Exponentially weighted averages
image

please feel free to ask any questions,
Thanks,
Abdelrahman

Hey @M_A_Naidu,

In addition to @AbdElRhaman_Fakhry’s explanation, on the other hand, we can also try to map variables (in the 1st screenshot) to variables (in the 1st eq in the 2nd screenshot) :

  1. \theta_t to dW^{[l]}
  2. v_{t-1} to v_{dW^{[l]}}
  3. v_t to v_{dW^{[l]}}

Now (1) makes sense because the “thing” that we allow for momentum is the gradient. (2) is just the gradient’s momentum in the last time step ( time step = t-1), whereas (3) is the updated momentum at the current time step t.

You see t and t-1 respectively in (3) and (2) in the 1st screenshot but not in the 2nd screenshot, because the 2nd screenshot is oriented to how a program works. In the program, we do not store the momentum value by its timestep, instead we have one variable called v_{dW^{[l]}} which aggregrates over time steps.

What’s done in the 1st eq. of the 2nd screenshot is that, we recall the current value of v_{dW^{[l]}} (which presents the value in the last time step), multiply it with \beta, and sum it with the gradient of the current time step multiplied by 1-\beta. The result replaces the current value of v_{dW^{[l]}} so that it becomes the momentum value in the current timestep.

Does this make sense to you?

Cheers,
Raymond