in the video for explaining about momentum the equation was given as
But in the programming exercise the exponential momentum for vt is given as
can you explain about this
in the video for explaining about momentum the equation was given as
But in the programming exercise the exponential momentum for vt is given as
can you explain about this
Hi @M_A_Naidu
clearly what this image say is the Clarification about
what Exponentially weighted averages it is mean so it can be used in many application like Time series and Gradient descent with momentum so how to use concept of Exponentially weighted averages in Gradient descent with momentum is shown in this photo
please feel free to ask any questions,
Thanks,
Abdelrahman
Hey @M_A_Naidu,
In addition to @AbdElRhaman_Fakhry’s explanation, on the other hand, we can also try to map variables (in the 1st screenshot) to variables (in the 1st eq in the 2nd screenshot) :
Now (1) makes sense because the “thing” that we allow for momentum is the gradient. (2) is just the gradient’s momentum in the last time step ( time step = t-1), whereas (3) is the updated momentum at the current time step t.
You see t and t-1 respectively in (3) and (2) in the 1st screenshot but not in the 2nd screenshot, because the 2nd screenshot is oriented to how a program works. In the program, we do not store the momentum value by its timestep, instead we have one variable called v_{dW^{[l]}} which aggregrates over time steps.
What’s done in the 1st eq. of the 2nd screenshot is that, we recall the current value of v_{dW^{[l]}} (which presents the value in the last time step), multiply it with \beta, and sum it with the gradient of the current time step multiplied by 1-\beta. The result replaces the current value of v_{dW^{[l]}} so that it becomes the momentum value in the current timestep.
Does this make sense to you?
Cheers,
Raymond