In the week two the video “Exponentially Weighted Averages” the video shows the formula for Exponentially weighted averages and it says Vt=B Vt-1+(1-B)theta t. In the video "Gradient descent with Momentum it applies the formula in the context of neural networks and shows the formula like this Vdw=B Vdw+(1-B)dw. My question is why in the first formula there is a Vt-1 but in the formula for weighted averages that is applied to neural networks it is just Vdw instead of Vdw-1 ?
Hello @Stephano_Cotsoradis,
We are taking the value of Vdw of the last time step, multiply it with B and add something else to get the value of Vdw for the current time step. There is just no t or t-1 in the symbols, but the concept of different time steps is implied.
Cheers,
Raymond
@rmwkwok So in the formula Vdw=B Vdw+(1-B)dw The part of the formula that is B Vdw the Vdw is from the previous step and the part of the formula that is (1-B)dw the dw is from the current step ?
Yes! Give me the latest Vdw, give me the latest dw, then I give you the next latest Vdw.
Thank man !