Momentum Formula

Here is the formula for momentum gradient descent in the week assignment which passed the test:
on itiration of l from 1 to L+1
v[“dW”+str(l)]=betav[“dW”+str(l)]+(1-beta)grads[“dW”+str(l)]
v[“db”+str(l)]=beta
v[“db”+str(l)]+(1-beta)grads[“db”+str(l)]
parameters[“W”+str(l)]=parameters[“W”+str(l)]-learning_rate
v[“dW”+str(l)]
parameters[“b”+str(l)]=parameters[“b”+str(l)]-learning_rate
v[“db”+str(l)]

And all the vdW1, vdW2… are all set to zero initially.
what I find by itirating l is that v does not take into account the previous v, as opposed to what Andrew taught in his class.
vdW1 will simply be updated by a zero vector plus dW1alpha
and vdW2 will be also updated by a zero vector plus dW2
alpha
these is no link to v1 and v2.
I really think the formula should change so that the vdW2 will be VdW1-dW1alpha rather than vdW2-dW1alpha as shown in Andrew’s lecture and also the assignment.
Pls teach me if I deduced wrongly

either my deduction is wrong or the lecture is taught wrongly. Pls enlighten me!!!

@Yifei1 Keep in mind you have a couple (small-- ‘Python-wise’) math errors here.

But keep in mind we are updating our parameters, so the

["dW" + str(l)]

that goes in is not the same as the

["dW" + str(l)] =

that comes out-- thus the first is effectively reflective of the gradient at the ‘time step before’ the present one, or V^{T-1}

Hi @Yifei1 ,

And all the vdW1, vdW2… are all set to zero initially.

If you take a look at the model() function, you will see the optimizer is initialized to zero only once at the start of the optimize loop, and it is updated per minibatch by calling the
update_parameters_with_momentum() function. This function makes adjustment to the parameters and v at each layer, and is called a number of times with a different set of v until all minibatches are done.

I really think the formula should change so that the vdW2 will be VdW1-dW1alpha

Why do you think the formula should be changed?

vdW2- dW1 alpha
dW[1] should refer to dW of layer l (lower case of L), not 1

oh thank you for clarifying that momentum is updated batch by batch. I was thinking it as layer by layer

thanks for clarification. I understand now