Confused about the momentum updates!

In the Wk2 assignment for the DLS course where we were requested to compute the function update_parameters_with_momentum(), I computed the below functions:

However, the v[“dW1”], v[“db1”], v[“dW2”], v[“db2”] lists are all initialized as lists of zeros. The v[“dW”+ str(l)] and * v[“db”+ str(l)]* has always referred to lists of zeros. Hence, the previous results in v[“dW”+ str(l -1)] or v[“db”+ str(l -1)] never seemed to have any bearing over their counterpart in the next iteration.

Therefore, I was wondering if how are the momentum generated if the lth iteration calculation is not taking the v value from the previous (l - 1) iteration calculation?

I am very confused. Did I overlook something?
Not sure if I am explaining my question well, but let me know what you think.

Thank you for helping out.

Hi, @Linfeng_W.

The averaging takes place across epochs (for simplicity, let’s assume we’re doing batch gradient descent), not across layers. These are the iterations you should be thinking of:

for i in range(num_epochs):
	parameters, v = update_parameters_with_momentum(parameters, grads, v, beta, learning_rate)

The loop inside update_parameters_with_momentum simply updates the parameters of every layer.

Was that helpful? :slight_smile:

1 Like

Ah, I see! Somehow I thought momentum is carried out between layers. You are totally right that the average should take place across epochs. Makes sense now.

Thank you so much for clearing things up for me.

1 Like

Glad I could help. Good luck with the rest of the course :slight_smile: