I got my code pass the checker but don’t entire understand the math behind this.

v[“dW” + str(l)] = beta * v[“dW” + str(l)] + (1 - beta) * grads[“dW” + str(l)] (1)

which is supposedly the correct implementation is the same as:

v[“dW” + str(l)] = (1 - beta) * grads[“dW” + str(l)] (2)

since v[“dW” + str(l)] is initialized = 0.

I tried (2) and pass all test.

Should it be v[“dW” + str **(l-1)** ] for l>1 and just 0 for l=1, as we take ‘beta’ part of the LAST momentum and give it a bit more acceleration?

Am I understanding this correctly?