I suggest the formula to compute v_dw and v_db should be rewritten to explain how it could become a momentum for gradient descent
The formula in lesson:
v_dW := beta * v_dW + (1-beta) *dW
W: = W - alpha * v_dW
The rewritten formula:
v_dW := dW + beta * (v_dW - dW)
W := W - alpha * v_dW = W - [ alpha *dW + alpha * beta * (v_dW - dW) ]
So we could see the component beta * (v_dW - dW) is the momentum for gradient descent.
If v_dW < dW : it is a negative acceleration that limits the oscillation
If v_dW > dW: it is a positive acceleration to speed up convergence