Hi, @MalayAgr.
It’s the same m, the size of the mini-batch. Here’s an interesting discussion about this scaling factor.
When you add momentum, W is still calculated as
. But V_{dw} depends on dw, which now has an additional term: ![]()
Hope that helped 
Hi, @MalayAgr.
It’s the same m, the size of the mini-batch. Here’s an interesting discussion about this scaling factor.
When you add momentum, W is still calculated as
. But V_{dw} depends on dw, which now has an additional term: ![]()
Hope that helped 